Exploring specific predictors of psychosis onset over a 2-year period: A decision-tree model

Aim: The fluctuating symptoms of clinical high risk for psychosis hamper conversion prediction models. Exploring specific symptoms using machine-learning has proven fruitful in accommodating this challenge. The aim of this study is to explore specific predictors and generate atheoretical hypotheses of onset using a close-monitoring, machine-learning approach. Methods: Study participants, N = 96, mean age 16.55 years, male to female ratio 46:54%, were recruited from the Prevention of Psychosis Study in Rogaland, Norway. Participants were assessed using the Structured Interview for Psychosis Risk Syndromes (SIPS) at 13 separate assessment time points across 2 years, yielding 247 specific scores. A machine-learning decision-tree analysis (i) examined potential SIPS predictors of psychosis conversion and (ii) hierarchically ranked predictors of psychosis conversion. Results: Four out of 247 specific SIPS symptom scores were significant: (i) reduced expression of emotion at baseline, (ii) experience of emotions and self at 5 months, (iii) perceptual abnormalities/hallucinations at 3 months and (iv) ideational richness at 6 months. No SIPS symptom scores obtained after 6 months of follow-up predicted psychosis. Conclusions: Study findings suggest that early negative symptoms, particularly those observable by peers and arguably a risk factor for social exclusion, were predictive of psychosis. Self-expression and social behaviour might prove relevant entry points for early intervention in psychosis and psychosis risk. Testing study results in larger samples and at other sites is warranted.

As psychosis arguably is one of the mental illnesses most disruptive to leading fulfilling lives, better prediction of its debut has been a main research focus over many years. Earlier and more precise identification of risk could open up possibilities for earlier, better targeted preventive efforts. However, research is still hampered by several problems. First, the definition of CHR-P is restricted to positive symptoms only. This is puzzling, as the functional consequences of psychosis are mainly in the social realm (Ajnakina et al., 2019;Anglin et al., 2020;McGorry et al., 2018;Moritz et al., 2019). Furthermore, besides attenuated psychotic symptoms, negative symptoms and poor functioning are associated with an increased risk of psychosis. Early CHR-P symptoms typically emerge during adolescence (Catalan et al., 2020;Raballo et al., 2020), a sensitive as well as dynamic developmental period when mature relationships and identity gradually manifest (Blakemore, 2018). Adolescence is accompanied by limitations in reflexivity, emotion regulation and the ability to consider consequences before acting, thus increasing vulnerability to external influences (Blakemore, 2018). Further, the use of social media for social engagement accelerates during this time period, adding a layer of social complexity for adolescents to master (Bjornestad et al., 2020). A recent umbrella review of 42 meta-analyses  identified significant impairments in work or educational functioning, social functioning, neurobiological functioning and quality of life in CHR-P individuals. The overall risk of psychosis conversion was 22% at 3 years, and highest (38%) for the sub-group having brief and limited intermittent psychotic symptoms. However, substance use, comorbid mental disorders, suicidal ideation and accumulated sociodemographic risk factors were also common. Second, there is a challenge regarding timing of symptom detection. Time intervals between study follow-up assessments are mostly long. Identifying in more detail when symptoms and psychological problems related to CHR-P emerge could shed light on their place in a trajectory towards psychosis, and consequences for psychosocial development. Third, research is typically based on sum score analyses of composite constructs, including the merging of several specific symptoms. Adding specificity by studying individual phenomena and particular symptoms in more detail could provide a fruitful way forward.
Machine learning has shown promise for predicting psychosis onset in individuals at risk (Sanfelici et al., 2020). A simple variant of machine learning is decision-tree modelling (Kass, 1980). A key feature in both machine learning and decision-tree modelling is that they have no a priori theoretically based hypotheses-in this case regarding psychosis prediction. Further, such models handle large quantities of predictors, something that allows researchers to be less strict about which predictors are allowed in their models. They allow for model variables to be treated with equal value, and output is readily interpretable. Consequently, despite the fact that researchers are choosing model variables, the impact of the researchers pre-understandings and biases are reduced. On this basis, decision-tree modelling seems appropriate for generating hypotheses based on large quantities of specific predictors of psychosis onset.
In this study, we have aimed to explore specific symptom predictors and generate a-theoretical hypotheses of psychosis onset by using data from frequent assessments in a decision-tree approach.
We focus on symptoms derived from the Structured Interview for Psychosis Risk Syndromes (SIPS) (Miller et al., 1999). SIPS includes positive, negative, disorganization and general symptoms subscales, which enables a multifaceted symptom basis for analysing CHR-P individuals.
2 | METHOD 2.1 | Sample and recruitment The sample was recruited from the ongoing Prevention of Psychosis Study (POP), (Joa et al., 2015) a naturalistic longitudinal CHR-P study in Rogaland, Norway. The POP study included a population-based cohort ($300 000 inhabitants) of CHR-P individuals between 2012 and 2018. Participants were recruited through intensified case detection within secondary mental-health clinical services and the general population (for detailed descriptions of awareness campaigns, recruitment, and treatment see Joa et al., 2021). POP was approved by the Regional Committee for Medical Research Ethics Health Region West, Norway (2009/949). All participants provided written informed consent.
In the present study, 141 participants were eligible for inclusion and 104 of these gave informed consent. Of those, eight (7.4%) were excluded either due to missing data on the main output variable: conversion to psychosis (N = 7) or had no symptom data (N = 1). Thus 96 participants were included. Due to missing data at different assessments, the sample size varied across the study period. Participants converting to psychosis were excluded from the study at the time of conversion and offered inclusion in the Early Treatment and Intervention in Psychosis-2 (TIPS 2) study (Joa et al., 2008). Hence, only assessments prior to conversion were included in the analysis. As participants could re-enter the study after not participating in one or more assessments, the missing assessments were counted from 1 month to 2 years. Participants who missed fewer than six assessments were compared to participants who missed more than six assessments. No significant differences between these two groups were identified in baseline age, gender or any of the four SIPS scales (mean-scores). Serial means were imputed for missing mean values on SIPS scores on the 6-, 12-, 18-and 24-month follow-ups used in Table 1. The addition of serial mean score did not alter which correlations were significant, nor did it substantially change the result on the significant associations. Attrition thus appears to be at random for both baseline and follow-up characteristics.

| CHR-P inclusion and exclusion criteria
Individuals included in the POP study met the following criteria: living in the catchment area; being 13-65 years of age; meeting diagnostic criteria for CHR-P based on the SIPS (Miller et al., 1999); their symptoms not being better accounted for by an Axis I, Axis II or substance use disorder, based on the Structured Clinical Interview for DSM disorders (SCID) I interview (First et al., 1995), with the exception of schizotypal personality disorder (the presence of any of these disorders in itself was not an automatic reason for exclusion); "understanding and speaking one of the Scandinavian languages Norwegian, Danish or Swedish all of which are inter-understandable"; and being able to provide informed consent. Participants under 16 years of age also had to have consent from parents or guardians.
Individuals were excluded based on the following criteria: Meeting current or life-time criteria for any psychotic disorder; current use of any antipsychotic medication; use of antipsychotic medication (regardless of dosage) for more than 4 weeks in their lifetime; known neurological or endocrine disorders related to CHR-P symptoms; and intellectual functioning below IQ of 70.

| Measures
The SIPS is a semi-structured interview targeting experiences of attenuated symptoms and other indicators of psychosis risk (Miller et al., 1999). The SIPS subscales include positive symptoms (five items), negative symptoms (six items), symptoms of disorganization (four items) and general/affective symptoms (four items), with the positive symptom scale defining psychosis risk (see Supporting Information 1 for a full list of SIPS-items). The SIPS identifies three clinical high-risk syndromes: Attenuated positive symptoms (APS), brief intermittent psychotic symptoms (BIPS), and genetic risk and/or deterioration syndrome (Yung & McGorry, 1996). It was administered to participants 13 times over 2 years, starting with baseline (T1), then monthly for 6 months (T2, T3, T4, T5, T6, T7) and then every 3 months until 2 years post baseline (T8, T9, T10, T11, T12, T13).
Two hundred and forty seven specific SIPS symptoms (19 symptoms measured 13 times) were tested over this time period and treated as variables in the statistical modelling.

| Procedure
Psychiatric nurses trained in interviewing for psychosis spectrum disorders conducted the SIPS interviews. Trained clinical researchers conducted the SCID for diagnosis (First et al., 1995). Consensus regarding the CHR-P state was reached during weekly diagnostic meetings. Reliability of the SCID in the research group was satisfactory at kappa = 0.9 in 2012 (Weibell et al., 2013). Regular reliability training was undertaken to avoid drift.

| Outcome measure-Psychosis onset
Psychosis onset was operationalized according to the SIPS Presence of Psychotic Syndrome criteria (one or more SIPS positive symptom scores of six or higher that were seriously disorganising or dangerous or lasted on average more than 1 h per day, 4 days per week over a 1-month period) within any of the 12 assessment intervals of the follow-up. Conversion to psychosis was operationalized as a single dichotomous variable (0 = not converted, 1 = converted).

| Statistical analyses
Statistical analyses were carried out using SPSS version 25 (IBM Corp, 2017). Two-way analysis of variance was performed to examine the distribution of key variables and basic demographic characteristics of age and gender across the dichotomous conversion variable.
Between-group differences were estimated using t tests (normally dis- were included in the CHAID model, and missingness was allowed as it commonly is in CHAID models; as the data were categorical, missing data were treated as a category on each variables the same way that the other categories were treated. Missing data were also reported along with the other categories (see Figure 1). 63.5% had at least 6.
Participants were young (mean age 16.5 years, SD 3.0) and with an even male-to-female ratio (46:54). Age and gender were not associated with conversion to psychosis. A significantly higher mean item score for negative symptoms (2.43 vs. 1.69, t = À3.11, p < 0.01) was found for participants who later converted to psychosis. No other symptoms at baseline were associated with psychosis-conversion.

| Decision-tree model
The decision tree model is presented in Figure 1 and Table 2. Of the 247 variables (SIPS items) included, five variables were key for predicting conversion to psychosis. Four out of these five were significant with a p value below 0.05, one (node 6) had a p value of 0.066.
(The lowest p value was naturally found for Node 0 and p values were increasing throughout the branches. However, only Node 6 was above p value >0.05). The first variable (node 0) was item "N3 Expression of emotion" obtained at baseline, leading to the smallest branch (43 participants), of whom 16 out of 19 converters scored above zero.
Following this branch, the second variable (node 2) was "N4 Experience of emotions and self" at 5 months. All 16 converters with a score above zero on N3 at baseline also had a score above 1 or missing on N4 at 5 months. The third variable (node 6) was "D1 Odd behaviour or appearance" at 1 month. Two out of 16 participants converting to psychosis had a score above 0. Returning to the first split, and following the largest branch (53 of 96 participants), the next splitting variable (node 1)-was "P4 Perceptual abnormalities/ hallucinations" after 3 months; the fourth variable in the model. The fifth variable included (node 4) was "N5 Ideational richness" after 6 months, splitting above zero or missing. On this largest branch all three participants converting to psychosis had a combination of scoring above three on perceptual abnormalities and above zero on ideational richness. Other variables were not significant and were therefore not included.
The model, including a total of 10 nodes, had a maximum tree depth of three nodes. Of these nodes six were terminal, and not branching into new nodes. The estimated error of risk was 0.125, and the SE was 0.034. The model predicted correctly in 67 of 77 nonconverted cases (87.5%), and 17 of 19 converted cases (89.5%) ( Table 2).

| DISCUSSION
The main findings in this study were, first, that the decision-tree model suggested that only four out of 247 specific SIPS symptom scores were significant and useful for predicting psychosis onset correctly. Second, that early SIPS scores carried more weight compared to later ones. None of the item scores obtained later than 6 months were significant. Third, the item "reduced expression of emotion" (N3) at baseline was best for splitting the sample into converters versus non-converters; a score of one or higher was present in 16 out of 19 converters. Fourth, three out of four significant predictors are listed as negative symptoms and are usually not included in the definition of CHR-P: reduced expression of emotion, reduced experience of emotions and self, and reduced ideational richness. However, one positive (perceptual abnormalities/hallucinations after 3 months) also contributed to prediction.

| Early symptoms and their social consequences
Some of the predictors found in this study appear to have a common denominator: They are observable by peers, and they could affect the social domain. Reduced emotional expression and odd behaviour or appearance are directly observable. These features can lead to social marginalization, as people in a social context do not necessarily tolerate "otherness" and ultimately repulse it (Becker, 1963;Bjornestad et al., 2020). A recent study of social interaction among adolescents (Bjornestad et al., 2020) indicates that these behaviours lead to diminished social interaction and ultimately, to social exclusion by peers. Further, despite being subjective and private, poor ideational richness and perceptual abnormalities or hallucinations, along with a reduced experience of emotions and self, can be observable through coping strategies such as withdrawing from social contact (Schultze-Lutter & Theodoridou, 2017).
Early failure in mastering social skills is common in CHR-P (Fusar-Poli et al., 2020). The importance of social participation for developing and practicing core human capacities such as self-agency (Bandura, 1982) and mentalization, (Fonagy & Bateman, 2006) imply that behaviours associated with the study predictors may produce long- term psychological and social setbacks, not only in terms of psychosis but also in a more general psychosocial sense. For example, individuals with CHR-P who do not develop psychosis continue to display subthreshold psychotic symptoms (Addington et al., 2019), meet criteria for other mental disorders (Addington et al., 2017), and have limitations in social functioning . Targeting specific observable behaviours leading to social exclusion seems a relevant entry point for early prediction models and preventive efforts in CHR-P.

| Predictive value of negative symptoms
It may seem paradoxical that negative symptoms were so strongly predictive of psychosis, whereas most early warning signs or "high risk" symptoms are defined as either attenuated or brief positive symptoms. This may be due to challenges distinguishing emerging negative symptoms from typical adolescent features: Puberty, insecurity, heartbreak, depression, substance-use, for example (Blakemore, 2018). Specificity appears to be low. Nevertheless, negative symptoms are important because they are associated with high levels of disability and poor prognosis both in psychosis (Marder & Galderisi, 2017) and CHR-P (Carri on et al., 2020), and a central goal of CHR-P intervention is to reduce the risk of adverse outcomes through early intervention . This is inspired by research showing that reducing the duration of untreated psychosis (DUP) through early intervention can have beneficial prognostic effects (Fusar-Poli, McGorry, & Kane, 2017;Hegelstad et al., 2012). However, reducing DUP in individuals with insidious psychosis onsets predominated by negative symptoms has proven difficult (Marder & Galderisi, 2017). Therefore, we argue in favour of a more detailed investigation of early negative symptoms in order to harness the potential of negative symptoms for the prediction of psychosis. Investigating the duration of untreated negative symptoms (DUN) would then be within reach and a fruitful way forward. Furthermore, the specificity of the items identified in this study could possibly add specificity to the "psychosis risk calculator" (Cannon et al., 2016).

| Clinical implications
Our findings suggest that the CHR-P symptoms that are the most predictive of psychosis conversion also increase risk for social exclusion.
Hence, identification of these symptoms should be followed by clinical interventions aimed at increased social participation, agency and social skills training. Such efforts have proven promising, both in psychosis (Cella et al., 2017;Davidson et al., 2004) and CHR-P (Santesteban-Echarri et al., 2020).

| Limitations
The primary limitations of this study are the small sample size and the high attrition rate. This represents a loss of valuable information and may weaken the study's generalizability. However, attrition appeared to be random for baseline characteristics and the sample seems comparable (e.g., age, gender SIPS symptom levels and functioning) to other CHR-P samples internationally . A major limitation with the machine-learning analysis is the possibility of overfitting the model, reducing the possible use on other datasets and future data collections. We acknowledge that untested, this limits the usability of the model. However, given the exploratory nature of this study with the goal of generating hypotheses for future study, we believe the risk of overfitting is acceptable in this study. We plan to further test this model in this, and other populations with larger sample sizes.
The machine-learning analysis was also limited by a false negative result in 10.5% of cases and a false positive result in 13% of cases.
Furthermore, while this is a study of specific SIPS items, the items targeted quite general domains. Hence, variable discrimination may have been limited. Finally, as exploration and hypothesis generation was the main aim of this investigation, we allowed a significance level of up to p < 0.10 in the decision-tree analyses. This posed no problem for the main branches in the model; however, the significance level of Node 6: "1 month: D1 Odd behaviour or appearance" (see Figure 1) may reduce the predictive value of this variable.

ACKNOWLEDGEMENTS
The present study was supported by a grant from Health West Foundation 911508 and grant 911881 and also supported by grant 913184 from the Norwegian Extra Foundation for Health and Rehabilitation through EXTRA funds.

CONFLICT OF INTEREST
The authors declare no conflicts of interest. The funding sources provided no input into the analyses or presentation of these data. all phases of the paper and were involved in study design, provided scientific oversight throughout the project, detailed comments to the paper across several drafts, and edited the paper.

DATA AVAILABILITY STATEMENT
The datasets presented in this article are not readily available because according to Norwegian law, data sharing requires approvals from the