Depressive DisordersPsilocybin

Natural speech algorithm applied to baseline interview data can predict which patients will respond to psilocybin for treatment-resistant depression

This machine-learning study (n=17) was able to predict the therapeutic effectiveness of psilocybin for treatment-resistant depression using an algorithm applied to natural speech data from the baseline interviews. The results were 85% accurate and 75% precise.

Authors

  • Ashton, M.
  • Carhart-Harris, R. L.
  • Carrillo, F.

Published

Journal of Affective Disorders
individual Study

Abstract

Background: Natural speech analytics has seen some improvements over recent years, and this has opened a window for objective and quantitative diagnosis in psychiatry. Here, we used a machine-learning algorithm applied to natural speech to ask whether language properties measured before psilocybin for treatment-resistant can predict for which patients it will be effective and for which it will not.Methods: A baseline autobiographical memory interview was conducted and transcribed. Patients with treatment-resistant depression received 2 doses of psilocybin, 10 mg and 25 mg, 7 days apart. Psychological support was provided before, during and after all dosing sessions. Quantitative speech measures were applied to the interview data from 17 patients and 18 untreated age-matched healthy control subjects. A machine-learning algorithm was used to classify between controls and patients and predict treatment response.Results: Speech analytics and machine learning successfully differentiated depressed patients from healthy controls and identified treatment responders from non-responders with a significant level of 85% of accuracy (75% precision).Conclusions: Automatic natural language analysis was used to predict effective response to treatment with psilocybin, suggesting that these tools offer a highly cost-effective facility for screening individuals for treatment suitability and sensitivity. Limitations: The sample size was small and replication is required to strengthen inferences on these results.

Unlocked with Blossom Pro

Research Summary of 'Natural speech algorithm applied to baseline interview data can predict which patients will respond to psilocybin for treatment-resistant depression'

Introduction

Quantitative analysis of natural speech has advanced in recent years and is increasingly applied to psychiatric problems. Previous work has used automated measures of speech coherence and emotion to identify conditions such as schizophrenia and mood disorders, and to predict clinical trajectories; these studies indicate that language features can serve as objective diagnostic and prognostic markers. However, it remained unclear whether pre-treatment speech patterns could predict who will respond to psychedelic-assisted therapy for depression. Carrillo and colleagues set out to test whether automated natural language analytics combined with machine learning could predict clinical response to psilocybin in patients with treatment-resistant depression (TRD). The study applied emotional-sentiment measures to autobiographical interview transcripts collected before treatment and used a classifier to distinguish healthy controls from depressed patients and to predict which patients later responded to psilocybin treatment.

Methods

This was an open-label clinical study, sponsored by Imperial College London, in which patients with TRD received two oral doses of psilocybin (10 mg then 25 mg) one week apart, with psychological support provided before, during and after dosing. Seventeen patients completed the study (mean age 44.59, SD 10.97, 5 females). Eighteen age- and sex-matched healthy controls were recruited separately (mean age 36.44, SD 17.23, 7 females). The primary clinical outcome was the Quick Inventory of Depressive Symptoms (QIDS-16) measured at baseline and again 5 weeks after the 25 mg dose; treatment response was predefined as a ≥ 50% reduction in QIDS score at 5 weeks. The extracted text does not detail other inclusion/exclusion criteria or randomisation because the trial was open-label; the authors direct readers to the main trial publication and online supplement for further procedural details. Participants completed an autobiographical memory test (AMT), a structured interview that elicits specific memories in response to cue words. Patients undertook the AMT approximately 2 weeks before the first psilocybin dose, while controls completed the task at their convenience. All AMT sessions were audio recorded and transcribed. The interviews typically took about 10 minutes to administer. Transcriptions were performed by named research staff (L.F., J.S. and P.A.), and two balanced versions of the AMT were used across subjects. For speech analysis the researchers applied an automated Emotional Analysis algorithm that assigns positive and negative sentiment scores to individual words (decimal values between 0 and 1). For each transcript they calculated two features: AVG P (average positivity across words) and AVG N (average negativity). A Gaussian Naive Bayes classifier, a probabilistic machine-learning algorithm, was used to represent each subject by these two features and to perform two classification tasks: patients versus controls, and responders versus non-responders. A 7-fold cross-validation scheme was applied to estimate classification performance and permutation testing was used to assess whether accuracies exceeded chance; details of the cross-validation and permutation procedures are provided in the supplement according to the extracted text.

Results

The speech analytics distinguished depressed patients from healthy controls primarily via lower use of positive emotional words. Controls had a higher AVG P (0.0532 ± 0.013) than patients (0.0384 ± 0.011), a difference that reached significance (t-test p = 0.0011). AVG N did not differ significantly between groups (p = 0.4). Using AVG P and AVG N as inputs to a Gaussian Naive Bayes classifier with 7-fold cross-validation, the model classified patients versus controls with a mean accuracy of 82.85%, precision 0.82, recall 0.82 (sensitivity 0.82), and specificity 0.83. Permutation testing (1,000 trials) indicated this accuracy was significantly greater than chance (p < 0.05). For the main clinical question, 7 of 17 patients (41%) met the predefined response criterion (≥ 50% reduction in QIDS at 5 weeks) and were labelled responders; 10 were non-responders. Mean AVG P and AVG N did not differ significantly between responders (AVG P 0.0334 ± 0.0132; AVG N 0.0368 ± 0.0088) and non-responders (AVG P 0.0418 ± 0.0091; AVG N 0.041 ± 0.0082). Despite the lack of a simple univariate difference, the same classifier trained on AVG P and AVG N achieved an accuracy of 85% for predicting responders versus non-responders (precision 0.75) using 7-fold cross-validation. Permutation testing placed this accuracy in the upper 97th percentile of the null distribution, indicating significance at p < 0.05. The authors note that responders tended to use fewer emotional words at baseline, especially fewer positive words, which they suggest may reflect greater capacity for change in those patients.

Discussion

Carrillo and colleagues interpret the findings as evidence that short, structured autobiographical interviews analysed by automated natural language tools can both detect depressive status and predict clinical response to psilocybin in TRD. They emphasise that the AMT required little time to administer (around 10 minutes) yet provided features sufficient for above-chance classification and prediction when combined with a simple machine-learning classifier. The discussion situates these results within broader work showing that language features and the quality of acute psychedelic experiences relate to clinical outcomes; the authors note that psilocybin’s idiosyncratic acute effects have previously been linked to longer-term response. They further argue that, given the low marginal cost of applying automated speech-analysis software, these methods could offer a cost-effective screening tool to identify individuals more likely to benefit from psychedelic treatment. Finally, the authors propose directions for future research, specifically testing the specificity of the observed relationships and assessing whether the predictive associations generalise to other interventions and clinical outcomes.

View full paper sections

METHODS

London, was sponsored by Imperial College London, and was carried out in accordance with Good Clinical Practice Guidelines. It was an open-label design in which patients with TRD received two doses of psilocybin (10 mg and 25 mg) one week apart. The autobiographical memory test (AMT)was performed by patients (n = 17) and age and sex matched matched controls (n = 18), who were recruited separately. For more details on the design and procedures of the main trial, see. The AMT is a structured interview in which participants are asked to provide specific autobiographical memories in response to specific cue words. For example, the cue word "newspaper" may be read to a participant, who might then reply "When I was about 8 years old, I remember a dog biting my arm as I tried to pick-up a newspaper" etc. Two different but balanced versions of the task, with a different set of word cues, were completed across the sample but there were no betweengroup differences in the completed versions. Patients completed their AMT interviews approximately 2 weeks prior to receiving their first dose of psilocybin and the matched controls did theirs at their convenience. All AMT interviews were audio recorded and transcribed (by L.F, J.S and P.A). More details on study procedures can be found in the online supplement and the main outcomes of the trial are published elsewhere. The sample comprised of 17 patients (mean age = 44.59 (SD = 10.97), 5 females) and 18 healthy control subjects (mean age = 36.44 (SD = 17.23), 7 females). The primary outcome measure, the Quick Inventory of Depressive Symptoms (QIDS-16), was rated by both sets of participants at baseline, and patients rated it again 5 weeks after the 25 mg psilocybin dose. Treatment response was defined as ≥ 50% re- duction in QIDS scores at 5 weeks. There were 7 treatment responders and 10 non responders at 5 weeks (41%).

RESULTS

Before we addressed our main question, we asked whether our method can distinguish between controls and patients. A significant between-group difference was found in the rate of positive words used in participants' AMT interview responses, with patients using significant fewer positive words: controls AVG P = 0.0532 ± 0.013 and patients AVG P = 0.0384 ± 0.011 (t-test p = 0.0011). The AVG N did not differ significantly between both groups (p = 0.4). Using a machine learning classifier, with a 7 folds cross-validation scheme, to identify patients (versus controls) based on a combination of AVG P and N values, we obtained a mean accuracy of 82.85%, (precision = 0.82, recall = 0.82, sensitivity = 0.82, specificity = 0.83). A control experiment using random permutation testing (1000 trials) (see online supplement), confirmed that this accuracy was significantly greater than chance (p < 0.05). Next, we tackled the main study question: whether pre-treatment speech could be predictive of subsequent treatment success. To do this, we employed the same machine learning approach as described above to identify responders from non-responders. AVG P and N values were not significantly different for responders (P: 0.0334 ± 0.0132, N: 0.0368 ± 0.0088) and non-responders (P: 0.0418 ± 0.0091, N:0.041 ± 0.0082); however, using the same input formula as above, we were able to predict treatment response with an above chance accuracy of 85% (precision 0.75 with a 7 folds cross-validation scheme and Gaussian Naive Bayes as classifier algorithm). As can be discerned from in Fig., AVG P, was the most sensitive variable for distinguishing patients from controls, and for predicting responder versus non-responders. On closer inspection of the data, it was found that responders used fewer emotional words at baseline (and fewer positive words especially) than non-responders, potentially reflecting a greater capacity for change in the responders that rendered them particularly sensitive to this treatment. Permutation testing revealed that the 85% accuracy was in the upper 97-percentile of the distribution and therefore significantly greater than chance (p < 0.05).

CONCLUSION

In the present study, natural speech analytics combined with machine learning was able to differentiate depressed patients from healthy controls and predict responders versus non-responders in a clinical trial of psilocybin for treatment-resistant depression. The AMT interviews that produced the data on which these analyses were performed took little longer than 10 min to perform, yet were able to identify depression from health and predict treatment response with a significant level of precision. Psilocybin, like other psychedelics, has idiosyncratic acute effects, and the quality of the acute drug experience has been found to be strongly predictive of subsequent long-term clinical outcomes. Psilocybin is currently being studied as a treatment for a range of different psychiatric disorders, and particularly depression. As well as providing further support for the diagnostic potential of natural speech analytics, the present resultscombined with the near-to-zero application cost of this software methodssuggest that these tools offer a highly cost-effective facility for screening individuals for treatment suitability and sensitivity. Future work may test the specificity of the highlighted relationships and whether they generalize to other interventions and outcomes.

Study Details

Your Library