Depressive DisordersPsilocybin

Cognitive outcomes following psilocybin-assisted therapy in treatment-resistant depression: A post-hoc analysis of a randomized, waitlist-controlled trial

This secondary analysis (n=26) of adults with treatment-resistant depression from an open-label psilocybin-assisted psychotherapy trial found statistically significant improvements in processing speed and executive function at two weeks post-treatment, with gains on Trail Making Tests remaining significant after adjusting for depressive symptoms; however, reliable change indices showed that the proportion of participants achieving meaningful improvement (4.2–12.5%) did not exceed chance expectations, suggesting observed gains may reflect practice effects rather than genuine procognitive benefits.

Authors

  • Johnson, D. E.
  • Meshkat, S.
  • Kaczmarek, E. S.

Published

Progress in Neuro-Psychopharmacology and Biological Psychiatry
individual Study
Unlocked with Blossom Pro

Research Summary of 'Cognitive outcomes following psilocybin-assisted therapy in treatment-resistant depression: A post-hoc analysis of a randomized, waitlist-controlled trial'

Introduction

Depression, including unipolar and bipolar forms, frequently involves persistent cognitive difficulties across domains such as attention, processing speed, executive function, language, and episodic memory. These cognitive impairments are common in treatment-resistant depression (TRD), are associated with poorer functioning and quality of life, and may reflect disrupted neuroplasticity in fronto-limbic circuits (for example medial prefrontal cortex, anterior cingulate cortex and hippocampus). Conventional antidepressants often fail to remediate cognitive symptoms, prompting interest in treatments that more directly engage neuroplastic mechanisms. Psilocybin is a serotonergic 5‑HT2A agonist proposed to act as a ‘‘psychoplastogen’’ that can rapidly induce neuroplastic changes in cortical regions implicated in executive function and processing speed. Clinical evidence for psilocybin’s cognitive effects in depression is limited. E. and colleagues therefore conducted a retrospective post‑hoc analysis of data from an investigator‑initiated randomized, waitlist‑controlled, open‑label trial to evaluate whether a single 25 mg dose of psilocybin administered with brief psychotherapeutic support was associated with changes in processing speed and executive function. The primary cognitive measures were the Digit Symbol Substitution Test (DSST) and Trail Making Tests A and B (TMT‑A/B), assessed at baseline, one day and two weeks post‑treatment. The researchers hypothesised that performance would improve at two weeks and that mood improvements would be associated with but not fully account for any cognitive changes; they also planned reliable change index (RCI) analyses to estimate whether observed gains exceeded expected practice effects.

Methods

This analysis used data from a randomized, waitlist‑controlled, open‑label clinical trial run at a community clinic in Mississauga, Ontario. The parent trial enrolled adults aged 18–75 with a primary diagnosis of major depressive disorder (MDD) or bipolar II disorder (BD‑II) experiencing a major depressive episode ≥3 months and meeting criteria for treatment resistance (nonresponse to ≥2 adequate antidepressant trials). Participants discontinued antidepressants, antipsychotics, ketamine/esketamine and augmenting agents for ≥5 half‑lives prior to screening; BD‑II participants could continue mood stabilisers at clinical discretion. Participants were randomised to immediate treatment or a two‑week delayed (waitlist) condition. Although the parent trial allowed up to three psilocybin doses over six months, the present post‑hoc analysis focused exclusively on cognitive change following the first 25 mg oral dose of synthetic psilocybin given in a supportive setting. Preparatory and integration psychotherapy sessions were provided (one preparatory session and two integration sessions, each 1–2 hours), and therapist dyads offered non‑specific support during the 6–8 hour dosing session. Because the parent trial was not powered to detect between‑group efficacy differences, the immediate and waitlist arms were pooled for a single open‑label cohort for this analysis. Of the 30 participants who received at least one 25 mg dose, three lacked baseline cognitive data and one withdrew before follow‑up; 26 participants had baseline cognitive assessments and 25 were retained in the longitudinal models (one participant had incomplete post‑dose data but was retained under maximum likelihood estimation). The RCI analyses used a further reduced sample depending on availability of paired pre/post data. Cognitive outcomes were the DSST (processing speed, sustained attention and some executive demands) and TMT‑A/B (visual attention/psychomotor speed and set‑shifting/executive function; TMTB‑A difference score was used to isolate executive demands from motor/processing speed). Tests were administered at baseline, one‑day and two‑weeks post‑dose; waitlist arm assessments were time‑shifted and harmonised to these labels for pooled analysis. An administration error gave participants 90 s on the DSST rather than the normative 120 s; baseline raw DSST scores were prorated to 120‑s equivalents only for z‑score normative comparisons (unadjusted raw scores were used in mixed models). Baseline performance was converted to age‑adjusted z‑scores using published norms and impairment was defined as ≤1 SD below the normative mean. Statistical analyses comprised separate linear mixed models (LMMs) for each cognitive outcome using raw scores, with participant random intercepts and categorical timepoint (pre‑dose, one‑day, two‑weeks) as the main fixed effect. Models adjusted for age, sex and treatment group (immediate vs waitlist), and included a timepoint-by‑group interaction to test differential change by assignment. Two complementary LMMs were run per outcome: Model 1 estimated total change over time; Model 2 added time‑varying MADRS total score to estimate change independent of concurrent depressive symptom change (MADRS is a clinician‑rated depression scale). Model diagnostics informed a log transformation of the TMTB‑A difference (ln[TMTB‑A + 1]) to improve residual normality; other outcomes were analysed untransformed. Missing data were handled via maximum likelihood under an assumption of missing at random. To assess clinically meaningful individual change, RCIs were computed using the Iverson method, incorporating both a clinical reference (the waitlist participants who completed pre‑ and two‑week pre‑psilocybin tests) and published normative test‑retest parameters. An RCI threshold of ±1.645 (90% CI) classified reliable improvement or decline; one‑tailed binomial tests assessed whether observed proportions exceeded chance (5%). One‑sample t‑tests on continuous RCI scores were also conducted.

Results

Sample characteristics and baseline performance: Twenty‑six participants completed pre‑dose cognitive testing (mean age 43.1 years, SD 12.7; 65% male; 73% white). Most had MDD (n = 22, 85%) and 4 had BD‑II. Education was high (92% ≥ some college/university). Baseline depressive severity averaged MADRS 27.8 (SD 7.3) and mean duration of illness was 17.1 years; treatment resistance was pronounced (mean 11.7 failed medication trials). On age‑adjusted norms a subset showed impairment (≤1 SD below mean): DSST n = 7 (26.9%), TMT‑A n = 4 (15.4%), TMT‑B n = 4 (15.4%), TMTB‑A n = 3 (11.5%). Linear mixed model findings: In unadjusted LMMs (Model 1) there was a significant main effect of timepoint for all cognitive outcomes. DSST scores improved from baseline to one‑day (mean difference = 4.72, 95% CI [1.96, 7.49], p < .001) and baseline to two‑weeks (mean difference = 4.26, 95% CI [0.27, 8.25], p = .034), with no significant change between one‑day and two‑weeks. TMT‑A completion times decreased (improved) from baseline to one‑day (mean difference = −5.45 s, 95% CI [−9.25, −1.66], p = .003) and to two‑weeks (mean difference = −6.00 s, 95% CI [−9.42, −2.58], p < .001). TMT‑B showed a timepoint effect with significant improvement by two weeks (baseline to two‑weeks mean difference = −12.27 s, p < .001) and one‑day to two‑weeks improvement (mean difference = −8.80 s, p = .039); baseline to one‑day was not significant. The log‑transformed TMTB‑A difference score improved significantly from baseline to two‑weeks (mean difference ≈ −0.33 to −0.35 on the log scale, p between .005 and .022 across models). Treatment group and age were generally non‑significant predictors; sex effects favoured females on some TMT metrics in unadjusted models. When MADRS was included as a time‑varying covariate (Model 2) results varied by test. The DSST timepoint effect was attenuated and no longer statistically significant (F(2, 27.36) = 2.95, p = .069); higher MADRS scores were associated with worse DSST performance (β = −0.17, p = .048), suggesting DSST gains were partly mood‑related. In contrast, improvements on TMT‑A, TMT‑B and the TMTB‑A difference remained significant after adjusting for MADRS, and MADRS itself did not significantly predict these TMT outcomes, supporting a degree of mood‑independent change on measures of processing speed/executive function. Sex effects on TMT‑A were attenuated after adjusting for MADRS, but sex remained a significant predictor for TMT‑B and TMTB‑A in some models. Reliable change analyses: Using the clinical waitlist reference (pre‑psilocybin retest values), reliable improvement (RCI > 1.645) was observed in 2/24 participants (8.3%) on the DSST, 2/24 (8.3%) on the TMT‑A, and 3/24 (12.5%) on the TMT‑B. None of these proportions significantly exceeded the 5% expected by chance (binomial p‑values .116–.339). Group‑level one‑sample t‑tests on continuous RCI scores showed no significant change for the DSST (t(23) = −0.291, p = .387), a trend for TMT‑A (t(23) = 1.703, p = .051), and a significant group‑level improvement for TMT‑B (t(23) = 2.224, p = .018, d = 0.454). Using normative reference estimates produced similar results: reliable improvement was seen in 3 participants (12%) on the DSST, 2 (8.3%) on TMT‑A, and 1 (4.2%) on TMT‑B, with none exceeding chance and no significant group‑level effects. Reliable decline (RCI < −1.645) occurred for 2/24 participants (8.3%) on the DSST under both reference frameworks, but this proportion did not exceed chance; no reliable decline was observed on TMT‑A or TMT‑B. Overall, the researchers report modest, short‑term group‑level improvements on processing speed and executive function measures following a single 25 mg psilocybin dose; some TMT changes remained significant after controlling for depressive symptom change, whereas DSST gains appeared partly explained by mood improvement. However, only a minority of individuals showed RCI‑defined reliable improvement, and proportions did not significantly exceed chance.

Discussion

The authors interpret their findings as preliminary evidence that psilocybin‑assisted psychotherapy (PAP) may be associated with modest, short‑term improvements in tasks tapping processing speed and executive function in people with treatment‑resistant depression. Improvements on DSST and TMT‑A appeared as early as one day and persisted to two weeks, while TMT‑B and the executive index (TMTB‑A) reached significance at two weeks. The attenuation of DSST effects when controlling for MADRS suggests that improvements on that test were at least partly mediated by concurrent mood change, whereas TMT findings remained significant after adjustment, indicating the possibility of mood‑independent cognitive effects. The findings are positioned relative to the sparse prior literature: they broadly align with few previous trials that reported modest or domain‑specific cognitive changes following psilocybin, and extend them by including an earlier (one‑day) assessment. The absence of evidence for short‑term cognitive deterioration is noted as supporting the short‑term cognitive safety of psilocybin and has practical implications for post‑dose monitoring and activity restrictions. The authors acknowledge several important limitations that constrain interpretation. Chief among these are the small sample size, lack of placebo control and blinding, limited cognitive battery (only DSST and TMTs), short follow‑up (two weeks), practice effects from repeated testing with identical forms, and a DSST administration error requiring prorating from 90 s to 120 s for normative comparisons. These factors raise the possibility that at least part of the observed improvements reflect practice or measurement variability rather than true procognitive effects. The sample’s demographic homogeneity (predominantly white and highly educated) and predominance of MDD over BD‑II further limit generalisability. The authors also note unequal test exposure across randomised groups and that several potentially relevant covariates (age at depression onset, inflammatory markers, metabolic comorbidities, detailed treatment history, childhood adversity) were not controlled for. They further discuss the inability of the design to disentangle the contributions of psychotherapeutic support versus psilocybin itself. For future work the authors recommend adequately powered randomized controlled trials with larger, more diverse samples, expanded cognitive batteries covering additional domains, longer follow‑up to assess durability, inclusion of patient‑reported outcome measures, pre‑registration of cognitive endpoints, and incorporation of neurobiological measures (for example fMRI, EEG) and analytic approaches (for example mediation or latent growth models) to clarify mechanisms and whether cognitive change is independent from, contributes to, or results from mood and functional recovery. They emphasise that cognition should be assessed as an independent treatment domain in psychedelic research rather than solely as a secondary mood‑related endpoint.

Conclusion

In this exploratory post‑hoc analysis E. and colleagues report that a single 25 mg psilocybin dose given within a brief psychotherapeutic framework was associated with modest, short‑term improvements on measures of executive function and processing speed in people with treatment‑resistant depression. However, given the small sample size, absence of a placebo control, short follow‑up and potential practice effects (including a DSST administration error), the authors conclude these results should be interpreted cautiously. The observed changes may reflect practice‑related gains or other non‑specific influences rather than a direct procognitive effect of psilocybin, and controlled trials with more comprehensive designs are required to determine whether cognition is a meaningful and distinct therapeutic target of psilocybin‑assisted therapy.

View full paper sections

STUDY DESIGN AND PARTICIPANTS

This retrospective, post-hoc analysis draws on data from an investigator-initiated psilocybin clinical trial. This was a randomized, waitlist-controlled, open-label clinical trial conducted at Braxia Health (formerly the Canadian Rapid Treatment Center of Excellence), a community-based clinic in Mississauga, Ontario, Canada. The primary objective was to evaluate the feasibility of a brief four-session model of PAP for TRD and TRBD with broader eligibility (e. g., including both MDD and BD-II, personality disorders, presence of suicidality, no upper limit of failed treatment trials), in addition to evaluating the feasibility, efficacy, and safety of single and repeated doses of psilocybin. The trial protocol was approved by an independent Institutional Review Board (Advarra; Pro00056530; BCDF001) and registered on ClinicalTrials.gov (NCT05029466). Verbal and written consent were obtained from all study participants prior to conducting any study-related activities. As detailed inand summarized in Supplementary Table, eligible participants were adults aged 18 to 75 years with a primary diagnosis of MDD or BD-II, currently experiencing a major depressive episode (MDE) of at least three months in duration. All participants met criteria for TRD, defined as nonresponse to at least two guideline-concordant pharmacological treatments at an adequate dose and duration during the current MDE. There was no upper limit on the number of prior failed treatments. To avoid the confounding effects and potential interactions of concurrent antidepressant use, interested participants were required to discontinue all antidepressants, antipsychotics, ketamine, esketamine or any augmenting medications for a minimum of five half-lives before screening and for the duration of the study. Based on clinical discretion, BD-II participants could continue conventional mood stabilizers (e.g., lithium, lamotrigine, valproate) as recommended by their prescribing physician. Participants were randomized to either the immediate treatment arm or the two-week delayed-treatment (waitlist) control condition. Regardless of group assignment, all participants (n = 30) received at least one 25 mg dose of synthetic psilocybin (powder obtained from Usona Institute and dissolved in water) administered in a supportive non-clinical environment. Based on safety, tolerability, response, and relapse, as determined by the clinician's judgment and available clinical measures, participants may have been eligible for up to two additional doses (three doses total) over the six-month study duration. However, the present analysis focused exclusively on cognitive changes following the first 25 mg dose of psilocybin. Each dosing session was preceded by one preparatory psychotherapy session (1-2h) focused on psychoeducation and intention setting and followed by two integration sessions (1-2 h each) focused on reflection and meaning-making. In addition, therapist dyads provided nonspecific psychological support throughout the 6-to 8-h dosing session. The total non-dosing therapeutic contact time was approximately 4.5 h. Given the smaller sample size and the parent trial's primary focus on feasibility rather than efficacy, this post-hoc analysis was not powered to detect between-group differences. Accordingly, data from both the immediate and waitlist arms were pooled to form a single open-label cohort to evaluate the effects of one 25 mg dose of psilocybin on cognitive functioning at one-day and two-weeks post-treatment. Participants who lacked baseline cognitive data (n = 3) or withdrew before follow-up cognitive assessments (n = 1) were excluded. Of the remaining 26 participants, one individual completed cognitive assessments during the waitlist period but did not complete any assessments following psilocybin administration. This participant's baseline demographic and cognitive data were retained for descriptive analyses and for estimating practice effects in the RCI analysis but were excluded from all other post-treatment analyses. Another participant completed the baseline and one-day post-psilocybin assessments, but was missing the two-week follow-up; their data were retained for the linear mixed model (LMM), but were excluded from the RCI analysis. Although monthly follow-up assessments were conducted for up to six months, these timepoints were excluded from the present analyses due to substantial missing data. For similar reasons, we did not assess the effects of repeated psilocybin doses.

DEMOGRAPHIC AND CLINICAL VARIABLES

At baseline, we collected self-reported demographic information from all participants, including age, sex, race/ethnicity, and highest level of educational attainment. Clinical variables were also obtained at study entry, including total illness duration and prior psychiatric treatment history. Depressive symptom severity was assessed using the Montgomery-Åsberg Depression Rating Scale (MADRS). MADRS assessments were conducted at four timepoints: baseline (morning of psilocybin administration), one-day, one-week, and twoweeks following psilocybin administration. The total MADRS score served as the primary clinical outcome for tracking changes in global depressive symptoms over time, with higher scores indicating greater severity. Other outcome measures beyond the scope of this secondary analysis were also collected. For feasibility, tolerability, and preliminary efficacy results from this trial, refer to.

COGNITIVE ASSESSMENTS

Cognitive functioning was evaluated using two standardized paperand-pencil tests: the DSST and the TMT-A/-B. The DSST assesses processing speed, sustained attention, and executive function. Participants are provided with a key that pairs numbers with specific symbols. Below the key are rows of numbers with blank spaces underneath. Participants are asked to fill in the corresponding symbol for each number as quickly as possible within 90 s. The outcome of interest is the total number of correct symbols completed, with higher total scores indicating better performance. TMT-A assesses visual attention and psychomotor speed. On the test, participants are required to connect 25 numbered circles in ascending order as quickly as possible. TMT-B assesses executive function abilities, such as cognitive flexibility and set-shifting, and requires participants to alternate between numbers and letters as quickly as possible. Completion time (in seconds) served as the outcome of interest for TMT-A and TMT-B. To better isolate executive function abilities from motor/processing speed, we computed a difference score (TMTB -A), as done previously. This difference score helps control for individual variability in processing speed. A smaller difference score suggests less difficulty managing the additional cognitive demands of TMT-B, and thus reflects more efficient executive functioning. As illustrated in Fig., cognitive assessments were administered at multiple timepoints in both the immediate and waitlist arms. The immediate treatment arm completed the DSST and TMT at Day 0 (prior to psilocybin ingestion), Day 1 (one-day post-psilocybin), and Day 14 (twoweeks post-psilocybin). The waitlist arm completed the tests at Day 0 (start of waitlist period), Day 14 (end of waitlist period, pre-psilocybin ingestion), Day 15 (one-day post-psilocybin), and Day 28 (two-weeks post-psilocybin). To harmonize data across groups, the Day 14 prepsilocybin assessment in the waitlist arm was redefined as the "baseline," equivalent to Day 0 in the immediate arm. Accordingly, Day 15 and Day 28 in the waitlist arm were re-labelled as Day 1 and Day 14, respectively. Throughout the manuscript, these harmonized labels are used to enable pooled analyses.

BASELINE COGNITIVE IMPAIRMENT

To evaluate the degree of cognitive impairment at the pre-dose assessment (Day 0 or Day 14), raw scores from the DSST and TMT were converted into age-adjusted z-scores for each participant (n = 26) using published normative datasets. For the DSST, normative data were obtained from the Wechsler Adult Intelligence Scale-Third Edition (WAIS-III) Administration and Scoring Manual. Due to an administration error, participants were given 90 s to complete the task instead of the 120 s specified in the WAIS-III manual. To account for this, raw scores were prorated to their 120-s equivalents by multiplying the total correct responses by a factor of 1.33 (i.e., 120 ÷ 90). Each participant's prorated raw score and age were then used to determine the corresponding WAIS-III scaled score, which was subsequently converted to a z-score. This prorating step was only applied for the baseline analysis; unadjusted raw scores were used in the LMM. For the TMT-A and TMT-B, normative data were drawn from Tombaugh (), which provides comprehensive age-and educationstratified norms based on a sample of 911 community-dwelling adults (18-89 years). We restricted our normative comparisons to the 12+ education subgroup across all age groups because all participants in our study had ≥12 years of education. For the derived TMTB -A score, normative means were calculated by subtracting the age-and education-adjusted mean for the TMT-A from the TMT-B, using data from the same normative sample. The SD of the TMTB -A score was estimated using the Variance Sum Law, incorporating the reported correlation between the TMT-A and TMT-B (r = 0.74). This approach allowed us to base all TMT-related z-scores on a single dataset, avoiding discrepancies introduced by mixing normative sources. For the DSST, lower raw scores reflect greater impairment, so negative z-scores directly indicate worse performance. For the TMT-A, TMT-B and TMTB -A, where longer completion times denote worse performance, z-scores were multiplied by -1 to maintain consistency across all cognitive measures so that when comparing baseline cognitive functioning, more negative z-scores represent greater impairment. A threshold of 1 SD below the normative mean (z ≤ -1.0) was used to define cognitive impairment. This criterion was selected to enhance sensitivity to more subtle deficits in our highly educated sample.

LINEAR MIXED MODELS

To evaluate changes in cognitive performance following psilocybin treatment, we conducted separate LMMs for each cognitive outcome using their raw unadjusted scores (i.e., DSST total correct symbols, TMT-A completion time, TMT-B completion time, and TMTB -A difference score). Each model included a random intercept for participant ID to account for repeated observations and timepoint as a categorical fixed effect with three levels: pre-dose, one-day post-dose, and two-weeks post-dose. The models also adjusted for age, sex, and treatment group (immediate vs. waitlist), with the latter included to account for unequal test exposure across groups. Relatedly, we also included an interaction term between timepoint and treatment group to examine whether cognitive changes over time differed by group assignment. Pairwise comparisons were conducted with Bonferroni-adjusted p-values to control for Type I error, and 95 % confidence intervals were reported for all estimated mean differences. To capture both total and mood-independent effects, two complementary LMMs were performed for each cognitive outcome: Model 1 (Total effect): examined overall change in cognitive performance over time without covarying for mood. Model 2 (Mood-independent effect): included MADRS total score as a time-varying covariate to estimate cognitive change independent of concurrent mood improvement. Model assumptions were examined using residual-vs-fitted plots for linearity and homoscedasticity, and Q-Q plots and histograms for residual normality. The residuals from the TMTB -A LMM demonstrated non-normality and heteroscedasticity, as evidenced by visual inspection of residual plots and significant Shapiro-Wilk test results (p < .001). A natural logarithmic transformation [ln(TMTB -A + 1)] was applied, which improved residual normality and homoscedasticity. Both LMMs with the TMTB -A were therefore conducted using the log-transformed values. Other outcomes retained their raw form due to approximate normality and the robustness of LMMs to moderate violations of normality assumptions. Model fit was evaluated using Akaike Information Criterion (AIC) values, which led to the selection of an unstructured covariance matrix. Missing data were handled using maximum likelihood estimation under the assumption that data were missing at random (MAR), allowing all participants (n = 25), including one with incomplete follow-up data, to be retained in the analyses.

RELIABLE CHANGE ANALYSIS

To determine whether changes in cognitive performance following psilocybin treatment exceeded what would be expected from measurement error and practice effects, we calculated RCIs for the DSST, TMT-A, and TMT-B. We used the Iverson method to compute RCI values, which incorporates the SD of the first and second timepoints to account for increased variability in follow-up scores. We derived expected change scores (accounting for practice effects) and standard errors of the difference (SED) from two reference samples to capture both clinical and normative expectations. The clinical reference values were obtained from the 14 participants in the waitlist condition who completed cognitive testing at baseline (Day 0) and again two weeks later (Day 14), prior to receiving psilocybin. Prorated mean change scores and SDs from this group were used to model expected practice effects in a treatment-resistant sample. Due to the small sample size, we used published test-retest reliability coefficients rather than estimating them internally: for the DSST, the average coefficient across age groups from the WAIS-III Technical Manual (r = 0.84); for the TMT-A and TMT-B, coefficients from(r = 0.658 and r = 0.769, respectively). The normative reference values were derived from the literature. For the DSST, normative mean change scores, SDs, and test-retest reliability (r = 0.84) were obtained from the WAIS-III Technical Manual, which reports scaled scores based on a 120-s administration. We prorated each participant's 90-s raw score to its 120-s equivalent before converting to an age-adjusted scaled score to compute RCI values. For the TMT-A and TMT-B, normative mean change scores and SDs over a two-to-three-week retest interval were obtained from, a longitudinal study of practice effects in healthy adults. Testretest reliabilities were again drawn from. After computing individual RCI values, we evaluated the proportion of participants who exceeded the RCI threshold of ±1.645, corresponding to a 90 % confidence interval. Participants with RCI scores >1.645 were classified as showing reliable improvement, and those with RCI scores < -1.645 as showing reliable decline. To assess whether the observed proportion of participants showing reliable improvement significantly exceeded what would be expected by chance alone (i.e., 5 %), we conducted one-tailed binomial tests. These tests were repeated separately for each cognitive test and reference dataset (waitlist-based and normative). In addition to the categorical classification of reliable change, we conducted one-sample t-tests on the continuous RCI scores to evaluate whether the average magnitude of cognitive change exceeded zero (no change) across the full sample.

SAMPLE CHARACTERISTICS

Demographic and clinical characteristics of the 26 participants who had completed pre-dose cognitive assessments are presented in Table. Briefly, the mean age (SD) of the sample was 43.1 years (12.7), and the majority identified as male (65 %) and white (73 %). Most participants had a primary diagnosis of MDD (n = 22; 85 %), with the remaining diagnosed with BD-II (n = 4; 15 %). Educational attainment was relatively high, with 92 % having completed at least some college or university education. Clinically, participants reported significant and persistent depressive symptoms with a mean baseline MADRS score of 27.8 (7.3) and an average illness duration 17.1 years (11.2). Treatment resistance was pronounced, with a mean of 11.7 (5.9) failed medication trials. SI at baseline was mild on average, with a mean score of 1.7 (1.5) on the MADRS suicidality item.

PRE-TREATMENT COGNITIVE PERFORMANCE

At baseline, participants' performance on the DSST, TMT-A, TMT-B, and TMTB -A was generally within age-adjusted normative expectations. However, a notable subset demonstrated cognitive weaknesses or impairments (i.e., scores ≥1 SD below the normative mean) on the DSST (n = 7; 26.9 %), TMT-A (n = 4; 15.4 %), TMT-B (n = 4; 15.4 %), and TMTB -A (n = 3; 11.5 %) (Fig.). Of the 10 participants who showed impairment on at least one measure, six were impaired on a single task (four on the DSST, two on TMT-A), one was impaired on both the TMT-B and TMTB -A, one was impaired on all tasks except the TMT-A, one was impaired on all tasks except the TMTB -A, and one demonstrated impairment across all four cognitive tasks.

DSST

A significant main effect of timepoint was observed on DSST performance (F(2, 22.70) = 11.14, p < .001) (Fig.). Pairwise comparisons revealed significant improvements from baseline to one-day postdose (mean difference = 4.72, 95 % CI [1.96, 7.49], p < .001) and twoweeks post-dose (mean difference = 4.26, 95 % CI [0.27, 8.25], p = .034). No significant difference emerged between the one-day and twoweek follow-ups (mean difference = -0.46, 95 % CI, p = 1.000), suggesting that gains were sustained. No significant main effects were observed for treatment group (F(1, 24.45) = 0.63, p = .436), age (F (1, 21.97) = 0.38, p = .545), or sex (F(1, 22.17) = 0.66, p = .426). The timepoint-by-group interaction was also non-significant (F(2, 22.70) = 2.33, p = .120).

TMT-A

There was a significant main effect of timepoint on TMT-A performance (F(2, 22.99) = 10.31, p < .001), with reduced mean completion times from baseline to one-day (mean difference = -5.45, 95 % CI [-9.25, -1.66], p = .003) and two-weeks post-dose (mean difference = -6.0, 95 % CI [-9.42, -2.58], p < .001) (Fig.). There was no significant change between the one-day and two-week post-dose assessments (p = 1.000). A significant main effect of sex emerged (F(1, 20.67) = 5.69, p = .027), with females performing faster (β = -6.51, 95 % CI [-12.20 to -0.83]). Treatment group (F(1, 22.86) = 0.22, p = .647), age (F(1, 20.18) = 0.85, p = .367), and the timepoint-by-group interaction (F(2, 22.99) = 2.02, p = .155) were not significant.

TMT-B

There was a significant main effect of timepoint on TMT-B

TMTB -A

There was a significant main effect of timepoint on log-transformed TMTB -A scores (F(2, 22.48) = 6.495, p = .006), and pairwise comparisons revealed that scores improved significantly from baseline to two-weeks post-treatment (mean difference = -0.35, 95 % CI [-0.60, -0.10], p = .005) (Fig.). No significant differences were observed from baseline to one-day (mean difference = -0.05, 95 % CI [-0.38, 0.28], p = 1.00) or from one-day to two-weeks post-treatment (mean difference = -0.30, 95 % CI [-0.66, 0.07], p = .135). A significant main effect of sex was found (F(1, 22.09) = 8.214, p = .009), with females performing better (β = -0.38, 95 % CI [-0.65 to -0.10]). Treatment group (F(1, 22.48) = 0.04, p = .853), age (F(1, 21.40) = 0.04, p = .837), and the timepoint-by-group interaction (F(2, 22.47) = 1.08, p = .356) were not significant.

DSST

With MADRS scores included as a time-varying covariate, in contrast to the original model, the main effect of timepoint was no longer significant (F(2, 27.36) = 2.95, p = .069) (Fig.). Pairwise comparisons revealed no significant difference between baseline to one-day postdose (mean difference = 3.10, 95 % CI [-0.24, 6.44], p = .076) and twoweeks post-dose (mean difference = 2.74, 95 % CI [-1.61, 7.09], p = .359). There was also no significant change between the one-day and two-week post-dose assessments (p = 1.000). Notably, MADRS scores significantly predicted DSST performance (F(1, 31.47) = 4.22, p = .048), with higher depressive symptoms associated with poorer performance (β = -0.17, 95 % CI: -0.35 to -0.001). A significant timepoint-bygroup interaction also emerged (F(

TMT-A

As seen in the original model, there was a significant main effect of timepoint (F(2, 24.28) = 11.27, p < .001), indicating overall improvement in TMT-A performance over time, with reduced mean completion times from baseline to one-day (mean difference = -6.51, 95 % CI], p = .002) and two-weeks post-dose (mean difference = -7.0, 95 % CI, p < .001) (Fig.). There was no significant change between the one-day and two-week post-dose assessments (p = 1.000). MADRS scores were not significantly associated with TMT-A performance (F(1, 36.58) = 1.86, p = .181). Notably, the main effect of sex was attenuated and no longer significant (F(1, 24.09) = 3.09, p = .091), but all other effects remained consistent such that no significant effects were observed for treatment group (F(1, 24.10) = 0.05, p = .832), age (F(1, 19.26) = 0.67, p = .423), or timepoint-bygroup (F(2, 22.66) = 2.45, p = .109).

TMT-B

Consistent with the original model, there was a significant main effect of timepoint (F(2, 25.89) = 11.76, p < .001) and sex (F(1, 25.33) = 7.60, p = .011, β = -14.95, 95 % CI) on TMT-B performance (Fig.). Performance improved from baseline to twoweeks post-treatment (mean difference = -12.27, 95 % CI, p < .001) and from one-day to two-weeks post-treatment (mean difference = -8.80, 95 % CI], p = .039). The change from baseline to one-day post-treatment was not significant (mean difference = -3.47, 95 % CI], p = 1.000). Importantly, MADRS score itself was not a significant predictor of TMT-B performance (F(1, 28.841) = 0.651, p = .426) and all other covariates and interactions remained non-significant, including treatment group (F (1, 25.52) = 0.17, p = .680), age (F(1, 20.49) = 0.24, p = .626), and timepoint-by-group (F(2, 23.82) = 1.52, p = .238).

TMTB -A

Despite the inclusion of MADRS score as a covariate, the effect of timepoint (F(2, 25.71) = 4.81, p = .017) and sex (F(1, 25.23) = 7.30, p = .012, β = -0.39, 95 % CI [-0.69 to -0.09]) remained significant, while MADRS itself was not a significant predictor of log-transformed TMTB -A scores (F(1, 38.93) = 0.06, p = .815) (Fig.). Pairwise comparisons revealed that scores improved significantly from baseline to two-weeks post-treatment (mean difference = -0.33, 95 % CI [-0.63, -0.04], p = .022). No significant differences were observed from baseline to one-day (mean difference = -0.03, 95 % CI, p = 1.000) or from one-day to two-weeks post-treatment (mean difference = -0.30, 95 % CI [-0.66, 0.06], p = .131). The inclusion of MADRS also did not meaningfully alter the model or affect the significance of other predictors, including treatment group (F(1, 24.06) = 0.06, p = .806), age (F(1, 20.29) = 0.04, p = .841), and the timepoint-by-group interaction (F(2, 22.50) = 0.95, p = .403).

CLINICAL REFERENCE GROUP

All individual RCI values used in these analyses are provided in Supplementary Table. When using the waitlist control sample to model expected change, two participants (8.3 %) on the DSST (Supplementary Fig.), two participants (8.3 %) on the TMT-A (Supplementary Fig.), and three participants (12.5 %) on the TMT-B (Supplementary Fig.) met the reliable improvement threshold. Although each of these individuals exceeded the RCI threshold, the overall proportion did not significantly exceed the 5 % expected by chance based on binomial testing (p = .339; p = .339; p = .116, respectively). One-sample t-tests on the continuous RCI scores further supported these patterns. No significant change was observed on the DSST (t(23) = -0.291, p = .387, d = -0.059), while the TMT-A showed a nonsignificant trend toward improvement (t(23) = 1.703, p = .051, d = 0.348). The TMT-B demonstrated significant group-level improvement (t(23) = 2.224, p = .018, d = 0.454), aligning with the higher proportion of individuals exceeding the RCI threshold on this measure.

NORMATIVE REFERENCE GROUP

Using normative practice effects and variability estimates, reliable improvement was observed in three participants (12 %) for the DSST, two participants (8.3 %) for the TMT-A, and in one participant (4.2 %) for the TMT-B. None of these proportions significantly exceeded the 5 % expected by chance (p = .116; p = .339; p = .708, respectively). Group- level analyses of continuous RCI scores mirrored these findings: DSST (t (23) = 0.263, p = .397, d = 0.054), TMT-A (t(23) = 0.899, p = .189, d = 0.18), and TMT-B (t(23) = 1.310, p = .102, d = 0.27) all showed small, non-significant effects.

RELIABLE DECLINE

To assess potential cognitive worsening, we applied the inverse threshold (RCI < -1.645). Using the clinical reference group and normative reference group, 2 out of 24 participants (8.3 %) demonstrated reliable decline on the DSST. In both cases, the proportion of participants showing decline did not significantly exceed the 5 % expected by chance based on binomial testing (p = .339). No participants showed reliable decline on the TMT-A or TMT-B under either reference framework.

PSILOCYBIN-ASSOCIATED COGNITIVE IMPROVEMENTS

This study investigated cognitive outcomes following PAP in individuals with TRD and TRBD. Consistent with our hypothesis, we observed statistically significant improvements across all cognitive measures over time. Improvements on the DSST and TMT-A emerged as early as one-day post-treatment and were sustained through the twoweek follow-up. In contrast, improvements on the TMT-B and TMTB -A reached significance only at the two-week timepoint. For the DSST, improvements were attenuated after adjusting for concurrent reductions in depressive symptoms, suggesting that observed gains may be partially attributable to mood improvements. However, changes in the TMT-A, TMT-B, and TMTB -A remained significant after controlling for depression severity, raising the possibility of moodindependent procognitive effects with PAP. Despite these encouraging group-level effects, reliable change analyses indicated that a minority of participants exhibited improvements that exceeded thresholds for clinically meaningful change. Moreover, the proportion of participants who demonstrated reliable improvement did not significantly exceed what would be expected by chance across any cognitive domain. These findings suggest that while PAP may be associated with modest improvements in cognitive performance, such effects were not consistently observed across individuals and may reflect practice effects or measurement variability rather than robust treatment-related change.

RELATIONSHIP TO PRIOR RESEARCH ON PSILOCYBIN AND COGNITION

Our findings both align with and extend the limited existing literature on the cognitive effects of psilocybin. To date, only two studies have systematically evaluated cognitive outcomes following psilocybin administration in patients with depression. In an open-label trial of 24 individuals with MDD,reported significant improvements in cognitive flexibility, as measured by the Penn Conditional Exclusion Test (PCET), persisting up to four weeks post-treatment following two psilocybin sessions. Notably, these cognitive gains were not correlated with antidepressant response. In the largest randomized controlled psilocybin trial to date (N = 233),observed small, non-significant improvements in DSST performance three weeks after a single psilocybin dose, with a least squares mean difference of 1.5 points (95 % CI: -0.8 to 3.8) between the 25 mg and 1 mg groups. Although key methodological differences may account for discrepancies in the magnitude, timing, and domains of cognitive change observed, our two-week results align with these prior findings in several respects. Similar to, we observed modest improvements in DSST performance that did not reach statistical significance after adjusting for mood, nor did they meet criteria for clinically meaningful change. Furthermore, echoing, we found that cognitive gains were largely dissociable from antidepressant response, reinforcing the possibility that psilocybin's effects on cognition may occur independently from mood improvements. However, this dissociation could not be examined in thetrial, which did not report associations between cognitive and mood outcomes. Our study also adds to this literature by incorporating an earlier, oneday post-treatment assessment. At both the one-day and two-week timepoints, we found no evidence of cognitive deterioration, supporting the short-term safety and tolerability of psilocybin from a cognitive standpoint. The one-day post-treatment findings carry important practical implications for clinical implementation, particularly regarding post-dose monitoring protocols and discharge instructions (e.g., limitations on driving, operating heavy machinery). Taken together, these findings suggest that psilocybin does not impair cognitive functioning in the short term and may be associated with subtle improvements in performance. However, given the lack of a placebo control and the small sample size, these results should not be interpreted as evidence of procognitive effects. The durability, clinical relevance, and mechanistic basis of these apparent changes remain uncertain and warrant investigation in adequately powered randomized controlled trials.

STRENGTHS AND LIMITATIONS

Our study has several methodological strengths that advance the emerging literature on psilocybin's effects on cognitive functioning in TRD. First, our cognitive battery, while brief, was broader than those employed in previous psilocybin trials. Specifically, the inclusion of the TMT-A, TMT-B, and the derived executive function index (TMTB -A) allowed for a more nuanced examination of cognitive domains beyond processing speed. Second, we applied a rigorous analytic framework that included LMMs, RCIs, and exploratory regression analyses (see Supplementary Material). This multimodal approach allowed us to evaluate change at both group and individual levels, distinguish true cognitive effects from measurement error and practice effects, and test for associations with antidepressant and antisuicidal outcomes (see Supplementary Material). Third, by modelling reliable change using both normative data and a clinically matched waitlist sample, we enhanced the interpretability of the results. Specifically, normative benchmarks enabled comparison to general population standards, while the clinical reference provided a contextually informed estimate of expected change in TRD. Discrepancies between these models underscore how reference group selection influences conclusions about clinical significance. Despite these strengths, several limitations must be acknowledged. Our small sample size limited statistical power, increased the risk of Type I and Type II errors, and restricted the generalizability of our findings. Our sample was also predominantly white, highly educated, and composed mainly of individuals with a primary diagnosis of MDD, limiting applicability to more diverse clinical populations or individuals with bipolar depression. We also lacked a placebo control group and blinding, which limited our ability to isolate treatment effects from expectancy or other non-specific factors. Cognition was only evaluated using two tests: DSST and TMT, which assess processing speed and executive functioning. Several important areas, including sustained attention, other aspects of executive function, language, episodic memory, visuospatial abilities, and social cognition, were not evaluated. In addition, the follow-up period was limited to two weeks, preventing conclusions about the durability or long-term impact of cognitive changes post-PAP. The short retest intervals (one-day and two-weeks post-treatment) further limit interpretability, as brief retest intervals can exaggerate practice effects. This limitation is particularly salient given that identical test forms were administered at each timepoint, and repeated exposure to the same stimuli likely amplified learning-related gains. Consequently, a substantial portion of the observed cognitive improvements may reflect practice effects rather than true treatment-related change. An administrative error in the DSST administration further limits interpretation, as scores were prorated to adjust for a shorter administration time (90 s instead of 120 s). While this approach allowed for standardization, it assumes a linear rate of symbol substitution over time, which may not accurately capture variations in response speed or fatigue. Consequently, the prorated scores introduce uncertainty regarding the validity of DSST performance estimates, particularly when referencing normative data or computing reliable change indices. This adjustment may have inflated or attenuated apparent improvements and should therefore be interpreted with caution. Moreover, while we included validated, clinician-administered assessments of cognition and mood in this analysis, we did not include patient-reported outcomes measures (PROMs). Clinician-rated scales, while useful for standardized evaluation, may not fully capture the lived experiences or priorities of participants and restrict interpretation of whether observed changes in cognition or mood were experienced as personally valuable or clinically meaningful by participants. Although we attempted to account for practice effects in our models, the unequal number of cognitive assessments between groups (three in the immediate group vs. four in the waitlist group) may have introduced differential learning effects that could not be fully separated from treatment-related change. A further limitation concerns the use of the pre-dose assessment as the cognitive baseline for the waitlist group. Because this was not a true first exposure to the tests, practice effects may have artificially inflated performance scores. Therefore, we may have underestimated the proportion of individuals who were cognitively impaired at baseline. Finally, we did not formally control for several factors thought to influence cognitive functioning in depression, including age of depression onset, educational attainment, MDD subtype, inflammatory status, metabolic comorbidities, treatment history, and childhood adversity.

IMPLICATIONS AND FUTURE DIRECTIONS

These preliminary findings underscore the need for future randomized controlled trials with larger samples, expanded cognitive batteries, and longer-term follow-up. Larger samples would allow for examination of potential moderators such as sex, diagnosis (e.g., MDD vs. BD-II), and physical health factors (e.g., metabolic comorbidities), while extended follow-ups are necessary to determine whether early cognitive changes translate into lasting functional improvements. The underlying mechanisms of psilocybin-related cognitive change also remain unclear. Improvements may reflect direct neuroplastic effects, such as enhanced prefrontal or hippocampal functioning, increased dendritic spine density, upregulation of BDNF, or altered default mode network connectivity. Alternatively, improvements may emerge indirectly through modulation of emotion-related ("hot") cognitive processes. For instance, reductions in rumination, increased psychological flexibility, and enhanced emotional regulation may reduce cognitive load or attentional interference, thereby facilitating improvements in "cold" cognitive domains such as processing speed and executive functioning. Future studies that incorporate neurobiological markers (e.g., functional magnetic resonance imaging, electroencephalography) alongside cognitive assessments could help clarify the mechanisms linking psilocybin to cognitive outcomes. Another key avenue for future research involves clarifying the contribution of psychotherapy to the observed cognitive effects. Given that we administered psilocybin within a structured psychotherapeutic framework, it remains unclear whether improvements in cognitive performance were attributable to psilocybin itself, the therapeutic support provided, or the synergistic interaction between the two. While this question cannot be resolved within the current design, existing evidence suggests that psychotherapy alone may have a limited impact on objective cognitive functioning in mood disorders. For example, several randomized controlled trials, including those by,, have failed to demonstrate meaningful improvements across a wide range of neuropsychological domains following evidence-based psychotherapies such as cognitive behavioural therapy, schema therapy, or mindfulness-based cognitive therapy. Although some signal has emerged for improvements in emotional processing or executive functioning, these findings are inconsistent and often limited by small sample sizes, heterogeneous test batteries, and methodological variability. Thus, while psychotherapy likely plays an important role in patient safety, emotional processing, and integration of the psychedelic experience, the contribution of psychotherapy alone to cognitive change remains uncertain. Finally, the observed dissociation between cognitive and mood outcomes suggests these effects may follow partially distinct trajectories, underscoring the importance of assessing cognition as an independent treatment domain in psychedelic research rather than viewing it solely as a secondary endpoint tied to mood response. Although cognitive improvements did not reliably predict antidepressant or antisuicidal outcomes in our sample (see Supplementary Material), they may still hold clinical relevance, particularly if they occur independently of mood change and contribute to functional recovery. Incorporating validated PROMs will be essential for capturing patient perspectives on treatment efficacy, thereby enhancing the interpretability and clinical utility of objective findings. Future trials should pre-register cognitive endpoints and apply analytic approaches such as mediation models or latent growth curve modelling to formally test directional hypotheses regarding the role of cognitive functioning in treatment response. These methods will help clarify whether cognitive improvements contribute to, result from, or occur independently of changes in mood and functioning.

CONCLUSION

This exploratory post hoc analysis offers preliminary evidence that PAP may be associated with modest, short-term improvements in performance on tasks of executive function and processing speed among individuals with TRD. However, given the small sample size and absence of a control group, these findings should be interpreted with caution. The observed changes may reflect practice-related gains or other nonspecific factors rather than a direct procognitive effect. Nonetheless, they underscore the importance of incorporating cognitive measures into future controlled trials to better understand whether cognition represents a meaningful and distinct therapeutic target of PAP.

ETHICAL STATEMENT

A community Institutional Review Board (Advarra) approved this trial (Pro00056530; BCDF001). Both written and verbal informed consent were obtained from all participants prior to initiating any studyrelated procedures. This trial was registered on ClinicalTrials.gov (NCT05029466) prior to starting recruitment.

Your Library