Post-acute psychological effects of classical serotonergic psychedelics: A systematic review and meta-analysis
This review and meta-analysis (2020) of long-term effects of psychedelics finds a large effect (Hedges' g ≈ 1) on various outcomes, but also notes various biases as issues in current research.
Authors
- Chen, Z.
- Deole, G.
- Goldberg, S. B.
Published
Abstract
“Background: Scientific interest in the therapeutic effects of classical psychedelics has increased in the past two decades. The psychological effects of these substances outside the period of acute intoxication have not been fully characterized. This study aimed to: (1) quantify the effects of psilocybin, ayahuasca, and LSD on psychological outcomes in the post-acute period; (2) test moderators of these effects; and (3) evaluate adverse effects and risk of bias.Methods: We conducted a systematic review and meta-analysis of experimental studies (single-group pre-post or randomized controlled trials) that involved administration of psilocybin, ayahuasca, or LSD to clinical or non-clinical samples and assessed psychological outcomes ≥24 hours postadministration. Effects were summarized by study design, timepoint, and outcome domain.Results: A total of 34 studies (24 unique samples, n = 549, mean longest follow-up = 55.34 weeks) were included. Classical psychedelics showed significant within-group pre-post and between-group placebo-controlled effects on a range of outcomes including targeted symptoms within psychiatric samples, negative and positive affect-related measures, social outcomes, and existential/spiritual outcomes, with large between-group effect in these domains (Hedges’ gs = 0.84 to 1.08). Moderator tests suggest some effects may be larger in clinical samples. Evidence of effects on big five personality traits and mindfulness was weak. There was no evidence of post-acute adverse effects.Conclusions: High risk of bias in several domains, heterogeneity across studies, and indications of publication bias for some models highlight the need for careful, large-scale, placebo-controlled randomized trials.”
Research Summary of 'Post-acute psychological effects of classical serotonergic psychedelics: A systematic review and meta-analysis'
Introduction
Psychedelic substances have a long history of ritual and medicinal use and re-emerged as a focus of scientific research over the past two decades after a hiatus following mid-20th century legislative restrictions. Classical serotonergic psychedelics share a common pharmacology (5-HT2A receptor agonism) and characteristic subjective effects; recent clinical and non-clinical studies have evaluated psilocybin, ayahuasca (DMT with MAO-A inhibitors), and LSD for outcomes ranging from psychiatric symptoms to well-being, spirituality and personality. Narrative reviews and a small number of meta-analyses have suggested potential therapeutic benefit and acceptable short-term safety, but a comprehensive quantitative characterisation of post-acute psychological effects across outcome domains remained lacking. Goldberg and colleagues designed this study to fill that gap. The paper reports a systematic review and meta-analysis of experimental studies (randomised controlled trials and single-group pre–post designs) that administered psilocybin, ayahuasca or LSD in controlled settings and measured psychological outcomes at least 24 hours after administration. The stated aims were to quantify post-acute effects across multiple outcome domains, test study-level moderators (psychedelic type, clinical versus non-clinical samples, presence of behavioural support, percentage female), and evaluate adverse effects and risk of bias across the literature.
Methods
The review followed PRISMA guidance and was preregistered, with some deviations specified by the authors: the analysis was restricted to post-acute outcomes (≥24 hours), moderation by specific diagnoses was avoided due to limited data, and outcomes were aggregated into coherent categories. Eligible studies administered psilocybin, ayahuasca or LSD in experimental (not naturalistic) settings, reported at least one psychological outcome assessed ≥24 hours post-administration, and provided data necessary to compute effect sizes. Both clinical and non-clinical samples and both between-group (including placebo-controlled RCTs and crossovers) and within-group pre–post designs were eligible. Studies that reported only post-treatment data without baseline or appropriate control comparisons were excluded. Six electronic databases were searched (PubMed, CINAHL, PsycINFO, Web of Science, Scopus, Cochrane) for studies from 1990 up to the search window (23–31 October 2019), supplemented by hand-searching recent reviews. Two reviewers independently screened records and extracted data; inter-rater reliability was reported as good to excellent. Extracted items included study design, psychedelic type and dose, control condition, inclusion criteria, adverse events, timing of post-treatment and follow-up assessments, presence of behavioural support (preparation/therapy), sample demographics, country, retention, and items necessary for Cochrane risk-of-bias assessment. Outcomes were grouped into 14 domains (adverse effects; targeted psychiatric symptoms; depression in samples with depression; negative affect; positive affect; social outcomes; behaviour; existential/spiritual outcomes; mindfulness; and the Big Five personality traits: openness, neuroticism, extraversion, agreeableness, conscientiousness). Risk of bias within studies was rated across five Cochrane domains (selection, performance, detection, attrition, reporting). Effect sizes were computed using standard meta-analytic approaches: within-group pre–post and pre–follow-up effects were calculated assuming a correlation of r = 0.50 between timepoints when needed; between-group effects were computed either as the difference between within-group effects or as Cohen’s d when pre–post data were not available. Summary statistics were converted to Hedges’ g to correct for small-sample bias and pooled using random-effects models. Heterogeneity was quantified with I2 (the percentage of total variability due to between-study heterogeneity). Publication bias was explored with trim-and-fill and fail-safe N calculations, although the authors treated these tests as exploratory due to small numbers in some models. Four moderators were pre-specified: psychedelic type (psilocybin versus LSD/ayahuasca), clinical sample status, presence of behavioural support, and percentage female. Sensitivity analyses excluded identified outliers using a function that flags studies whose confidence intervals do not overlap the omnibus effect CI.
Results
Study selection yielded 14,591 citations; after deduplication and screening, 34 reports representing 24 unique samples and 549 participants were retained (studies published 2006–2020). Study designs were evenly split: 50.0% single-group pre–post, 16.7% within-subject RCTs (crossovers), and 33.3% between-group RCTs. Psilocybin was the most studied agent (58.3% of studies), with ayahuasca in 25.0% and LSD in 16.7%. The average timing of the first post-test was 5.54 weeks (SD = 6.48; range 0–26 weeks); 54.2% of studies included at least one follow-up, with the last follow-up on average 53.34 weeks post-treatment (SD = 64.25; range 3–234.9). Mean sample size was 22.88 (SD = 17.42; range 6–85), mean age 42.13 years, and samples were 51.5% female. Approximately 45.8% of studies recruited clinical samples, most commonly depression (k = 4) and life‑threatening illness with comorbid anxiety/depression (k = 3). Risk-of-bias assessments varied by design, with single-group studies lacking randomisation and blinding features. Domains most at risk were blinding of participants/personnel and blinding of outcome assessment; selective reporting was often rated unclear. Concerning adverse events, 79.2% of studies reported adverse effects; none reported serious adverse events such as death or hospitalisation. Transient effects commonly reported included headache, anxiety, nausea and increased blood pressure. Where longer-term adverse measures were available, meta-analytic within-group effects suggested reductions in adverse symptoms at post-treatment and follow-up (Hedges’ g = 0.40 and 0.50, respectively)—note that a positive g was coded to indicate improvement (reduction in adverse effects). Within-group meta-analyses showed statistically significant improvements across multiple domains at both post-treatment and follow-up. Domains with consistent beneficial effects included targeted psychiatric symptoms in clinical samples, depression in depressed samples, negative affect, positive affect, social outcomes, and existential/spiritual outcomes; within-group Hedges’ g estimates ranged approximately from 0.44 (positive affect) up to 2.06 (depression in depressed samples). Behaviour and mindfulness showed improvements at post-treatment though follow-up estimates were not always available. Most Big Five personality traits did not change, with the exception of a small increase in openness. Heterogeneity tended to be substantial (I2 > 50% for many models). Between-group (placebo-controlled) analyses at the longest follow-up likewise favoured psychedelics, with moderate-to-large effects for targeted psychiatric symptoms, negative affect, positive affect, social outcomes, behaviour, and existential/spiritual outcomes; between-group Hedges’ g values reported ranged from 0.84 to 1.16. No reliable between-group effects were observed for personality dimensions. Again, heterogeneity was generally high. Across-study bias assessments detected funnel-plot asymmetry in eight models; trim-and-fill adjustments left most statistically significant findings intact, but one model (within-group pre–post social outcomes) became non-significant after adjustment (adjusted g = 0.43). Fail-safe Ns varied from 0 to 803; applying Rosenberg’s guideline (fail-safe N > 5n + 10) indicated that some effects (within-group adverse effects, social outcomes, openness, mindfulness, and between-group behaviour) were not robust to potential publication bias. Moderator analyses were constrained by limited numbers of studies for many outcomes. Clinical samples tended to show larger improvements on some outcomes (negative affect, positive affect, adverse effects, existential/spiritual outcomes, extraversion). Psychedelic type generally did not moderate effects, except psilocybin produced larger within-group pre–post increases in mindfulness. Presence of behavioural support did not meaningfully moderate effects, and percentage female was not a consistent moderator (one exception: higher female percentage associated with smaller increases in extraversion at pre‑follow‑up). Sensitivity analyses excluding outliers did not change significance and effect-size changes were small (Δg ≤ 0.26).
Discussion
Goldberg and colleagues present what they describe as the first comprehensive meta-analysis focusing on post-acute psychological effects of classical serotonergic psychedelics. Despite a modest evidence base (k = 34 reports, 24 unique samples, n = 549), the study found moderate-to-large improvements in a range of psychological domains. Most notable was a large between-group effect for targeted psychiatric symptoms in clinical samples (Hedges’ g = 1.08), which the authors contextualise as comparable to or larger than effect sizes often reported for psychotherapy versus waitlist or for antidepressants versus placebo, and robust to publication-bias adjustment and outlier exclusion. Psychedelics also showed favourable between-group effects on affective, social, behavioural, and existential/spiritual measures, while evidence for enduring changes in personality and mindfulness was weaker and more sensitive to publication-bias considerations. The authors emphasise important caveats. Risk of bias within included studies was common, especially for blinding and selective reporting, and many studies used single‑group designs without randomisation. Difficulty blinding psychoactive interventions is a prominent concern for causal attribution. Small sample sizes, heterogeneity in doses, provision of behavioural support, and other design features limited the precision and generalisability of pooled estimates. The participant pool lacked racial/ethnic diversity in many studies and selection processes may have introduced expectancy or other sampling biases. Trim-and-fill and fail-safe N analyses suggested some models were vulnerable to publication bias, and the authors note that many studies were not preregistered or did not explicitly report intention‑to‑treat analyses, leaving attrition and selective reporting as addressable sources of bias. In terms of implications, the investigators argue the results support further, larger-scale, carefully controlled randomised trials to clarify efficacy for specific clinical conditions (notably depression and anxiety) and to define the role and necessary dose of accompanying behavioural support. They recommend extended follow-up, inclusion of populations formerly excluded from trials (for safety evaluation), naturalistic and population-based designs to examine real-world safety and persistence of effects, and more consistent preregistration to reduce selective reporting. While no clear evidence of persistent adverse effects emerged in the analysed studies, the authors call for vigilant adverse-event monitoring in future work, especially when broadening inclusion criteria beyond the relatively screened populations represented in the current literature.
View full paper sections
INTRODUCTION
Humans have intentionally consumed psychoactive substances for thousands of years. Psychedelic substances, in particular, figure prominently in indigenous medical and religious practices around the world. Scientific interest during the 1950s and 1960s in the therapeutic potential of both plant-based psychedelics (e.g., psilocybin) and synthetic psychedelics (e.g., lysergic acid diethylamide) largely ceased following legislative changes during the 1970s and 1980s. Research has resumed in the past two decades. While early work in this contemporary period focused on pharmacokinetics (e.g.,or the use of psychedelics as a model for psychiatric conditions, a growing number of studies are again evaluating the therapeutic potential of psychedelics. Classical psychedelics are a class of psychoactive substances that share both mode of action (agonism of the 5-HT2A receptor; Carhart-Harris, 2019) and psychoactive effects (marked cognitive, affective, and perceptual changes). Members of this class that have received recent scientific attention include psilocybin, ayahuasca, and LSD. Psilocybin (4-phosphoroyloxy-N,N-dimethyltryptamine) is a naturally occurring plant alkaloid used ritualistically for spiritual and healing purposes by indigenous cultures in Mexico and South America. Ayahuasca is a plant-based serotonergic psychedelic also used ritualistically by indigenous cultures in South America. The psychoactive effects of ayahuasca are due to N,N-dimethyltryptamine (DMT) coupled with reversible monoamine oxidase inhibitors (MAO-A;. LSD is a synthetic psychedelic first synthesized in 1943 by Albertthat is both a serotonin and dopamine receptor agonist. Numerous studies in the 1960s investigated the therapeutic effects of LSD for the treatment of addictionand other clinical applications (e.g., end-of-life distress;. Research halted as LSD became associated with the countercultural revolution of the late 1960s coupled with concerns regarding its safety. Studies have begun reexamining the therapeutic potential of classical psychedelics for clinical conditions including depression, anxiety, and substance use. Often psychedelics are paired with behavioral interventions intended to maximize benefits by enhancing the mental "set" and physical "setting". Other studies have examined effects in nonclinical samples on measures of well-being, personality, and associated constructs (e.g., mindfulness, spirituality; MacLean,. Several systematic reviews have examined the safety and efficacy of psychedelics for both clinical and non-clinical populations. These narrative reviews consistently suggest psychedelics can be safely administered (i.e., adverse effects are minimal and transient) and may reduce depression and anxiety symptoms, provide psychological benefits in the context of life-threatening disease, and induce mystical experiences associated with enduring changes in personality and attitudes. Despite several well-conducted systematic reviews, only two quantitative reviews (i.e., meta-analyses) have characterized the efficacy of psychedelics.meta-analyzed six randomized controlled trialspublished between 1966 and 1970 testing LSD for alcoholism, finding LSD substantially reduced substance misuse (odds ratio=1.96).found that psilocybin was associated with large reductions in depression and anxiety across four recent studies (Hedges' gs=0.82 to 1.47). The available reviews suggest psychedelics may have therapeutic potential. Yet, a clear quantitative depiction of the breadth of this literature is lacking. A comprehensive meta-analysis would be valuable for characterizing the magnitude and variability (i.e., heterogeneity) of the effect of psychedelics across psychological outcomes, including but not limited to psychiatric symptoms. Such a meta-analysis would be particularly valuable for clarifying effects that have been inconsistent in prior studies (e.g., effects on personality;. The small sample size in many primary studies (e.g., mean n=29.25;) also recommends the use of meta-analysis which allows aggregation across studies. Lastly, meta-analysis offers the opportunity to examine whether various study-level features (e.g., psychedelic type, behavioral support) moderate effects. The current study sought to address this gap in the literature by quantitatively synthesizing psychological effects from experimental studies testing psilocybin, ayahuasca, or LSD. We focus on these three substances due to their shared mechanism of action (5-HT2A receptor agonism) and subjective effects. Other psychoactive compounds that produce partially overlapping effects through partially overlapping mechanisms were not considered (e.g., enactogens such as 3,4-Methylenedioxymethamphetamine;. Given our interest in therapeutic applications, we focus on effects outside of the acute period of intoxication. To provide the most comprehensive depiction, we included studies with either clinical or non-clinical (i.e., healthy) samples. Likewise, we included both between-group (e.g., RCTs) and within-group (e.g., pre-post) designs. Four study-level characteristics (psychedelic type, clinical sample, presence of behavioral support, percentage female) were examined as moderators. We also assess adverse effects and risk of bias within and between studies.
METHOD PROTOCOL AND REGISTRATION
We followed the PRISMA guidelines. This meta-analysis was preregistered through the Open Science Framework (). Upon reviewing the available studies, we made several deviations. First, we restricted our focus to post-acute effects given the acute hallucinogenic effects have been well characterized (e.g.,and are less relevant for therapeutic purposes. Second, there were insufficient studies to test moderation by specific clinical condition (e.g., depression vs. anxiety disorders). Instead, we report results restricted to clinical samples and to samples with depression. Third, no waitlist control conditions were available to compare with placebocontrolled studies. Fourth, we aggregated outcomes into conceptually coherent categories based on measures reported across studies. This led to the addition of some categories (e.g., adverse effects) and exclusion of some that were rarely reported (e.g., substance use).
ELIGIBILITY CRITERIA
Eligible studies involved the administration of psilocybin, ayahuasca, or LSD within an experimental setting (i.e., not a naturalistic settings). Studies were required to report at least one psychological outcome. We maintained a broad definition of psychological to include psychiatric symptoms as well as non-clinical measures (e.g., well-being, spirituality). However, measures primarily focused on the acute psychedelic experience itself (e.g., altered states of consciousness; Studerus, Gamma, & Vollenweider, 2010) were excluded. Outcomes were assessed outside of the period of acute intoxication, which we operationalized as ≥ 24 hours post-administration of the psychedelic, consistent with prior studies (e.g.,. Studies with and without behavioral support were eligible. Both single group (e.g., within-group pre-post) or between-group designs (e.g., placebo-controlled RCT) were eligible. Both clinical and nonclinical samples were eligible. No restriction was placed on language or publication status. Studies were excluded if they were missing data necessary for computing effect sizes. Studies that only reported post-treatment data without a baseline measurement or a relevant control group (e.g., persisting effects at post-treatment for a single-group design;were excluded. Principal investigators of completed clinical trials were contacted regarding available results.
INFORMATION SOURCES
We searched six databases including PubMed, CINAHL, PsycINFO, Web of Science, Scopus, and Cochrane. We restricted our search to studies from the contemporary period of psychedelic research (1990 or later). This window captured the period when research on classical psychedelics resumed (e.g.,conducted under sufficiently different methodological standards such that safety and efficacy data may not be interpretable. The search was conducted between October 23 rd and 31 st , 2019. In addition, we hand searched recent systematic reviews.
SEARCH
We paired search terms associated with the three psychedelics of interest (e.g., "psilocybin," "ayahuasca," "LSD," "psychedelic*") with terms related to both clinical (e.g., "mental disorders," "depression," "anx*") and non-clinical populations (e.g., "well-being," "quality of life," "healthy"). The full search terms for all six databases are shown in Supplemental Materials Table.
STUDY SELECTION
Two authors independently reviewed each title and/or abstract of potential studies for inclusion. Full texts were reviewed for studies that passed initial screening. Disagreements were discussed with the first author until consensus was reached.
DATA COLLECTION PROCESS
Standardized spreadsheets were developed for study-and effect size-level coding. The first and second authors independently extracted data. Inter-rater reliabilities were good to excellent (i.e.,.
DATA ITEMS
In addition to data necessary for computing effect sizes (e.g., sample sizes, means, standard deviations), we extracted: (1) study design, (2) psychedelic type and dose and control condition, (3) inclusion criteria, (4) adverse events, (5) post-treatment and follow-up timing, (6) behavioral support, (7) sample age and sex composition, (8) country, (9) and retention. We also extracted data necessary for coding risk of bias with the Cochrane tool. Outcomes were grouped into categories that were intended to be both parsimonious and conceptually coherent. This yielded 14 categories: adverse effects (i.e., symptoms potentially associated with negative drug effects such as psychotic symptoms or mania), targeted symptoms of psychiatric disorders (e.g., alcohol use for samples with alcohol use disorder), depression for samples with depression (as this was the most common psychiatric disorder studied), negative affect-related outcomes (e.g., negative mood, anxiety), positive affect-related outcomes (e.g., joy), social outcomes (e.g., altruism), behavior (e.g., observer-rated behavior change), existential and spiritual outcomes (e.g., death transcendence, lifetime mystical experience), mindfulness, and the big five personality traits (i.e., openness, neuroticism, extraversion, agreeableness, conscientiousness).
RISK OF BIAS IN INDIVIDUAL STUDIES
Risk of bias was evaluated using the Cochrane tool. Bias was assessed across five domains: selection bias (random sequence generation, allocation concealment), performance bias (blinding of participants and personnel), detection bias (blinding of outcome assessors), attrition bias (incomplete outcome data), and reporting bias (selective reporting). For each study, an evaluation of low, high, or unclear risk of bias was made.
SUMMARY MEASURES
Effect sizes in standardized units were calculated using standard meta-analytic methods. Specifically, a within-group pre-post and pre-follow-upd was computed for all studies providing eligible data. The pre-post effect used baseline and the first available data collected post-treatment. To provide the most conservative estimate of effects at follow-up, pre-follow-up effects used data from the last available followup. For within-group effects, we assumed a correlation of rxx=.50 between timepoints. For controlled studies, a between-group effect size was also computed. When prepost data were available for both the treatment and control conditions, within-group effects were computed for each group separately. Then, the between-group effect was computed as the difference between within-group effects (i.e.,del). This effect size has the advantage of accounting for baseline data. When within-group effects were not available (e.g., outcomes like persisting effects assessed only at post-treatment;, a between-group Cohen's d was computed. To provide the most conservative estimate of controlled effects, we used data from the last available follow-up timepoint. For randomized controlled cross-over designs in which both groups ultimately received the active treatment (e.g.,, we used data from the last timepoint prior to cross-over. For within-person RCTs that included multiple dosages (e.g.,, we compared the placebo condition with the highest dose condition. In order to decrease the influence of selective reporting bias, we attempted to represent all outcome measures that were assessed. Authors were contacted regarding measures described in the Method section but not included in the Results section. When data remained missing at the time of analysis, we represented effects described in the text as non-significant as d=0.00. Authors were also contacted when adverse effects were not mentioned in the published report.
SYNTHESIS OF RESULTS
Using standard meta-analytic methods, effects were aggregated first within measure (e.g., subscales of the Depression Anxiety and Stress Scale) and then within study using the 'MAd' packagein R. As noted previously, separate analyses examined effects for specific outcome domains. Meta-analytic effect sizes with an associated 95% confidence intervalwas computed when at least two studies were available for a specific estimate. Summary effects were converted from Cohen's d to Hedges' g in order to account for small sample bias. As appropriate, the sign for each effect was reversed so that a positive g always indicated improvement (e.g., decreased depression, increased well-being). Magnitude was interpreted based onguidelines. Separate aggregate effect size estimates were computed for within-group effects at post-treatment and follow-up and for between-group effects at last available post-treatment assessment. Heterogeneity was characterized using I 2 (i.e., proportion of heterogeneity that is between-study heterogeneity) and interpreted based onguidelines. Random effects models with weighting based on the inverse of the variance of each study's effect size was implemented through the 'metafor' package.
RISK OF BIAS ACROSS STUDIES
We assessed publication bias using trim-and-fill analyses in the 'metafor' package. When funnel plot asymmetry was detected, an adjusted effect size was computed with studies imputed to account for asymmetry. Due to the small number of studies in some analyses, which limits statistical power, these tests were considered exploratory. In addition, we calculated the fail-safe Ns to represent the number of non-significant results that would need to exist to nullify an observed effect.
ADDITIONAL ANALYSES
We tested four study-level characteristics as moderators. These included the psychedelic type (coded as 1=psilocybin, 0=LSD or ayahuasca), whether the sample was clinical (i.e., required elevated symptoms of a medical/psychiatric diagnosis for inclusion) or non-clinical (i.e., healthy controls), whether behavioral support was provided (e.g., pre-treatment preparation), and percentage female. Psilocybin was compared with LSD or ayahuasca as the majority of studies investigated psilocybin (k=14). Insufficient studies were available to adequately compare psilocybin with LSD (k=4) and ayahuasca (k=6) separately, or LSD and ayahuasca with each other. We also conducted sensitivity analyses with outliers excluded. There are several methods for identifying outliers in meta-analysis. We used the 'find.outliers' function provided bywhich defines an outlier as a study whose confidence interval does not overlap the omnibus effect confidence interval.
STUDY SELECTION
Our search produced a total of 14,591 citations. After removing 4,540 duplicates, 10,051 unique titles and/or abstracts were reviewed. After applying our exclusion criteria (Figure), we retained 34 studies representing 24 unique samples and 549 participants (see Supplemental Materials Tablefor a list of the 34 studies). Studies were published between 2006 and 2020.
STUDY CHARACTERISTICS
Study-level characteristics are reported in Table. Half of the studies used single-group pre-post designs (50.0%) with the remainder being within-group RCTs (i.e., participants received all conditions in random order; 16.7%), or between-group RCTs (33.3%). The majority of studies tested psilocybin (58.3%) with 25.0% testing ayahuasca and 16.7% testing LSD. Dosages of each psychedelic and placebo control conditions are listed in Supplemental Materials Table. Post-test assessment occurred on average at 5.54 weeks post-treatment (SD=6.48, range=0 to 26.00). Most studies (54.2%) included a follow-up assessment. For studies with a follow-up assessment, last follow-up occurred on average 53.34 weeks (SD=64.25) post-treatment (range=3 to 234.90). Retention at post-treatment was 94.5% (SD=10.0) and 85.6% (SD=16.9) at follow-up. Sample sizes were generally small, on average 22.88 participants (SD=17.42, range=6 to 85). Mean age was 42.13 years old and the samples were 51.5% female. Among the studies that reported race/ethnicity (37.5% of studies), 74.6% were non-Hispanic white or Caucasian. Studies were conducted in the US (45.8%), Europe (41.7%), and Brazil (12.5%). Approximately half of the studies (45.8%) included participants with clinical conditions. The most common clinical condition was depression (k=4). Other clinical conditions included cancer/life-threatening diseases with comorbid anxiety and/or depression (k=3), alcohol dependence (k=1), smoking (k=1), and AIDS (k=1).
RISK OF BIAS WITHIN STUDIES
Risk of bias varied, often based on whether a single-group design was used (Supplemental Materials Table). Single-group designs lacked randomization and other features (e.g., blinding) that increase confidence that effects are associated with the active treatment. Risk of bias also varied across domains (Figure). Blinding of participants and personnel and blinding of outcome assessment were the domains most at risk for bias. Selective reporting bias was commonly rated as unclear due to difficulty determining whether the reported outcomes were planned.
RESULTS OF INDIVIDUAL STUDIES
Effect size-level data are reported by study, domain, timepoint, and design in Supplemental Materials Table. The outcome measures included across studies are listed in Supplemental Materials Tablealong with their corresponding domain.
SYNTHESIS OF RESULTS
Adverse effects. Adverse effects were available for 79.2% of studies (Supplemental Materials Table). Among those reporting adverse effects, none reported serious adverse effects (e.g., death, hospitalization). Commonly reported transient adverse effects included headache, anxiety, nausea, and increased blood pressure. Several studies (29.2%) also included measures of longer-term adverse effects that could be used to quantify the magnitude of these effects (e.g., psychotic symptoms, mania, persisting negative effects; see Supplemental Materials Table). There was no evidence that psychedelics increased risk for adverse effects. In fact, within-group effects suggested decreased adverse effects at post-treatment and follow-up (gs=0.40 and 0.50, respectively; Table). As noted above, a positive effect size indicates a reduction in adverse effects. Heterogeneity was low for within-group pre-post comparisons but moderate to high for within-group pre-follow-up and between-group comparisons. Within-group effects. Psychedelics showed statistically significant within-group improvements across several outcome domains at both post-treatment and follow-up (Table, Figure). Domains showing beneficial effects included targeted symptoms within psychiatric samples, depression within samples with depression, negative affect, positive affect, social outcomes, and existential/spiritual outcomes. Associated effect sizes ranged from gs=0.44 (positive affect) to 2.06 (depression) and were fairly similar in magnitude at post-treatment and follow-up. Psychedelics showed improvements in behavior and mindfulness at post-treatment, although estimates were not available at follow-up. Psychedelics were not associated with changes in big five personality dimensions, with the exception of openness which showed a small increase. Heterogeneity was generally high (I 2 >50%). Between-group effects. Moderate to large and statistically significant between-group effects favored psychedelics relative to placebo controls across several outcome domains at longest follow-up. These included targeted symptoms within psychiatric samples, negative affect, positive affect, social outcomes, behavior, and existential/spiritual outcomes. Effect sizes ranged from gs=0.84 to 1.16. There was no evidence of between-group effects on personality. Heterogeneity was generally high (I 2 >50%).
RISK OF BIAS ACROSS STUDIES
There was evidence of funnel plot asymmetry (i.e., publication bias) in eight models (Table). Statistical significance was not impacted by this adjustment, with one exception (within-group pre-post effect on social outcomes which became non-significant, g=0.43). Fail-safe Ns ranged from 0 to 803. Based on Rosenberg's () guidelines (i.e., fail-safe N>5n + 10, where n=number of published studies), within-group effects on adverse effects, social outcomes, openness, and mindfulness as well as between-group effects on behavior were not robust against publication bias.
ADDITIONAL ANALYSES
Due to insufficient studies, not all moderators could be tested for all models (see Supplemental Materials Table). Clinical samples were associated with larger improvements for some comparisons in the domains of negative affect, positive affect, adverse effects, existential/spiritual outcomes, and extraversion. Psychedelic type did not moderate effects, with the exception of within-group pre-post effects on mindfulness for which psilocybin produced larger increases. Presence of behavioral support did not moderate effects. Percentage female did not moderate effects, with the exception within-group pre-follow-up effects on extraversion for which higher percentage female was associated with smaller increases. Models with outliers removed are reported in Supplemental Materials Table. No significance tests changed as a result of this and effect sizes were similar in magnitude (change in g≤0.26).
DISCUSSION
To our knowledge, this is the first comprehensive meta-analysis of experimental studies testing the post-acute effects of psychedelics. 1 Although based on a relatively small number of studies and participants (k=34 studies and 24 unique samples, n=549), results suggest psychedelics may produce beneficial effects. Most relevant for psychiatric samples, large and statistically significant effects were detected for targeted symptoms (g=1.08) when psychedelics were compared with placebo controls in RCTs. As points of comparison, this effect is on par or larger than that achieved by psychotherapy relative to waitlist (e.g., d=0.80;and antidepressants relative to placebo (e.g., ds=0.42 to 0.17;. Moreover, this effect appears robust to publication bias and not influenced by outliers. Psychedelics also compared favorably with placebo controls on measures related to negative and positive affect; on measures of social, behavior, and existential/spiritual outcomes; and on depression in samples with depression (although effect on behavior was not robust to fail-safe N). The superiority over placebo controls supports the possibility of specific effects, however this conclusion is necessarily uncertain given difficultly blinding psychedelics. Within-group effects were similar in magnitude and statistical significance, and support the notion that beneficial effects may persist at follow-up. Although adverse effects were not available for 20.8% studies, effects reported were transient and no serious adverse events occurred. Quantitative assessment of longer-term adverse effects similarly suggests that transient psychological effects do not typically remain elevated during the post-acute period and may even reduce in some instances. Evidence supporting the effects of psychedelics on personality and mindfulness were less compelling and less robust to test of publication bias. Due to the limited number of studies and variation across studies in design features, we were limited in our ability to test moderators. Nonetheless, it appears that some effects may be larger for clinical samples. Psychedelic type, presence of behavioral support, and percentage female generally did not moderate effects, although confounding with other design characteristics (e.g., amount of behavioral support, clinical sample) makes these null findings tenuous. It does appear that moderate to large reductions in psychiatric symptoms have been achieved in studies testing psilocybin with relatively little behavioral support (e.g., one to three sessions;. Future clinical trials and meta-analyses should clarify the requisite dosage of behavioral support. Although the most comprehensive quantitative review to date, our study remained limited in sample size and associated statistical power. Indeed, the sample available in the entire literature reviewed (n=549) is considerably smaller than that from large-scale RCTs (e.g., n=952 in Project MATCH; Project Match Research. This highlights the inherent uncertainty in conclusions drawn. An additional complication is the degree to which generalizations can be made from the individuals who chose to participate in the available experimental studies, given psychedelics remain Schedule I substances in most study locations. While selection bias may have produced inflated effect size estimates (e.g., selecting individuals most open to the possibility of change through psychedelic treatments, higher expectancy), some studies included healthy controls with previous use of psychedelics which could have created ceiling effects (i.e., therapeutic effects were achieved at baseline through prior use). A relatively modest amount of racial/ethnic diversity and a lack of reporting on sample race/ethnicity in the available studies is another important limitation that must be addressed. While we attempted to aggregate effects in conceptually coherent ways, there remained methodological heterogeneity (e.g., psychedelic dose, provision of behavioral support) that was either not modeled or tested in underpowered ways. This makes it impossible to provide recommendations regarding the specific treatment characteristics most strongly linked to beneficial effects. Similarly, although results generally did not change when accounting for publication bias, trim-and-fill analyses were also likely underpowered. A broader potentially more pernicious limitation is risk of bias within the available studies. As noted, obviously psychoactive substances may be particularly difficult to adequately double blind. However, several studies included features that may increase the strength of the placebo condition (e.g., using methylphenidate or other psychoactive agents, making specific treatment conditions and study aims ambiguous;. Two potential sources of bias that would be relatively straightforward to address are risks associated with attrition and selective reporting. None of the included studies explicitly used an intention-to-treat analysis, although this would be a straightforward way to address attrition bias. Of note, studies rated here as low on attrition bias generally had no attrition. Selective reporting could be reduced through more consistent pre-registration of study hypotheses. While several included studies were preregistered (e.g., clinicaltrials.gov), many were not, making it difficult to ascertain the degree to which the reported outcomes were specified a priori versus drawn from a larger number of unpublished outcomes (i.e., increasing risk for opportunistic bias; DeCoster, Sparks, Sparks, Sparks, & Sparks, 2015). It did not appear that any of the included studies published their hypotheses using the Open Science Framework or similar platforms (e.g., AsPredicted.Org). While perhaps unsurprising given these platforms are relatively newand some contemporary research on psychedelics has been exploratory in nature and may not have had a priori hypotheses, explicit pre-registration of study hypotheses and analysis plans could help reduce selective reporting bias and increase confidence in this body of literature. These limitations notwithstanding, the current study joins the two previous meta-analysessuggesting that psychedelics are a class of substances worthy of further exploration.Careful, large-scale, placebo-controlled RCTs are especially needed to clarify the empirical status for specific clinical conditions (e.g., depression) as well as for non-clinical applications. Particularly promising applications may include the use of psilocybin for the treatment of anxiety and depression, although ayahuasca and LSD may also prove beneficial for these indications. While based on only one study each in the contemporary period, the use of psilocybin for smoking cessation and LSD for alcohol use are also promising avenues for future exploration, given the prevalence, health burden, and recalcitrance associated with both nicotine and alcohol use disorders. Future studies could pursue the pairing of psychedelics with behavioral interventions and nonpsychotherapeutic approaches (e.g., meditation retreats;to enhance well-being and support flourishing in both clinical and non-clinical samples. However, it is crucial that future work investigating clinical and non-clinical applications of psychedelics carefully evaluate adverse effects. While we found no clear evidence of persistent adverse effects, many of the included studies excluded individuals with personal or family histories of psychiatric conditions (e.g., bipolar disorder, psychotic disorders). Future studies using alternative designs (e.g., naturalistic and population-based surveys, case reports); extending long-term follow-up to measure protracted effects and naturalistic use in trial participants; and examining safety in previously excluded samples (e.g., contraindicated family histories; personality disorder) may help clarify potential risks. Note: Behav = inclusion of behavioral support (e.g., preparation prior to psychedelic administration); N = sample size; tx = treatment; cont = control; Wkpost = week of post-treatment assessment; WkFU = week of follow-up assessment; Fem = female; Retpost = % of sample retained at post-treatment assessment; RetFU = % of sample retained at follow-up assessment; NA = not available; life-threat disease = life-threatening disease.
Full Text PDF
Study Details
- Study Typemeta
- Populationhumans
- Characteristicsmeta analysisliterature review
- Journal