LSDPlacebo

The entropic tongue: Disorganization of natural language under LSD

This placebo-controlled study (n=20) suggests that speech produced under the influence of LSD (75 μg) exhibits more entropy than normal speech. This allowed machine learning programs to identify speech produced under the influence of LSD without analyzing semantic content.

Authors

  • Enzo Tagliazucchi

Published

Consciousness and Cognition
individual Study

Abstract

Serotonergic psychedelics have been suggested to mirror certain aspects of psychosis, and, more generally, elicit a state of consciousness underpinned by increased entropy of ongoing neural activity. We investigated the hypothesis that language produced under the effects of lysergic acid diethylamide (LSD) should exhibit increased entropy and reduced semantic coherence. Computational analysis of interviews conducted at two different time points after 75 μg of intravenous LSD verified this prediction. Non-semantic analysis of speech organization revealed increased verbosity and a reduced lexicon, changes that are more similar to those observed during manic psychoses than in schizophrenia, which was confirmed by direct comparison with reference samples. Importantly, features related to language organization allowed machine learning classifiers to identify speech under LSD with accuracy comparable to that obtained by examining semantic content. These results constitute a quantitative and objective characterization of disorganized natural speech as a landmark feature of the psychedelic state.

Unlocked with Blossom Pro

Research Summary of 'The entropic tongue: Disorganization of natural language under LSD'

Introduction

Sanz and colleagues situate their study within two related lines of inquiry: first, that classic serotonergic psychedelics (acting at 5-HT2A receptors) profoundly alter consciousness and have historically been considered “psychotomimetic”; and second, the entropic brain hypothesis, which proposes that psychedelics raise the entropy or unpredictability of ongoing neural activity. Earlier work reported that psychedelics render speech less predictable and enhance free association, but contemporary, quantitative analyses of natural, spontaneous speech under psychedelics have been scarce. This study set out to test the central hypothesis that speech produced during the acute effects of lysergic acid diethylamide (LSD) would show increased markers of disorganization relative to placebo. To do so, the investigators applied automated natural language processing and graph-theoretical methods to transcribed interview material, using Shannon information entropy, Word2Vec-based semantic coherence measures, a novel embedding-rank (compressibility) metric, speech-graph topology, a previously defined disorganization index, and machine-learning classifiers to distinguish LSD from placebo speech. The aim was both to quantify speech disorganisation in the psychedelic state and to compare its profile with speech patterns seen in psychotic disorders.

Methods

Design and participants: Twenty healthy volunteers attended two experimental sessions at least two weeks apart and received 75 μg intravenous LSD and saline placebo in random order. Neuroimaging sessions (fMRI at approximately 120–150 minutes post-infusion, labelled Time 1; and MEG at ≈225 minutes, Time 2) were followed by open-ended interviews asking participants to report feelings and spontaneous thoughts during scanning. One participant withdrew, so their data were excluded. Speech preprocessing: Only participant speech was analysed. Transcripts were converted to lower case, punctuation and stop words removed, tokenized, part-of-speech tagged, and lemmatised using NLTK and WordNet. Words shorter than three letters and items not in the WordNet dictionary (≈3.6% of unique words, mostly vocalisations) were excluded. Counts of major parts of speech were compared across conditions without significant differences. Semantic analyses: A pre-trained Word2Vec embedding (Google News; 300 dimensions) mapped each unique word to a vector. Semantic coherence was operationalised as the temporal variance of cosine distances between consecutive word vectors (semantic variability): higher variance indicates larger semantic jumps and lower coherence. Following prior work, the mean cosine similarity between all words in an interview and a set of ten pre-defined terms (“visual”, “pattern”, “relax”, “listen”, “mood”, “stimulate”, “normal”, “ego”, “fear”, “reality”) provided ten semantic-content features. Words absent from the Word2Vec vocabulary were discarded for these analyses. The authors also computed Shannon’s information entropy of each interview (sum over -pi log2 pi where pi is the relative frequency of each word) as a measure of unpredictability. Embedding-rank (compressibility): For each transcript, the sequence of word vectors produced an N × 300 matrix. The approximate matrix rank (via singular value decomposition with a 0.5 threshold) quantified the minimum dimensionality needed to represent the transcript without substantial information loss; lower rank indicates greater redundancy or compressibility. Graph-based non-semantic analysis: Speech graphs were constructed with unique lemmatised words as nodes and directed edges for consecutive occurrences. Metrics extracted included Average Total Degree (ATD), Largest Strongly Connected Component (LSC), Average Clustering Coefficient (CC), density, diameter and recurrence measures (L1–L3, PE, RE), among others. To control for verbosity effects, each interview’s words were randomly shuffled 1000 times and graph metrics normalised by the average of these random graphs. Features invariant under shuffling were excluded from classifier inputs. Disorganization index and clinical comparison: The study applied a previously published disorganization index (DI = 93.91 - 3.08 × E + 0.21 × LSC, with E = average edges and LSC computed on 30-word sliding windows) to compare LSD and placebo speech with reference samples comprising 20 individuals with schizophrenia, 20 with bipolar disorder and 20 controls. Interviews shorter than 30 words were excluded, yielding reduced sample counts (time 1: LSD n=17, PCB n=17; time 2: LSD n=15, PCB n=3 — the latter excluded from analysis due to small size). Machine learning and statistics: A 5-fold cross-validated random forest (1000 trees) classified LSD vs placebo using either the ten semantic-content features or the normalised non-semantic graph metrics, with performance assessed by area under the ROC curve (AUC). Statistical comparisons employed two-tailed Wilcoxon signed-rank tests with Bonferroni correction where indicated. Classifier significance was evaluated by repeating training/testing 1000 times with and without label shuffling to derive empirical p-values. Sample size justification referenced prior related studies and power calculations indicating n≈20 per group as adequate with α = 0.05 and β = 0.2.

Results

Corpus and comparisons: Analyses contrasted four groups (LSD/PCB at Time 1 and Time 2), with most statistical comparisons performed separately at each time point. Word frequency and semantic-content: Word-cloud comparisons showed that, under LSD, participants more frequently used perception-related terms (e.g. “music”, “hear”, “visual”, “pattern”), while placebo interviews featured words associated with relaxation and low vigilance (e.g. “relax”, “asleep”, “calm”). When computing mean semantic similarity to the ten pre-defined terms, placebo yielded higher similarity to “mood” at both Time 1 (Z = 3.14, p = 0.0017) and Time 2 (Z = 2.89, p = 0.0038) after Bonferroni correction. LSD increased similarity to “reality” at Time 1 (Z = 2.17, p = 0.03) and to “ego” and “fear” at Time 2 (Z = 2.17, p = 0.03 and Z = 2.74, p = 0.0062 respectively), but these latter effects did not survive multiple-comparison correction. Speech graph metrics: Representative speech-graph visualisations showed clearer network differences between LSD and placebo. Quantitatively, LSD was associated with increased verbosity (higher word count) but a reduced lexicon (smaller diameter, lower ATD per word and fewer nodes per word). Recurrence metrics (L1, L2, L3 and PE) increased under LSD at Time 1, while global connectivity parameters (Average Shortest Path, Diameter, LSC and Density) were lower. Boxplots indicated significant group differences for clustering coefficient, density, LSC, L3, RE and word count; these patterns suggest more cyclic repetitions and fewer ordered sequences of unique terms during the peak LSD effect. Disorganization index and clinical comparison: Applying the speech-graph disorganization index, the LSD condition differed significantly from the schizophrenia group at both Time 1 (Z = 3.43, p = 0.0006) and Time 2 (Z = 3.89, p = 0.0001). By contrast, LSD did not differ significantly from the bipolar disorder group (Time 1 Z = 0.56, p = 0.5729; Time 2 Z = 1.77, p = 0.0772) nor from matched controls (Time 1 Z = 1.75, p = 0.0797; Time 2 Z = 0.48, p = 0.6366). Placebo comparisons showed similar distinctions from schizophrenia. Entropy, semantic variability and rank: Shannon information entropy was higher under LSD than placebo at both Time 1 and Time 2, indicating increased unpredictability of word usage. Semantic variability (variance of consecutive-word cosine distances) and the embedding-matrix rank were higher under LSD at Time 2, consistent with larger semantic jumps and reduced compressibility of discourse in the later stage examined. Machine learning classification: The random forest classifier achieved comparable performance using semantic-content versus non-semantic graph features at Time 1. For semantic features the mean AUC was 0.7570 ± 0.0003 versus 0.507 ± 0.004 with label shuffling (empirical p = 0.015). For non-semantic features the mean AUC was 0.7493 ± 0.0002 versus 0.495 ± 0.004 with shuffling (p = 0.015). These results indicate that both feature sets discriminated LSD from placebo above chance.

Discussion

Sanz and colleagues interpret their findings as evidence that the acute LSD state is characterised by less predictable, more disorganized spontaneous speech. Semantic analyses revealed that LSD shifted content toward perceptual terms and produced higher entropy and greater semantic variability, especially at the later time point. Graph-based metrics showed a simultaneous increase in verbosity and a reduction in lexicon, with greater local recurrence and reduced global connectedness, a pattern the authors consider more similar to speech observed in manic psychoses than that typical of schizophrenia. The investigators frame these results in relation to the entropic brain hypothesis: increased entropy at the neural level should, they argue, be mirrored in behaviour and subjective reports, here operationalised as speech unpredictability and reduced semantic coherence. They note that semantic and non-semantic features were roughly equally informative for classifying LSD vs placebo, suggesting complementary perspectives on discourse organisation. Key limitations acknowledged include the constrained speech task (open-ended reports of spontaneous thought rather than performance on a broader battery of language tasks), modest sample sizes and reduced numbers for some comparisons (notably Time 2 placebo), and the heterogeneity inherent in psychiatric diagnostic categories which complicates direct mapping between drug-induced states and clinical disorders. The authors also highlight that they did not test correlations between neural entropy measures and speech-derived entropy in this dataset, so direct brain–behaviour coupling remains to be established. Looking ahead, the study team recommends larger samples, dose–response studies, comparisons with other drugs and non-drug altered states, and integration of computational linguistic measures with neuroimaging and biological profiling. They suggest that automated analysis of natural language is a practical, objective, and potentially valuable tool for screening, monitoring and predicting outcomes in psychedelic research and therapy, while noting practical constraints on collecting speech during high-dose therapeutic sessions.

Conclusion

The authors conclude that natural speech becomes measurably more disorganized during the acute effects of LSD, and that this disorganisation more closely resembles patterns seen in manic psychoses than in schizophrenia. Both semantic-content and non-semantic graph features can distinguish LSD from placebo with similar effectiveness. They propose that these findings are consistent with a framework in which 5-HT2A receptor agonists increase entropy in brain activity and associated thought processes. Future work should expand sample sizes, explore other doses and compounds, include non-drug altered states, and pair language analytics with neuroimaging to further elucidate the relationship between neural entropy, subjective experience, and therapeutic mechanisms.

Study Details

Your Library