Language Models Learn Sentiment and Substance from 11,000 Psychoactive Experiences
Using BERT on 11,816 public drug testimonials, the authors extracted 28-dimensional sentiment and biochemical/demographic signals and—via canonical correlation analysis—linked testimonial language to receptor‑binding profiles, revealing 11 latent receptor–experience factors that were mapped onto a 3D cortical atlas. The models converged on a lucid–mundane experiential axis, linked specific drugs (e.g. MDMA with “Love”, DMT/5‑MeO‑DMT with “Mystical Experiences”) and propose that real‑time biofeedback could help steer therapeutic psychedelic sessions.
Authors
- Ballentine, G.
- Friedman, S. F.
Published
Abstract
Abstract With novel hallucinogens poised to enter psychiatry, we lack a unified framework for quantifying which changes in consciousness are optimal for treatment. Using transformers (i.e. BERT) and 11,816 publicly-available drug testimonials, we first predicted 28-dimensions of sentiment across each narrative, validated with psychiatrist annotations. Secondly, BERT was trained to predict biochemical and demographic information from testimonials. Thirdly, canonical correlation analysis (CCA) linked 52 drugs’ receptor affinities with testimonial word usage, revealing 11 latent receptor-experience factors, mapped to a 3D cortical atlas. Together, these 3 machine learning methods elucidate a neurobiologically-informed, temporally-sensitive portrait of drug-induced subjective experiences. Different models’ results converged, revealing a pervasive distinction between lucid and mundane phenomena. MDMA was linked to “Love”, DMT and 5-MeO-DMT to “Mystical Experiences”, and other tryptamines to “Surprise”, “Curiosity” and “Realization”. Applying these models to real-time biofeedback, practitioners could harness them to guide the course of therapeutic sessions.
Research Summary of 'Language Models Learn Sentiment and Substance from 11,000 Psychoactive Experiences'
Introduction
The paper situates itself in the challenge of quantifying the rich, temporally unfolding subjective states induced by psychoactive substances, noting that these subjective qualities correlate with clinical outcomes in psychedelic-assisted therapy. The authors argue conventional questionnaires compress a vast experiential space into a few researcher-selected scalar measures and therefore miss temporal dynamics (for example peak-end and primacy effects) that may condition therapeutic benefit. They propose that natural language contained in patient or user testimonials, combined with modern natural language processing (NLP), can provide multi-dimensional, temporally sensitive measures of subjective experience and thereby help link phenomenology to neurobiology. G. and colleagues set out to build and compare three machine learning approaches that together produce neurobiologically informed, temporally resolved portraits of drug-induced experiences. Using a large corpus of 11,816 Erowid testimonials, pretrained transformer encoders fine-tuned in supervised and transfer-learning paradigms were used to recover multi-dimensional sentiment trajectories and other metadata, while canonical correlation analysis (CCA) linked testimonial word-usage to receptor affinity fingerprints for 52 drugs. The stated aim is to demonstrate that these complementary methods converge on meaningful experiential structure that can be anchored to pharmacology and mapped to cortical gene-expression, thereby suggesting routes toward data-driven neurofeedback and personalised therapeutic guidance.
Methods
Data sources comprised 11,816 Erowid user testimonials, receptor affinity data at up to 61 receptor subtypes for 52 drugs (primarily from the Psychoactive Drug Screening Program, supplemented by Rickli et al. for eight phenethylamines), RNA gene-expression measures for 200 brain regions from the Allen Brain Atlas, and 58,000 Reddit posts annotated with 28 emotions (the GoEmotions dataset). Chemical and pharmacologic class labels were taken from Psychonaut Wiki. Text preprocessing removed occurrences of drug names (scientific, common, colloquial and misspellings) to avoid trivial cues, tokenised the texts and capped model inputs at 512 tokens. A sliding-window inference scheme was applied to longer testimonials to build trajectories from contiguous blocks; windows smaller than the testimonial were split and each block processed, while windows larger than the text were padded. Dynamic Time Warping (DTW) was used to compare averaged sentiment trajectories between drugs and classes. Two transformer-based models were fine-tuned. Both used a base bidirectional transformer (BERT) encoder with approximately 109 million parameters; pooled outputs fed through dropout and task-specific dense heads. BERTowid is a multitask, multi-label supervised model trained directly on Erowid excerpts to predict categorical labels (drug identity, chemical and pharmacologic classes, metadata tags, self-reported gender) and regression targets (age, receptor affinities, and CCA component weights). Classification losses minimised cross-entropy, regressions minimised mean squared error; due to sensitivity when optimising mixed losses, the authors trained and serialized separate multitask models for classification and regression. Hyperparameters included AdamW optimisation, learning rate 1e-5, batch size 32 and 16 epochs; a typical run with window size 64 words required ~3 hours on an NVIDIA V-100 GPU. Imbalanced labels were explored with weighted cross-entropy but with mixed effects on precision. BERTiment is a transfer-learning model: the same base BERT encoder was fine-tuned on the GoEmotions Reddit corpus to predict 28 binary emotion labels (training split 70-20-10). Hyperparameters matched BERTowid except dropout was set to 0.5; training ran for 16 epochs (~2 hours on a V-100). BERTiment produced temporally resolved emotion predictions when applied to testimonial window series. Canonical Correlation Analysis (CCA) provided an independent, linear linkage between bag-of-words representations of whole testimonials and receptor-affinity fingerprints. The scikit-learn CCA implementation was used to extract paired components (word weightings and receptor weightings). Receptor-component weights were projected to cortex using Allen Brain Atlas RNA expression quantities. The authors contrasted the CCA (linear, bag-of-words, whole-text) with the transformer models (nonlinear, positionally encoded, excerpt-based) and compared cross-model consistency.
Results
The analysis used 11,816 Erowid testimonials and produced convergent results across three analytical approaches. BERTowid achieved substantive discrimination across multiple pharmacological granularities: 52 drugs, 22 chemical types, 10 pharmacologic classes, 30 receptor subtypes and 11 CCA-derived components. Some semantic metadata tags were particularly learnable (for example "Medical Use", "Mystical Experiences", "Alone", "Addiction Habituation") with ROC AUCs ranging from 0.88 to 0.95 for the better-performing tags. Model confusions tended to mirror biochemical and pharmacological relationships (e.g. phenethylamines vs tryptamines are more likely to be misclassified for one another than for benzodiazepines or opioids). BERTowid also produced gender classification with ROC AUC 0.85 and age regression with Pearson correlation 0.56 despite missingness and class imbalance in self-reported demographics. The ability to recover sensitive features was noted as a step towards debiasing. BERTiment generalized emotion detection to the testimonial domain: emotion ROC AUCs on unseen data ranged from 0.72 for "Realization" to 0.97 for "Love". A hedonic-tone classifier trained on IMDB reviews provided a complementary axis that sorted emotions by valence: e.g. "Admiration", "Pride", "Love" correlated positively; "Annoyance", "Disgust" correlated negatively. Clinical validation consisted of a psychiatrist manually adjudicating 393 emotions from 256 Erowid excerpts; human labels were within the model's top 10 predicted emotions 87% of the time, top 5 73% and top 1 42%, reported to be within inter-human variability from the original GoEmotions resource. Temporal sentiment trajectories revealed pharmacologically meaningful patterns via DTW. MDMA and the entactogen MDA showed high and increasing trajectories for "Love", distinguishing them from other drugs. Stimulants (cocaine, amphetamine, methamphetamine) exhibited gradually rising "Sadness" trajectories, while antidepressants (paroxetine, venlafaxine, sertraline) started higher on sadness but declined over the testimonial. Tryptamines (notably DMT and 5-MeO-DMT) showed elevated "Surprise", "Curiosity" and "Realization" relative to phenethylamines, which leaned more to "Admiration", "Excitement" and "Gratitude". Salvinorin A (a diterpenoid) shared high rankings on "Surprise", "Curiosity", "Realization" and also ranked highly on "Mystical Experience", paralleling short-acting tryptamines. Opioids were lower on curiosity-related emotions but higher on "Relief". Across drugs, "Neutral" sentiment typically decreased over the course of a testimonial while "Optimism" rose sharply near the end, resembling a peak-end effect. CCA uncovered 11 statistically significant components that linked word-usage to receptor-affinity profiles. The dominant component (CCA 0) contrasted a pole of abstract, transcendental and perceptual language (terms such as "reality", "universe", "peak", visuals, music) associated with DMT, LSD and psilocin and serotonergic affinities (5-HT2A, 5-HT1A, 5-HT2C) with expression concentrated in medial prefrontal cortex, against a pole of mundane suffering words ("depression", "pain", "addiction", "work") associated with stimulants and oxycodone and affinities at MOR, DOR, NET and DAT with expression in posterior cingulate and inferior parietal lobule. CCA-derived components often produced spatially contiguous cortical maps and bilateral symmetry when receptor weights were projected using Allen Brain Atlas gene-expression. BERTowid could predict CCA component weights from testimonial excerpts with Pearson correlations that declined with component order (for example 0.68 for CCA 0 down to 0.24 for CCA 11), indicating components explaining more variance were easier to learn from text. Overall, the three methods (BERTowid, BERTiment and CCA) converged on a broad dichotomy between "mystical/expansive" experiential language tied to serotonergic psychedelics and a "mundane/suffering" pole tied to stimulants and certain psychiatric medications. MDMA in particular occupied a uniquely positive affective niche across models, aligning with clinical interest in its therapeutic potential.
Discussion
G. and colleagues interpret their findings as evidence that combining linear CCA and nonlinear transformer models can produce unified, biochemically anchored and temporally sensitive representations of psychoactive experiences from retrospective text. They emphasise that these language-derived representations capture clinically relevant qualities — the intensity of mystical experience, the depth of joy, anger and grief — and that temporally resolved trajectories reflect known pharmacological distinctions and anecdotal phenomenology reported by psychonauts and researchers. The authors situate their results alongside prior NLP analyses of Erowid, noting agreement on associations such as antidepressants being linked to negative affect (which they partly attribute to ascertainment bias). They also highlight that their transformer-based construction of detailed temporal trajectories is a novel contribution beyond earlier bag-of-words or topic modelling approaches. The CCA findings, in particular the dominant component contrasting lucid, transcendental phenomena with somatic suffering and addiction-related themes, are presented as reinforcing the transformer-derived emotion landscapes. Limitations acknowledged by the authors include inherent noise and biases in crowd-sourced retrospective reports (uncertainty about dosage, chronology, impurities and effects on memory), and the interpretive limits of receptor affinity data (affinity does not indicate functional effects such as agonism versus antagonism or biased signalling). They note ascertainment bias in the dataset may confound associations between drug classes and affective language. For future research and application, the authors propose combining these linguistic trajectories with concurrent high-temporal-resolution modalities (EEG, ECG, fMRI) to build cross-modal representations of acute drug states and to enable real-time neurofeedback. They suggest such combined representations could aid personalised, responsive psychoactive sessions in clinical practice and that transformer models’ zero-shot capabilities might be harnessed to monitor and guide emotional dynamics during therapy. The authors present these applications as plausible next steps while recognising that prospective clinical outcome data and cross-modal validation would be required to establish therapeutic utility.
View full paper sections
RESULTS:
We amassed a corpus of 11,816 psychoactive experiences from Erowid, which we semantically and chemically characterize with two BERT-based models, and one Canonical Correlation Analysis (CCA) see Fig.. The supervised model, BERTowid, is trained using multi-task, multi-label, classification and regression directly on Erowid testimonials and associated metadata. BERTowid is trained to "read" a 512 token excerpt from the testimonial and predict the associated drug, its chemical and pharmacological class, self-reported gender and age, 52 metadata tags, 11 canonical correlation component weightings, and 30 receptor affinities. Tableshows the taxonomy and testimonials counts. The transfer-learning model, BERTiment, is trained to detect 28 sentiments simultaneously on a corpus from Reddit. It then makes inferences on Erowid revealing sentimental trajectories which we show agree with psychiatrist adjudications, validate expected emotional associations and are consistent with pharmacological groupings. Both models generalize to unseen data, demonstrating how machine learning on crowd-sourced, noisy semantic data can lead to diverse biochemical inferences. Note for instance in Fig.how the entactogens MDA and MDMA, the opioids and the antidepressants all track together.
B) AND INDIVIDUAL DRUG LEVEL (C), SEE TABLE 1 FOR THE FULL DRUG TAXONOMY. NOTE THAT FOR CLARITY WE HAVE SELECTED ONLY 12 REPRESENTATIVE DRUGS OF THE 52 INCLUDED, SEE OTHER FIGURES FOR COMPARISONS INVOLVING ALL DRUGS. RIGHT: BERTOWID TRAJECTORIES FOR EACH OF THE 3 DIFFERENT LEVELS OF DRUG CLASSIFICATION (FROM THE LEFT PANEL) ON METADATA TAGS "MYSTICAL EXPERIENCES" (D), "ADDICTION HABITUATION" (E), "DEPRESSION" (F), AND "RAVE DANCE EVENT" (G). NOTE THE CONCORDANCE BETWEEN THE ENTACTOGENS MDA AND MDMA AND THE ANTIDEPRESSANTS SERTRALINE AND VENLAFAXINE.
This point is reinforced by our findings from CCA which identified a latent structure of 11 statistically-significant components mapping between the semantic data and the receptor affinity profiles in a self-supervised fashion. CCA is a linear model and relies on a bag-of-words representation of the entirety of the testimonial text, while the transformers are deep nonlinear neural networks which positionally-encode a subset of text excerpted from the testimonials. Despite the large differences in representation and model, BERTowid learns to infer the CCA weightings, while many BERTiment emotion-scapes reveal similar drug rankings as given by CCA 0, see Fig..
BERTOWID
There is noise inherent in any crowd-sourced, open dataset like Erowid, which includes reports from many illegal substances rife with potential impurities and misrepresentations. Nonetheless, BERTowid shows powerful discrimination at several different granularities of pharmacology, classifying amongst 52 drugs, 22 ligand chemical types, and 10 pharmacologic classes, 30 receptor subtypes, and 11 CCA weights. Tableshows the taxonomies and testimonial counts. Model mistakes are consistent with expected biochemical and pharmacological groupings. For example, the psychedelic chemical classes of phenethylamines and tryptamines are much more likely to be mistaken for each other, than for a benzodiazepine, antidepressant, or opioid (see confusion matrices in Supplementary Fig.). Semantic tags are also learnable, with some of the best-performing being "Medical Use", "Mystical Experiences", "Alone" and "Addiction Habituation", with areas under the receiver operating characteristic curves (ROC AUC) ranging from 0.88 to 0.95, areas under the precision-recall and ROC curves for all tags are shown in Supplementary Fig.. Confirming its reputation as the "spirit molecule", DMT displayed heightened trajectories for the tag "Mystical Experiences" and, even more dramatically, for the tag "Entities and Beings", echoing themes uncovered in manual DMT-specific analyses. As expected, the "Depression" tag trajectory highlights antidepressants, while the "Addiction Habituation" tag is consistently elevated for the stimulants cocaine and methamphetamine, see Supplementary Fig.. Some testimonials include self-reported ageand gender) with which we trained gender-classifying (ROC AUC 0.85) and age-regressing (Pearson correlation 0.56) output heads. Despite missingness, gender class imbalance, and skew towards younger individuals, we can predict age and gender from these reports. Accurate detection of sensitive features, such as these, is a critical first step towards de-biasing predictions through iterative removal of confounded subspaces. Given their potential role in healthcare, the ability to apply these models without bias is of utmost importance. Through an entirely different analytic paradigm BERTowid appears to broadly confirm the salience and ranking of the 11 CCA components (described below). Test set performance on the CCA components drops off almost exactly with their ordering by CCA, with (Pearson correlations ranging from 0.68 for CCA 0 to 0.24 for CCA 11). Components which explain more variance between testimonials and affinity are also more effectively learned directly from testimonials.
BERTIMENT
BERTiment is trained to predict 28 emotion classifications from Reddit annotations. All emotion predictions generalize to unseen data with ROC AUCs ranging from 0.72 for "Realization" to 0.97 for "Love". Further validation is provided by a hedonic tone classifying BERT model trained with positive and negative movie reviews from IMDB. The signed Pearson correlations between BERTiment and the hedonic tone predictions neatly sorts the fine-grained emotion taxonomy. The emotions "Admiration", "Pride", "Approval", and "Love" have the highest positive correlations, while "Annoyance", "Nervousness", "Embarrassment", "Disapproval", and "Disgust" have the largest negative correlations. Originating from entirely different datasets (i.e. movie reviews and Reddit posts) and evaluated on third orthogonal dataset (i.e. the Erowid testimonials) these models learned mutually reinforcing representations of sentiment, albeit at different levels of granularity, as shown in Fig.. Domain expert validation for the specific context of the emotions contained in reports of psychoactive experience was provided by a clinical psychiatrist, who manually adjudicated 393 emotions from 256 Erowid excerpts. Concordance between the model and the psychiatrist was within the range of inter-human variability as reported in the original GoEmotions paper. Specifically, human labeled emotions were in the top 10 BERTiment emotions for 87% of the labels, in the top 5 for 73%, and in the top 1 for 42%, see Fig.
PANEL (K).
Qualitative manual inspection confirms that the extreme (positive and negative) predictions for each sentiment were prominent examples of the emotion (or its opposite), see Supplementary Table. As expected with extreme language, profanities, capitalization and modifiers like "very" and "so" are common. To quantitatively evaluate the sentimental trajectories, Dynamic Time Warping (DTW) measured the distance between the averaged trajectories for each emotion and each pharmacological and biochemical class, see Fig.and Supplementary Fig. 4. The DTW reveals emotional landscapes that conform to expectations based on pharmacological classifications, molecular structure, questionnaires, and anecdotal reports of drug phenomenology. The DTW matrices are skew-symmetric with the sign indicating which of the drugs had a higher mean predicted emotion. Fig.provides a comprehensive view of the emotional content as determined by BERTiment in the Erowid dataset. Ordered by hedonic-tone, the most negative sentiments are associated with antidepressants and antipsychotics, in the middle we find pharmacological classes that are used clinically but also abused recreationally, like opioids and deliriants, and at the positive extreme we see psychedelics and entactogens. The association of psychiatric medications with negative emotions is confounded by ascertainment bias of those who seek out these medications, and does not necessarily reflect their efficacy. Zooming in from the broad emotionscapes to a singular molecule, MDMA is characterized with both BERTowid and BERTiment in Fig.. The trajectory of "Love" during MDMA testimonials starts high and ends higher-fitting for a drug colloquially known as the "love-drug". This arc is clearly distinguished from all other drugs, though closely tracked by the related entactogen, MDA. Supplementary Fig.shows the "Sadness" trajectories of the stimulants (cocaine, amphetamine, and methamphetamine) are tightly coupled and rise gradually over the course of the testimonial, while the antidepressants (paroxetine, venlafaxine, and sertraline) start much higher than the stimulants but gradually fall. In contrast, the stimulants and antidepressants start with similar "Anger" levels, but over the course of the report methamphetamine and cocaine rise dramatically while for antidepressants they increase somewhat less. The emotions "Realization", "Curiosity", "Confusion", "Surprise", and "Amusement" are consistently elevated in subjective testimonials of hallucinogens and psychedelics as compared to other drug classes, most notably the opioids. This constellation of emotional trajectories provides additional discernment within the broad, overlapping classes of hallucinogens and psychedelics as shown in Fig.. For example, Salvia and DMT are both high in "Realization", and "Curiosity", however Salvia triggers more "Confusion", while DMT generates more "Surprise". PCP in contrast is high in "Confusion" and "Amusement", but lower in "Realization", "Surprise" and "Curiosity". The opioids are consistently lower in all of these emotions, but "Relief" provides an interesting counterpoint, as it is higher in opioids than in hallucinogens or psychedelics, which one would expect for drugs widely prescribed for their pain-relieving effects. sentiment trajectories for a subset of phenethylamines, lysergamide, and tryptamines, and for comparison, opioids and ketamine are also shown. Note how, for tryptamines in particular, the weightings for "Surprise" and "Curiosity" are far greater than they are for MDMA. Whereas "Relief" is much higher for opioids than for any of the psychedelics. Notably, not every emotion clearly distinguishes the drugs, the emotions "Neutral" and "Optimism" are quite conserved across pharmacological classes. For every drug analyzed, "Neutral" decreases as the testimonial proceeds, while "Optimism" trajectories for every drug increase dramatically near the end, resembling a ski-jump, see Supplementary Fig.. As if the peak-end rule is a self-fulfilling prophecy, testimonials for all drugs tend to end on an optimistic note. The reduction in "Neutral" over the course of a trip is expected as a drug's effects reveal themselves to the user over time, a similar reduction in neutral sentiment over the course of a narrative was also shown with the IMDB.
CANONICAL CORRELATION ANALYSIS
Our pattern-learning strategy uncovered 11 discrete components that integrate subjective descriptions with the neurotransmitter affinity fingerprint of each psychoactive drug. The dominant factor described a vector characterized by visual-beauty on one extreme and somatic-suffering at the opposite. The second and third most explanatory factors contained poles of depression-insomnia and impulsivity-addiction contrasted with perception-celebration and cosmic-expansion, respectively. The eleven significant components reveal latent patterns of correlation between drug receptor affinity and word-usage, each is described in detail in Supplementary Figure : CCA 0 -Supplementary Figure: CCA 10. Although not imposed by our analytic model, the cortical mappings of factors often turned out to be spatially contiguous with smooth transitions of expression strength between neighboring brain regions. These mappings were often found to be mirrored in homologous brain regions in the left and right hemispheres. These observations are noteworthy because all steps of our modeling pipeline were blind to brain regions and are informed only by receptor affinities and natural language. The dominant component explained the largest portion of joint variation between receptor affinity profiles and word-usage frequencies. This central component isolated a semantic theme characterized by abstract, transcendental terms such as reality, universe, everything, consciousness, and peak as well as a constellation of perceptual phenomena including visuals, patterns, music, and sound. These terms were linked to drugs DMT, LSD, and psilocin, as well as mostly serotonergic receptor affinity at 5-HT2A, 5-HT1A, 5-HT2C, and the expression of these receptors in the medial prefrontal cortex. The opposite extreme of this factor flagged words that describe a theme of mundane suffering: depression, pain, addiction, sleep, awake, daily, weeks, months, work, and anxiety. This latter theme was linked to drugs cocaine, amphetamine, methamphetamine, and oxycodone, as well as receptor affinity at MOR, DOR, NET, and DAT, while the expression of these receptor genes was densest in the posterior cingulate cortex and the inferior parietal lobule. Notably, this component evoked a similar structure found by the BERTiment model for 17 of its 28 emotions, see Fig..
CONVERGENCES BETWEEN BERTIMENT, BERTOWID, AND CCA COMPONENTS
Remarkably, the three distinct ML approaches elucidated similar findings. MDMA was found to elicit a uniquely positive palette of affective attributes that are of particular interest given its apparent efficacy as a treatment for PTSD. At the most positive extreme of the IMDB-derived spectrum of hedonic tone, MDMA was ranked highest among all 52 compounds for "Admiration", "Pride", "Approval", "Love", "Excitement", and "Gratitude", and was only narrowly edged out by opioids for "Joy". BERTowid tag weightings for entactogens (Supplementary Figureand) exhibited heightened levels of festivity and dancing, but also relational, emotional, and effusive, (e.g. "Glowing") phenomena. Using CCA to incorporate its receptor-affinity fingerprint, MDMA was most closely linked with receptor-experience patterns highlighting perception-celebration and emotional-extremes, with receptor gene-expression in the visual and primary sensory cortices. Results also show important distinctions amongst psychedelic subclasses. In particular, the tryptamines exhibit elevated levels of "Curiosity", "Surprise", and "Realization", while phenethylamines highlight relatively higher levels of "Admiration", "Excitement", and "Gratitude". Tryptamines-especially powerful, short-acting DMT and 5-MeO-DMT-had higher "Mystical Experiences" tag weightings than the phenethylamines like mescaline and 2-CE. Interestingly, the chemically distinct diterpenoid compound Salvinorin A shared with these prototypical tryptamines high levels of "Surprise", "Curiosity", and "Realization", in addition to a very high ranking of "Mystical Experience". Drug factorization in the dominant CCA component (CCA 0) aggregated stimulants (including MDMA) and psychiatric medications into one pole, and associated them with phenomena relating to suffering, addiction, and the mundane. Through an entirely separate analysis, BERTiment DTW charts demonstrated these drug classes all ranked highest for a constellation of emotions compatible with this CCA theme such as "Disappointment", "Grief", "Annoyance", "Disapproval", "Disgust", "Sadness", and "Anger". The opposite pole grouped psychedelic and hallucinogenic drugs together with terms highlighting a theme of abstract, lucid, and beatific phenomena. BERTiment DTW found these drugs to have much higher scores for "Curiosity", "Admiration", "Amusement", "Surprise", and "Realization". This dichotomy between the gloomy, prosaic, quotidian aspects of human sentiment and the more abstract, expansive, and creative aspects of human potential was judged by the CCA to be the most explanatory distinction in this large corpus of drugs, and the majority of sentiment dimensions fit neatly into the same schema.
DISCUSSION:
Utilizing linear CCA and nonlinear transformers, supervised or self-supervised, we've trained models which represent diverse drug experiences in a unified, biochemically-informed, temporally-sensitive way. Derived directly from natural language, these representations contain information like the intensity of mystical experience and the depth of joy, anger, and grief: qualities of subjective experience that are of great clinical import for psychiatry. Sequentially applying these models on retrospective reports creates evocative narrative trajectories, which reflect pharmacological distinctions and conform to expectations of subjective effects reported by psychonauts and researchers. Our findings also generally dovetail with prior efforts to use natural language processing tools to analyze the Erowid testimonial database. For example, one recent studyfound, in agreement with our results, that antidepressants were associated with words denoting negative affect and less with mystical phenomena-which we both attributed at least in part to ascertainment bias of depressed subjects seeking medication. Also, a pioneering study from 2012noted surprising similarity between pharmacologically distinct but similarly short-acting drugs like Salvinorin A and DMT. Our analysis detected strong similarities between these drugs, but we were further able to parse fine-grained subjective differences with DMT being linked to relatively higher levels of "Surprise" while Salvinorin was associated with more "Confusion". Notably, compared to the aforementioned prior efforts, the ability to construct detailed, trajectories of experience using transformers that we introduce in this paper is a novel addition to the field. The peak-end rule inspired us to look for such trajectories since it established the counterintuitive finding that there are times when more pain is preferred to less. But this is a rule not a law, and exceptions such as whether pleasure is increasing or decreasing when an experience ends can be even more important than the peak level of pleasure felt. There may well be further variations, and subtleties by which experiential trajectories shape how memories are formed. The shapes of the trajectories produced by BERTowid and BERTiment capture this nuance and, if combined with clinical outcome data in a prospective manner, may identify more temporally-mediated "rules" that mediate how drug-induced experiences impact overall mental health. While the trajectories we constructed unfold from the narrative language in the trip report, the methods we describe naturally apply to other streams of phenomenological and neurochemical data. Modalities as diverse as EEG, ECG, fMRI, and other biometrics are amenable to trajectory construction, simply by replacing the BERT models with appropriately pre-trained encoders (e.g. a 1D CNN for EEGs or ECGs). Excitingly, such modalities can be sampled with high temporal resolution and independently from the patient's recollection, mitigating the issues inherent in self-reported datasets like Erowid, where there is uncertainty about dosage, chronology, and drugs' impact on memory. On the other hand, given that recent machine learning models trained on cross-modal representations have shown improved phenotype prediction, combining parallel subjective and neuroimaging datastreams may build more useful and holistic representations of acute drug states. By tethering retrospective linguistic reconstructions of drug experiences to specific timepoints in their corresponding EEG trajectories we can construct powerful cross-modal representations of conscious states. Such cross-modal representations that provide a common measure spanning psychiatric illness, pharmacological intervention, and acute psychedelic experiences could be transformative to clinical practice. Combining real-time datastreams with the zero-shot learning capability of transformers, it is possible to envision a future of personalized, responsive psychoactive sessions. Music appreciation has been shown to be enhanced by certain psychedelics, but what if therapists could monitor and adjust the emotional palette as it is felt in real-time? The tables of extreme predictions from our models (Supplementary Tables&) make clear that transformers learn semantics reflecting both the training label as well as its opposite meaning. Word2Vec demonstrated that directions, not just points, are meaningfully encoded in latent spaces of natural language; (e.g. King-Queen = Man-Woman). The directions connecting the extremes of "Love" or "Mystical Experience" can guide trips as they unfold, with realtime neurofeedback modulating diverse environmental inputs. In this manner, psychedelic therapists of the future may use machine learning to help their patients navigate latent spaces of subjective experience towards psychoactive journeys of minimal risk and maximal therapeutic benefit.
ONLINE METHODS:
Many datasets were leveraged in this study, namely the Erowid dataset of 11,816 drug testimonials, receptor affinities at 61 receptor subtypes for 44 drugs from the Psychoactive Drug Screening Program (PDSP)(Supplementary Figure), augmented with affinities for 8 phenethylamines from Rickli et al., RNA gene expression data for 200 brain regions from the Allen Brain Atlas, and 58K Reddit posts with 28 human-annotated human emotions. The 10 pharmacologic classes and the 22 chemical classes were retrieved from the Psychonaut Wiki.
DATA PREPROCESSING FOR TRANSFORMERS
Scraped testimonials were parsed for meta data, drug-masked, and tokenized. Drug-masking removed all occurrences of drug names in the testimonial text, including both scientific, common, and colloquial nomenclature as well as misspellings. See Supplementary Tableand code for the full list of masked words. Models were initialized with pre-trained weights for the base BERT encoders. All initial model weights are publicly available. Except when otherwise noted, the base BERT model used was trained with the Stanford Sentiment Treebank dataand is available at TensorFlow Hub. The pooled output from the base BERT model was extended with a dropout layerfollowed by a dense layer for each task (e.g. BERTiment has 28 distinct outputs-one for each binary emotion classification: present or absent). Code necessary to replicate our findings is available at:While testimonials vary in size, the input to BERT models is at most 512 tokens. A sliding window inference step used all available data by creating prediction series of varying lengths for each testimonial from each model. Different window sizes are compared in Supplementary Fig.. When the window size exceeds the testimonial size, the input is zero-padded. When the window size is smaller than the testimonial size, the testimonial is split into contiguous blocks of text and the model is applied to each on constructing a trajectory of inferences. Dynamic Time Warping (DTW) quantified inter-trajectory distances, using an implementation from the fastdtw python package. The Broad Institute's ML4H tools were used for model evaluation.
BERT-BASED MODEL ENCODER FINE-TUNING
The encoder backbone of both BERTowid and BERTiment is a bidirectional transformer (BERT) trained with a masked language model objective,. BERTowid and BERTiment add output heads, which take the BERT encoder representation as input and fine-tune it for new tasks. The encoder backbone contains 109 million parameters. Base models are compared in Supplementary Fig., showing similar performance. Prior to the output layer for the fine-tuning task, we insert a dropout layer, see Supplementary Fig. 8. The ADAMwstochastic gradient descent optimization with initial learning rate of 1e-5 and a batch size of 32. The minimum validation loss model is serialized for downstream inference after 16 epochs.
TRAINING BERTOWID
The Erowid metadata, the inferred CCA weights and the receptor affinities from PDSP provide diverse training labels for BERTowid. Both classification (e.g. drug, tag, gender) and regression tasks (e.g. age, affinity, CCA weights) are considered. All classification tasks are trained to minimize a cross entropy loss, while regression models are trained to minimize the mean squared error of their predictions. Classification and regression with a single model requires a term to balance between the two types of loss, but optimization was found to be sensitive to this value, requiring careful tuning to avoid convergence for only one of the loss types. To mitigate this, multitask BERTowid is trained and serialized separately for classification and regression. Supplementary Fig.compares multi-task vs single task models showing a relatively small cost to taking the multitask approach. With a window size of 64 words and all categorical tasks, 16 epochs takes about 3 hours on a NVidia V-100 GPU. Many of the categorical labels are class-imbalanced. This imbalance leads to poor performance on the less well-represented drugs and tags. To mitigate this we considered a weighted cross entropy loss, which scaled the loss by the inverse of each labels' prevalence to compensate for the imbalance. As Supplementary Fig.shows, the weighted loss did increase precision for less well represented drugs (at the cost of reduced precision on the more prevalent drugs). However, the results for tags were less convincing with only minor improvements in precision for the 4 rarest tags. A likely explanation is that less common tags are in fact less informative and may be applied less rigorously or consistently by the Erowid moderators. Giving these less informative labels more weight in the loss function results in a worse model. With a window size of 64 words and all categorical tasks, 16 epochs takes about 3 hours on a NVidia V-100 GPU.
TRAINING BERTIMENT
The BERTiment training procedure has previously been described and evaluated. Our approach only differs in dropout rate, 0.5, learning rate, 1e-5, and batch size, 32, where for consistency we used the same hyper-parameters and base model used to train BERTowid. The GoEmotions data contains about 58K Reddit posts and 28 emotion annotations, which can be downloaded here. We split the data 70-20-10 between training, testing and validation sets. The 28-head BERTiment is trained for 16 epochs which takes about 2 hours on a NVidia V-100 GPU.
CANONICAL CORRELATION ANALYSIS
The CCA mapping between testimonials and affinities has previously been described in detail. The approach here extends from 27 drugs to 52, from 40 receptor subtypes to 61, and exclusively sources affinity values from the PDSPor Rickli et al.. Each of these receptor-semantic components were composed of two poles of a weighted list of words and a weighted list of receptors. The receptor weights for each component were mapped to the cortex using Allen Brain Atlas receptor RNA expression quantities measured by invasive tissue probes. The scikit-learn package's implementation of the CCA algorithm was used.