Note: This work thread is a part of the PECSS Project, advised by Dr. Rosa I. Arriaga, and funded by the National Science Foundation (Award Number: 1915504)

PTSD is not always apparent. There could be situations where people themselves don’t know that they are suffering from PTSD. Getting the right care for these people at the right time is crucial for their mental health.

In order to address this, we thought about how to best create a system that can predict these traces of trauma from the way a person speaks. In this page, I discuss all the data, methods, and results associated with this work thread.

The 4 PTSD Symptom Clusters

A PTSD patient would most likely show one or more of these symptoms:

Posttraumatic Stress Disorder Checklist Items in Each DSM-IV Symptom Cluster. Image taken from the “Posttraumatic Stress Disorder Symptom Clusters in Service Members Predict New-Onset Depression Among Military Spouses” paper.

Posttraumatic Stress Disorder Checklist Items in Each DSM-IV Symptom Cluster. Image taken from the “Posttraumatic Stress Disorder Symptom Clusters in Service Members Predict New-Onset Depression Among Military Spouses” paper.

The Research Question

Can we detect the presence of the 4 PTSD Symptom Clusters (Reexperiencing, Hyperarousal, Emotional Numbing, Effortful Avoidance) from a person’s posts on social media platforms (such as Reddit)?

Data

We collated publicly available data from Reddit, creating two datasets for our use case:

  1. Randomly sampled r/ptsd posts with multilabel symptom cluster annotations (Dataset-1): This dataset consisted of 500 randomly sampled posts from r/ptsd, out of which we manually annotated 350 posts for the 4 symptom clusters. We ensured to balance the dataset with an almost equal number of symptom posts (179) vs non symptom posts (171).
  2. Posts, Comments, and Replies from r/ptsd 2008-2022 (Dataset-2): This dataset contains all ~40,000 posts, and their respective comments and replies made on the r/ptsd subreddit since its inception in 2008, until Feb, 1 2022.

Methodology

We employed a 3-step approach towards answering the research question at hand:

  1. Literature Survey: We first conducted a thorough literature survey to identify relevant work in this domain. In order to accurately annotate the reddit posts with the symptom cluster information, we used the Posttraumatic Stress Disorder Checklist Items as seen in the “Posttraumatic Stress Disorder Symptom Clusters in Service Members Predict New-Onset Depression Among Military Spouses” paper.
  2. Multilabel Symptom Cluster Classification: We developed a multilabel classification model for PTSD symptom clusters by splitting Dataset-1 for training and testing the model. To come up with the best performance, we experimented with different algorithms such as One-vs-Rest, Binary Relevance, Classifier Chain, and Label Powerset.
  3. Running the classifier on Reddit data: After creating the best possible classifiers in the above step, we ran the classifiers on Dataset-2 to find relevant insights about how the PTSD symptom clusters seem across the entire r/ptsd subreddit. Additionally, we also did an investigation to observe how people with a PTSD self-diagnosis talk about symptom clusters, when compared to people who haven’t been diagnosed with PTSD.

Results

  1. Highest model accuracy was recorded using the One-vs-Rest algorithm, with these scores:
    1. Reexperiencing: 75.87%
    2. Hyperarousal: 80.95%
    3. Emotional Numbing: 83.49%
    4. Effortful Avoidance: 89.52%