Text Classification of Dreams Using NLP and scikit-learn

About

Many people struggle to interpret their dreams and the meaning of various symbols or elements. Creating a tool that automatically codes specific elements of dreams can present an opportunity to highlight significant themes that may provide insight into an individual dreamer’s unconscious. This project trains a model to determine emotions present in a text dream account using a dataset containing dream reports that was used by researchers who wrote a paper titled Our dreams, our selves: automatic analysis of dream reports. The researchers sourced the dream data, a collection of 20k+ dream reports, from dreambank.net. The dataset contains journal-like text accounts of dreams from a number of individuals that have been coded using the Hall/Van de Castle dream coding system. This system was developed by psychologists as a method for doing quantitative content analysis on dreams. It assigns quantitative values to several dream elements: characters (male/female, animal, family, etc), aggression or friendliness of interactions, negative/positive emotions. The dataset also includes information on the dreamers’ profiles and dates of dreams.

For this machine learning project, I used a Random Forest classifier to train and test a model to classify text-based dream data. My process for doing this is as follows:

Import Python libraries
Load dream dataset and inspect data
Data cleaning steps
Split emotions_code column to keep only the first emotion code (for simplicity).
Keep only values with the dream codes we want to work with and defining the dream codes in a Python dictionary.
Change the data type from string to list.
Split the data into training and test sets.
Build a pipeline to tokenize text, determine term frequencies, and train a random forest classifier.
Evaluate performance of test set.
Confusion matrix
Plot confusion matrix
Classification report
Parameter tuning using GridSearchCV
Now that the classifier is trained, use it to predict the emotion code on a new dream account.
Determine which are the best parameters