Neural Net Writes The Office

Neural Net Writes The Office

A Deep Learning Recurrent Neural Net (RNN), written in Python, trained to write additional scripts for The Office.

The first step in this project was to do some basic exploration and data cleaning. The dataset was already pre-processed, so there was no much cleaning to do. All the better, let the fun Viz work begin! The dataset I'm working off of can be found in the data folder "the-office-lines - scripts.csv". It contains every line said during filming of The Office, including deleted scenes. It is organized by ID, Season, Episode, Scene, Speaker, Line, Deleted Scene Flag. All of the visualizations were done using Seaborn. I am a big fan of this project. It is a very clever front-end for Matplotlib, it lets you make really detailed and professional plots in one line of code. If you have never used it, I highly recommend checking out their website! The first plot shows the number of episodes per season of The Office.

I decided the best way to feed the data into my Neural Net was to combine the lines of text from the dataset into one .txt file in the form of "Speaker": Text. Before throwing this data into the net, I wanted to get a better idea of how many lines of text each season contains to get a better idea of which season will have the most influence on my result. As an avid fan, there are differences in tone and comedy in each season of The Office, particularly when Michael Scott left. I heavily considered removing the seasons after Michael's departure (as NBC should have...) but in the end I kept true to the series and used the entire dataset.

Last but not least, as a personal interest, I wanted to visualize how often each character says the famous "That's what she said" phrase. To no one's surprise, Michael leads the characters by far. The RNN that I ultimately decided on trained on 125 Epochs, with a sequence length of 15 and batch sizes of 512. I used an RNN size of 1024, 3 lstm layers, and a learning rate of 0.01. This project was accomplished with Python, Seaborn, and Tensorflow. I want to send a big shout out to my GTX 1080, without this I would still be waiting on this model to train!