Using an RNN-CNN with Embedding to identify a movie-genre from a script.
The goal of this project was to gain some experience and understanding in how Recurrent Layers can be mixed with Convolutional layers. I wanted to see if I could build a Deep Learning model that could classify lines of text as a certain genre of movie based on word patterns and sentence structure. Also, I was eager to begin playing around with Tensorflow 2 seeing as Keras is the new default api.
The first step in this project was gathering movie scripts classified by genre. I acquired roughly 15 each comedies, horrors, and thrillers and began the pre-processing-process. After examining the scripts, I noticed that most of them were delimited by character name and line number in all caps, so I began by removing all of the all-caps words and numbers. I also removed all punctuation. Lastly, I utilized the lemmatization functionality from spaCy to help simplify the vocabulary.
After that, I combined all of the scripts into three separate strings, then broken them into lists no more than 25 items per. From here, I turned them into Pandas DataFrames and hard coded the genre of each. I also used these scripts to create a word index dictionary so I could convert the words to numbers and vice versa.
Once the pre-processing was done, I was able to begin experimenting with different model architectures. The configuration I ended up seeing the best performance with was fairly simple. I used a Long Short-Term Memory (LSTM) Layer that returned the full sequence of parameters into a 1-Dimensional Convolutional Layer. This was followed by Batch Normalization and Max Pooling. From there, those outputs were passed into a Dense layer for further calculation and lastly into a final Dense layer with 3-output nodes and Softmax activation.
With this structure and only 10 epochs of training on my GTX 1080ti, I was able to achieve a weighted accuracy of 79% between the three classes.
For next steps, I'd like to test how well the model performs at classifying entire movies based on the sum of their scripts. For example, if a movie has mostly comedy lines, would the model recognize that movie as a comedy?
In the picture here, you can see ten randomly sampled outputs from the validation set. The model performs quite well!
I was very excited to start this project. I very rarely get to work with Embedding networks, and I am glad that I was able to utilized RNN and CNN layers together! Make sure to check out my git repo below to check out the code!