In this project, I build the world model (https://worldmodels.github.io/) from scratch utilizing Tensorflow and Keras!
This project was extremely fun! Before this, I had only ever combined at most two Neural Networks together.
The goal of this project was to build and train an AI that would be able to play the OpenAI Gym Car Racing game [Reference 1]. This game involves driving a race car around a track with the goal of staying on track and getting as far as possible before time runs out.
This is essentially a reinforcement learning problem. To solve it, I used a highly advanced combination of a Convolutional Neural Network (CNN) with a Variational Auto-Encoder (VAE) as the eyes for the car, and the Mixed Density Network (MDN) combined with the Recurrent Neural Network (RNN) as the thinker, and a Controller as the decision maker.
To the left, you can see the beginning code for the CNN - VAE class. This class is responsible for providing the "vision" to the rest of the network. What it does is, takes in the available environment and converts it to a matrix. From there, the matrix ran through Convolution layers and Max-Pooling layers. These are essentially feature extractors. Through back propagation, this CNN learns what data about the cars environment it needs to provide to the rest of the network.
The VAE is a crucial piece of the network. Typically, Auto-Encoders are also used for feature extraction. They take in large amounts of features, and learn which are more meaningful than others and weight those appropriately.
Here is an example, I want to be able to predict whether someone is in college or not. I'm given 300 of columns of personal data ranging from age to what color shoes they where. Instead of ranking these by hand for feature importance, I can build an Auto-Encoder to shrink those 300 features into 10 highly important features!
What a Variational Auto-Encoder does, is shrinks the outputs of the CNN down to a minute number of features, alters them slightly, and the expands them back to their original size. What this does is allows for a dream-like state for the network. In a sense, this makes the image the MDN-RNN sees much more fuzzy and unclear. While it does increase training time and resources, it provides a huge benefit to the overall process in that it makes your entire network much more resilient!
[Reference 2 - Left-IMG]
The MDN-RNN allows for the network to make better decisions. This portion of the network allows for the learning, remembering, and then predicting, or anticipating, what the car will experience. These types of networks are typically used in sequence generation. For example, the Magenta project has used a similar network to predict what you are drawing in real time as you draw it! [Reference 3]
From here, all that is left to do is feed the output of this model into a Probability Density Function. The controller then takes in this output and tells the car whether to speed up, slow down, and whether to turn left or right!
I loved working on this project, and have definitely been inspired to try an ensemble for Neural Networks like this in the future!