I've spent my past little bit understanding a lot more about what self-driving is + how we can really commercialize and create generalizable, end2end self-driving.
Recently, I shifted from building DataGAN to focusing on this goal that I've had for quite a while right now. The TLDR; of what I'm doing is that I'm focusing on being able to take in monocular camera inputs, and being able to generate control values i.e. steering, acceleration, brake.
What I've realized with DataGAN is that I don't really know enough. I've been focusing on trying to create new approaches with self-driving without fully understanding it.
That's my current goal with end2end learning. This is something that I've been super fascinated about end2end learning and really want to understand it better. It's also the best way I can really go from 0-1 right now and make legit, compounding progress.
Figure 1. Proposed diagram of creating an end2end model
The overall architecture that I’m currently building out is to use Convolutional Neural Networks to predict the speed and steering given an input image [similar to what NVIDIA did]. Why? CNNs have shown to have state-of-the-art performance for performing tasks such as detection, steering, speed, and a lot more. This also makes it the best way for me to implement this in predicting speed + steering [and solving regression problems].
I’m also using autoencoders as a tertiary network to see whether we can also have a network understand the current see and how it would look like 1 scene later. This is a metric that I’m using to see if the network understands the current scene and how steering + speed changes its prediction for the next frame.
Problems that I’m trying to figure out:
Here’s a demo of steering on a dataset that I’ve been playing around with:
My ask right now: