During the past nine months, an Nvidia engineering team built a self-driving car with one camera, one Drive-PX embedded computer and only 72 hours of training data. Nvidia published an academic preprint of the results of the DAVE2 project entitled End to End Learning for Self-Driving Cars on arXiv.org hosted by the Cornell Research Library.
The Nvidia project called DAVE2 is named after a 10-year-old Defense Advanced Research Projects Agency (DARPA) project known as DARPA Autonomous Vehicle (DAVE). Although neural networks and autonomous vehicles seem like a just-invented-now technology, researchers such as Google’s Geoffrey Hinton, Facebook’s Yann Lecune and the University of Montreal’s Yoshua Bengio have collaboratively researched this branch of artificial intelligence for more than two decades. And the DARPA DAVE project application of neural network-based autonomous vehicles was preceded by the ALVINN project developed at Carnegie Mellon in 1989. What has changed is GPUs have made building on their research economically feasible.
Neural networks and image recognition applications such as self-driving cars have exploded recently for two reasons. First, Graphical Processing Units (GPU) used to render graphics in mobile phones became powerful and inexpensive. GPUs densely packed onto board-level supercomputers are very good at solving massively parallel neural network problems and are inexpensive enough for every AI researcher and software developer to buy. Second, large, labeled image datasets have become available to train massively parallel neural networks implemented on GPUs to see and perceive the world of objects captured by cameras.
Mapping human driving patterns
The Nvidia team trained a convolutional neural network (CNN) to map raw pixels from a single front-facing camera directly to steering commands. Nvidia’s breakthrough is the autonomous vehicle automatically taught itself by watching how a human drove, the internal representations of the processing steps of seeing the road ahead and steering the autonomous vehicle without explicitly training it to detect features such as roads and lanes.
Although in operation the system uses one camera and one Drive-PX embedded computer, the training system used three cameras and two computers to acquire three-dimensional video images and steering angels from the vehicle driven by a human that were used to train the system to see and drive.
Nvidia monitored changes in the steering angle as the training signal that mapped the human driving patterns into bitmap images recorded by the cameras. The system learned using the CNN to create the internal representations of the processing steps of driving, such as detecting useful road features like lanes, cars and road outlines.
The open-source machine learning system Torch 7 was used to render the learning into the processing steps that autonomously perceived the road, other vehicles and obstacles to steer the test vehicles. The actual training occurred at 10 frames per second (fps) because there wasn’t enough differentiation in adjacent frames at 30 fps to make learning valuable. The test vehicles were a 2016 Lincoln MKZ and a 2013 Ford Focus.
The core of the machine-learning process was the simulated steering by the CNN using Torch 7. The steering commands the CNN executed in a simulated response to the 10 fps images taken from a car driven by a human were compared to the human steering angles. The analysis of the difference between the human steering angles and the CNN-simulated steering commands taught the system to see and steer. The test data used in simulation was based on the video recording of three hours of driving over test routes, amounting to a total distance of 100 miles.
When the CNN driving simulation performed well, further machine learning and testing was stepped up to test vehicles on the road. On-road testing improved the system, with a human driver supervising the autonomous car and intervening when the autonomous system erred. Each correction was fed to the machine-learning system to improve the accuracy of the steering process. In the first 10 miles of driving on the New Jersey Turnpike, the vehicle operated 100 percent autonomously. Overall in early testing, the vehicle operated 98 percent autonomously.
Nvidia demonstrated that CNNs can learn the entire task of lane detection and road following without manually and explicitly decomposing and classifying road or lane markings, semantic abstractions, path planning and control. This was learned using Torch 7 to render fewer than 100 hours of training data to create the internal process to operate a vehicle autonomously in diverse weather and lighting conditions, on highways and side roads. Nvidia released a video with its paper that shows examples of the system autonomously steering the test vehicles:
The Nvidia team indicated that the system is not yet ready for production by stating in its paper:
“More work is needed to improve the robustness of the network, to find methods to verify the robustness, and to improve visualization of the network-internal processing steps.”
Based on the video, it’s fairly certain that the engineering team at every company building or planning to build an autonomous vehicle is reading this paper right now and discussing the results. Building this autonomous vehicle prototype could put Nvidia in the position to be a leading supplier of massively parallel GPU systems to all of the autonomous car manufacturers.