Seeing The Road Ahead: The Undervalued Self-Driving Asset of Data

Gene MunsterApril 13, 2018

This is the 4th in a series of notes based on our deep dive into computer perception for autonomous vehicles. Autonomy is a question of when? not if? In this series, we’ll outline our thoughts on the key components that will enable fully autonomous driving. See our previous notes (Computer Perception Outlook 2030, What You Need To Know About LiDAR, The Importance of Cameras To Self-Driving Vehicles).

A self-driving car will only be as strong as the data it trains on.

The importance driving data will play in developing fully autonomous vehicles is often understated, and companies that possess large and high-quality driving data sets are much further ahead of their peers than some may think. Driving data is key because it is the core input that will train the artificial intelligence models that operate autonomous vehicles. The more data these models have, the more scenarios they can prepare for, and, in turn, the stronger the entire system becomes. However, obtaining good driving data is not an easy task, and it is virtually impossible to gather data on every single driving scenario in all types of weather conditions. Due to improvements in computer graphics technology, many are relying on simulated data to train their self-driving models. In the paragraphs below, we dive deeper into the pros and cons of using simulated data versus real data and identify who are the data leaders among self-driving car companies.

Simulation vs. real data. We recently spoke to a computer vision expert at the University of Michigan about the difference between using simulated data versus real-world data. He believes that simulated training data is valuable because most of the data collected during on-road driving is innocuous. Real-time driving data is only very interesting when there is a critical event or an unusual scenario. Simulation data lets you test on critical events constantly. However, he also noted the AI in simulated data is still programmed by a human and those AI tend to act differently than humans on the road. All autonomous systems will train with some simulated data and will therefore require fine tuning to factor in the difference between human and machine driving styles. That said, we do not want to underestimate the value of real data and believe capturing non-programmed scenarios will play a key role in preparing AVs for all situations that may arise.

Waymo’s large data lead. The more miles an autonomous vehicle drives, the more real data the system can capture, the more robust the system can become. Companies that are approved to test autonomous driving in California are responsible for recording and publishing the number of autonomous miles driven. As of April 1st, 52 companies have been issued permits to test autonomous vehicles in California, and as shown in the graph below, Waymo has driven 352,545 as of November 30th, 2017. As of February 2018, Waymo had announced they have driven over 5 million miles in total. This announcement came only ~3 months after they announced crossing the 4-million-mile mark. While testing takes place in many states other than California, these data points suggest that Waymo has a very large data lead over their peers, which may translate to a large lead in the race to full autonomy.

Tesla lurking in the shadows. While Tesla is one of the 52 companies approved to test autonomous vehicles in California, Tesla did not test on state roads in 2017. However, the company acknowledged in the report they filed with California DMV that Tesla conducts testing to develop autonomous vehicles via simulation, in laboratories, on test tracks, and on public roads in various locations around the world. Tesla also highlighted that they have a fleet of hundreds of thousands of customer-owned vehicles that test autonomous technology in “shadow-mode” during their normal operation. Shadow mode is a feature that runs in the background without actuating vehicle controls in order to provide data on how the features would perform in real-world and real-time conditions. This has allowed Tesla to gather billions of miles of passive real-world driving data to develop its autonomous technology. This data is extremely valuable in training autonomous vehicles to interact with the real world, and, in our eyes, makes Tesla one of the top contenders in the race for full autonomy.

Disclaimer: We actively write about the themes in which we invest: virtual reality, augmented reality, artificial intelligence, and robotics. From time to time, we will write about companies that are in our portfolio. Content on this site including opinions on specific themes in technology, market estimates, and estimates and commentary regarding publicly traded or private companies is not intended for use in making investment decisions. We hold no obligation to update any of our projections. We express no warranties about any estimates or opinions we make.