Our experience of the world is intensely visual. Researchers suggest over half of our brain power is devoted to processing what we see. We talk a lot about how artificial intelligence will transform the world around us, automating physical and knowledge work tasks. In order for such a system to exist, it’s clear that we must teach it to see. This is called computer vision, and it is one of the most basic and crucial elements of artificial intelligence. At a high level, endowing machines with the power of sight seems simple, just slap on a webcam and press record. However, vision is our most complex cognitive ability, and machines must not only be able to see, but understand what they are seeing. They must be able to derive insights from the entirely new layer of data that lies all around them and act on that information.
Despite being an important driver of innovation today, computer vision is little understood by those outside of the tech world. Here are a handful of facts that help put some context around what computer vision is and how far we’ve come in developing it.
1.) Computer scientists first started thinking about vision about 50 years ago. In 1966, MIT professor Seymour Papert gave a group of students an assignment to attach a camera to a computer and describe what it saw, dividing images into “likely objects, likely background areas, and chaos.” Clearly, this was more than a summer project, as we are still working on it half a century later, but it laid the groundwork for what would become one of the fastest growing and most exciting areas of computer science.
2.) While computer vision (CV) has not reached parity with human ability, its uses are already widespread, and some may be surprising. Scanning a barcode, the yellow first down line while watching football, camera stabilization, tagging friends on Facebook, Snapchat filters, and Google Street View are all common uses of CV.
3.) In some narrow use cases, computer vision is more effective than human vision. Google’s CV team developed a machine that can diagnose diabetic retinopathy better than a human ophthalmologist. Diabetic retinopathy is a complication that can cause blindness in diabetic patients, but it is treatable if caught early. With a model that has been trained on hundreds of thousands of images, Google uses CV to screen retinal photos in hopes of earlier identification.
4.) One of the first major industries being transformed by computer vision is an old one you might not expect: farming. Prospera, a startup based in Tel-Aviv, uses camera tech to monitor crops and detect diseases like blight. John Deere just paid $305M for a computer-vision company called Blue River. Their technology is capable of identifying unwanted plants and dousing them in a focused spray of herbicide to eliminate the need for coating entire fields in harmful chemicals. Beyond these examples, there are countless aerial and ground based drones that monitor crops and soil, as well as robots that use vision to pick produce.
5.) Fei-Fei Li, head of Stanford’s Vision Lab and one of the world’s leading CV researchers, compares computer vision today to children. Although computers can “see” better than humans in some narrow use cases, even small children are experts at one thing – making sense of the world around them. No one tells a child how to see. They learn through real-world examples. Considering a child’s eyes as cameras, they take a picture every 200 milliseconds (the average time an eye movement is made). So by age 3, the child will have seen hundreds of millions of pictures, which is an extensive training set for a model. Seeing is relatively simple, but understanding context and explaining it is extremely complex. That’s why over 50% of the cortex, the surface of the brain, is devoted to processing visual information.
6.) This thinking is what led Fei-Fei Li to create ImageNet in 2007, a database of tens of millions of images that are labeled for use in image recognition software. That dataset is used in the ImageNet Large Scale Visual Recognition Challenge each year. Since 2010, teams have put their algorithms to the test on ImageNet’s vast trove of data in an annual competition that pushes researchers and computer scientists to raise the bar for computer vision. Don’t worry, the database includes 62,000 images of cats.
7.) Autonomous driving is probably the biggest opportunity in computer vision today. Creating a self-driving car is almost entirely a computer vision challenge, and a worthy one — 1.25 million people die a year in auto-related deaths. Aside from figuring out the technology, there are also questions of ethics like the classic trolley problem: Should a self-driving vehicle alter its path into a situation that would kill or injure its passengers to save a greater number of passengers in its current direction? Lawyers and politicians might have to sort that one out.
8.) There’s an accelerator program specifically focused on computer vision, and we’re excited to be participating as mentors. Betaworks is launching Visioncamp, an 11-week program dedicated to ‘camera-first’ applications and services starting in Q1 2018. Betaworks wants to “explore everything that becomes possible when the camera knows what it’s seeing.”
We’re just scratching the surface of what computer vision can accomplish in the future. Self-driving cars, automated manufacturing, augmented and virtual reality, healthcare, surveillance, image recognition, helpful robots, and countless other spaces will all heavily employ CV. The future will be seen.
Disclaimer: We actively write about the themes in which we invest: artificial intelligence, robotics, virtual reality, and augmented reality. From time to time, we will write about companies that are in our portfolio. Content on this site including opinions on specific themes in technology, market estimates, and estimates and commentary regarding publicly traded or private companies is not intended for use in making investment decisions. We hold no obligation to update any of our projections. We express no warranties about any estimates or opinions we make.