Computer vision can now recognize thousands of objects

The field of computer vision has vastly improved since it began in the 1960s. Computers can now quickly and accurately recognize thousands of faces, as well as a growing number of other objects. Although computer vision currently lacks the subtlety, versatility, and general capabilities of human vision, the gap is steadily closing. Dr. Fei Fei Li is an expert on the field of computer vision, and she and her colleagues at Stanford University are working on advancing the state of the art. In an interview with Sander Olson for Next Big Future, Dr. Li describes the importance of math and algorithms to computer vision, and why she believes that computer vision systems may be able to match human visual systems within the next twenty years.
 
Fei-Fei Li
 
Question: You have been working in the field of computer vision for more than a decade. How has the field changed since you started?
 
The field of computer vision has grown in all dimensions since I began studying this technology in the late 1990s. There are many more active researchers, more students, more startup companies, more money. The technology has evolved as well. For example, when I began, scientists were concentrating almost exclusively on getting computers to recognize faces. Now, researchers are getting computers to quickly and accurately recognize all manner of objects. So it is an exciting time to study computer vision.
 
Question: Can these advances be primarily attributed to Moore’s law?
 
Although Moore’s law has been a boon to computer vision research, computer vision is fundamentally a mathematical science. So it has always been at the forefront of using the most advanced mathematical tools. We have created larger, more complex, and more effective algorithms that can accomplish more tasks.
 
Question: To what extent can computer vision take advantage of multi-core processors and GPUs?
 
Multi-core processors and GPUs have massive parallelization. For the algorithms that can take advantage of massive parallelism, substantial performance benefits can occur. But these technologies are still young, so we hope to be able to refine our algorithms to better exploit many-core architectures in the future.
 
Question: Will future computers have hardware specifically dedicated to computer vision?
 
Absolutely. Intel, which is a sponsor of our research, is well aware of this fact. For certain applications, there will probably be application specific integrated circuits (ASICs) that can quickly process visual information. That would free up the CPU to handle other tasks. Question: Is the field of computer vision hampered more by insufficient hardware or software?
 
Neither the hardware nor the software currently available can be considered close to being optimal. Computer vision is sufficiently hard that it simultaneously pushes hardware and software to the maximum. But more than anything we need better algorithms and a better mathematical understanding of the nature of vision.
 
Question: What companies are funding your computer vision research?
 
My lab has won several Google research awards We also collaborate with Microsoft, Intel, NEC, and Kodak. But most of our funding comes from Government institutions, such as the NSF, NIH and other agencies.
 
Question: How will the Kinect device affect computer vision research?
 
Kinect is born out of computer vision research, so we are proud to see this development. When I worked at Microsoft as a visiting scientist, I was along side with the researchers who were developing the Kinect. So computer vision research gave birth to the kinect, and in turn the kinect device will in turn lead to new computer vision devices.
 
Question: Is one industry in particular pushing computer vision?
 
Many industries are pushing computer vision, be it Internet search engines, industrial inspections, medical imaging, etc. One of them is the social media industry that is definitely a major driver of computer vision. People are uploading huge numbers of photos, and 24 hours of video are uploaded to youtube every minute. In order to answer the challenge of dealing with all of this data, we will need to create sophisticated systems that can quickly interpret and organize vast streams of visual data. The field of robotics is also pushing computer vision, since visual intelligence is by far the most important component of intelligence.
 
Question: How wide is the gap between the state of the art in computer vision and human vision?
 
Comparing human and computer vision is somewhat akin to comparing airplanes and birds. Computer vision is generally less advanced than human vision, but is already doing tasks that human vision cannot do. For instance, computer vision systems are sifting through vast numbers of faces, looking for matches. Computers today are beginning to recognize thousands of categories of objects, so computer vision is definitely catching up with human vision capabilities.
 
Question: Is funding for computer vision increasing?
 
Funding is always a problem, because the competition for Government funds is fierce. Money for my own lab comes from the Government, although we also receive corporate assistance. Funding for computer vision is generally increasing, since most now realize the importance of this field. But more and faster progress would be made if there is more funding available.
 
Question: Are there any objects that a computer vision system is incapable of recognizing?
 
Computers cannot yet recognize all objects, and they cannot match the subtlety of human vision. For instance, they cannot differentiate between a BMW X3 and a BMW X5. But the number of objects that computers can quickly recognize is rapidly increasing.
 
Question: How long before a computer vision system fully matches the capabilities of the human visual system?
 
I’m optimistic that computer systems could come close to human systems within the next several decades. I would like to see this development within the next 20 years. Given Moore’s law, we are now primarily limited by suboptimal algorithms and insufficient funding.
 
Question: When computer vision systems match the dexterity of human visual systems, will that indicate that human level AI is near?
 
The answer is a partial Yes. Vision is one of the most important component of human intelligence. So solving computer vision is a big part of solving AI. I am encouraged by the steady progress that I see my colleagues in the AI field making.
 
Question: How much progress do you expect from the computer vision field within the next decade?
 
With sufficient funding, I hope vision systems in 2022 should be able to quickly recognize virtually any object. Having this capability could lead to breakthroughs in the field of robotics and AI in general. Progress in the field of computer vision should match or even exceed the progress made during the past decade.