Facebook’s new computer vision model achieves state-of-the-art performance

Facebook today announced an AI model trained on a billion images that ostensibly achieves state-of-the-art results on a range of computer vision benchmarks.

Facebook claims to have made a step toward this with a computer vision model called SEER, which stands for SElf-supERvised. SEER contains a billion parameters and can learn from any random group of images on the internet without the need for curation or annotation.

Parameters, a fundamental part of machine learning systems, are the part of the model derived from historical training data.

Facebook researchers found that scaling AI systems to work with complex image data required at least two core components. In developing SEER, Facebook took advantage of an algorithm called SwAV, which was borne out of the company’s investigations into self-supervised learning.

Training models at SEER’s size also required an architecture that was efficient in terms of runtime and memory without compromising on accuracy, according to Facebook. The researchers behind SEER opted to use RegNets, or a type of ConvNet model capable of scaling to billions or potentially trillions of parameters while fitting within runtime and memory constraints.

Facebook software engineer Priya Goyal said SEER was trained on 512 NVIDIA V100 GPUs with 32GB of RAM for 30 days. The last piece that made SEER possible was a general-purpose library called VISSL, short for VIsion library for state-of-the-art Self Supervised Learning. VISSL, which Facebook is open-sourcing today, allows for self-supervised training with a variety of modern machine learning methods.

After pretraining on a billion public Instagram images, SEER outperformed the most advanced state-of-the-art self-supervised systems, Facebook says. SEER also outperformed models on tasks including object detection, segmentation, and image classification. When trained with just 10% of the examples in the popular ImageNet dataset, SEER still managed to hit 77.9% accuracy. And when trained with just 1%, SEER was 60.5% accurate.

When asked whether the Instagram users whose images were used to train SEER were notified or given an opportunity to opt out of the research, Goyal noted that Facebook informs Instagram account holders in its data policy that it uses information like pictures to support research, including the kind underpinning SEER. That said, Facebook doesn’t plan to share the images or the SEER model itself, in part because the model might contain unintended biases. “Self-supervised learning has long been a focus for Facebook AI because it enables machines to learn directly from the vast amount of information available in the world, rather than just from training data created specifically for AI research,” Facebook wrote in a blog post.