At three months old, they already have intuition about how things around them behave-without anyone explicitly teaching them the rules of the game. Babies rapidly develop this ability by soaking up data from their external environments, forming a sort of “Common sense” about the dynamics of the physical world.
While recent AI models have already trounced humans from game play to solving decades-old scientific conundrums, they still struggle at developing intuition about the physical world.
This month, researchers at Google-owned DeepMind took inspiration from developmental psychology and built an AI that naturally extracts simple rules about the world through watching videos.
Netflix and chill didn’t work on its own; the AI model only learned the rules of our physical world when given a basic idea of objects, such as what their boundaries are, where they are, and how they move.
Similar to babies, the AI expressed “Surprise” when shown magical situations that didn’t make sense, like a ball rolling up a ramp. In a way, PLATO hits the sweet spot between nature and nurture.
Developmental psychologists have long argued about whether learning in babies can be achieved from finding patterns in data from experiences alone. PLATO suggests the answer is no, at least not for this particular task. To be clear, PLATO isn’t a digital replica of a three-month-old baby-and was never designed to be.
At just three months old, most babies won’t bat an eye if they drop a toy and it falls to the ground; they’ve already picked up the concept of gravity. That’s great news for AI: it means that rather than building robots to physically explore their environment, it’s possible to imbue a sense of physics into AI through videos.
In contrast, even the most sophisticated deep learning models still struggle to build a sense of our physical world, which limits how much they can engage with the world-making them almost literally minds in the clouds.
So how do you measure a baby’s understanding of everyday physics? “Luckily for us, developmental psychologists have spent decades studying what infants know about the physical world,” wrote lead scientist Dr. Luis Piloto.
Show a baby a ball rolling up a hill, randomly disappearing, or suddenly going the opposite direction, and the baby will stare at the anomaly for longer than it would when ibserving its normal expectations.
In the new study, the team adapted VoE for testing AI. They tackled five different physical concepts to build PLATO. Among those are solidity-that is, two objects can’t pass through each other; and continuity-the idea that things exist and don’t blink out even when hidden by another object.
To build PLATO, the team first started with a standard method in AI with a two-pronged approach. One component, the perceptual model, takes in visual data to parse discrete objects in an image.
In other words, the model builds a “Physics engine” of sorts that maps objects or scenarios and guesses how something would behave in real life. This setup gave PLATO an initial idea of the physical properties of objects, such as their position and how fast they’re moving.
The team showed PLATO under 30 hours of synthetic videos from an open-sourced dataset. PLATO eventually learned to predict how a single object would move in the next video frame, and also updated its memory for that object. They presented PLATO with both a normal scene and an impossible one, such as a ball suddenly disappearing. When measuring the difference between the actual event and PLATO’s predictions, the team could gauge the AI’s level of “Surprise”-which went through the roof for magical events. Challenged with a completely different dataset developed by MIT, featuring, among other items, rabbits and bowling pins, PLATO expertly discriminated between impossible and realistic events.
PLATO had never “Seen” a rabbit before, yet without any re-training, it showed surprise when a rabbit defied the laws of physics. Similar to babies, PLATO was able to capture its physical intuition with as little as 28 hours of video training. PLATO isn’t meant as an AI model for infant reasoning.
From prosthetics to self-driving cars, an intuitive grasp of the physical world bridges the amorphous digital world of 0s and 1s into every day, run-of-the-mill reality. It’s an ability that comes naturally for kids around four years old, and if embedded into AI models, could dramatically help it understand social interactions.
The new study builds upon our early months in life as a rich resource for developing AI with common sense. The authors are releasing their dataset for others to build on and explore an AI model’s ability to interact with more complex physical concepts, including videos from the real world.