To train a robot to navigate a house, you either need to give it a lot of real time in a lot of real houses, or a lot of virtual time in a lot of virtual houses. The latter is definitely the better option, and Facebook and Matterport are working together to make thousands of virtual, interactive digital twins of real spaces available for researchers and their voracious young AIs.
On Facebook’s side the big advance is in two parts: the new Habitat 2.0 training environment and the dataset they created to enable it. You may remember Habitat from a couple years back; in the pursuit of what it calls “embodied AI,” which is to say AI models that interact with the real world, Facebook assembled a number of passably photorealistic virtual environments for them to navigate.
Many robots and AIs have learned things like movement and object recognition in idealized, unrealistic spaces that resemble games more than reality. A real-world living room is a very different thing from a reconstructed one. By learning to move about in something that looks like reality, an AI’s knowledge will transfer more readily to real-world applications like home robotics.
But ultimately these environments were only polygon-deep, with minimal interaction and no real physical simulation — if a robot bumps into a table, it doesn’t fall over and spill items everywhere. The robot could go to the kitchen, but it couldn’t open the fridge or pull something out of the sink. Habitat 2.0 and the new ReplicaCAD dataset change that with increased interactivity and 3D objects instead of simply interpreted 3D surfaces.
Simulated robots in these new apartment-scale environments can roll around like before, but when they arrive at an object, they can actually do something with it. For instance if a robot’s task is to pick up a fork from the dining room table and go place it in the sink, a couple years ago picking up and putting down the fork would just be assumed, since you couldn’t actually simulate it effectively. In the new Habitat system the fork is physically simulated, as is the table it’s on, the sink it’s going to, and so on. That makes it more computationally intense, but also way more useful.
They’re not the first to get to this stage by a long shot, but the whole field is moving along at a rapid clip and each time a new system comes out it leapfrogs the others in some ways and points at the next big bottleneck or opportunity. In this case Habitat 2.0’s nearest competition is probably AI2’s ManipulaTHOR, which combines room-scale environments with physical object simulation.
Where Habitat has it beat is in speed: according to the paper describing it, the simulator can run roughly 50-100 times faster, which means a robot can get that much more training done per second of computation. (The comparisons aren’t exact by any means and the systems are distinct in other ways.)
The dataset used for it is called ReplicaCAD, and it’s essentially the original room-level scans recreated with custom 3D models. This is a painstaking manual process, Facebook admitted, and they’re looking into ways of scaling it, but it provides a very useful end product.
More detail and more types of physical simulation are on the roadmap — basic objects, movements, and robotic presences are supported, but fidelity had to give way for speed at this stage.
Matterport is also making some big moves in partnership with Facebook. After making a huge platform expansion over the last couple years, the company has assembled an enormous collection of 3D-scanned buildings. Though it has worked with researchers before, the company decided it was time to make a larger part of its trove available to the community.
“We’ve Matterported every type of physical structure in existence, or close to it. Homes, high-rises, hospitals, office spaces, cruise ships, jets, Taco Bells, McDonalds… and all the info that is contained in a digital twin is very important to research,” CEO RJ Pittman told me. “We thought for sure this would have implications for everything from doing computer vision to robotics to identifying household objects. Facebook didn’t need any convincing… for Habitat and embodied AI it is right down the center of the fairway.”
To that end it created a dataset, HM3D, of a thousand meticulously 3D-captured interiors, from the home scans that real estate browsers may recognize to businesses and public spaces. It’s the largest such collection that has been made widely available.
The environments, which are scanned an interpreted by an AI trained on precise digital twins, are dimensionally accurate to the point where, for example, exact numbers for window surface area or total closet volume can be calculated. It’s a helpfully realistic playground for AI models, and while the resulting dataset isn’t interactive (yet) it is very reflective of the real world in all its variance. (It’s distinct from the Facebook interactive dataset but could form the basis for an expansion.)
“It is specifically a diversified dataset,” said Pittman. “We wanted to be sure we had a rich grouping of different real world environments — you need that diversity of data if you want to get the most mileage out of it training an AI or robot.”
All the data was volunteered by the owners of the spaces, so don’t worry that it’s been sucked up unethically by some small print. Ultimately, Pittman explained, the company wants to create a larger, more parameterized dataset that can be accessed by API — realistic virtual spaces as a service, basically.
“Maybe you’re building a hospitality robot, for bed and breakfasts of a certain style in the U.S — wouldn’t it be great to be able to get a thousand of those?” he mused. “We want to see how far we can push advancements with this first dataset, get those learnings, then continue to work with the research community and our own developers and go from there. This is an important launching point for us.”
Both datasets will be open and available for researchers everywhere to use.