BitcoinWorld The dirty work of training robots: XDOF raises $70M to build the data pipelines AI labs desperately need Two weeks ago, OpenAI announced it wouldBitcoinWorld The dirty work of training robots: XDOF raises $70M to build the data pipelines AI labs desperately need Two weeks ago, OpenAI announced it would

The dirty work of training robots: XDOF raises $70M to build the data pipelines AI labs desperately need

2026/06/18 00:05
7 min read
For feedback or concerns regarding this content, please contact us at [email protected]

BitcoinWorld

The dirty work of training robots: XDOF raises $70M to build the data pipelines AI labs desperately need

Two weeks ago, OpenAI announced it would relaunch its robotics program, shuttered in 2021 — the latest signal that the biggest AI labs are racing to teach machines to operate in the physical world. But building capable robots requires something the AI industry doesn’t yet have at scale: the training data to match what powers language models. That gap is creating a new kind of infrastructure business.

Unlike large language models trained on a vast sea of publicly available text, robots need data that captures physical interaction — and that kind of data barely exists. YouTube videos and footage captured by gig workers are low-fidelity and hard to reconcile with the physical world. Enter XDOF (pronounced “ecks-doff”), a startup emerging from stealth today that is betting the next great bottleneck in AI isn’t models or chips, but the data feedback loop needed to teach robots how to interact with the physical world.

Building the data ecosystem for physical AI

XDOF aims to build the data pipelines, collection tools, and annotation systems that frontier labs and robotics companies can’t easily build themselves. The company has raised $70 million from Thrive Capital, Spark Capital, a16z, Lux, and WndrCo to do it. Co-founder and CEO Philippe Wu says XDOF, which has about 60 employees, is already working with 20 customers including several frontier AI labs, though he cannot name them.

“All of the top labs are trying to pursue robotics,” Wu said in an interview. “We’ve already seen some of the downfalls of falling a little bit behind in the language model race … you don’t want to be in this type of situation where you pursue this technology too late, and everyone is in this boat where physical AI is the next frontier.”

Wu ran into this problem himself as a PhD student at UC Berkeley, where his focus was on enabling robots to learn skills from large-scale data sets. There was just one problem. “We didn’t have large-scale data to work with,” he said. “There was this chicken-and-egg problem — we first needed to actually collect data before we could even ask how to train a foundation model for robotics.”

Wu and his future XDOF co-founder and CTO, Fred Shentu, worked on a project called GELLO, a low-cost teleoperation system that lets a human operator control a robotic arm to generate training data. “It ended up becoming a very influential paper in robotics, because a lot of people had similar needs and bottlenecks, and many started leveraging this type of device for data collection,” Wu said.

Spotting the opportunity, Wu, Shentu, and third co-founder and Chief Operating Officer Nemo Jin launched XDOF in October 2024 to provide a data ecosystem for companies pursuing robotics models. Mindful that data provision alone can be a dead-end business, the company is also focused on data cleaning, tooling, and annotation — creating a self-reinforcing feedback loop for robot trainers.

The ABC dataset: A new benchmark for robotics research

As a starting point, the company is partnering with UC Berkeley’s AI Research lab to release what it believes is the largest collection of high-quality robot training data ever assembled, dubbed ABC. It includes 130,000 trajectories of robot manipulation data, 300 hours of simulation, and 100 hours of evaluations. That kind of scaled-up pre-training data has never been available to academia before.

“We’ve seen in language, image generation, and other fields, that when models and data are released, the community achieves things that you wouldn’t necessarily have expected,” David McAllister, a Berkeley PhD student who helped organize the release, told Bitcoin World. The team has already used the data to train robots on benchmark tasks like folding T-shirts, flattening boxes, and loading AirPods into their cases.

Three tiers of data collection

The company plans to work across three tiers of a data pyramid. The most valuable tier is teleoperation data collected on the actual robot being deployed; next comes teleoperated robots gathering more general data, as with GELLO; and finally “egocentric” data gathered by humans performing everyday tasks, for which XDOF plans to build its own wearable sensors.

“Your camera choice is going to affect the quality of your data — which is going to affect how your hand-tracking algorithm performs,” Wu said. “If you don’t design the hardware well from the start, the data you collect might have very specific problems that you didn’t anticipate.”

The company plans to hire and train armies of teleoperators and egocentric data operators around the world — a labor-intensive model that raises an obvious question: Why aren’t the major labs doing this data production work themselves?

“You need a warehouse of hundreds of thousands of square feet with hundreds of robots,” Wu said. “You need to maintain these robots, calibrate their physical parameters, and properly train operators.” It’s a build-out that requires focus, capital, and operational scale that most AI labs would rather outsource — which is precisely the market XDOF is betting on.

Why this matters for the AI industry

The emergence of XDOF signals a broader shift in the AI landscape. As frontier labs race toward physical AI — robots that can operate in unstructured human environments — the data bottleneck is becoming as critical as computing power or model architecture. Companies that can provide reliable, high-quality training data for physical interaction are positioning themselves as essential infrastructure providers.

The name XDOF is a play on the robotics term “degrees of freedom,” which describes the number of independent motions a robot can perform. Your arm, from shoulder to wrist, has seven degrees of freedom. Humanoid robotics company Figure.AI’s latest robot has 30. The X in the company’s name captures its ambition: “Arbitrary degrees of freedom, unlimited degrees of freedom,” Wu says.

Conclusion

XDOF’s $70 million raise and emerging-from-stealth announcement underscore a growing recognition in the AI industry: the path to capable physical AI runs through data infrastructure, not just better models. As more labs follow OpenAI’s lead in restarting robotics programs, the demand for high-quality, physically grounded training data will only intensify. XDOF is positioning itself at the center of that demand, building the pipelines that could determine which companies succeed in the race to build robots that can actually work in the real world.

FAQs

Q1: What is XDOF and what does it do?
XDOF is a startup that builds data pipelines, collection tools, and annotation systems for training robots. It provides the physical-world training data that AI labs need to teach robots how to interact with their environments.

Q2: Why is robot training data different from language model training data?
Language models can be trained on vast amounts of text available on the internet. Robot training data must capture physical interactions — like grasping objects or folding clothes — which requires specialized collection methods like teleoperation or wearable sensors.

Q3: How much funding has XDOF raised and who are the investors?
XDOF has raised $70 million from Thrive Capital, Spark Capital, a16z, Lux, and WndrCo. The company has about 60 employees and is already working with 20 customers, including several frontier AI labs.

This post The dirty work of training robots: XDOF raises $70M to build the data pipelines AI labs desperately need first appeared on BitcoinWorld.

Market Opportunity
Gensyn Logo
Gensyn Price(AI)
$0.02423
$0.02423$0.02423
-1.74%
USD
Gensyn (AI) Live Price Chart

World Cup Combo: Aim for 200x

World Cup Combo: Aim for 200xWorld Cup Combo: Aim for 200x

Combine up to 20 World Cup matches in one order

Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact [email protected] for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.

Score Your Share of 50K USDT

Score Your Share of 50K USDTScore Your Share of 50K USDT

Complete DEX+ tasks to unlock the Champion Wheel