Why We Need World Models to Replace Chatbots

Three years in, models are 100x more capable. Why is AI still just a chat box?

After two years building AI products, I’m more and more convinced of one thing: the chat box is the Nokia of the AI era.

It’s just the first form of AI we got our hands on. Higher-dimensional products will replace it.

I. First-gen AI: chat boxes — write your prompts

Regular people aren’t bad at AI because AI is dumb. They’re bad at it because they don’t know what to ask, or how to ask.

Quick example.

Before AI, there was a phrase that got thrown around constantly: “Can’t you just Google it?”

A decade after search engines went mainstream, plenty of regular people still couldn’t break what they wanted into keywords. If “turn a question into 3-5 search keywords” already had a barrier, then “write a complete prompt with full context, clear goals, and verifiable tests” is a much higher one.

Just look at how many DeepSeek and Doubao courses are selling on the market right now.

So I keep saying: the real difficulty for regular people using AI isn’t technical. It’s expressing what they want clearly.

Claude Code and Cursor are obviously built for programmers. A regular person sees the UI and closes it instantly. Doubao is a bit better, but watch the people around you who say they want to learn AI — almost none of them actually use Doubao well.

They don’t know how to follow up. They don’t know how to give context. They often haven’t even thought through what they want. When AI answers off-target, they have no idea how to course-correct. One message bounces off another, and both sides get stuck.

A hot product doesn’t mean the form is right. What matters first is what users actually need.

It wasn’t until OpenClaw arrived that AI agents started leaking out of the small programmer crowd and reaching everyone else.

II. Second-gen AI: agents — anticipate what humans need

OpenClaw launched in November 2025 (originally Clawdbot, renamed to OpenClaw in January 2026). It hit 100K GitHub stars in 3 months — the fastest star growth of any open-source project in GitHub history. No IDE, no terminal, you don’t even have to leave WeChat. Drop a message in your chat app and AI handles it.

What expands AI’s reach has never been technical sophistication. It’s lowering the barrier to use.

OpenClaw pointed at a direction: AI that does things on its own.

You wake up in the morning, and your “lobster” has already gathered today’s important news and written it up as an analysis report for you to review.

Your meeting ends, and the lobster has already summarized the notes and flagged the key points.

Users get more done with fewer prompts. Efficiency goes up, the barrier and the mental load come way down.

Still not enough. Most work today still needs the user to plan things out. What people really want is an agent that’s foolproof: say nothing, and AI knows what you’re trying to do.

That sounds impossible, but parts of it already exist. The most familiar one is Computer Use.

Anthropic shipped the first consumer version in October 2024. They gave Claude a wild new ability — let AI look at your screenshot, move the mouse like a human, type on your keyboard. You hand it a task, it opens the app, finds the info on the web, fills out forms, hits submit.

In January 2025, OpenAI followed with Operator, “AI’s own browser” that reads pages, clicks buttons, and fills forms like a human. That July, Operator merged into ChatGPT and became ChatGPT Agent.

Around the same time, products like Perplexity Comet started fusing AI with the browsing experience more deeply: not just searching for answers, but organizing information, comparing options, and suggesting next steps.

Human Security’s data is even more direct: since July 2025, network requests from agentic browsers are up 6900%. AI is taking over screen operations faster than anything before it.

In March 2026, Anthropic pushed Computer Use further: background parallel tasks, scheduled tasks, and the Dispatch feature — “go out for dinner, AI keeps working for you.”

The shared direction across these products:

AI no longer waits for you to ask. It watches what you do to figure out what you need.

It understands the page you’re looking at
It answers questions based on context
It occasionally runs simple operations for you

Now picture this:

A truly mature Computer Use agent that monitors your screen all day
Runs silently and predicts in real time what problems you’re about to hit
The moment it detects something, it pops up a few suggested fixes, you pick one, and it executes

No more opening a chatbot every time you have a problem, scratching your head over how to phrase the prompt, then walking through the AI’s solution step by step yourself.

The technology isn’t there yet: slow, unstable, very sensitive to page changes, bad at complex flows. Two big issues remain unsolved:

Privacy: AI has to know what you looked at, what you clicked, where you hesitated, what you copied. That’s basically full behavioral monitoring.
Performance and cost: Real-time AI inference means analyzing every page and reasoning over every action. With current models, latency is too high and token costs are too steep. Not worth it.

If models keep improving and these problems get solved, Computer Use becomes the technology that lets regular people use AI without writing prompts.

But Computer Use still has a hard ceiling. You can only use AI on a computer. The moment you’re away from a screen, how does AI predict what you need next?

An agent trapped in a screen knows everything about the world, but not enough about you. That’s where software has to hand the ball off to hardware.

III. Third-gen AI: wearables + AI — they know everything about you

Before talking about glasses, watches, and brain interfaces, here’s a principle a lot of people still miss: AI’s output quality depends heavily on the user’s input quality.

The people who use AI brilliantly today almost all have a complete “second brain” behind them. A personal knowledge base built up over years or even decades, with all their notes, project docs, meeting transcripts, codebases, even every past conversation. They feed all of this to AI, and AI becomes a version that serves only them. Knows their preferences, their go-to libraries, their thinking patterns, their entire history.

That’s why the AI of a power user is just smarter than the AI of a regular person. The model isn’t stronger; the input quality is higher and the customization is deeper. The “lobster” of a power user just gets it.

For a regular person to build the whole system today, there are three mountains in the way:

Hardware barrier: you need a Mac or a powerful PC, install a stack of tools, configure environments, hook up APIs, try to deploy local models.
Content barrier: you need a long-term writing and recording habit, and the discipline to digitize and structure it. Most people can’t keep a journal for three days. Their work docs vanish when they leave a job. Meeting recordings never get saved.
Engineering barrier: you need basic agent knowledge to build your own knowledge base, set up workflows, and continuously tune the whole setup.

Stack those three together and you get a wall that separates 99% of regular people from “high-intelligence AI.”

But flip the framing — what if AI is something you wear?

Imagine you’re wearing AI glasses that stay on all day, an AI watch on your wrist, a hat on your head that reads your attention. They build that second brain in the cloud for you, frictionlessly, in the background.

What you hear, see, say, write, type, even your physical health — all of it gets transcribed, tagged, and archived automatically.

In no time, everyone has their own “digital twin” — a life record more complete than your own memory. You don’t have to scramble for a prompt to get AI to understand you. It is you.

When that day comes, all the courses out there charging you to learn prompt engineering, RAG, or how to build a second brain go completely obsolete. You don’t need to “manage” a knowledge base. Every input and output syncs to your personal cloud automatically, gets categorized, gets fed to AI.

The skills that look “must-learn” today are really transitional skills before AI goes mainstream — like learning “how to use a Nokia for email” in 2000.

The exciting part is plenty of startups are already on this, and consumer products are starting to ship.

Limitless (Meta acquired), Bee (Amazon acquired), Omi — necklaces, wristbands, tiny clip-ons in every form. They’re already on the necks, wrists, and collars of a chunk of Silicon Valley. Different shapes, same logic: wear it all day, record audio for 12+ hours, and at night the day’s audio gets transcribed, tagged, deduplicated, and archived into your personal knowledge base. The next morning your AI shows up with the context of “what you said yesterday, what you heard, what you promised to whom.”

Limitless got 10,000 preorders in 24 hours when it launched. In December 2025, Meta acquired the entire Limitless team and tech and folded it into Reality Labs.

CES 2026 had a whole row of booths for these “record your whole life” devices. Some commentators called it “an incoming Black Mirror,” but the more accurate framing is this — it’s the most underrated piece of the next-generation AI product puzzle.

Maybe today only the AI-savvy crowd is excited about it, but wasn’t GPT-3 the same? Once a killer product like ChatGPT lands, it goes mainstream fast.

This is what the hardware layer is really solving — giving AI an entry point so it can record your life, truly understand you, and become you.

Once you see this, glasses, Apple, and brain interfaces start to make a different kind of sense.

Thread one: smart glasses

I wrote a piece on smart glasses earlier, “Smart Glasses: The Next Trillion-Dollar Market You Can’t Ignore” — worth a read if you’re curious.

Meta’s CTO Andrew Bosworth said something on the a16z blog this year:

“The next wave of consumer tech won’t run on taps and swipes — it’ll run on intent.”

“Run on intent” — for AI to do things before you even speak, it needs to see what you see and hear what you hear.

Glasses are the most elegant form for this. Closest to your eyes and ears, lightweight, hands-free, doesn’t block your view. What it sees is what you see. What it hears is what you hear.

Meta is the most aggressive player. Zuckerberg is personally pushing it for a reason — he’s betting on “the next consumer electronics entry point.”

Thread two: Apple’s CEO change

This also explains why Apple announced a CEO change in April 2026 — John Ternus, hardware engineer by background, takes over from Tim Cook in September. His past scope at Apple covered the hardware engineering teams for iPhone, iPad, Mac, Apple Watch, AirPods, and Vision Pro.

CNBC’s deep dive on the move spelled the signal out clearly: AI differentiation is no longer about cloud scale or model performance. It’s about integrating silicon and software on the device.

Ternus taking over is basically Apple admitting: for the next decade, Apple won’t compete with OpenAI or Anthropic on models. Apple will pack AI into every device on your body.

Vision Pro shipped only about 45,000 units in Q4 2025. Not a hit. But Apple didn’t drop spatial computing. They swapped the CEO for a hardware guy and plan to ship AI glasses next year.

What that tells you: Vision Pro was too heavy, too expensive, and the ecosystem wasn’t there. But the “spatial computing + wearable + AI” direction is right. Apple’s bet is that Ternus, having absorbed the Vision Pro lessons, can do better.

Thread three: brain-computer interfaces

Even more aggressive than smart glasses are brain-computer interfaces. The topic felt like sci-fi a few years ago, but non-invasive BCIs (no skull surgery — just a headband or hat) are already here.

BrainCo — founder Han Bicheng, a Harvard PhD dropout who started a medical-focused BCI company. In September 2025 they shipped the Revo2 prosthetic hand: 383 grams, 0.1mm precision, 50 newtons of grip strength. Amputees wearing it can play piano. In January 2026, BrainCo’s pediatric ADHD treatment device, Focus Xin, passed China’s medical device approval and is now being sold in hospitals. Earlier this year they filed for a Hong Kong IPO at over $1.3B valuation. One of the first BCI companies to scale into mass production.
Sabi — this California startup just exited stealth in April 2026, with early OpenAI investor Vinod Khosla backing them. Sabi claims they’ve collected the world’s largest neural dataset and trained the strongest Brain Foundation Model. The product they’re shipping is a beanie, packed with 70,000 to 100,000 ultra-dense EEG sensors, decoding your inner monologue directly: think a sentence, see it on the screen. No talking, no typing. First version ships at the end of this year, next version is a baseball cap, target speed is 30 wpm — slower than most people type, but once the basic loop closes, performance climbs fast.

People call Sabi “the non-invasive Neuralink competitor.” I think they have it backwards. Neuralink’s open-skull-and-implant route is destined to stay niche medical. The product that actually ends up on every head will be something like Sabi: a hat you just put on.

One has already entered the medical system. Another is shipping with 100K sensors. BCIs went from “still 10 years out” to “this year” in two years.

These three hardware threads — glasses, Apple, BCIs — look like they’re each going their own way. They’re actually doing the same thing: giving AI the legs it needs to walk into the physical world, through wearables.

But hardware has a fatal bottleneck: cloud LLMs can’t handle the real-time physical world. Latency is too high, and the models have no physical intuition.

You can’t send every frame from glasses to the cloud and wait for GPT to reason. Neural signals from a BCI hat need millisecond responses. They can’t wait for network round trips. A watch’s battery can’t take the power draw of a giant model.

So the hardware bottleneck kicks the ball back to the foundation. If AI is going to live in trillions of wearables, it needs a brand-new architecture. A small AI that runs on the chip in your device.

IV. World models

World models are the underlying engine that lets AI understand the physical world.

Every frame from glasses, every neural signal from a BCI hat, every millisecond of motion data from a watch — send it all back to the cloud for GPT to process and the latency makes the whole thing impossible. Before AI finishes saying “watch out for that car,” the car has hit you.

There’s only one path to truly seamless wearable AI: world models. And that path runs through Yann LeCun.

LeCun left Meta in November 2025, after 10+ years there as Chief AI Scientist, Turing Award winner. Why does a 65-year-old AI legend quit? Because he thinks Meta is doubling down on the wrong path.

In January 2026, in an MIT Technology Review interview, he said:

“LLMs are limited to the discrete world of text. They can’t truly reason or plan, because they lack a model of the world.”

In another interview, he went deeper: the essence of intelligence isn’t “being able to talk.” It’s “predicting the consequences of action”. Before you reach for a cup, your brain has already simulated how your hand will move and whether the cup will tip — that’s what a world model means. LLMs only learn relationships between words. They don’t know apples fall from trees, or that water can’t flow uphill.

Another LeCun quote pierces the industry’s self-hypnosis:

“Language turned out to be the easy part. The hard part is the physical world.”

So you see AI passing exams and writing code today, yet there are still no household robots and no L5 self-driving. By LeCun’s read, that’s inevitable. The current LLM path can’t get to the physical world.

He went on to start AMI Labs in Paris, and raised $1.03B in two months at a $3.5B valuation.

This isn’t just LeCun’s faith. The VC market is putting real money behind a new path. Multiple players are now in the world model space in 2026:

AMI Labs (LeCun) — $3.5B valuation
World Labs (Fei-Fei Li) — $5B valuation
Runway — $5.3B valuation

The difference between world models and LLMs:

LLMs understand text — what word should come after another word.

World models learn the rules of the physical world — objects can’t pass through walls, balls fall when you throw them, water doesn’t flow uphill — letting AI understand the physical world the way humans do, through video.

In March 2026, LeCun’s team published a paper:

“LeWorldModel: Stable End-to-End Joint-Embedding Predictive Architecture from Pixels” arxiv: 2603.19312

This paper solved a stubborn problem in world model research that had lingered for years — representation collapse.

Old world models cheated. They figured out that mapping every input to the same vector trivially satisfied the training objective, so that’s what they did. Dogs, cars, people — all squashed into identical vectors. The model looked like it was learning, but it understood zero physical rules.

LeWM solved this with an elegant mathematical regularizer (SIGReg):

15M parameters — 100,000x smaller than GPT-4
Trains in a few hours on a single GPU
48x faster than mainstream baselines on robot planning tasks

What does 48x mean in practice? Same planning task — old methods take 47 seconds, LeWM takes 0.98 seconds.

The paper’s contribution is pushing the “end-to-end JEPA world model” path from “theoretically possible” to “an engineering starting point.” It isn’t a production-grade model that runs on glasses, but it’s a key milestone — proof you can stably train a small world model without a pretrained large model.

This is the most promising path toward an on-device world model. Once the paradigm is right, the rest is engineering.

The trend is clear:

The AI of the future isn’t one all-purpose brain. It’s two brains with different jobs.

Brain A (LLM) — handles language, creativity, reasoning, runs in the cloud
Brain B (JEPA / world model) — handles physics, space, perception

Your glasses, your BCI hat, every wearable on your body in the future — all of them run Brain B. It’s the most important engine for getting AI into human life without friction.

Apple swapped in a hardware CEO. Meta sold 7 million smart glasses in a year. Capital is putting tens of billions on world models. BrainCo’s BCI is already treating ADHD in hospitals.

Four seemingly unrelated things, all hinting at where AI is going:

Prompts are too hard to write → so AI has to anticipate → anticipating needs real-time user data → that data needs wearables to capture it automatically → all that data needs world models to digest it.

AI’s task for the next decade is to walk out of the chat box and into everyone’s life.

I. First-gen AI: chat boxes — write your prompts#

II. Second-gen AI: agents — anticipate what humans need#

III. Third-gen AI: wearables + AI — they know everything about you#

Thread one: smart glasses#

Thread two: Apple’s CEO change#

Thread three: brain-computer interfaces#

IV. World models#

Related Posts

Comments

Stay Updated