World Models?

World models are becoming the next real battleground in AI - not just chat, not just image generation, but systems that can simulate how environments behave, change, and respond.

And the field is not converging on one idea; it is splitting into competing philosophies.

LeCun / AMI Labs are betting that true intelligence will come from latent world models like JEPA, where the system predicts abstract structure instead of reconstructing every pixel, and AMI has already raised more than $1B to pursue that path.

Runway and OpenAI Sora (RIP) represent another camp: generative world models that learn by predicting and rendering the world itself, with Runway now shipping GWM-1 variants for explorable worlds, robotics simulation, and avatars.

Google DeepMind Genie 3 pushes this even further toward real-time interactivity, generating navigable environments at 24 fps and letting users modify the world live with new prompts.

World Labs, founded by Fei-Fei Li, is especially interesting because it is aiming at spatial intelligence more directly: generating full 3D scenes from a single image or prompt, with geometry, depth, and navigation built in.

Then there is the code-based world model direction, where LLMs generate executable programs that simulate environments, and research shows this can make planning 4–6 orders of magnitude faster than relying on neural rollouts alone in formal domains.

To me, this is the important shift: AI is moving from describing the world to modeling the world.

And once a system can model a world, it can do much more than generate media - it can plan, test actions, reason about consequences, and eventually become a real design or robotics engine.

My bet is that there will not be one dominant world model architecture.

We’ll likely end up with different stacks for different needs: latent models for abstraction and planning, video models for realism, 3D models for spatial interaction, and code-based models for precision and control.

For anyone building in design, robotics, games, or spatial computing, this feels like the beginning of a new foundational layer - not just models that generate outputs, but models that can simulate possibility.

The companies that matter in the next wave of AI may not be the ones with the best chatbot. They may be the ones that build the best simulation layer for reality.

It also makes me wonder whether systems like Spellshape - which turn intent into structured modeling briefs, executable spatial actions, and editable 3D outcomes - are an early form of a design world model.  

DE

Source

This article was originally published by DEV Community and written by Stepan Kukharskiy.

Read original article on DEV Community

Back to Discover

Reading List