WorldDirector: Building Controllable World Simulators with Persistent Dynamic Memory
Abstract
WorldDirector enables controllable video generation with persistent object memory by decoupling semantic motion planning from visual rendering through LLM coordination of 3D trajectories and camera movements.
We present WorldDirector, a highly controllable video world model framework designed for persistent dynamic object memory and unrestricted viewpoint exploration. Unlike existing world models that entangle physical dynamics with pixel rendering and rely on continuous visual observation to sustain motion, our framework explicitly decouples semantic motion orchestration from visual generation. By leveraging an LLM to coordinate 3D trajectories with camera movements and subsequently employing these orchestrated trajectories as control signals for video generation, our approach ensures strict physical logic and appearance stability, successfully preserving the exact visual identities of dynamic entities even when they re-enter the scene after prolonged periods out of view. Experimental results demonstrate that our method supports the synthesis of complex and extended events with unprecedented controllability and persistent dynamic object memory. Project Page: https://worlddirector.github.io/
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- MBench: A Comprehensive Benchmark on Memory Capability for Video World Models (2026)
- DecMem: Towards Minute-Long Consistent World Generation with Decoupled Memory (2026)
- ViPSim: Collaborating Visual and Parameter Spaces for Consistent Long-Horizon Embodied World Models (2026)
- Directing the World: Fast Autoregressive Video Generation with Compositional Human-Camera Control (2026)
- WorldCraft: From Camera Navigation to Object Manipulation in Interactive Video World Models (2026)
- AnyScene: Towards Highly Controllable Driving Scene Generation at Anywhere and Beyond (2026)
- Compression and Retrieval: Implicit Memory Retrieval for Video World Models (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Get this paper in your agent:
hf papers read 2607.02517 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper