Genie
Genie (Generative Interactive Environments) is a foundation “world‑model” trained on large collections of Internet videos that can convert a single image, whether a sketch, photograph or synthetic scene, into a fully playable environment. Without relying on annotated action labels, Genie infers latent controls and spatial dynamics from video data, allowing users to explore, interact with, and control generated worlds that match the input imagery.
The project aims to bridge generative AI and interactive media: creators can sketch or render scenes and bring them to “life,” while AI agents can be trained across endless, diverse simulation spaces. Though the demonstrations focus on 2D platforming domains, the method is presented as generalizable to more complex environments and modalities.