Sora
video
A multimodal text‑to‑video model that can generate, remix, and extend videos from textual prompts, images, or existing clips.
OpenAI’s Sora is a multimodal text‑to‑video model that can generate, remix, and extend videos from textual prompts, images, or existing clips. It supports video outputs up to 1080p and up to 20 seconds in duration, with tooling like a “storyboard” interface that gives users frame‑level control, and features to blend, layer, or animate user assets.
Sora builds on OpenAI’s earlier work in DALL·E and GPT, using a diffusion‑transformer architecture and techniques for maintaining consistency across frames. The platform also incorporates safety and moderation mechanisms—filtering disallowed prompts, planning detection tools for AI‑generated video, and metadata tracing—intended to mitigate misuse risks as the system is developed and iterated.