Getting Started With RunwayML Gen2
When I first opened RunwayML Gen-2, I didn’t think it would be that different from Runway’s older video generation tools. But within the first five minutes, I found myself staring at a blank prompt box wondering what exactly I was supposed to type. Unlike other tools that guide you with dropdowns or presets, Gen-2 just gives you an empty “Generate” box and says go. There’s a learning curve, but once you get it, the creative potential is wild.
Thank you for reading this post, don't forget to subscribe!Gen-2 is not a traditional video editor where you trim and stack video clips. It’s a text-to-video AI tool, which means you give it a written idea—like “a hyper-realistic robot walking through a rainy city at night”—and it tries to generate motion visuals that match. It’s built for speculative, concept-art-type workflows more than polished final edits. That said, it does let you edit existing footage, or use an image or video as a starting point—something I’ll break down below because the modes behave very differently.
The interface looks like a minimal workspace: timeline at the bottom, asset tray on the left, and preview in the center. You can drag in existing media or work entirely from AI generation without any footage at all. Sound isn’t part of Gen-2’s feature set right now—what you get is silent video (MP4 export), so anything audio-related needs to be layered in after.
The first time I got Gen-2 to make a decent 4-second animation, I tried 12 different prompts and tweaked grammar each time. One added a comma—”a splashy lava ocean, neon sky”—and that totally shifted the scene layout. It started generating glowing rock instead of liquid. My takeaway? Prompts behave like code. Punctuation and word order really affect the output.
In short: Gen-2 offers AI-generated video clips made from text, images, or video clips, but unlike drag-and-drop editors, this one needs very precise input to work well.
PromptToVideo Mode And Prompt Syntax
This is where most people start—text-to-video generation using the Prompt-to-Video feature. I’ll be honest: most results are weird the first time. You type in plain English, but some words just completely confuse the model.
For reference, here are two prompts I tested:
Prompt | What Happened |
---|---|
“A shark swimming through a desert made of glass” | Generated a blue-ish shark flickering through a landscape that looked like sand, but glass was ignored. Odd camera shakes too. |
“A realistic golden retriever sitting at a cafe, in Paris, during sunset” | More successful—dog shape was accurate. Cafe had chairs fused in the background. Vibe was more early morning than sunset. |
There’s no control net or refined camera control. If you’re coming from tools like Midjourney, where you can add aspect-ratio tricks and seed values to get consistent outputs, Gen-2 doesn’t go that deep yet. But here’s what tends to help prompt crafting for Gen-2:
- Lead with the focal subject: Always describe the main element first. It prioritizes the first clause.
- Avoid metaphor or poetic words: Saying something like “a lonely wind through regretful trees” just tanks the generation.
- Use cinematic phrases: Words like “slow-motion,” “aerial view,” or “shallow focus” get better camera angles.
This mode generates around 4 seconds of footage by default. You can extend it to about 8 if you’re lucky, but long videos often break continuity. One time I tried generating “a cat jumping onto a table in a morning-lit kitchen” and got four nice seconds… then two seconds of glitched static. It also tends to forget the subject halfway through, especially if motion is involved.
In short: Treat prompts as literal. Structure your request like you’re building a movie shot-list, not writing a screenplay.
ImageToVideo Mode Differences
This mode uses a single image as reference, then animates it across time based on textual prompts or just motion inference. It’s more reliable than prompt-only generation if you’re struggling to get coherent results. I’ve used it to animate Midjourney renders and surprisingly, as long as the input image is clean (no weird limbs or warped eyes), Gen-2 builds nice slow pans and subtle movements.
What’s wild is that even without a prompt, the AI decides on its own what to animate. I ran a shot of a robot in a forest with no prompt and it decided to tilt the camera upwards, as if revealing a sky. Another version spun the robot slightly. You’re not getting keyframe-like control. You’re trusting vibes.
Here’s a visual table comparing Prompt-to-Video and Image-to-Video modes:
Feature | Prompt-to-Video | Image-to-Video |
---|---|---|
Input | Text only | Image (with optional prompt) |
Output Behavior | Subject + scene invented | Stays close to image style |
Best Use Case | Concept ideas from scratch | Stylized animation from static illustrations |
Failure Mode | Glitched subjects or nonsense camera paths | Stiff motion or over-stretched morphing |
Keep in mind: the motion Gen-2 generates from an image is more about parallax-style illusion, not full-frame animation. If your image has a lot of overlapping shapes or intricate shadows, the animation might smear those in weird ways.
In short: This is your mode if you already have a strong visual and just need subtle camera movement or life added.
VideoToVideo Stylization And Limitations
This one’s both impressive and frustrating. Video-to-video lets you upload your own motion clip (a few seconds long) and apply a style and transformation over it—kind of like Stable Diffusion but with time. You can’t apply filters like “make it Pixar” and expect frame-perfect results. But you can achieve painting-like sequences, or make your footage look like a sci-fi hallucination.
My test used drone footage flying over a lake. I applied a vague prompt—”van Gogh style with vibrant color swirls”—and got two versions. First had painterly textures but completely broke the motion at frame three. The other retained the horizon and lake shape but swapped trees for curly golden brushstrokes.
It behaves more like a dream interpreter than a style transfer tool. And it doesn’t keep faces stable. I tried stylizing a clip of my friend walking through a street market and every three frames his jaw reshaped like soft clay. No way to lock identity yet.
Also, there’s no prompt history or undo. So if you stylize a clip and are unhappy with it, you won’t get the original clip back unless you reupload it anew.
In short: This mode works best for ambient b-rolls, not for anything needing character consistency or facial clarity.
Editing Workflow Inside Gen2
This confused me a bit the first time. RunwayML positions itself as an editor, but it’s not like Premiere Pro or Kapwing. You can’t slice footage across multiple tracks. You can’t layer audio. It’s more like a storyboard composer that lets you arrange short clips in sequence and add soft transitions.
Each generated segment shows up as a block in the timeline. You can trim it, duplicate it, reorder it. But there’s no real “cutting” beyond setting start and stop points. Each block is locked as-is—you can’t modify the clip’s internal style or prompt once it’s saved. If you want to change a dog’s expression mid-animation? Nope. Re-render the whole thing.
One thing I appreciated was how fast export works—once you’re done arranging 3–4 blocks, you click Export Project and it crunches it in-browser within minutes. But export defaults to 720p unless you upscale (which uses credits).
In short: This isn’t a nonlinear editor. Think of it as a visual script where each paragraph is a 4-second movie clip.
My Most Successful Use Cases So Far
This tool shines most when you treat it like a concept generator for visuals. Some of the best use cases I tried:
- Fantasy establishing shots: Like a “volcano above alien jungle at dusk.” Great for storyboarding.
- Motion concept art for pitches: Short clips with movement for explaining mood in client decks.
- Image-to-video for Instagram teasers: Took a sci-fi Midjourney render, animated it, and cropped for Reels. Worked better than stills.
What didn’t work:
- Trying to build multi-step actions (e.g., “a person walks to a door and opens it”)
- Maintaining consistent identities across several clips
- Creating video memes or speech-driven content
In short: Think ambiance, not narrative. Think dream-scene, not how-to explainer.
Export Tips And Resolution Realities
You export via the “Download” tab after adding your clips to the timeline. By default, videos export at standard definition, around 720p. If you want full HD or higher, you can upscale inside Runway itself—but this uses credits, and results vary.
The upscaler worked great for still-image-based videos. But when I applied it to my “market street” hallucination clip, one face came out sparkly but the rest of the frame decayed into noise. Upscaling doesn’t fix motion blur glitches—it just cleans the existing pixels.
If you plan to color grade or edit afterward in a tool like DaVinci, stick with the default export and don’t upscale inside Runway. It’s better to apply LUTs afterward on stable footage instead of relying on Gen-2’s unpredictable sharpening.
In short: Export only when you’re happy, and don’t treat upscaling as a fix-all.
Final Thoughts On Where It Fits
RunwayML Gen-2 isn’t for everyone. If you’re looking for AI video to push social content or explainers quickly, you’re better off recording with real footage and auto-captioning it. But if you deal with concept creation, world-building, or surreal moodboards—it’s like having a dream synthesizer in your browser.
You have to accept it’ll get some things wrong. A sky might melt halfway, or a horse might have two legs replaced by flags. But when it gets it right—like that one render of “abandoned carnival spinning slowly underwater,” backlit perfectly through rippling waves—it feels like magic.