Calliope Best Practices
Last updated: June 5, 2026.
This guide is for users and AI assistants preparing videos, scripts, narrations, templates, Motion Graphics, or API calls for Calliope.
Public version: https://calliopelabs.co/best-practices
Step 1: format and template
In Step 1, choose the base format for the video:
1.A Short: vertical video from 20 to 80 seconds. 1.B Long-Form Video: horizontal video from 2 to 15 minutes. 1.C Auto-Broll: a video, usually talking-head, where Calliope adds AI B-Roll.
- Template: the visual format that inherits visual style, references, captions, music, SFX, Motion Graphics, and models saved in the project/template.
Step 2: script, narration, and generation
In Step 2, choose exactly one input mode:
- AI Writer: the user has an idea and wants Calliope to write the narration.
- Manual Script: the user already has the exact text that must be narrated.
- Uploaded Narration: the user already has an MP3/WAV audio file and wants to use it as-is.
Do not mix these modes.
AI Writer
Use AI Writer when the user has an idea, but not a final narration script.
The textarea can include the topic and general instructions for what the scenes should contain: places, objects, actions, characters, eras, or shot types. These instructions help Calliope write a more aligned script and then generate more coherent visual scenes.
Do not promise that an exact storyboard will be executed 1:1. The final number of scenes is controlled by Calliope based on duration, template, cost/models, and the selected visual frequency. We do not recommend a visual storyboard; a script/story outline is fine.
Example (the more specific and detailed, the better):
Create a 5-minute video about why Roman roads are the foundation of modern roads.
The first sentence must be exactly: "What would have happened if Roman roads had never existed?"
Structure: Hook + 3 fast points, no long intro: Marcus Aurelius, Roman engineering, drainage + conclusion + CTA.
Tone: curiosity-driven, fast, "what if?" style.
Ending: CTA: "Buy my E-Book! Link in BIO."Manual Script
Use Manual Script when the narration text is already final.
Write only the words that should be spoken. Do not include:
- Scene numbers.
- Durations.
- Camera instructions.
- Storyboard notes.
- Instructions like "Create a 360-second video".
- Lines like "Scene 1: show a man walking", unless you want those words to be narrated.
Good Manual Script:
What would have happened if Roman roads had never existed? Roman roads are the foundation of modern roads. First, Emperor Marcus Aurelius built a road network to connect his empire... Second, Roman engineering made it possible to build durable roads with several layers of materials... Third, the Roman drainage system prevented flooding and kept roads usable. Without Roman roads, transport and trade would have been much harder, and the modern world would be very different. Buy my E-Book! Link in BIO.To control visuals precisely, generate a first version and then edit the prompts for specific scenes in Step 3.
Uploaded Narration
Use Uploaded Narration when the audio already exists.
Rules:
- Upload MP3 or WAV.
- Shorts: approximately 20 to 80 seconds.
- Videos: approximately 2 to 15 minutes.
- Leave the script textarea empty.
- Let Calliope transcribe the audio and align visuals to the real audio.
- Use the template, visual references, visual frequency, and later prompt editing to direct the visuals.
Do not tell the user to paste a long storyboard into the main textarea while also uploading audio. That text is not a separate visual-direction field: the real narration source will be the uploaded audio.
Available Settings In Step 2
From the UI, the user can choose language, voice, voice quality, preview voices, upload narration, generate a script, generate voiceover, regenerate voiceover, and select background music once the audio/script has been generated.
From the cost/quality panel, the user can change image quality, enable/disable animation, choose video quality, choose how often clips are animated, and move the image visual-frequency slider. Visual frequency is configured with that control, not by writing "generate 22 scenes" in the textarea.
If Auto-Complete is enabled, Calliope will generate assets immediately after audio, without a script/audio review in Step 2. If it is disabled, the user can review before continuing.
We recommend not enabling animation (I2V) unless the template is already optimized for it. It is usually better to click "Animate" in Step 3 after checking that the scenes are good. This saves cost if regenerations are needed.
Step 3: assets, timeline, and render
In Step 3, the user reviews the generated content before the final render.
From the UI, the user can:
- Preview the video and select clips in the timeline.
- Edit the prompt of a completed or failed clip and regenerate only that clip.
- Change the image/video model for a regeneration when the UI allows it.
- Animate available image clips.
- Generate a new clip in empty V2 spaces with a custom prompt.
- Upload personal images or videos to the timeline.
- Adjust timing, split clips, and move/trim personal assets.
- Edit captions, effects, Motion Graphics, music, SFX, narration volume, and music volume from the sidebars.
- Move or delete Motion Graphics and SFX badges in the timeline.
- Replace uploaded narration when the UI allows it.
- Save changes, resume blocked jobs, or confirm render.
Best visual-control hierarchy:
- Use the right template.
- Configure visual style, references, and visual frequency before generation.
- Keep the script concrete. Specific nouns and actions create better scenes.
- Review the first asset plan.
- Edit the prompts of weak scenes directly.
- Regenerate only the bad clips instead of restarting the whole job.
Avoid abstract briefs like "make it attractive" or "add cool visuals". Use concrete objects, places, actions, and constraints.
Template Workflow
A template is a Calliope project saved as a reusable production format.
To create a good template:
- Start from the most similar existing template, not from scratch.
- Configure visual style and references.
- Configure voice, language, captions, music, SFX, Motion Graphics, and model quality.
- Save the project.
- Reuse that project as a template for future jobs.
- For API usage, download and read the API Skill.
Visual Style:
Important, if you want to add text to the scenes you must specify it explicitly, E.G.: "OBEY this instruction above the rest: I explicitly want text, phrases, symbols or titles inside the images/scenes (As in Refs 1 and 2), diagrams with details or explanations are also allowed (As in Ref 5)". It is highly recommended to add well-labeled Visual Refs exemplifying this.
Motion Graphics
Use Motion Graphics for reusable overlays: counters, maps, stats panels, timelines, callouts, labels, and branded visual systems.
Rules:
- A Motion Graphic must be visually specific, short, and reusable.
- Editable text must be a text variable.
- Finite visual options must be dropdown/choice variables.
- Do not create conceptual variables that do not visibly change the result.
- The selector label must explain when the graphic should appear in a script, not only how it looks.
- Keep Motion Graphics focused. One strong reusable component is better than a generic overlay that tries to do everything.
Example:
Create a reusable Motion Graphic for a history video: a 5-second ancient map callout with a large region, a highlighted route, and a small year label. Make the region and year editable. Make the route style a dropdown with "land", "sea", and "mixed".Captions, music, and SFX
Captions:
- Use the template/editor to control the exact style.
- In the API,
captionsis only an on/off switch. - Omit
captionsto inherit the template style.
Music and SFX:
- Configure them in the template for repeatable formats.
- Use SFX for emphasis, transitions, and Motion Graphics.
- In long-form, use SFX moderately unless the format requires frequent effects.
API and automation
API docs: https://calliopelabs.co/api/v1/docs
Checklist for AI assistants
Before writing for Calliope, ask:
- Is it a Short or Video?
- Which template/project should be used?
- Is the source AI Writer, Manual Script, or Uploaded Narration?
- What is the target duration?
- What visual style or references should the template use?
- Are captions, music, SFX, or Motion Graphics needed?
- What should be avoided?
Then generate the right output for the selected mode:
- AI Writer: write a short, concrete brief for Calliope.
- Manual Script: write only spoken narration.
- Uploaded Narration: do not write text for the main textarea; help the user improve the template, style, and post-generation scene prompts.
Common mistakes
- Uploading narration and also pasting a storyboard into the script textarea.
- Putting scene instructions in Manual Script as if they were invisible instructions.
- Using vague visual requests instead of concrete subjects and actions.
- Creating a new template before testing a similar existing one.
- Expecting the API
captionsfield to control detailed caption styling. - Asking for exact scene timing before knowing the narration/audio.
- Regenerating the whole job when only one clip prompt needs editing.