API Notes
The asset:// format asset ID used in image_url requires first creating an asset group via spark_asset_create_group, then uploading an image asset via spark_asset_create to obtain the ID
Authentication
authorization string required
All APIs require authentication via Bearer Token.
Get API Key:
Visit API Key Management Page to get your API Key.
Usage:
Add to request header:
Authorization: Bearer YOUR_API_KEY
Parameters
model string required
Model ID to use for the request
Value: spark_dance_v2_0
content object[] required
Input content for video generation. Supported media include text, image, video, and audio. Supported combinations:
Text only
Text (optional) + image
Text (optional) + video
Text (optional) + image + audio
Text (optional) + image + video
Text (optional) + video + audio
Text (optional) + image + video + audio
Text Information
objectText information input to the model
type
stringrequiredType of input content
Value:
texttext
stringrequiredText prompt input to the model, describing the expected generated video. It is recommended to keep the prompt under 1000 words. Lengthy text will lead to scattered information, and the model may ignore details and only focus on key points, resulting in missing elements in the generated video
Image Information
objectImage information input to the model
type
stringrequiredType of input content
Value:
image_urlimage_url
objectrequiredImage object input to the model
url
stringrequiredImage URL, Base64 string of image, or asset ID
Image URL: Enter the public URL of the image
Base64 string: Convert the local file to a Base64 encoded string. Format:
data:image/<image format>;base64,<Base64 string>. Note that<image format>must be lowercase, for exampledata:image/png;base64,{base64_image}Asset ID: ID of preconfigured assets and digital characters, following the format
asset://<ASSET_ID>Requirements for a single uploaded image:
Format:jpeg,png,webp,bmp,tiff,gif
Aspect ratio (width/height): (0.4, 2.5)
Width and height (px): (300, 6000)
Size: A single image must be less than30 MB. Request body size must not exceed64 MB. Do not use Base64 encoding for large files
Number of images: First frame — 1 image; First and last frame — 2 images; Reference image — 1-9 imagesrole
stringconditional requiredPosition or purpose of the image
⚠️ Image to video (first frame), image to video (first and last frame), and multimodal reference video generation are three mutually exclusive scenarios and cannot be used together. For multimodal reference video generation, you can specify the reference image as the first/last frame via the prompt to indirectly achieve “first and last frame + multimodal reference”. If you need to strictly ensure that the first and last frames are consistent with the specified images, always use the image-to-video first/last frame feature (configure
roletofirst_frame/last_frame)Image to video (first frame):
Pass 1image_urlobject.rolevalue:first_frameor leave blankImage to video (first and last frame):
Pass 2image_urlobjects. First frame:role=first_frame(required). Last frame:role=last_frame(required)Image to video (reference image):
Pass 1-9image_urlobjects.role=reference_image(required)
Video Information
objectVideo input to the model
type
stringrequiredType of input content
Value:
video_urlvideo_url
objectrequiredVideo object input to the model
url
stringrequiredVideo URL or asset ID
role
stringconditional requiredPosition or purpose of the video
Options:
reference_video
Audio Information
objectAudio input to the model. At least 1 reference video or image must be included when using audio input
type
stringrequiredType of input content
Value:
audio_urlaudio_url
objectrequiredAudio object input to the model
role
stringconditional requiredPosition or purpose of the audio
Options:
reference_audio
generate_audio boolean
Controls whether the generated video includes sound synchronized with the footage
true: The video output includes synchronized audio. The model will automatically generate matching human voice, sound effects and background music based on the text prompt and visual content. It is recommended to put dialogue content in double quotes to optimize the audio generation effect. For example: The man stopped the woman and said: “Remember, you can’t point at the moon with your finger in the future.”
Note: All generated videos with audio are mono, regardless of the number of channels of the input audio
false: The video output is a silent video
Default: true
resolution string
Video resolution
Default: 720p
Options: 480p, 720p, 1080p
ratio string
Aspect ratio of the generated video
When ratio is set to adaptive, the model will automatically adapt the aspect ratio according to the generation scenario:
Text to video: Intelligently select the most appropriate aspect ratio based on the input prompt
First frame / first and last frame to video: Automatically select the closest aspect ratio based on the ratio of the uploaded first frame image
Multimodal reference video generation: Judged based on the intent of the user’s prompt. If it is first-frame video generation/video editing/video extension, select the closest aspect ratio based on the corresponding image/video; otherwise, select the closest aspect ratio based on the first uploaded media file (priority: video > image)
Default: adaptive
Options: 16:9, 4:3, 1:1, 3:4, 9:16, 21:9, adaptive
Width and height pixel values corresponding to different aspect ratios:
| Resolution | Aspect Ratio | Dimension (Width × Height) |
|---|---|---|
| 480p | 16:9 | 864×496 |
| 480p | 4:3 | 752×560 |
| 480p | 1:1 | 640×640 |
| 480p | 3:4 | 560×752 |
| 480p | 9:16 | 496×864 |
| 480p | 21:9 | 992×432 |
| 720p | 16:9 | 1280×720 |
| 720p | 4:3 | 1112×834 |
| 720p | 1:1 | 960×960 |
| 720p | 3:4 | 834×1112 |
| 720p | 9:16 | 720×1280 |
| 720p | 21:9 | 1470×630 |
| 1080p | 16:9 | 1920×1080 |
| 1080p | 4:3 | 1664×1248 |
| 1080p | 1:1 | 1440×1440 |
| 1080p | 3:4 | 1248×1664 |
| 1080p | 9:16 | 1080×1920 |
| 1080p | 21:9 | 2206×946 |
duration integer
Generated video duration in seconds. Only integers are supported
Two configuration methods:
Specify specific duration: Any integer within the valid range
Intelligent specification: Set to -1, the model independently selects the appropriate video length (integer seconds) within the valid range. The actual duration of the generated video can be obtained from the duration parameter returned by the task query API
Note: Video duration is related to billing, please set it carefully
Default: 5
Range: 4 - 15, or -1 (auto)
Polling
Since result generation takes time, you need to poll the task status after creating the task.
The initial response only returns information such as the task ID and initial status. The final result must be obtained by polling the task status endpoint using the task ID.
See the examples on the right for polling requests and responses.
Response Format
error object
Error information, only present when status is failed
code
stringError code
message
stringDetailed error message
output array
Generation results, only present when status is completed
content
arrayList of generated resource content
type
stringResource type
Value:
image|videourl
stringProcessed resource URL
jobId
stringRemote task ID
usage object
Usage statistics, only present when status is completed
cost
stringTotal cost in USD
discount
numberDiscount amount
metadata object
Metadata information