⌘K

Spark dance 2.0 Fast

spark_dance_v2_0_fast

Spark dance 2.0 Fast is a faster variant of the Seedance 2.0 multimodal video creation model, offering quicker generation with support for text, image, video, and audio inputs.

API Notes

The asset:// format asset ID used in image_url requires first creating an asset group via spark_asset_create_group, then uploading an image asset via spark_asset_create to obtain the ID

Authentication

authorization string required

All APIs require authentication via Bearer Token.

Get API Key:

Visit API Key Management Page to get your API Key.

Usage:

Add to request header:

Authorization: Bearer YOUR_API_KEY

Parameters

model string required

Model ID to use for the request

Value: spark_dance_v2_0_fast


content object[] required

Input content for video generation. Supported media include text, image, video, and audio. Supported combinations:

Text only
Text (optional) + image
Text (optional) + video
Text (optional) + image + audio
Text (optional) + image + video
Text (optional) + video + audio
Text (optional) + image + video + audio

Text Information object

Text information input to the model

type string required

Type of input content

Value: text

text string required

Text prompt input to the model, describing the expected generated video. It is recommended to keep the prompt under 1000 words. Lengthy text will lead to scattered information, and the model may ignore details and only focus on key points, resulting in missing elements in the generated video

Image Information object

Image information input to the model

type string required

Type of input content

Value: image_url

image_url object required

Image object input to the model

url string required

Image URL, Base64 string of image, or asset ID

Image URL: Enter the public URL of the image

Base64 string: Convert the local file to a Base64 encoded string. Format: data:image/<image format>;base64,<Base64 string>. Note that <image format> must be lowercase, for example data:image/png;base64,{base64_image}

Asset ID: ID of preconfigured assets and digital characters, following the format asset://<ASSET_ID>

Requirements for a single uploaded image:
Format: jpeg, png, webp, bmp, tiff, gif
Aspect ratio (width/height): (0.4, 2.5)
Width and height (px): (300, 6000)
Size: A single image must be less than 30 MB. Request body size must not exceed 64 MB. Do not use Base64 encoding for large files
Number of images: First frame — 1 image; First and last frame — 2 images; Reference image — 1-9 images

role string conditional required

Position or purpose of the image

⚠️ Image to video (first frame), image to video (first and last frame), and multimodal reference video generation are three mutually exclusive scenarios and cannot be used together. For multimodal reference video generation, you can specify the reference image as the first/last frame via the prompt to indirectly achieve “first and last frame + multimodal reference”. If you need to strictly ensure that the first and last frames are consistent with the specified images, always use the image-to-video first/last frame feature (configure role to first_frame / last_frame)

Image to video (first frame):
Pass 1 image_url object. role value: first_frame or leave blank

Image to video (first and last frame):
Pass 2 image_url objects. First frame: role = first_frame (required). Last frame: role = last_frame (required)

Image to video (reference image):
Pass 1-9 image_url objects. role = reference_image (required)

Video Information object

Video input to the model

type string required

Type of input content

Value: video_url

video_url object required

Video object input to the model

url string required

Video URL or asset ID

role string conditional required

Position or purpose of the video

Options: reference_video

Audio Information object

Audio input to the model. At least 1 reference video or image must be included when using audio input

type string required

Type of input content

Value: audio_url

audio_url object required

Audio object input to the model

role string conditional required

Position or purpose of the audio

Options: reference_audio


generate_audio boolean

Controls whether the generated video includes sound synchronized with the footage

true: The video output includes synchronized audio. The model will automatically generate matching human voice, sound effects and background music based on the text prompt and visual content. It is recommended to put dialogue content in double quotes to optimize the audio generation effect. For example: The man stopped the woman and said: “Remember, you can’t point at the moon with your finger in the future.”

Note: All generated videos with audio are mono, regardless of the number of channels of the input audio

false: The video output is a silent video

Default: true


resolution string

Video resolution

Default: 720p

Options: 480p, 720p


ratio string

Aspect ratio of the generated video

When ratio is set to adaptive, the model will automatically adapt the aspect ratio according to the generation scenario:
Text to video: Intelligently select the most appropriate aspect ratio based on the input prompt
First frame / first and last frame to video: Automatically select the closest aspect ratio based on the ratio of the uploaded first frame image
Multimodal reference video generation: Judged based on the intent of the user’s prompt. If it is first-frame video generation/video editing/video extension, select the closest aspect ratio based on the corresponding image/video; otherwise, select the closest aspect ratio based on the first uploaded media file (priority: video > image)

Default: adaptive

Options: 16:9, 4:3, 1:1, 3:4, 9:16, 21:9, adaptive

Width and height pixel values corresponding to different aspect ratios:

ResolutionAspect RatioDimension (Width × Height)
480p16:9864×496
480p4:3752×560
480p1:1640×640
480p3:4560×752
480p9:16496×864
480p21:9992×432
720p16:91280×720
720p4:31112×834
720p1:1960×960
720p3:4834×1112
720p9:16720×1280
720p21:91470×630

duration integer

Generated video duration in seconds. Only integers are supported

Two configuration methods:
Specify specific duration: Any integer within the valid range
Intelligent specification: Set to -1, the model independently selects the appropriate video length (integer seconds) within the valid range. The actual duration of the generated video can be obtained from the duration parameter returned by the task query API

Note: Video duration is related to billing, please set it carefully

Default: 5

Range: 4 - 15, or -1 (auto)


Polling

Since result generation takes time, you need to poll the task status after creating the task.

The initial response only returns information such as the task ID and initial status. The final result must be obtained by polling the task status endpoint using the task ID.

See the examples on the right for polling requests and responses.

Response Format

error object

Error information, only present when status is failed

code string

Error code

message string

Detailed error message


output array

Generation results, only present when status is completed

content array

List of generated resource content

type string

Resource type

Value: image|video

url string

Processed resource URL

jobId string

Remote task ID


usage object

Usage statistics, only present when status is completed

cost string

Total cost in USD

discount number

Discount amount


metadata object

Metadata information