Spark dance 2.0 | Vtrix API Docs

API Notes

The asset:// format asset ID used in image_url requires first creating an asset group via spark_asset_create_group, then uploading an image asset via spark_asset_create to obtain the ID

Authentication

authorization `string` required

All APIs require authentication via Bearer Token.

Get API Key:

Visit API Key Management Page to get your API Key.

Usage:

Add to request header:

Authorization: Bearer YOUR_API_KEY

Parameters

model `string` required

Model ID to use for the request

Value: spark_dance_v2_0

content `object[]` required

Input content for video generation. Supported media include text, image, video, and audio. Supported combinations:

Text only
Text (optional) + image
Text (optional) + video
Text (optional) + image + audio
Text (optional) + image + video
Text (optional) + video + audio
Text (optional) + image + video + audio

Text Information object

Text information input to the model

type string required

Type of input content

Value: text

text string required

Text prompt input to the model, describing the expected generated video. It is recommended to keep the prompt under 1000 words. Lengthy text will lead to scattered information, and the model may ignore details and only focus on key points, resulting in missing elements in the generated video

Image Information object

Image information input to the model

type string required

Type of input content

Value: image_url

image_url object required

Image object input to the model

url string required

Image URL, Base64 string of image, or asset ID

Image URL: Enter the public URL of the image

Base64 string: Convert the local file to a Base64 encoded string. Format: data:image/<image format>;base64,<Base64 string>. Note that <image format> must be lowercase, for example data:image/png;base64,{base64_image}

Asset ID: ID of preconfigured assets and digital characters, following the format asset://<ASSET_ID>

Requirements for a single uploaded image:
Format: jpeg, png, webp, bmp, tiff, gif
Aspect ratio (width/height): (0.4, 2.5)
Width and height (px): (300, 6000)
Size: A single image must be less than 30 MB. Request body size must not exceed 64 MB. Do not use Base64 encoding for large files
Number of images: First frame — 1 image; First and last frame — 2 images; Reference image — 1-9 images

role string conditional required

Position or purpose of the image

⚠️ Image to video (first frame), image to video (first and last frame), and multimodal reference video generation are three mutually exclusive scenarios and cannot be used together. For multimodal reference video generation, you can specify the reference image as the first/last frame via the prompt to indirectly achieve “first and last frame + multimodal reference”. If you need to strictly ensure that the first and last frames are consistent with the specified images, always use the image-to-video first/last frame feature (configure role to first_frame / last_frame)

Image to video (first frame):
Pass 1 image_url object. role value: first_frame or leave blank

Image to video (first and last frame):
Pass 2 image_url objects. First frame: role = first_frame (required). Last frame: role = last_frame (required)

Image to video (reference image):
Pass 1-9 image_url objects. role = reference_image (required)

Video Information object

Video input to the model

type string required

Type of input content

Value: video_url

video_url object required

Video object input to the model

url string required

Video URL or asset ID

role string conditional required

Position or purpose of the video

Options: reference_video

Audio Information object

Audio input to the model. At least 1 reference video or image must be included when using audio input

type string required

Type of input content

Value: audio_url

audio_url object required

Audio object input to the model

role string conditional required

Position or purpose of the audio

Options: reference_audio

generate_audio `boolean`

Controls whether the generated video includes sound synchronized with the footage

true: The video output includes synchronized audio. The model will automatically generate matching human voice, sound effects and background music based on the text prompt and visual content. It is recommended to put dialogue content in double quotes to optimize the audio generation effect. For example: The man stopped the woman and said: “Remember, you can’t point at the moon with your finger in the future.”

Note: All generated videos with audio are mono, regardless of the number of channels of the input audio

false: The video output is a silent video

Default: true

resolution `string`

Video resolution

Default: 720p

Options: 480p, 720p, 1080p

ratio `string`

Aspect ratio of the generated video

When ratio is set to adaptive, the model will automatically adapt the aspect ratio according to the generation scenario:
Text to video: Intelligently select the most appropriate aspect ratio based on the input prompt
First frame / first and last frame to video: Automatically select the closest aspect ratio based on the ratio of the uploaded first frame image
Multimodal reference video generation: Judged based on the intent of the user’s prompt. If it is first-frame video generation/video editing/video extension, select the closest aspect ratio based on the corresponding image/video; otherwise, select the closest aspect ratio based on the first uploaded media file (priority: video > image)

Default: adaptive

Options: 16:9, 4:3, 1:1, 3:4, 9:16, 21:9, adaptive

Width and height pixel values corresponding to different aspect ratios:

Resolution	Aspect Ratio	Dimension (Width × Height)
480p	16:9	864×496
480p	4:3	752×560
480p	1:1	640×640
480p	3:4	560×752
480p	9:16	496×864
480p	21:9	992×432
720p	16:9	1280×720
720p	4:3	1112×834
720p	1:1	960×960
720p	3:4	834×1112
720p	9:16	720×1280
720p	21:9	1470×630
1080p	16:9	1920×1080
1080p	4:3	1664×1248
1080p	1:1	1440×1440
1080p	3:4	1248×1664
1080p	9:16	1080×1920
1080p	21:9	2206×946

duration `integer`

Generated video duration in seconds. Only integers are supported

Two configuration methods:
Specify specific duration: Any integer within the valid range
Intelligent specification: Set to -1, the model independently selects the appropriate video length (integer seconds) within the valid range. The actual duration of the generated video can be obtained from the duration parameter returned by the task query API

Note: Video duration is related to billing, please set it carefully

Default: 5

Range: 4 - 15, or -1 (auto)

Polling

Since result generation takes time, you need to poll the task status after creating the task.

The initial response only returns information such as the task ID and initial status. The final result must be obtained by polling the task status endpoint using the task ID.

See the examples on the right for polling requests and responses.

Response Format

error `object`

Error information, only present when status is failed

code string

Error code

message string

Detailed error message

output `array`

Generation results, only present when status is completed

content array

List of generated resource content

type string

Resource type

Value: image｜video

url string

Processed resource URL

jobId string

Remote task ID

usage `object`

Usage statistics, only present when status is completed

cost string

Total cost in USD

discount number

Discount amount

metadata `object`

Metadata information

API Notes

Authentication

authorization string required

Parameters

model string required

content object[] required

Text Information object

type string required

text string required

Image Information object

type string required

image_url object required

url string required

role string conditional required

Video Information object

type string required

video_url object required

url string required

role string conditional required

Audio Information object

type string required

audio_url object required

role string conditional required

generate_audio boolean

resolution string

ratio string

duration integer

Polling

Response Format

error object

code string

message string

output array

content array

type string

url string

jobId string

usage object

cost string

discount number

metadata object

authorization `string` required

model `string` required

content `object[]` required

Text Information `object`

type `string` required

text `string` required

Image Information `object`

type `string` required

image_url `object` required

url `string` required

role `string` conditional required

Video Information `object`

type `string` required

video_url `object` required

url `string` required

role `string` conditional required

Audio Information `object`

type `string` required

audio_url `object` required

role `string` conditional required

generate_audio `boolean`

resolution `string`

ratio `string`

duration `integer`

error `object`

code `string`

message `string`

output `array`

content `array`

type `string`

url `string`

jobId `string`

usage `object`

cost `string`

discount `number`

metadata `object`