⌘K

Spark DreamO - Multi IP

spark_multi_dreamo

Spark DreamO - Multi IP multi-image editing model. Supports editing with 1-5 reference images while maintaining character features.

API Tips

Input images must meet the following requirements:

Supported formats: JPEG, PNG only (JPEG format recommended)

File size: Maximum 4.7 MB

Image resolution: Maximum 4096 * 4096

Aspect ratio: Recommended range 16:9 to 9:16 (extreme aspect ratios may have poor results and may cause errors)


Authentication

authorization string required

All APIs require authentication via Bearer Token.

Get API Key:

Visit API Key Management Page to get your API Key.

Usage:

Add to request header:

Authorization: Bearer YOUR_API_KEY

Parameters

model string required

Model ID to use for the request

Value: spark_multi_dreamo


binary_data_base64 array required (one of two)

Image files in Base64 encoding. Supports up to 5 input images

Either image_urls or binary_data_base64 must be provided (one of two)


image_urls array required (one of two)

Image file URLs (must be publicly accessible). Supports up to 5 input images

Either image_urls or binary_data_base64 must be provided (one of two)


prompt string required

Prompt for image editing, supports both Chinese and English

Recommended length around 300 characters. Prompts that are too long may not take effect and may cause errors


ref_type_list array

Reference type for each reference image. The length of this array must equal the number of reference images

The default reference type is AUTO, which automatically matches the reference type but will increase inference time. It is recommended to manually specify the reference type for each image in scenarios where the reference type is fixed

IP: Reference subject features
ID: Reference facial features
STYLE: Reference style features
AUTO: Automatically match reference type (default)

Options: IP, ID, STYLE, AUTO

Default: AUTO


guidance_scale1 number

Controls the consistency of generation results with text descriptions. Higher values result in higher text consistency but lower image consistency

Range: 1.0 to 7.0

Default: 2.5


guidance_scale2 number

Controls the consistency of generation results with images. Higher values result in higher image consistency but lower text consistency

Range: 1.0 to 7.0

Default: 2.5


ddim_steps integer

Number of steps for image generation

Range: 1 to 50

Default: 12


swap_face boolean

Whether to use facial ID enhancement. When enabled, facial consistency is higher, but it may affect facial attribute editing such as expressions and makeup, and will increase processing time

Options: true, false

Default: false


use_rephraser boolean

Whether to rephrase the input text prompt to optimize results. It is recommended to keep this enabled under normal conditions

If the input text is very long, or you have a strong requirement not to change the prompt content, or you want to reduce processing time, you can disable this parameter

Options: true, false

Default: true


rephraser_level string

Fine-grained level of intelligent prompt rewriting. More fine-grained levels result in better model understanding of reference images and prompt instructions, but also increase processing time

Note that fine-grained level and generation quality are not necessarily proportional

general: General level
fine: Fine level
coarse: Coarse level

Options: general, fine, coarse

Default: general


seed integer

Random seed as the basis for determining the initial diffusion state. If the random seed is the same positive integer and other parameters are consistent, the generated content will most likely have consistent results

Default: -1 (random)


width integer

Width of the generated image

Exceeding the upper limit requires ensuring width * height product is less than 2048 * 2048, and may cause abnormal results or timeout issues

Recommended ratios and corresponding dimensions (width * height):
1:1: 1328 * 1328
4:3: 1472 * 1104
3:2: 1584 * 1056
16:9: 1664 * 936
21:9: 2016 * 864

Range: 512 to 2048

Default: 1328


height integer

Height of the generated image

Exceeding the upper limit requires ensuring width * height product is less than 2048 * 2048, and may cause abnormal results or timeout issues

Range: 512 to 2048

Default: 1328



Polling

Since image generation takes time, you need to poll the task status after creation

The initial response returns the task ID and initial status. The actual generation results must be obtained through polling the task status endpoint

Response Format

error object

Error information. Only present when status is failed

code string

Error code

error_message string

Detailed error message


output array

Generation results. Only present when status is completed

content array

List of generated content

type string

Resource type

Value: image

url string

Image URL

jobId string

Remote job ID


usage object

Usage statistics. Only present when status is completed

cost string

Total cost in USD

discount number

Discount amount


metadata object

Metadata information


Error Codes

Error CodeDescription
003013001Missing prompt
003013002Missing image
003013003Invalid prompt length
003013004Invalid parameter
003013095Internal generation error
003013096Result parsing error
003013097HTTP error response
003013098Status check error
003013099Service unavailable