⌘K

Kirin Lipsync

kirin_lipsync

Kirin Lipsync lip sync video generation model. Supports text-to-video and audio-to-video lip sync modes for generating lip-synced videos.

API Notes

The session_id and face_id parameters are returned by the kirin_identify_face API

You must first call the face identification API to obtain these values before using this lip-sync API

Authentication

authorization string required

All APIs require authentication via Bearer Token.

Get API Key:

Visit API Key Management Page to get your API Key.

Usage:

Add to request header:

Authorization: Bearer YOUR_API_KEY

Parameters

model string required

Model ID to use for the request

Options: kirin_lipsync


session_id string required

Session ID, generated by the face identification interface


face_choose array required

Specify face for lip sync, including face ID, audio reference, etc. Currently only supports single person lip sync

face_id string required

Face ID, returned by the face identification interface

audio_id string

Audio ID generated by the preview interface

Only supports audio generated within 30 days, with duration between 2 and 60 seconds

Either audio_id or sound_file must be provided, but not both at the same time

sound_file string

Audio file

Supports Base64 encoded audio or audio URL (must be accessible)

Supported formats: .mp3, .wav, .m4a, file size max 5MB. Format mismatch or oversized file will return error code

Only supports audio with duration between 2 and 60 seconds

Either audio_id or sound_file must be provided, but not both at the same time

System will validate audio content, issues will return error code

sound_start_time long required

Audio clip start time

Based on original audio start time (0ms), unit: ms

Audio before this point will be clipped. Clipped audio must be at least 2 seconds

sound_end_time long required

Audio clip end time

Based on original audio start time (0ms), unit: ms

Audio after this point will be clipped. Clipped audio must be at least 2 seconds

End time must not exceed the original audio total duration

sound_insert_time long required

Insert time for clipped audio

Based on video start time (0ms), unit: ms

The inserted audio time range must overlap with the face lip sync available time interval by at least 2 seconds

Insert audio start time must not be earlier than video start time, insert audio end time must not be later than video end time

sound_volume float

Audio volume; higher value means louder volume

Range: 0 - 2

Default: 1

original_audio_volume float

Original video volume; higher value means louder volume

When original video has no audio, this parameter has no effect

Range: 0 - 2


external_task_id string

Custom task ID

User-defined task ID. Will not override system-generated task ID, but supports querying tasks by this ID

Please ensure uniqueness per user


callback_url string

Callback notification URL for this task result. If configured, server will actively notify when task status changes


Polling

Since lip-sync video generation takes time, you need to poll the task status after creation

The initial response returns the task ID and initial status. The actual generation results must be obtained through polling the task status endpoint

Response Format

error object

Error information. Only present when status is failed.

code string

Error code

error_message string

Detailed error message


output array

Generation results. Only present when status is completed.

content array

List of generated content

type string

Resource type, e.g., video, image

url string

Generated content URL

duration number

Video duration

jobId string

Remote job ID


usage object

Usage statistics. Only present when status is completed.

cost string

Total cost in USD

discount number

Discount amount


metadata object

Metadata information


Error Codes

Error CodeDescription
014002095Internal generation error
014002096Result parsing exception
014002097HTTP error response
014002099Sync generation exception