API Notes
The session_id and face_id parameters are returned by the kirin_identify_face API
You must first call the face identification API to obtain these values before using this lip-sync API
Authentication
authorization string required
All APIs require authentication via Bearer Token.
Get API Key:
Visit API Key Management Page to get your API Key.
Usage:
Add to request header:
Authorization: Bearer YOUR_API_KEY
Parameters
model string required
Model ID to use for the request
Options: kirin_lipsync
session_id string required
Session ID, generated by the face identification interface
face_choose array required
Specify face for lip sync, including face ID, audio reference, etc. Currently only supports single person lip sync
face_id
stringrequiredFace ID, returned by the face identification interface
audio_id
stringAudio ID generated by the preview interface
Only supports audio generated within 30 days, with duration between
2and60secondsEither
audio_idorsound_filemust be provided, but not both at the same time
sound_file
stringAudio file
Supports Base64 encoded audio or audio URL (must be accessible)
Supported formats:
.mp3,.wav,.m4a, file size max5MB. Format mismatch or oversized file will return error codeOnly supports audio with duration between
2and60secondsEither
audio_idorsound_filemust be provided, but not both at the same timeSystem will validate audio content, issues will return error code
sound_start_time
longrequiredAudio clip start time
Based on original audio start time (0ms), unit: ms
Audio before this point will be clipped. Clipped audio must be at least
2seconds
sound_end_time
longrequiredAudio clip end time
Based on original audio start time (0ms), unit: ms
Audio after this point will be clipped. Clipped audio must be at least
2secondsEnd time must not exceed the original audio total duration
sound_insert_time
longrequiredInsert time for clipped audio
Based on video start time (0ms), unit: ms
The inserted audio time range must overlap with the face lip sync available time interval by at least
2secondsInsert audio start time must not be earlier than video start time, insert audio end time must not be later than video end time
sound_volume
floatAudio volume; higher value means louder volume
Range:
0-2Default:
1
original_audio_volume
floatOriginal video volume; higher value means louder volume
When original video has no audio, this parameter has no effect
Range:
0-2
external_task_id string
Custom task ID
User-defined task ID. Will not override system-generated task ID, but supports querying tasks by this ID
Please ensure uniqueness per user
callback_url string
Callback notification URL for this task result. If configured, server will actively notify when task status changes
Polling
Since lip-sync video generation takes time, you need to poll the task status after creation
The initial response returns the task ID and initial status. The actual generation results must be obtained through polling the task status endpoint
Response Format
error object
Error information. Only present when status is failed.
code
stringError code
error_message
stringDetailed error message
output array
Generation results. Only present when status is completed.
content
arrayList of generated content
type
stringResource type, e.g.,
video,imageurl
stringGenerated content URL
duration
numberVideo duration
jobId
stringRemote job ID
usage object
Usage statistics. Only present when status is completed.
cost
stringTotal cost in USD
discount
numberDiscount amount
metadata object
Metadata information
Error Codes
| Error Code | Description |
|---|---|
| 014002095 | Internal generation error |
| 014002096 | Result parsing exception |
| 014002097 | HTTP error response |
| 014002099 | Sync generation exception |