Kirin Lipsync

API Notes

The session_id and face_id parameters are returned by the kirin_identify_face API

You must first call the face identification API to obtain these values before using this lip-sync API

Authentication

authorization `string` required

All APIs require authentication via Bearer Token.

Get API Key:

Visit API Key Management Page to get your API Key.

Usage:

Add to request header:

Authorization: Bearer YOUR_API_KEY

Parameters

model `string` required

Model ID to use for the request

Options: kirin_lipsync

session_id `string` required

Session ID, generated by the face identification interface

face_choose `array` required

Specify face for lip sync, including face ID, audio reference, etc. Currently only supports single person lip sync

face_id string required

Face ID, returned by the face identification interface

audio_id string

Audio ID generated by the preview interface

Only supports audio generated within 30 days, with duration between 2 and 60 seconds

Either audio_id or sound_file must be provided, but not both at the same time

sound_file string

Audio file

Supports Base64 encoded audio or audio URL (must be accessible)

Supported formats: .mp3, .wav, .m4a, file size max 5MB. Format mismatch or oversized file will return error code

Only supports audio with duration between 2 and 60 seconds

Either audio_id or sound_file must be provided, but not both at the same time

System will validate audio content, issues will return error code

sound_start_time long required

Audio clip start time

Based on original audio start time (0ms), unit: ms

Audio before this point will be clipped. Clipped audio must be at least 2 seconds

sound_end_time long required

Audio clip end time

Based on original audio start time (0ms), unit: ms

Audio after this point will be clipped. Clipped audio must be at least 2 seconds

End time must not exceed the original audio total duration

sound_insert_time long required

Insert time for clipped audio

Based on video start time (0ms), unit: ms

The inserted audio time range must overlap with the face lip sync available time interval by at least 2 seconds

Insert audio start time must not be earlier than video start time, insert audio end time must not be later than video end time

sound_volume float

Audio volume; higher value means louder volume

Range: 0 - 2

Default: 1

original_audio_volume float

Original video volume; higher value means louder volume

When original video has no audio, this parameter has no effect

Range: 0 - 2

external_task_id `string`

Custom task ID

User-defined task ID. Will not override system-generated task ID, but supports querying tasks by this ID

Please ensure uniqueness per user

callback_url `string`

Callback notification URL for this task result. If configured, server will actively notify when task status changes

Polling

Since lip-sync video generation takes time, you need to poll the task status after creation

The initial response returns the task ID and initial status. The actual generation results must be obtained through polling the task status endpoint

Response Format

error `object`

Error information. Only present when status is failed.

code string

Error code

error_message string

Detailed error message

output `array`

Generation results. Only present when status is completed.

content array

List of generated content

type string

Resource type, e.g., video, image

url string

Generated content URL

duration number

Video duration

jobId string

Remote job ID

usage `object`

Usage statistics. Only present when status is completed.

cost string

Total cost in USD

discount number

Discount amount

metadata `object`

Metadata information

Error Codes

Error Code	Description
014002095	Internal generation error
014002096	Result parsing exception
014002097	HTTP error response
014002099	Sync generation exception

curl --location \ 'https://cloud.vtrix.ai/model/v1/generation' \ --header 'Content-Type: application/json' \ --header 'Authorization: Bearer YOUR_API_KEY' \ --data '{ "model": "kirin_lipsync", "input": [ { "params": { "session_id": "949665381905347148", "face_choose": [ { "face_id": "0", "sound_file": "https://example.com/sample/test-audio.wav", "sound_start_time": 0, "sound_end_time": 5000, "sound_insert_time": 0, "sound_volume": 1, "original_audio_volume": 0.5 } ] } } ] }'

{ "id": "d5u5obte8783ap44qtj0", "created_at": 1769757744021, "status": "completed", "model": "kirin_lipsync", "output": [ { "content": [ { "type": "video", "url": "https://example.com/generated-video.mp4", "duration": 5, "jobId": "remote_job_id_12345" } ] } ], "usage": { "cost": "0.000500", "discount": 0, "input_tokens": null, "output_tokens": null, "quantity": 1, "time_per_unit": 0, "total_tokens": null, "unit_price": "0.000500", "user_discount": 1 }, "metadata": { "completed_at": 120.5, "in_queue_at": 0, "upload_at": 1.2, "usage": { "input_tokens": 20, "input_tokens_details": { "text_tokens": 20 }, "output_tokens": 0, "total_tokens": 20 } } }

API Notes

Authentication

authorization `string` required

Parameters

model `string` required

session_id `string` required

face_choose `array` required

face_id `string` required

audio_id `string`

sound_file `string`

sound_start_time `long` required

sound_end_time `long` required

sound_insert_time `long` required

sound_volume `float`

original_audio_volume `float`

external_task_id `string`

callback_url `string`

Polling

Response Format

error `object`

code `string`

error_message `string`

output `array`

content `array`

type `string`

url `string`

duration `number`

jobId `string`

usage `object`

cost `string`

discount `number`

metadata `object`

Error Codes

API Notes

Authentication

authorization string required

Parameters

model string required

session_id string required

face_choose array required

face_id string required

audio_id string

sound_file string

sound_start_time long required

sound_end_time long required

sound_insert_time long required

sound_volume float

original_audio_volume float

external_task_id string

callback_url string

Polling

Response Format

error object

code string

error_message string

output array

content array

type string

url string

duration number

jobId string

usage object

cost string

discount number

metadata object

Error Codes

authorization `string` required

model `string` required

session_id `string` required

face_choose `array` required

face_id `string` required

audio_id `string`

sound_file `string`

sound_start_time `long` required

sound_end_time `long` required

sound_insert_time `long` required

sound_volume `float`

original_audio_volume `float`

external_task_id `string`

callback_url `string`

error `object`

code `string`

error_message `string`

output `array`

content `array`

type `string`

url `string`

duration `number`

jobId `string`

usage `object`

cost `string`

discount `number`

metadata `object`