zsxkib / talknet-asd

🗣️ TalkNet-ASD: Detect who is speaking in a video

  • Public
  • 83 runs
  • T4
  • GitHub
  • Paper
  • License

Input

*file

Path to the video

number
(minimum: 0, maximum: 1)

Scale factor for face detection, the frames will be scaled to 0.25 of the original

Default: 0.25

integer

Number of min frames for each shot

Default: 10

integer
(minimum: 1)

Number of missed detections allowed before tracking is stopped

Default: 10

integer
(minimum: 1)

Minimum face size in pixels

Default: 1

number
(minimum: 0, maximum: 1)

Scale bounding box

Default: 0.4

integer
(minimum: 0)

The start time of the video

Default: 0

integer

The duration of the video, when set as -1, will extract the whole video

Default: -1

boolean

Return results in json format

Default: true

boolean

Return bounding box coordinates as percentages of the video width and height

Default: false

Output

Generated in

Run time and cost

This model runs on Nvidia T4 GPU hardware. We don't yet have enough runs of this model to provide performance information.

Readme

Is someone talking? TalkNet: Audio-visual active speaker detection Model