jlamoreaux/voice-and-video

Given a brief prompt minor context and a starting image, create a script video and voice in one go.

Public

6 runs

Run jlamoreaux/voice-and-video with an API

Use one of our client libraries to get started quickly. Clicking on a library will take you to the Playground tab where you can tweak different inputs, see the results, and copy the corresponding code to use in your own project.

Input schema

The fields you can use to run this model with an API. If you don't give a value for a field its default value will be used.

Field	Type	Default value	Description
user_input	string		User's message or question to the chat agent
person_image	string		Image of the person who will appear to be talking in the video
agent_instructions	string	You are a helpful and knowledgeable assistant. Provide clear, concise, and informative responses.	Instructions that define the agent's role and behavior
context	string		Additional context or background information for the agent
voice	None	Emma	Voice to use for speech synthesis
personality	None	Professional	Speaking personality and style
response_length	None	medium	Desired length of response (short: ~500 chars, medium: ~1500 chars, long: ~3000 chars)

{
  "type": "object",
  "title": "Input",
  "required": [
    "user_input",
    "person_image"
  ],
  "properties": {
    "voice": {
      "enum": [
        "Emma",
        "Sophia",
        "Isabella",
        "Marcus",
        "David",
        "Alexander"
      ],
      "type": "string",
      "title": "voice",
      "description": "Voice to use for speech synthesis",
      "default": "Emma",
      "x-order": 4
    },
    "context": {
      "type": "string",
      "title": "Context",
      "default": "",
      "x-order": 3,
      "description": "Additional context or background information for the agent"
    },
    "user_input": {
      "type": "string",
      "title": "User Input",
      "x-order": 0,
      "description": "User's message or question to the chat agent"
    },
    "personality": {
      "enum": [
        "Professional",
        "Bubbly",
        "Chill"
      ],
      "type": "string",
      "title": "personality",
      "description": "Speaking personality and style",
      "default": "Professional",
      "x-order": 5
    },
    "person_image": {
      "type": "string",
      "title": "Person Image",
      "format": "uri",
      "x-order": 1,
      "description": "Image of the person who will appear to be talking in the video"
    },
    "response_length": {
      "enum": [
        "short",
        "medium",
        "long"
      ],
      "type": "string",
      "title": "response_length",
      "description": "Desired length of response (short: ~500 chars, medium: ~1500 chars, long: ~3000 chars)",
      "default": "medium",
      "x-order": 6
    },
    "agent_instructions": {
      "type": "string",
      "title": "Agent Instructions",
      "default": "You are a helpful and knowledgeable assistant. Provide clear, concise, and informative responses.",
      "x-order": 2,
      "description": "Instructions that define the agent's role and behavior"
    }
  }
}

Output schema

The shape of the response you’ll get when you run this model with an API.

Schema

{
  "type": "object",
  "title": "Output",
  "required": [
    "video",
    "audio",
    "transcript"
  ],
  "properties": {
    "audio": {
      "type": "string",
      "title": "Audio",
      "format": "uri"
    },
    "video": {
      "type": "string",
      "title": "Video",
      "format": "uri"
    },
    "transcript": {
      "type": "string",
      "title": "Transcript"
    }
  }
}