Collections

Media utilities

These models provide utility functions for working with media like images, audio, and video. They serve as convenient building blocks for media processing pipelines and workflows.

Some highllights:

Frequently asked questions

Which models are the fastest in the Media Utilities collection?

If you need quick, low-overhead processing—like extracting frames or audio—models such as lucataco/frame-extractor and lucataco/extract-audio are some of the speedier options. These utilities focus on simple transformations, so they typically run faster than more complex generation models.
Keep in mind that performance still depends on input file size and format.

Which models offer the best mix of flexibility and utility?

For more advanced workflows, models like fictions-ai/autocaption, charlesmccarthy/addwatermark, and falcons-ai/nsfw_image_detection add extra functionality such as captioning, watermarking, or filtering content.
If your workflow involves bulk processing or automation, combining lightweight extractors with these more feature-rich utilities can give you a solid balance between speed and capability.

What works best when I need to automate captioning or watermarking?

What should I pick when I need to split a video into frames or extract audio?

For low-level media manipulation:

How do the main types of utility models differ?

  • Extract/convert utilities: Handle transformations like frame extraction, audio splitting, or merging images.
  • Overlay/detection utilities: Add captions, watermarks, or detect content (e.g., NSFW classifiers).
  • Pipeline tools: Work well in automated workflows or chained processing tasks.
    In general, these utilities are simpler and faster than generative models, but they’re often part of bigger workflows.

What kinds of outputs can I expect?

Utility models usually return:

  • Processed media (e.g., frames, audio files, or a video with overlays).
  • Image or video files with text or watermarks added.
  • Metadata or classification results (e.g., NSFW detection).
    Always check the model page for exact input and output formats (e.g., MP4, WAV, PNG).

How can I create or publish my own utility model?

You can package your own processing script or pipeline with Cog and publish it to Replicate under the Media Utilities collection.
Clearly define your input/output types (e.g., video → frames), set versioning, and configure sharing or pricing if needed.

Can I use these utility models for commercial work?

Many models in the Media Utilities collection support commercial use, but licenses vary. Check each model’s card for attribution requirements or restrictions before using them in production workflows.

How do I use a utility model on Replicate?

  1. Choose a model from the Media Utilities collection.
  2. Upload or link your media file (video, image, or audio).
  3. Set any options (e.g., time segments, overlay text, or detection settings).
  4. Run the model to process the file.
  5. Download the result and use it in your workflow or automation pipeline.

What should I keep in mind when working with utility models?

  • Clean, well-formatted input files give the most predictable results.
  • Some utilities expect specific formats (e.g., MP4 for video, WAV for audio).
  • If processing at scale, plan for processing time and cost.
  • These models are best used as building blocks in a workflow, not for generating new creative content.
  • Always test on a few files before automating a large job.