nvidia/pdf-to-podcast | Run with an API on Replicate

Run time and cost

This model costs approximately $0.014 to run on Replicate, or 71 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on CPU hardware. Predictions typically complete within 139 seconds. The predict time for this model varies significantly based on the inputs.

Readme

This NVIDIA AI blueprint transforms PDFs into engaging audio content. Built on NVIDIA NIM, this blueprint is flexible, and can run securely on a private network, delivering actionable insight without sharing sensitive data.

The blueprint accepts a Target PDF and optionally multiple Context PDFs. The Target PDF will be the main source of information for the generated transcript while Context PDFs will be used as additional reference for the agent to use. The user can also optionally specify a guide prompt that will give a focus for the agent generated transcript (i.e. “Focus on the key drivers for NVIDIA’s Q3 earnings report”).

Software Components

NVIDIA NIM microservices
Response generation (Inference)
Document ingest and extraction - Docling
Text-to-speech - ElevenLabs
Redis - Redis
Storage - MinIO

Deploy

To run this blueprint securely on a private network, take a look at the instructions in the GitHub repository.