typefile
{
"file": "https://replicate.delivery/pbxt/NDm0JLRLZiYuq9mDaTixxeV2eDy6GyFtQgWzQvC0a2h1npA3/page_1.pdf",
"output_format": "markdown_content"
}npm install replicate
REPLICATE_API_TOKEN environment variable:export REPLICATE_API_TOKEN=r8_cXL**********************************
This is your API token. Keep it to yourself.
import Replicate from "replicate";
const replicate = new Replicate({
auth: process.env.REPLICATE_API_TOKEN,
});
Run bytedance/dolphin using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
const output = await replicate.run(
"bytedance/dolphin:19f1ad93970c2bf21442a842d01d97fb04a94a69d2b36dee43531a9cbae07e85",
{
input: {
file: "https://replicate.delivery/pbxt/NDm0JLRLZiYuq9mDaTixxeV2eDy6GyFtQgWzQvC0a2h1npA3/page_1.pdf",
output_format: "markdown_content"
}
}
);
console.log(output);
To learn more, take a look at the guide on getting started with Node.js.
pip install replicate
REPLICATE_API_TOKEN environment variable:export REPLICATE_API_TOKEN=r8_cXL**********************************
This is your API token. Keep it to yourself.
import replicate
Run bytedance/dolphin using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
output = replicate.run(
"bytedance/dolphin:19f1ad93970c2bf21442a842d01d97fb04a94a69d2b36dee43531a9cbae07e85",
input={
"file": "https://replicate.delivery/pbxt/NDm0JLRLZiYuq9mDaTixxeV2eDy6GyFtQgWzQvC0a2h1npA3/page_1.pdf",
"output_format": "markdown_content"
}
)
print(output)
To learn more, take a look at the guide on getting started with Python.
REPLICATE_API_TOKEN environment variable:export REPLICATE_API_TOKEN=r8_cXL**********************************
This is your API token. Keep it to yourself.
Run bytedance/dolphin using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
curl -s -X POST \
-H "Authorization: Bearer $REPLICATE_API_TOKEN" \
-H "Content-Type: application/json" \
-H "Prefer: wait" \
-d $'{
"version": "bytedance/dolphin:19f1ad93970c2bf21442a842d01d97fb04a94a69d2b36dee43531a9cbae07e85",
"input": {
"file": "https://replicate.delivery/pbxt/NDm0JLRLZiYuq9mDaTixxeV2eDy6GyFtQgWzQvC0a2h1npA3/page_1.pdf",
"output_format": "markdown_content"
}
}' \
https://api.replicate.com/v1/predictions
To learn more, take a look at Replicate’s HTTP API reference docs.
{
"id": "ctehr7x2nxrgc0cqhx5rchw1z4",
"model": "bytedance/dolphin",
"version": "19f1ad93970c2bf21442a842d01d97fb04a94a69d2b36dee43531a9cbae07e85",
"input": {
"file": "https://replicate.delivery/pbxt/NDm0JLRLZiYuq9mDaTixxeV2eDy6GyFtQgWzQvC0a2h1npA3/page_1.pdf",
"output_format": "markdown_content"
},
"logs": "2025-06-20 21:45:00.684 | INFO | predict:_process_document:86 - Document has 1 page(s)\n2025-06-20 21:45:00.871 | INFO | predict:_process_document:107 - Processing page 1/1\nLegacy behavior is being used. The current behavior will be deprecated in version 5.0.0. In the new behavior, if both images and text are provided, the default value of `add_special_tokens` will be changed to `False` when calling the tokenizer if `add_special_tokens` is unset. To test the new behavior, set `legacy=False`as a processor call argument.",
"output": "# LLaMA: Open and Efficient Foundation Language Models\n\nHugo Touvron; Thibaut Lavril; Gautier Izacard; Xavier Martinet Marie-Anne Lachaux, Timothee Lacroix, Baptiste Rozière, Naman Goyal Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin Edouard Grave; Guillaume Lample $^⋆$\n\nMeta AI\n\n## Abstract\n\nWe introduce LLaMA, a collection of foundation language models ranging from 7B to 65B parameters. We train our models on trillions of tokens, and show that it is possible to train state-of-the-art models using publicly available datasets exclusively, without resorting to proprietary and inaccessible datasets. In particular, LLaMA-13B outperforms GPT-3 (17SB) on most benchmarks, and LLaMA65B is competitive with the best models. Chinchilla-70B and PaLM-540B. We release all our models to the research community $^1$ .\n\n## 1 Introduction\n\nLarge Languages Models (LLMs) trained on massive corpora of texts have shown their ability to perform new tasks from textual instructions or from a few examples ( Brown et al. , 2020 ) . These few-shot properties first appeared when scaling models to a sufficient size ( Kaplan et al. , 2020 ) , resulting in a line of work that focuses on further scaling these models ( Chowdhery et al. , 2022 ; Rae et al. , 2021 ) . These efforts are based on the assumption that more parameters will lead to better performance. However, recent work from Hoffmann et al. ( 2022 ) shows that, for a given compute budget, the best performances are not achieved by the largest models, but by smaller models trained on more data.\n\nThe objective of the scaling laws from Hoffmann et al. ( 2022 ) is to determine how to best scale the dataset and model sizes for a particular training compute budget. However, this objective disregards the inference budget, which becomes critical when serving a language model at scale. In this context, given a target level of performance, the preferred model is not the fastest to train but the fastest at inference, and although it may be cheaper to train a large model to reach a certain level of\n\nperformance, a smaller one trained longer will ultimately be cheaper at inference. For instance, although Hoffmann et al. ( 2022 ) recommends training a 10B model on 200B tokens, we find that the performance of a 7B model continues to improve even after 1T tokens.\n\nThe focus of this work is to train a series of language models that achieve the best possible performance at various inference budgets, by training on more tokens than what is typically used. The resulting models, called LLaMA , ranges from 7B to 65B parameters with competitive performance compared to the best existing LLMs. For instance, LLaMA-13B outperforms GPT-3 on most benchmarks, despite being 10 $\\times$ smaller. We believe that this model will help democratize the access and study of LLMs, since it can be run on a single GPU. At the higher-end of the scale, our 65B-parameter model is also competitive with the best large language models such as Chinchilla or PaLM-540B.\n\nUnlike Chinchilla, PaLM, or GPT-3, we only use publicly available data, making our work compatible with open-sourcing, while most existing models rely on data which is either not publicly available or undocumented (e.g. “ Books – 2TB\" or “ Social media conversations\"). There exist some exceptions, notably OPT ( Zhang et al. , 2022 ) , GPT-NeoX ( Black et al. , 2022 ) , BLOOM ( Scao et al. , 2022 ) and GLM ( Zeng et al. , 2022 ) , but none that are competitive with PaLM-62B or Chinchilla.\n\nIn the rest of this paper, we present an overview of the modifications we made to the transformer architecture ( Vaswani et al. , 2017 ) , as well as our training method. We then report the performance of our models and compare with others LLMs on a set of standard benchmarks. Finally, we expose some of the biases and toxicity encoded in our models, using some of the most recent benchmarks from the responsible AI community.\n\n* Equal contribution. Correspondence: (htouvron, thibautlav,gizacard,egrave,glample)@meta.com\n\nhttps://github.com/facebookresearch/llam\n\narXiv:2302.1397lv1 [cs.CL] 27 Feb 2023\n\n",
"data_removed": false,
"error": null,
"source": "web",
"status": "succeeded",
"created_at": "2025-06-20T21:43:08.207Z",
"started_at": "2025-06-20T21:45:00.520769Z",
"completed_at": "2025-06-20T21:45:08.835304Z",
"urls": {
"cancel": "https://api.replicate.com/v1/predictions/ctehr7x2nxrgc0cqhx5rchw1z4/cancel",
"get": "https://api.replicate.com/v1/predictions/ctehr7x2nxrgc0cqhx5rchw1z4",
"web": "https://replicate.com/p/ctehr7x2nxrgc0cqhx5rchw1z4"
},
"metrics": {
"predict_time": 8.314534958,
"total_time": 120.628304
}
}