Prediction

Model

bytedance/dolphin:19f1ad93

ctehr7x2nxrgc0cqhx5rchw1z4

Status

Succeeded

Source

Web

Hardware

Total duration

Created

6 months ago

Webhook

–

Input

file: https://replicate.delivery/pbxt/NDm0JLRLZiYuq9mDaTixxeV2eDy6GyFtQgWzQvC0a2h1npA3/page_1.pdf
output_format: markdown_content

{
  "file": "https://replicate.delivery/pbxt/NDm0JLRLZiYuq9mDaTixxeV2eDy6GyFtQgWzQvC0a2h1npA3/page_1.pdf",
  "output_format": "markdown_content"
}

Install Replicate’s Node.js client library:

npm install replicate

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=r8_cXL**********************************

This is your API token. Keep it to yourself.

Import and set up the client:

import Replicate from "replicate";

const replicate = new Replicate({
  auth: process.env.REPLICATE_API_TOKEN,
});

Run bytedance/dolphin using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

const output = await replicate.run(
  "bytedance/dolphin:19f1ad93970c2bf21442a842d01d97fb04a94a69d2b36dee43531a9cbae07e85",
  {
    input: {
      file: "https://replicate.delivery/pbxt/NDm0JLRLZiYuq9mDaTixxeV2eDy6GyFtQgWzQvC0a2h1npA3/page_1.pdf",
      output_format: "markdown_content"
    }
  }
);

console.log(output);

To learn more, take a look at the guide on getting started with Node.js.

Install Replicate’s Python client library:

pip install replicate

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=r8_cXL**********************************

This is your API token. Keep it to yourself.

Import the client:

import replicate

Run bytedance/dolphin using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

output = replicate.run(
    "bytedance/dolphin:19f1ad93970c2bf21442a842d01d97fb04a94a69d2b36dee43531a9cbae07e85",
    input={
        "file": "https://replicate.delivery/pbxt/NDm0JLRLZiYuq9mDaTixxeV2eDy6GyFtQgWzQvC0a2h1npA3/page_1.pdf",
        "output_format": "markdown_content"
    }
)

print(output)

To learn more, take a look at the guide on getting started with Python.

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=r8_cXL**********************************

This is your API token. Keep it to yourself.

Run bytedance/dolphin using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

curl -s -X POST \
  -H "Authorization: Bearer $REPLICATE_API_TOKEN" \
  -H "Content-Type: application/json" \
  -H "Prefer: wait" \
  -d $'{
    "version": "bytedance/dolphin:19f1ad93970c2bf21442a842d01d97fb04a94a69d2b36dee43531a9cbae07e85",
    "input": {
      "file": "https://replicate.delivery/pbxt/NDm0JLRLZiYuq9mDaTixxeV2eDy6GyFtQgWzQvC0a2h1npA3/page_1.pdf",
      "output_format": "markdown_content"
    }
  }' \
  https://api.replicate.com/v1/predictions

To learn more, take a look at Replicate’s HTTP API reference docs.

Output

Unable to display output

{
  "id": "ctehr7x2nxrgc0cqhx5rchw1z4",
  "model": "bytedance/dolphin",
  "version": "19f1ad93970c2bf21442a842d01d97fb04a94a69d2b36dee43531a9cbae07e85",
  "input": {
    "file": "https://replicate.delivery/pbxt/NDm0JLRLZiYuq9mDaTixxeV2eDy6GyFtQgWzQvC0a2h1npA3/page_1.pdf",
    "output_format": "markdown_content"
  },
  "logs": "2025-06-20 21:45:00.684 | INFO     | predict:_process_document:86 - Document has 1 page(s)\n2025-06-20 21:45:00.871 | INFO     | predict:_process_document:107 - Processing page 1/1\nLegacy behavior is being used. The current behavior will be deprecated in version 5.0.0. In the new behavior, if both images and text are provided, the default value of `add_special_tokens` will be changed to `False` when calling the tokenizer if `add_special_tokens` is unset. To test the new behavior, set `legacy=False`as a processor call argument.",
  "output": "# LLaMA: Open and Efficient Foundation Language Models\n\nHugo Touvron; Thibaut Lavril; Gautier Izacard; Xavier Martinet Marie-Anne Lachaux, Timothee Lacroix, Baptiste Rozière, Naman Goyal Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin Edouard Grave; Guillaume Lample $^⋆$\n\nMeta AI\n\n## Abstract\n\nWe introduce LLaMA, a collection of foundation language models ranging from 7B to 65B parameters. We train our models on trillions of tokens, and show that it is possible to train state-of-the-art models using publicly available datasets exclusively, without resorting to proprietary and inaccessible datasets. In particular, LLaMA-13B outperforms GPT-3 (17SB) on most benchmarks, and LLaMA65B is competitive with the best models. Chinchilla-70B and PaLM-540B. We release all our models to the research community $^1$ .\n\n## 1 Introduction\n\nLarge Languages Models (LLMs) trained on massive corpora of texts have shown their ability to perform new tasks from textual instructions or from a few examples ( Brown et al. , 2020 ) . These few-shot properties first appeared when scaling models to a sufficient size ( Kaplan et al. , 2020 ) , resulting in a line of work that focuses on further scaling these models ( Chowdhery et al. , 2022 ; Rae et al. , 2021 ) . These efforts are based on the assumption that more parameters will lead to better performance. However, recent work from Hoffmann et al. ( 2022 ) shows that, for a given compute budget, the best performances are not achieved by the largest models, but by smaller models trained on more data.\n\nThe objective of the scaling laws from Hoffmann et al. ( 2022 ) is to determine how to best scale the dataset and model sizes for a particular training compute budget. However, this objective disregards the inference budget, which becomes critical when serving a language model at scale. In this context, given a target level of performance, the preferred model is not the fastest to train but the fastest at inference, and although it may be cheaper to train a large model to reach a certain level of\n\nperformance, a smaller one trained longer will ultimately be cheaper at inference. For instance, although Hoffmann et al. ( 2022 ) recommends training a 10B model on 200B tokens, we find that the performance of a 7B model continues to improve even after 1T tokens.\n\nThe focus of this work is to train a series of language models that achieve the best possible performance at various inference budgets, by training on more tokens than what is typically used. The resulting models, called LLaMA , ranges from 7B to 65B parameters with competitive performance compared to the best existing LLMs. For instance, LLaMA-13B outperforms GPT-3 on most benchmarks, despite being 10 $\\times$ smaller. We believe that this model will help democratize the access and study of LLMs, since it can be run on a single GPU. At the higher-end of the scale, our 65B-parameter model is also competitive with the best large language models such as Chinchilla or PaLM-540B.\n\nUnlike Chinchilla, PaLM, or GPT-3, we only use publicly available data, making our work compatible with open-sourcing, while most existing models rely on data which is either not publicly available or undocumented (e.g. “ Books – 2TB\" or “ Social media conversations\"). There exist some exceptions, notably OPT ( Zhang et al. , 2022 ) , GPT-NeoX ( Black et al. , 2022 ) , BLOOM ( Scao et al. , 2022 ) and GLM ( Zeng et al. , 2022 ) , but none that are competitive with PaLM-62B or Chinchilla.\n\nIn the rest of this paper, we present an overview of the modifications we made to the transformer architecture ( Vaswani et al. , 2017 ) , as well as our training method. We then report the performance of our models and compare with others LLMs on a set of standard benchmarks. Finally, we expose some of the biases and toxicity encoded in our models, using some of the most recent benchmarks from the responsible AI community.\n\n* Equal contribution. Correspondence: (htouvron, thibautlav,gizacard,egrave,glample)@meta.com\n\nhttps://github.com/facebookresearch/llam\n\narXiv:2302.1397lv1 [cs.CL] 27 Feb 2023\n\n",
  "data_removed": false,
  "error": null,
  "source": "web",
  "status": "succeeded",
  "created_at": "2025-06-20T21:43:08.207Z",
  "started_at": "2025-06-20T21:45:00.520769Z",
  "completed_at": "2025-06-20T21:45:08.835304Z",
  "urls": {
    "cancel": "https://api.replicate.com/v1/predictions/ctehr7x2nxrgc0cqhx5rchw1z4/cancel",
    "get": "https://api.replicate.com/v1/predictions/ctehr7x2nxrgc0cqhx5rchw1z4",
    "web": "https://replicate.com/p/ctehr7x2nxrgc0cqhx5rchw1z4"
  },
  "metrics": {
    "predict_time": 8.314534958,
    "total_time": 120.628304
  }
}

Generated in

8.3 seconds

Tweak it Report