sljeff/dots.ocr:214a4fc4 | Run with an API on Replicate

You're looking at a specific version of this model. Jump to the model overview.

The fields you can use to run this model with an API. If you don’t give a value for a field its default value will be used.

Field	Type	Default value	Description
image	string		Input image for OCR
prompt	string	Please output the layout information from the PDF image, including each layout element's bbox, its category, and the corresponding text content within the bbox. 1. Bbox format: [x1, y1, x2, y2] 2. Layout Categories: The possible categories are ['Caption', 'Footnote', 'Formula', 'List-item', 'Page-footer', 'Page-header', 'Picture', 'Section-header', 'Table', 'Text', 'Title']. 3. Text Extraction & Formatting Rules: - Picture: For the 'Picture' category, the text field should be omitted. - Formula: Format its text as LaTeX. - Table: Format its text as HTML. - All Others (Text, Title, etc.): Format their text as Markdown. 4. Constraints: - The output text must be the original text from the image, with no translation. - All layout elements must be sorted according to human reading order. 5. Final Output: The entire output must be a single JSON object.	Prompt to guide the extraction
temperature	number	0.1 Max: 2	Temperature for sampling (lower = more deterministic)
max_tokens	integer	16384 Min: 1 Max: 32768	Maximum number of tokens to generate
top_p	number	1 Max: 1	Top-p sampling parameter

The shape of the response you’ll get when you run this model with an API.

Schema

{'title': 'Output', 'type': 'string'}