You're looking at a specific version of this model. Jump to the model overview.

sljeff /dots.ocr:214a4fc4

Input schema

The fields you can use to run this model with an API. If you don’t give a value for a field its default value will be used.

Field Type Default value Description
image
string
Input image for OCR
prompt
string
Please output the layout information from the PDF image, including each layout element's bbox, its category, and the corresponding text content within the bbox. 1. Bbox format: [x1, y1, x2, y2] 2. Layout Categories: The possible categories are ['Caption', 'Footnote', 'Formula', 'List-item', 'Page-footer', 'Page-header', 'Picture', 'Section-header', 'Table', 'Text', 'Title']. 3. Text Extraction & Formatting Rules: - Picture: For the 'Picture' category, the text field should be omitted. - Formula: Format its text as LaTeX. - Table: Format its text as HTML. - All Others (Text, Title, etc.): Format their text as Markdown. 4. Constraints: - The output text must be the original text from the image, with no translation. - All layout elements must be sorted according to human reading order. 5. Final Output: The entire output must be a single JSON object.
Prompt to guide the extraction
temperature
number
0.1

Max: 2

Temperature for sampling (lower = more deterministic)
max_tokens
integer
16384

Min: 1

Max: 32768

Maximum number of tokens to generate
top_p
number
1

Max: 1

Top-p sampling parameter

Output schema

The shape of the response you’ll get when you run this model with an API.

Schema
{'title': 'Output', 'type': 'string'}