You're looking at a specific version of this model. Jump to the model overview.

datalab-to /marker:c00ae56f

Input schema

The fields you can use to run this model with an API. If you don’t give a value for a field its default value will be used.

Field Type Default value Description
file
string
Input file. Must be one of: .pdf, .doc, .docx, .ppt, .pptx, .png, .jpg, .jpeg, .webp
max_pages
integer

Min: 1

Maximum number of pages to process. Cannot be specified if page_range is set - these parameters are mutually exclusive
page_range
string
Page range to parse, comma separated like 0,5-10,20. Example: '0,2-4' will process pages 0, 2, 3, and 4. Cannot be specified if max_pages is set - these parameters are mutually exclusive
force_ocr
boolean
False
Force OCR on all pages of the PDF
format_lines
boolean
False
Format the lines in the output to detect inline math and styles
paginate
boolean
False
Whether to paginate the output. Each page will be separated by horizontal rules
strip_existing_ocr
boolean
False
Strip existing OCR text from the PDF and re-run OCR
disable_image_extraction
boolean
False
Disable image extraction from the PDF
disable_ocr_math
boolean
False
Disable inline math recognition in OCR
use_llm
boolean
False
Significantly improves accuracy by using an LLM. Will increase latency
mode
None
fast
Output mode: fast (lowest latency), balanced, or accurate (slowest)
output_format
string
markdown
Output format for the text. Can be 'json', 'html', 'markdown', or 'chunks'. You can comma separate multiple formats
skip_cache
boolean
False
Skip the cache and re-run inference
save_checkpoint
boolean
False
Save checkpoint after processing
block_correction_prompt
string
Optional prompt to improve output alignment to specific requirements
page_schema
string
Schema for structured extraction (JSON string of Pydantic schema)
segmentation_schema
string
Schema for document segmentation (JSON string with segment names and descriptions)
additional_config
string
Additional configuration options as JSON string. Supports keys like 'disable_links', 'keep_pageheader_in_output', 'keep_pagefooter_in_output', 'filter_blank_pages', 'drop_repeated_text', 'layout_coverage_threshold', 'merge_threshold', 'height_tolerance', 'gap_threshold', 'image_threshold', 'min_line_length', 'level_count', 'default_level', 'no_merge_tables_across_pages', 'force_layout_block'. See full documentation at https://documentation.datalab.to/api-reference/marker
visualize_output
boolean
False
Generate visualization images showing detected text regions overlaid on original document for OCR debugging

Output schema

The shape of the response you’ll get when you run this model with an API.

Schema
{'properties': {'chunks': {'items': {'additionalProperties': True,
                                     'type': 'object'},
                           'nullable': True,
                           'title': 'Chunks',
                           'type': 'array'},
                'html': {'nullable': True, 'title': 'Html', 'type': 'string'},
                'images': {'items': {'anyOf': [],
                                     'format': 'uri',
                                     'type': 'string'},
                           'nullable': True,
                           'title': 'Images',
                           'type': 'array'},
                'json_data': {'additionalProperties': True,
                              'nullable': True,
                              'title': 'Json Data',
                              'type': 'object'},
                'markdown': {'nullable': True,
                             'title': 'Markdown',
                             'type': 'string'},
                'metadata': {'additionalProperties': True,
                             'nullable': True,
                             'title': 'Metadata',
                             'type': 'object'},
                'page_count': {'title': 'Page Count', 'type': 'integer'},
                'visualization_images': {'items': {'anyOf': [],
                                                   'format': 'uri',
                                                   'type': 'string'},
                                         'nullable': True,
                                         'title': 'Visualization Images',
                                         'type': 'array'}},
 'required': ['page_count'],
 'title': 'MarkerOutput',
 'type': 'object'}