datalab-to/marker:c00ae56f | Run with an API on Replicate

You're looking at a specific version of this model. Jump to the model overview.

datalab-to /marker:c00ae56f

Input schema

The fields you can use to run this model with an API. If you don’t give a value for a field its default value will be used.

Field	Type	Default value	Description
file	string		Input file. Must be one of: .pdf, .doc, .docx, .ppt, .pptx, .png, .jpg, .jpeg, .webp
max_pages	integer	Min: 1	Maximum number of pages to process. Cannot be specified if page_range is set - these parameters are mutually exclusive
page_range	string		Page range to parse, comma separated like 0,5-10,20. Example: '0,2-4' will process pages 0, 2, 3, and 4. Cannot be specified if max_pages is set - these parameters are mutually exclusive
force_ocr	boolean	False	Force OCR on all pages of the PDF
format_lines	boolean	False	Format the lines in the output to detect inline math and styles
paginate	boolean	False	Whether to paginate the output. Each page will be separated by horizontal rules
strip_existing_ocr	boolean	False	Strip existing OCR text from the PDF and re-run OCR
disable_image_extraction	boolean	False	Disable image extraction from the PDF
disable_ocr_math	boolean	False	Disable inline math recognition in OCR
use_llm	boolean	False	Significantly improves accuracy by using an LLM. Will increase latency
mode	None	fast	Output mode: fast (lowest latency), balanced, or accurate (slowest)
output_format	string	markdown	Output format for the text. Can be 'json', 'html', 'markdown', or 'chunks'. You can comma separate multiple formats
skip_cache	boolean	False	Skip the cache and re-run inference
save_checkpoint	boolean	False	Save checkpoint after processing
block_correction_prompt	string		Optional prompt to improve output alignment to specific requirements
page_schema	string		Schema for structured extraction (JSON string of Pydantic schema)
segmentation_schema	string		Schema for document segmentation (JSON string with segment names and descriptions)
additional_config	string		Additional configuration options as JSON string. Supports keys like 'disable_links', 'keep_pageheader_in_output', 'keep_pagefooter_in_output', 'filter_blank_pages', 'drop_repeated_text', 'layout_coverage_threshold', 'merge_threshold', 'height_tolerance', 'gap_threshold', 'image_threshold', 'min_line_length', 'level_count', 'default_level', 'no_merge_tables_across_pages', 'force_layout_block'. See full documentation at https://documentation.datalab.to/api-reference/marker
visualize_output	boolean	False	Generate visualization images showing detected text regions overlaid on original document for OCR debugging

Output schema

The shape of the response you’ll get when you run this model with an API.

Schema

{'properties': {'chunks': {'items': {'additionalProperties': True,
                                     'type': 'object'},
                           'nullable': True,
                           'title': 'Chunks',
                           'type': 'array'},
                'html': {'nullable': True, 'title': 'Html', 'type': 'string'},
                'images': {'items': {'anyOf': [],
                                     'format': 'uri',
                                     'type': 'string'},
                           'nullable': True,
                           'title': 'Images',
                           'type': 'array'},
                'json_data': {'additionalProperties': True,
                              'nullable': True,
                              'title': 'Json Data',
                              'type': 'object'},
                'markdown': {'nullable': True,
                             'title': 'Markdown',
                             'type': 'string'},
                'metadata': {'additionalProperties': True,
                             'nullable': True,
                             'title': 'Metadata',
                             'type': 'object'},
                'page_count': {'title': 'Page Count', 'type': 'integer'},
                'visualization_images': {'items': {'anyOf': [],
                                                   'format': 'uri',
                                                   'type': 'string'},
                                         'nullable': True,
                                         'title': 'Visualization Images',
                                         'type': 'array'}},
 'required': ['page_count'],
 'title': 'MarkerOutput',
 'type': 'object'}