You're looking at a specific version of this model. Jump to the model overview.

datalab-to /ocr:0f698414

Input schema

The fields you can use to run this model with an API. If you don’t give a value for a field its default value will be used.

Field Type Default value Description
file
string
Input file. Must be one of: .pdf, .doc, .docx, .ppt, .pptx, .png, .jpg, .jpeg, .webp
max_pages
integer

Min: 1

Maximum number of pages to process. Cannot be specified if page_range is set - these parameters are mutually exclusive
visualize
boolean
False
Draw red polygons on the input image(s) to visualize detected text regions and return the annotated images
page_range
string
Page range to parse, comma separated like 0,5-10,20. Example: '0,2-4' will process pages 0, 2, 3, and 4. Cannot be specified if max_pages is set - these parameters are mutually exclusive
skip_cache
boolean
False
Bypass the server-side cache and force re-processing. By default, identical requests are cached to save time and cost. Enable this to get fresh results
return_pages
boolean
False
Return detailed page information including text lines, bounding boxes, polygons, and character-level data. When disabled, only text and page_count will be returned

Output schema

The shape of the response you’ll get when you run this model with an API.

Schema
{'properties': {'page_count': {'nullable': True,
                               'title': 'Page Count',
                               'type': 'integer'},
                'pages': {'items': {'additionalProperties': True,
                                    'type': 'object'},
                          'nullable': True,
                          'title': 'Pages',
                          'type': 'array'},
                'text': {'title': 'Text', 'type': 'string'},
                'visualizations': {'items': {'anyOf': [],
                                             'format': 'uri',
                                             'type': 'string'},
                                   'nullable': True,
                                   'title': 'Visualizations',
                                   'type': 'array'}},
 'required': ['text'],
 'title': 'OCROutput',
 'type': 'object'}