datalab-to/ocr:909c96fc | Run with an API on Replicate

You're looking at a specific version of this model. Jump to the model overview.

datalab-to /ocr:909c96fc

Input schema

The fields you can use to run this model with an API. If you don’t give a value for a field its default value will be used.

Field	Type	Default value	Description
file	string		Input file. Must be one of: .pdf, .doc, .docx, .ppt, .pptx, .png, .jpg, .jpeg, .webp
max_pages	integer	Min: 1	Maximum number of pages to process. Cannot be specified if page_range is set - these parameters are mutually exclusive
visualize	boolean	False	Draw red polygons on the input image(s) to visualize detected text regions and return the annotated images
page_range	string		Page range to parse, comma separated like 0,5-10,20. Example: '0,2-4' will process pages 0, 2, 3, and 4. Cannot be specified if max_pages is set - these parameters are mutually exclusive
skip_cache	boolean	False	Bypass the server-side cache and force re-processing. By default, identical requests are cached to save time and cost. Enable this to get fresh results
return_text	boolean	False	Return extracted text as a single string with all text lines concatenated. Each line is separated by a newline character
return_pages	boolean	True	Return detailed page information including text lines, bounding boxes, polygons, and character-level data. When disabled, only text and page_count will be returned

Output schema

The shape of the response you’ll get when you run this model with an API.

Schema

{'description': 'OCR output with optional text, pages, and visualizations',
 'properties': {'page_count': {'description': 'Total number of pages '
                                              'processed. Only returned if '
                                              'return_pages=true'},
                'pages': {'description': 'List of pages with detailed OCR '
                                         'information. Only returned if '
                                         'return_pages=true. Each page '
                                         'contains: page (number), image_bbox '
                                         '(coordinates), and text_lines (list '
                                         'of detected text). Each text_line '
                                         'has: text (string), confidence '
                                         '(0-1), bbox ([x1,y1,x2,y2]), polygon '
                                         '([[x1,y1],[x2,y2],[x3,y3],[x4,y4]]), '
                                         'and chars (character-level data with '
                                         'text, bbox, polygon, confidence, '
                                         'bbox_valid)'},
                'text': {'description': 'Extracted text as a single string '
                                        'with all text lines concatenated, '
                                        'separated by newlines. Only returned '
                                        'if return_text=true'},
                'visualizations': {'description': 'List of images with red '
                                                  'polygons drawn around '
                                                  'detected text regions. Only '
                                                  'returned if '
                                                  'visualize=true'}},
 'title': 'OCROutput',
 'type': 'object'}