Florence-2 Advanced OCR (Offline & Robust)

This is a custom deployment of Microsoft’s Florence-2-Large, optimized for high-throughput document OCR.

Unlike standard implementations, this version includes: * Robust Pre-processing: “Squash” resizing strategy (1024x1024) to maximize text density and prevent model hallucinations on rectangular documents. * Auto-Orientation: Automatically fixes EXIF rotation issues for phone-captured documents. * Structured JSON Output: Returns clean, parsed JSON with both raw text and bounding box regions. * Multi-Image Support: Process batch uploads (e.g., full PDF pages converted to images) in a single request.

Input Parameters

Input	Type	Default	Description
`images`	`List[Path]`	Required	A list of image files to process. Max 50 images per request. Supports JPG, PNG, WEBP.
`task`	`String`	`"OCR"`	The task to perform. • `OCR`: Extracts raw text only. • `OCR_WITH_REGION`: Extracts text + bounding box coordinates.
`include_image`	`Boolean`	`False`	If `True`, returns a base64-encoded copy of the image with bounding boxes drawn on it (only works with `OCR_WITH_REGION`).

Output Schema

The model returns a JSON String. You must parse this string in your client application.

Top-Level Structure

{
  "task": "OCR_WITH_REGION",
  "total_pages": 1,
  "pages": [ ... ]
}

Page Object Structure

Field	Type	Description
`page`	`Integer`	The index of the page (1-based).
`type`	`String`	The task performed (e.g., `OCR`).
`text`	`String`	The full extracted text combined into a single string.
`regions`	`Array`	A list of structured objects linking text to its location.
`image_base64`	`String`	(Optional) The base64 PNG string of the debug image if `include_image=True`.
`error`	`String`	Error message if the specific page failed, otherwise `null`.

Region Object Structure

The regions array provides precise localization for every line of text detected.

{
  "text": "It was a cold windy night",
  "bbox": [100.5, 200.0, 500.2, 250.0]  // [x1, y1, x2, y2]
}

Example Response

{
  "task": "OCR_WITH_REGION",
  "total_pages": 1,
  "results": [
    {
      "page": 1,
      "type": "OCR_WITH_REGION",
      "text": "INVOICE #1024\nDATE: 2024-01-01",
      "regions": [
        {
          "text": "INVOICE #1024",
          "bbox": [50.0, 50.0, 200.0, 80.0]
        },
        {
          "text": "DATE: 2024-01-01",
          "bbox": [50.0, 90.0, 200.0, 110.0]
        }
      ],
      "image_base64": null,
      "error": null
    }
  ]
}

Usage Examples

JavaScript

import Replicate from "replicate";
const replicate = new Replicate();

const output = await replicate.run(
  "your-username/florence-2-ocr:latest",
  {
    input: {
      images: [
        "[https://example.com/doc_sample.jpg](https://example.com/doc_sample.jpg)"
      ],
      task: "OCR",
      include_image: false
    }
  }
);

// Replicate usually returns the string output directly
const result = JSON.parse(output);
console.log(result.results[0].text);

Model created 1 month, 3 weeks ago