rocketcoder/florence-2-lg-ocr

Vision Model that excels at batch OCR processing

Public
18 runs

Florence-2 Advanced OCR (Offline & Robust)

This is a custom deployment of Microsoft’s Florence-2-Large, optimized for high-throughput document OCR.

Unlike standard implementations, this version includes: * Robust Pre-processing: “Squash” resizing strategy (1024x1024) to maximize text density and prevent model hallucinations on rectangular documents. * Auto-Orientation: Automatically fixes EXIF rotation issues for phone-captured documents. * Structured JSON Output: Returns clean, parsed JSON with both raw text and bounding box regions. * Multi-Image Support: Process batch uploads (e.g., full PDF pages converted to images) in a single request.

Input Parameters

Input Type Default Description
images List[Path] Required A list of image files to process. Max 50 images per request. Supports JPG, PNG, WEBP.
task String "OCR" The task to perform.
OCR: Extracts raw text only.
OCR_WITH_REGION: Extracts text + bounding box coordinates.
include_image Boolean False If True, returns a base64-encoded copy of the image with bounding boxes drawn on it (only works with OCR_WITH_REGION).

Output Schema

The model returns a JSON String. You must parse this string in your client application.

Top-Level Structure

{
  "task": "OCR_WITH_REGION",
  "total_pages": 1,
  "pages": [ ... ]
}

Page Object Structure

Field Type Description
page Integer The index of the page (1-based).
type String The task performed (e.g., OCR).
text String The full extracted text combined into a single string.
regions Array A list of structured objects linking text to its location.
image_base64 String (Optional) The base64 PNG string of the debug image if include_image=True.
error String Error message if the specific page failed, otherwise null.

Region Object Structure

The regions array provides precise localization for every line of text detected.

{
  "text": "It was a cold windy night",
  "bbox": [100.5, 200.0, 500.2, 250.0]  // [x1, y1, x2, y2]
}

Example Response

{
  "task": "OCR_WITH_REGION",
  "total_pages": 1,
  "results": [
    {
      "page": 1,
      "type": "OCR_WITH_REGION",
      "text": "INVOICE #1024\nDATE: 2024-01-01",
      "regions": [
        {
          "text": "INVOICE #1024",
          "bbox": [50.0, 50.0, 200.0, 80.0]
        },
        {
          "text": "DATE: 2024-01-01",
          "bbox": [50.0, 90.0, 200.0, 110.0]
        }
      ],
      "image_base64": null,
      "error": null
    }
  ]
}

Usage Examples

JavaScript

import Replicate from "replicate";
const replicate = new Replicate();

const output = await replicate.run(
  "your-username/florence-2-ocr:latest",
  {
    input: {
      images: [
        "[https://example.com/doc_sample.jpg](https://example.com/doc_sample.jpg)"
      ],
      task: "OCR",
      include_image: false
    }
  }
);

// Replicate usually returns the string output directly
const result = JSON.parse(output);
console.log(result.results[0].text);
Model created