rocketcoder/florence-2-lg-ocr

Vision Model that excels at batch OCR processing

Public
18 runs

Run time and cost

This model runs on Nvidia T4 GPU hardware. We don't yet have enough runs of this model to provide performance information.

Readme

Florence-2 Advanced OCR (Offline & Robust)

This is a custom deployment of Microsoft’s Florence-2-Large, optimized for high-throughput document OCR.

Unlike standard implementations, this version includes: * Robust Pre-processing: “Squash” resizing strategy (1024x1024) to maximize text density and prevent model hallucinations on rectangular documents. * Auto-Orientation: Automatically fixes EXIF rotation issues for phone-captured documents. * Structured JSON Output: Returns clean, parsed JSON with both raw text and bounding box regions. * Multi-Image Support: Process batch uploads (e.g., full PDF pages converted to images) in a single request.

Input Parameters

Input Type Default Description
images List[Path] Required A list of image files to process. Max 50 images per request. Supports JPG, PNG, WEBP.
task String "OCR" The task to perform.
OCR: Extracts raw text only.
OCR_WITH_REGION: Extracts text + bounding box coordinates.
include_image Boolean False If True, returns a base64-encoded copy of the image with bounding boxes drawn on it (only works with OCR_WITH_REGION).

Output Schema

The model returns a JSON String. You must parse this string in your client application.

Top-Level Structure

{
  "task": "OCR_WITH_REGION",
  "total_pages": 1,
  "pages": [ ... ]
}

Page Object Structure

Field Type Description
page Integer The index of the page (1-based).
type String The task performed (e.g., OCR).
text String The full extracted text combined into a single string.
regions Array A list of structured objects linking text to its location.
image_base64 String (Optional) The base64 PNG string of the debug image if include_image=True.
error String Error message if the specific page failed, otherwise null.

Region Object Structure

The regions array provides precise localization for every line of text detected.

{
  "text": "It was a cold windy night",
  "bbox": [100.5, 200.0, 500.2, 250.0]  // [x1, y1, x2, y2]
}

Example Response

{
  "task": "OCR_WITH_REGION",
  "total_pages": 1,
  "results": [
    {
      "page": 1,
      "type": "OCR_WITH_REGION",
      "text": "INVOICE #1024\nDATE: 2024-01-01",
      "regions": [
        {
          "text": "INVOICE #1024",
          "bbox": [50.0, 50.0, 200.0, 80.0]
        },
        {
          "text": "DATE: 2024-01-01",
          "bbox": [50.0, 90.0, 200.0, 110.0]
        }
      ],
      "image_base64": null,
      "error": null
    }
  ]
}

Usage Examples

JavaScript

import Replicate from "replicate";
const replicate = new Replicate();

const output = await replicate.run(
  "your-username/florence-2-ocr:latest",
  {
    input: {
      images: [
        "[https://example.com/doc_sample.jpg](https://example.com/doc_sample.jpg)"
      ],
      task: "OCR",
      include_image: false
    }
  }
);

// Replicate usually returns the string output directly
const result = JSON.parse(output);
console.log(result.results[0].text);
Model created