Florence-2 Advanced OCR (Offline & Robust)
This is a custom deployment of Microsoft’s Florence-2-Large, optimized for high-throughput document OCR.
Unlike standard implementations, this version includes: * Robust Pre-processing: “Squash” resizing strategy (1024x1024) to maximize text density and prevent model hallucinations on rectangular documents. * Auto-Orientation: Automatically fixes EXIF rotation issues for phone-captured documents. * Structured JSON Output: Returns clean, parsed JSON with both raw text and bounding box regions. * Multi-Image Support: Process batch uploads (e.g., full PDF pages converted to images) in a single request.
Input Parameters
| Input | Type | Default | Description |
|---|---|---|---|
images |
List[Path] |
Required | A list of image files to process. Max 50 images per request. Supports JPG, PNG, WEBP. |
task |
String |
"OCR" |
The task to perform. • OCR: Extracts raw text only.• OCR_WITH_REGION: Extracts text + bounding box coordinates. |
include_image |
Boolean |
False |
If True, returns a base64-encoded copy of the image with bounding boxes drawn on it (only works with OCR_WITH_REGION). |
Output Schema
The model returns a JSON String. You must parse this string in your client application.
Top-Level Structure
{
"task": "OCR_WITH_REGION",
"total_pages": 1,
"pages": [ ... ]
}
Page Object Structure
| Field | Type | Description |
|---|---|---|
page |
Integer |
The index of the page (1-based). |
type |
String |
The task performed (e.g., OCR). |
text |
String |
The full extracted text combined into a single string. |
regions |
Array |
A list of structured objects linking text to its location. |
image_base64 |
String |
(Optional) The base64 PNG string of the debug image if include_image=True. |
error |
String |
Error message if the specific page failed, otherwise null. |
Region Object Structure
The regions array provides precise localization for every line of text detected.
{
"text": "It was a cold windy night",
"bbox": [100.5, 200.0, 500.2, 250.0] // [x1, y1, x2, y2]
}
Example Response
{
"task": "OCR_WITH_REGION",
"total_pages": 1,
"results": [
{
"page": 1,
"type": "OCR_WITH_REGION",
"text": "INVOICE #1024\nDATE: 2024-01-01",
"regions": [
{
"text": "INVOICE #1024",
"bbox": [50.0, 50.0, 200.0, 80.0]
},
{
"text": "DATE: 2024-01-01",
"bbox": [50.0, 90.0, 200.0, 110.0]
}
],
"image_base64": null,
"error": null
}
]
}
Usage Examples
JavaScript
import Replicate from "replicate";
const replicate = new Replicate();
const output = await replicate.run(
"your-username/florence-2-ocr:latest",
{
input: {
images: [
"[https://example.com/doc_sample.jpg](https://example.com/doc_sample.jpg)"
],
task: "OCR",
include_image: false
}
}
);
// Replicate usually returns the string output directly
const result = JSON.parse(output);
console.log(result.results[0].text);