You're looking at a specific version of this model. Jump to the model overview.
stspanho /spectacles-yolov7-trainer:f6dda6a0
Input schema
The fields you can use to run this model with an API. If you don’t give a value for a field its default value will be used.
| Field | Type | Default value | Description |
|---|---|---|---|
| images_zip_url |
string
|
URL to a .zip of real Spectacles frames (flat folder of images).
|
|
| classes |
string
|
coffee cup
|
Comma-separated class list. Must line up with the objects described in synthetic_prompts and the objects visible in the real Spectacles frames, e.g. "coffee cup" or "banana, apple".
|
| synthetic_prompts |
string
|
first-person POV wide-angle snapshot from smart glasses, a white ceramic coffee cup with steam on a wooden kitchen counter, eye-level, soft morning daylight from a window; head-tilted-down POV wide-angle smart-glasses view, a takeaway coffee cup with a brown lid on a cafe table with coffee rings, warm tungsten overhead light; first-person 45-degree downward POV snapshot from smart glasses, an espresso cup on a saucer on a marble counter, slight motion blur, cafe ambient light; near top-down POV from head-mounted smart glasses, a latte mug with foam art on a wooden desk next to a laptop, cool daylight; first-person POV looking down at an angle, a stainless steel travel mug on an office desk cluttered with notebooks and pens, mixed fluorescent + window light; overhead first-person POV wide-angle snapshot looking straight down, a paper coffee cup with a corrugated sleeve on a cafe table, surface filling most of the frame; Dutch-angle first-person smart-glasses view, a glass mug of black coffee on a glass coffee table in a living room, low golden-hour light; eye-level POV snapshot captured while walking, a takeaway coffee cup held loosely out of frame, blurred kitchen counter background
|
Semicolon-separated FLUX.1-schnell prompts. Each prompt should describe ONE scene as if seen through Spectacles: include (1) a first-person/POV camera angle (eye-level, head-tilted-down, top-down...), (2) the target object with concrete visual detail, (3) the surface it sits on, and (4) the lighting. Prompts are round-robined across `synthetic_count` frames with unique seeds for variety. The default is a set of Spectacles-POV coffee-cup scenes — replace every entry to retarget the synthetic data. To train on more than one class, mix prompts for each class across the list (roughly synthetic_count / len(prompts) per scene).
|
| synthetic_count |
integer
|
100
Max: 1000 |
Number of synthetic frames to generate (0 = skip Flux entirely).
|
| epochs |
integer
|
200
Min: 1 Max: 1000 |
None
|
| batch_size |
integer
|
64
Min: 1 Max: 128 |
None
|
| img_size |
None
|
224
|
Training + export image size (must be a multiple of 32). 224 is Snap's SnapML recipe for Spectacles; 320/416/512/640 give higher accuracy at higher on-device cost.
|
| sam_score_threshold |
number
|
0.5
Max: 1 |
Minimum SAM 3 detection confidence for an annotation to be kept. Lower (e.g. 0.3) -> more boxes per image but noisier labels; higher (e.g. 0.7) -> fewer, cleaner labels but you may drop images entirely. If a run errors with 'SAM 3 produced no detections above threshold', lower this. Tune by enabling include_dataset and inspecting the .txt labels.
|
| include_dataset |
boolean
|
False
|
If true, bundle the SAM 3-annotated train/val dataset into the output zip.
|
Output schema
The shape of the response you’ll get when you run this model with an API.
Schema
{'format': 'uri', 'title': 'Output', 'type': 'string'}