unityaisolutions/webdl
A beautiful, user-friendly website scraper built with HTTrack and Replicate Cog. Scrapes websites and returns them as ZIP archives with rich progress feedback.
Run unityaisolutions/webdl with an API
Use one of our client libraries to get started quickly. Clicking on a library will take you to the Playground tab where you can tweak different inputs, see the results, and copy the corresponding code to use in your own project.
Input schema
The fields you can use to run this model with an API. If you don't give a value for a field its default value will be used.
| Field | Type | Default value | Description |
|---|---|---|---|
| url |
string
|
The URL of the website to scrape (e.g., https://example.com)
|
|
| max_depth |
integer
|
2
Max: 10 |
Maximum depth to crawl (0 = unlimited)
|
| max_size |
integer
|
100
Max: 1000 |
Maximum size in MB to download (0 = unlimited)
|
| external_links |
boolean
|
False
|
Follow links to external domains
|
| include_media |
boolean
|
True
|
Download images, videos, and other media files
|
{
"type": "object",
"title": "Input",
"required": [
"url"
],
"properties": {
"url": {
"type": "string",
"title": "Url",
"x-order": 0,
"description": "The URL of the website to scrape (e.g., https://example.com)"
},
"max_size": {
"type": "integer",
"title": "Max Size",
"default": 100,
"maximum": 1000,
"minimum": 0,
"x-order": 2,
"description": "Maximum size in MB to download (0 = unlimited)"
},
"max_depth": {
"type": "integer",
"title": "Max Depth",
"default": 2,
"maximum": 10,
"minimum": 0,
"x-order": 1,
"description": "Maximum depth to crawl (0 = unlimited)"
},
"include_media": {
"type": "boolean",
"title": "Include Media",
"default": true,
"x-order": 4,
"description": "Download images, videos, and other media files"
},
"external_links": {
"type": "boolean",
"title": "External Links",
"default": false,
"x-order": 3,
"description": "Follow links to external domains"
}
}
}
Output schema
The shape of the response you’ll get when you run this model with an API.
{
"type": "string",
"title": "Output",
"format": "uri"
}