Run deniyes/dolly-v2-12b-demo using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
output = replicate.run(
"deniyes/dolly-v2-12b-demo:ef548bcbf14a2dc42292c647523630085bdb7e4a65a8e405237fccdc03e4cbda",
input={
"top_k": 50,
"top_p": 1,
"prompt": "please compare the Cog and Blentoml",
"decoding": "top_p",
"max_length": 500,
"temperature": 0.75,
"repetition_penalty": 1.2
}
)
# The deniyes/dolly-v2-12b-demo model can stream output as it's running.# The predict method returns an iterator, and you can iterate over that output.for item in output:
# https://replicate.com/deniyes/dolly-v2-12b-demo/api#output-schemaprint(item, end="")
Cog - A computer program that allows you to create a virtual assistant.
Blentoml - An open-source language model platform for building conversational AI applications.
{
"completed_at": "2024-07-24T11:49:01.119868Z",
"created_at": "2024-07-24T11:46:10.007000Z",
"data_removed": false,
"error": null,
"id": "zwczqjh3txrj40cgwj8bcx6gbm",
"input": {
"top_k": 50,
"top_p": 1,
"prompt": "please compare the Cog and Blentoml",
"decoding": "top_p",
"max_length": 500,
"temperature": 0.75,
"repetition_penalty": 1.2
},
"logs": "The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.\nSetting `pad_token_id` to `eos_token_id`:0 for open-end generation.",
"metrics": {
"predict_time": 1.811844952,
"total_time": 171.112868
},
"output": [
"Cog",
" -",
" A",
" computer",
" program",
" that",
" allows",
" you",
" to",
" create",
" a",
" virtual",
" assistant.\nBlentoml",
" -",
" An",
" open-source",
" language",
" model",
" platform",
" for",
" building",
" conversational",
" AI",
" applications."
],
"started_at": "2024-07-24T11:48:59.308023Z",
"status": "succeeded",
"urls": {
"stream": "https://streaming-api.svc.rno2.c.replicate.net/v1/streams/kxmvlh2dtsgltzyywrbn4n4j5ys7fwsbmsvnl4gw46abvkn62qfa",
"get": "https://api.replicate.com/v1/predictions/zwczqjh3txrj40cgwj8bcx6gbm",
"cancel": "https://api.replicate.com/v1/predictions/zwczqjh3txrj40cgwj8bcx6gbm/cancel"
},
"version": "ef548bcbf14a2dc42292c647523630085bdb7e4a65a8e405237fccdc03e4cbda"
}
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Run time and cost
This model runs on Nvidia A100 (80GB) GPU hardware.
We don't yet have enough runs of this model to provide performance information.
This model is cold. You'll get a fast response if the model is warm and already running, and a slower response if the model is cold and starting up.
Copy
Show
Copy
Copy
Copy
Copy
Show
Copy
Copy
Show run API
Copy
Show
Copy
Copy
Copy
Logs (zwczqjh3txrj40cgwj8bcx6gbm)
Succeeded
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.