Here's what's changing:
Here are the prices:
Hardware | Before | After |
---|---|---|
CPU | $0.000200 per second | $0.000100 per second ($0.36 per hour) |
Nvidia T4 | $0.000550 per second | $0.000225 per second ($0.81 per hour) |
Nvidia A40 | $0.001300 per second | $0.000575 per second ($2.07 per hour) |
Nvidia A100 (40GB) | $0.002300 per second | $0.001150 per second ($4.14 per hour) |
Nvidia A100 (80GB) | $0.003200 per second | $0.001400 per second ($5.04 per hour) |
When you run a model, it is running on a GPU instance. It takes a bit of time to start up the model, then your prediction runs, then we keep the instance idle for a bit of time after the prediction finishes so that subsequent requests are fast.
Currently, we charge you only for the amount of time that the model is running a prediction. Soon, we’re going to start charging private models for startup time and idle time, at half the per-second price. This will only be for new users or if you opt-in.
If you’re running a large volume of requests on private models, this will be significantly cheaper, because you’ll be making efficient use of the underlying instance. If you’re running a small number of requests, then this will be more expensive.
Private models still scale to zero when you aren’t using them, but we'll bill for that bit of extra compute time before it scales to zero. We’re also going to let you control how long that time is.
This change will just be for new users. For existing users, this change will be opt-in and nothing will change unless you want it to.
If you're just using public models, you can stop reading right now. We're rolling out the new prices over the course of the month. Enjoy your lower bill. 🍹
If you’re an existing user of private models, you’re not going to pay more. We want this to be unambiguously good news for you. If the new prices will save you money, you can switch over. If not, you can keep your current prices. Stay tuned for an email.
If you have any questions, contact us via support.