We use Coreweave as our cloud-provider, they serve us GPUs by the hour. About 60% of our cloud costs are the costs of having RTX_A5000 GPUs serving the models to users (inference costs), and the remaining 40% comes from A100_PCIE GPUs for training new models (training costs).

How costs will evolve

Cloud costs scale with usage. The per-user cost will go down. Their are two main forces that drive cost down:

  1. Commercial: We’ve already got deep discounts (high usage + good relationship), we expect a further 31% reduction in cost through increased discounts as we scale our usage.
  2. Engineering: In the last 18 months we reduced the cost of inference by over 1000x with straight-forward engineering fixes. There are still a handful of low hanging fruit that we expect should be able to halve costs further: optimal database usage (currently very inefficient, further triton improvements, etc…). Once fully implemented this should give at least a 35% reduction in costs.

Costs today

Material agreements: we negotiated deep discounts on GPUs

Our net discount rate is 42% for cloud costs, as we scale our usage we expect to increase this up to 60%.

For all of our Coreweave usage we get a minimum 30% discount (this is 2x larger than the discount they give normal customers).

Screenshot 2023-04-25 at 11.01.31.png

Screenshot 2023-04-25 at 11.01.44.png

Specifically or inference we have a 50% discount on 200 RTX_A5000 GPUs (this is most of our usage).

Screenshot 2023-04-25 at 10.51.51.png

Screenshot 2023-04-25 at 10.52.09.png