We use Coreweave as our cloud-provider, they serve us GPUs by the hour. About 60% of our cloud costs are the costs of having RTX_A5000 GPUs serving the models to users (inference costs), and the remaining 40% comes from A100_PCIE GPUs for training new models (training costs).
Cloud costs scale with usage. The per-user cost will go down. Their are two main forces that drive cost down:
Our net discount rate is 42% for cloud costs, as we scale our usage we expect to increase this up to 60%.
For all of our Coreweave usage we get a minimum 30% discount (this is 2x larger than the discount they give normal customers).
Specifically or inference we have a 50% discount on 200 RTX_A5000 GPUs (this is most of our usage).