Status: Draft → Review → Approved
Authors: Evan | Qiang | Doug | Rick | Cloud SPE | Network Advisory Board
Timeline: 2 Weeks (Complete by End of November)
This RFC defines the Design Spec for Milestone 1.0 (NaaP MVP) — to deliver a publicly observable SLA reporting system for the Livepeer Network-as-a-Product (NaaP).
It establishes a unified set of GPU and Network metrics, technical architecture, and API interfaces that enable real-time monitoring of Orchestrator performance and network demand across geographies and workflows.
This RFC establishes the data foundation for self-adaptive scaling and SLA-based orchestration in Milestone 2 by standardizing the telemetry schema and analytic models introduced here.
Success Criteria
Deliverable #1:
A dashboard that displays the core metrics (See Section 4) from the whole AI Network.
User story: As a community member, I can use this dashboard to learn for any specific O, and workflow, how is performing historically, and in realtime, defined by the well-defined set of metrics in Section 4.
Deliverable #2:
A MVP Gateway that specializes in executing the “Test Loads”, to monitor the network performance metrics.
User story: As a community member, I have confidence in the overall network performance, because a dedicated Gateway continuously monitors the network performance and reliability, with a transparent and public contributed high quality testing datasets.
| User Role | As a … | I want to … | So that I can … |
|---|---|---|---|
| Orchestrator | Enroll and monitor my GPU capacity on the Livepeer Network | Know real-time SLA compliance and competitive positioning | optimize service competitiveness through SLA visibility |
| Gateway Provider | Operate a public gateway utility and load test tool | Validate GPU reliability and feed metrics into network analytics | select the best resources for the workload requests |
| Inference Provider (e.g. Daydream) | Deploy AI workflows and view service SLA data | Ensure their inferences meet industry-leading latency and cost targets | have full confidence of underlying infra, and its SLAs |
| Community Observer / Researcher | Access public dashboard and APIs | Monitor network health and transparency of decentralized GPU performance | have a comprehensive view of what NaaP is and how NaaP is doing |
| Core Engineer | Validate metrics pipeline integrity to ensure the network meets published SLAs | Know the QoS of the network before committing engineering efforts | reduce the overall infra risk |