Status: Draft → Review → Approved

Authors: Evan | Qiang | Doug | Rick | Cloud SPE | Network Advisory Board

Timeline: 2 Weeks (Complete by End of November)

1. Overview & Purpose

This RFC defines the Design Spec for Milestone 1.0 (NaaP MVP) — to deliver a publicly observable SLA reporting system for the Livepeer Network-as-a-Product (NaaP).

It establishes a unified set of GPU and Network metrics, technical architecture, and API interfaces that enable real-time monitoring of Orchestrator performance and network demand across geographies and workflows.

This RFC establishes the data foundation for self-adaptive scaling and SLA-based orchestration in Milestone 2 by standardizing the telemetry schema and analytic models introduced here.

Success Criteria

Deliverable #1:

A dashboard that displays the core metrics (See Section 4) from the whole AI Network.

User story: As a community member, I can use this dashboard to learn for any specific O, and workflow, how is performing historically, and in realtime, defined by the well-defined set of metrics in Section 4.

Deliverable #2:

A MVP Gateway that specializes in executing the “Test Loads”, to monitor the network performance metrics.

User story: As a community member, I have confidence in the overall network performance, because a dedicated Gateway continuously monitors the network performance and reliability, with a transparent and public contributed high quality testing datasets.

2. User Stories

User Role As a … I want to … So that I can …
Orchestrator Enroll and monitor my GPU capacity on the Livepeer Network Know real-time SLA compliance and competitive positioning optimize service competitiveness through SLA visibility
Gateway Provider Operate a public gateway utility and load test tool Validate GPU reliability and feed metrics into network analytics select the best resources for the workload requests
Inference Provider (e.g. Daydream) Deploy AI workflows and view service SLA data Ensure their inferences meet industry-leading latency and cost targets have full confidence of underlying infra, and its SLAs
Community Observer / Researcher Access public dashboard and APIs Monitor network health and transparency of decentralized GPU performance have a comprehensive view of what NaaP is and how NaaP is doing
Core Engineer Validate metrics pipeline integrity to ensure the network meets published SLAs Know the QoS of the network before committing engineering efforts reduce the overall infra risk