Query Analytics Pipeline

A tool for deep query analytics, helping Indexers properly manage and scale infrastructure according to traffic and expected query volume.

Details:

Indexers operate Graph Nodes, indexing on-chain data and processing queries for dapps (Consumers). The indexing work is a heavy task and query time and performance are paramount for Indexers to meet dapp needs.

Graph Node logs information about each GraphQL query it executes. The Graph community is in need of a Query Analytics Pipeline tool that will help Indexers assess query patterns and optimize their infrastructure. The analytics pipeline should ingest these logs and summarize and present the data to provide deeper insight into the usage and performance of an indexer's infrastructure.

RFP Requirements:

Proposal to design and build an analytics pipelining tool, ingesting data from existing services (such as graph-node) and qlog (https://github.com/graphprotocol/qlog/), which may serve as baseline.

The resulting work must provide deep insight into the usage and performance of the indexer's infrastructure and common query patterns
The tool must analyze indexes in Graph Node, assessing which indexes are being queried or not, building and removing indexes to optimize the database that will improve the cost model
Tool should ingest logs and summarize the data about past queries efficiently, so the data can act as an input into query cost models
- Note: some of the raw analysis can currently be done with qlog but that only processes data, and is missing useful integration with analysis tools
The tool must compute different results taking into account different simulated query scenarios, based on types of subgraphs, query complexity, query volume, cost models etc.
Must be open-source and interoperable with different databases as much as possible

Deliverables:

Proposal document with high-level architecture and expected scenarios that will be assessed, highlighting major expected functionality
Conduct user studies of other Indexers as needed, to gather accurate data and information for assessing realistic query scenarios
Document all research and data for the related scenarios and analytics tooling
Correlate DB index config with query log data to facilitate index setup optimizations
Tool that efficiently ingests and analyzes query data
- A PoC of the tool demonstrating partial capabilities may be shared with The Graph Foundation and Indexer community for feedback prior to sharing accompanied by source code and used dataset.
Recommendations for each scenario for scaling infrastructure, managing query pipelines, indexing behavior and cost modelling
A basic UI for visualizing query performance (For example, Grafana, as its already a standard tool used by Indexers).
Final release: all documented source code pushed to GitHub