High-Level Protocol Architecture

This page details the protocol that frontends can implement to interact with the backend.

Overview

The architecture is divided into one backend and multiple frontends. The backend is responsible for implementing almost all of the logic, data storage, communication with models, etc. while the frontends act largely as proxies that connect a particular user interaction domain with the backend. Examples for frontends are discord bots, web applications, telegram bots, etc. essentially anything that can interact with a human. Having the frontends act largely as proxies has the advantage that we do not need to implement the same logic many times for the different frontends, but as long as a frontend implements our communication protocol, it can join our network.

graph TD
  Database --- Backend
	ModelInference --- Backend
	S3 --- Backend
	.... --- Backend
	Backend -- Protocol --- DiscordBot1
	Backend -- Protocol --- DiscordBot2
	Backend -- Protocol --- Website1
	Backend -- Protocol --- ...

The Protocol

At the heart of the effort is the protocol. The protocol is designed to let any frontend communicate with the backend in a structured way. The frontend is responsible to facilitate this communication between the backend and the human in the way that is most appropriate for the particular channel the frontend runs on.

Example: One type of message could be to ask the user to rate some text from 1-5, the message might be something like (yaml for readability)

type: TextRatingTask
data:
	text: "Johnny was a good boy! Very good!"
	range:
		min: 1
		max: 5

A discord bot could implement this using reaction symbols, such as 1️⃣ 2️⃣ …, whereas a purely text-based frontend could prompt the user to respond with a number between 1 and 5.

<aside> 💡 Note that for the frontend, this means there is no need to implement complex logic & templates, this is all handled in the backend.

</aside>

Core Protocol Concepts

API Key: Each frontend gets a separate API key with which it authenticates with the backend using a bearer-token header. Never expose this key to the human users.
User ID: Each frontend must assign stable IDs to its users, and report the auth provider it uses for each. For example, a discord bot would report users with auth provider “discord” and user ID being the full user ID of the discord user. The same would go for a web frontend that authenticates users via discord’s OAuth. Valid auth providers for now are “discord” and “local”. The last one (local) is used if the frontend implements its own custom auth. The frontend should also send the display name of the user along (needed for leaderboards etc.).
Task: (may be sometimes referred to as “work package”) is a task given by the backend to either a user directly, or posted to the open community. Usually, a user would use the frontend to initiate a task, signaling to the backend that they want to do some work, then the backend would respond with a task. Examples of tasks are:
- Here is a conversation between the user and the assistant. Act as the assistant and write the next reply.
- Here is a set of prompts other people wrote. Please rank them by quality.
Post: A post refers to a thing a user posts in the flow of a conversation. This includes mostly text posts (i.e., I type a message and hit enter), but also image posts, file uploads, etc. essentially anything that counts as a separate “chat message”. Note that, for example, posting a single 👍 emoji would count as a post, but using the same emoji as a “reaction” to a post, would not count as a post. We’ll orient ourselves largely to what discord considers to be posts. We’ll also assume posts are immutable in their content. The frontend is also responsible for reading and sending the IDs of posts, for example the fully qualified post IDs in discord. A special case here is the “initial post” of a task. Upon receiving a task, the frontend always posts that task as a post (to tell the user what to do). The user handling that task will mostly be responding or reacting to that initial post.

<aside> 💡 For now, the protocol could support multi-post interactions, but we don’t implement that yet. The user is always asked to do 1 thing, i.e. to write a single next reply in a conversation, or to rank one set of replies. Then the current task will be done and the next task starts.

</aside>

Interaction: This is largely any action the user takes in response to a task. For example, posting a post is an interaction. Replying to a post with another post is an interaction. Providing a rating of something is an interaction, etc. The main types of interactions for now are
- Text reply: a user replies to the task post using a text message
- Rating: a user provides a rating of something, usually a piece of text
- Ranking: a user provides a ranking (a list of indices) of a set of options

<aside> 💡 The goal here is to make the frontends as stateless as possible. Interactions with users are tracked via the user IDs and post IDs (which the frontend, e.g. a discord bot, can determine at any time). The backend has more structures internally to pull these things together.

</aside>

The protocol’s core flow

The basic principle is always the same