After having reviewed and tested the alpha version of the Murmurations protocol (MP), here are some proposals to make it more robust, simpler and hopefully more widely adopted.
Data Structure
- Data needs to have context and a way to be validated
- Context is provided by defining what each term means - if the field name is
name
, then a definition should be provided (e.g., writing it out: the legal name of your organization
, or providing a link to the definition: https://schema.org/name
) - it's up to the schema author to decide how to give context to the fields
- Validation is enabled by providing rules to ensure input data conforms to expected values (e.g., type of input (string, number, etc.), minimum and/or maximum length, etc.)
- Both the context and validation rules can be contained in a JSON Schema: https://json-schema.org/ - there is no need for two separate files (like there currently is with
base.json
and base.context.jsonld
in the MP repo) - context can be provided by making use of the title
and description
annotations (more info here: https://json-schema.org/understanding-json-schema/reference/generic.html#annotations)
Schema Definitions
- Schemas need to follow the JSON Schema specification
- Schemas will be drafted by individuals independently or on behalf of an organization - anyone can define a schema - perhaps it is the aggregators who will be the ones providing most of the schemas, because they will be defining what data they want to aggregate about the various organizations they want to map/reference and enable their users to explore
- Each schema definition must be hosted somewhere (i.e., must have a URL where it can be viewed/downloaded) so that it can be referenced and used to validate any instances of data on a node that use that schema
Node
- Each node stores its own data (preferably in the root directory in a
murmurations.json
file (e.g., https://example.com/murmurations.json
) - by standardizing on this location, aggregators can attempt to find nodes without relying on an index
- Data is comprised of fields, objects and arrays that are validated using JSON Schema - the fields/objects/arrays are defined according to a Schema Definition
- A node references at least one Schema Definition against which the data can be validated (TODO: need to think about whether it is realistic to have multiple schema definitions in the same
murmurations.json
file - in theory it is provided all the fields/objects are provided in order for all of the schemas to validate correctly, but it needs more thought and eventually testing before assuming it is doable) - this would be a killer feature if it is possible so it needs further investigation/validation
Aggregator
- I was confused by the role of the aggregator initially, and now that I understand how it works in the alpha release, I am inclined to remove the networks list, which is simply just a list of add-on schemas. See below under the Index section for why there should only be one schema type.
- Aggregators can define schemas and they can requests nodes from the index who have referenced specific schemas. For example, a UK co-op mapping initiative may define a schema and then ask all of the UK co-ops to reference it on their node and provide the required data.
Index
- The big, unresolved question is "how much data should the index store about each node"? If too much is stored then the index risks becoming a "normative framework" and potentially alienating certain users, but if too little is stored then it risks becoming useless (i.e., aggregators have to crawl every single node to find out if they want to map them or not).
- Perhaps the best way to deal with this is store only (1) the
nodeUrl
, (2) the last updated
time and (3) the schemaUrl
(ideally more than one - the link to schema definition/s IOW). Aggregators can then filter initially based on a schemaUrl
. If I understand it correctly, this is essentially what is happening now anyway with the add-on schemas, so rather than having two types of schemas ("base" and "add-ons") just have one and keep everything simple.
- IOW, rather than doing what is done now in the alpha release ("Here's the base schema that you all have to use whether you want to or not, and then you can tack on some add-on schemas") just roll it all into one ("Just tell us which schemas you support and have added data for on your node").