Documentation of data format for statistics

<aside> 💡 This guide explains how to organize your statistical data to work with modern visualization tools. It focuses primarily on multi-dimensional, timeseries data – that is data such as “What was the Population of 20-30 year old Women in India 1974”.

The guide uses concepts and best practices from the world’s of Linked Open Data and Business Intelligence. For example concepts like Measure, Dimension, Cube and Observation. It is not necessary to understand these concepts fully to produce an accurate dataset. The theoretical background is available here: https://www.w3.org/TR/vocab-data-cube/

</aside>

Here’s a complete example of what we’re creating:

https://github.com/datastory-org/sample-dataset

Preparing your statistical data – in a few simple steps 🚀

To understand your data we need to know:

What different things are you measuring? **(**Also known as “variables” or “indicators”) ⇒ Store this in measures.csv.
What dimensions are you using? ⇒ Store this in dimensions.csv
What’s the statistical data? ⇒ Store this in different Cube files based on dimensions. ⇒ For example cube-ageGroup-country-year.csv
measures.csv

Create a file that lists your measures. Use “kebab-case” for IDs.

<aside> 💡 Tips:

Texts can be stored in multiple languages using @your-language.
Mandatory fields are marked with *****

</aside>

id *	name@en *	name@sv	description@en	unit
income-per-person	Income per person	Inkomst per person	Per capita income is national income divided by population size.	dollar
life-expectancy	Life Expectancy	Förväntad livslängd	Life expectancy is the key metric for assessing population health. It tells us the average age of death in a population.	years

2. dimensions.csv

Create a file that lists your dimensions. Use “camelCase” for IDs.

Also assign a name and a “dataType” to your dimensions.

id *	name@en *	name@sv	dataType *
ageGroup	Age group	Åldersgrupp	string
country	Country	Land	entity
gender	Gender	Kön	entity
date	Date	Datum	date
year	Year	År	year