<aside> 💡 This guide explains how to organize your statistical data to work with modern visualization tools. It focuses primarily on multi-dimensional, timeseries data – that is data such as “What was the Population of 20-30 year old Women in India 1974”.
The guide uses concepts and best practices from the world’s of Linked Open Data and Business Intelligence. For example concepts like Measure, Dimension, Cube and Observation. It is not necessary to understand these concepts fully to produce an accurate dataset. The theoretical background is available here: https://www.w3.org/TR/vocab-data-cube/
</aside>
Here’s a complete example of what we’re creating:
https://github.com/datastory-org/sample-dataset
To understand your data we need to know:
What different things are you measuring? **(**Also known as “variables” or “indicators”) ⇒ Store this in measures.csv.
What dimensions are you using? ⇒ Store this in dimensions.csv
What’s the statistical data? ⇒ Store this in different Cube files based on dimensions. ⇒ For example cube-ageGroup-country-year.csv
measures.csv
Create a file that lists your measures. Use “kebab-case” for IDs.
<aside> 💡 Tips:
</aside>
**id *** | **name@en *** | name@sv | description@en | unit |
---|---|---|---|---|
income-per-person | Income per person | Inkomst per person | Per capita income is national income divided by population size. | dollar |
life-expectancy | Life Expectancy | Förväntad livslängd | Life expectancy is the key metric for assessing population health. It tells us the average age of death in a population. | years |
2. dimensions.csv
Create a file that lists your dimensions. Use “camelCase” for IDs.
Also assign a name and a “dataType” to your dimensions.
**id *** | **name@en *** | name@sv | **dataType *** |
---|---|---|---|
ageGroup | Age group | Åldersgrupp | string |
country | Country | Land | entity |
gender | Gender | Kön | entity |
date | Date | Datum | date |
year | Year | År | year |