Prioritisation

What do you want to be able to do?

Access a data sample for each dataset in the catalogue. These data need not be (and shouldn’t be) real data, but they should conform to the schema that the real dataset has.

Why is this important?

This would allow researchers to get a good understanding of the data they will have access to before going through the data access request process.

Any suggestions for how we could solve this?

The attributes associated with different records could be permuted and a small subset be made available for variables that are not particularly sensitive. This randomised dataset would have minimal privacy risks (although they should be evaluated on a case by case basis, of course).

LE: I would like to be able to access a data sample (not real data) for each dataset in the catalogue so that I can get a good understanding of the data that I’ll have access to before going through the request process

DS: I agree! It would also be helpful to have a link to any open data that is produced from the same underlying source.