As my last post highlighted, I’ve been thinking about how we can find and discover datasets and their related APIs and services. I’m thinking of putting together some simple tools to help explore and encourage the kind of linking that my diagram illustrated.
There’s some related work going on in a few areas which is also worth mentioning:Within the UK Government Linked Data group there’s some work progressing around the notion of a “registry” for Linked Data that could be used to collect dataset metadata as well as supporting dataset discovery. There’s a draft specification which is open for comment. I’d recommend you ignore the term “registry” and see it more as a modular approach for supporting dataset discovery, lightweight Linked Data publishing, and “namespace management” (aka URL redirection). A registry function is really just one aspect of the model.There’s an Open Data on the Web workshop in April which will cover a range of topics including dataset discovery. My current thoughts are partly preparation for that event (and I’m on the Programme Committee)There’s been some discussion and a draft proposal for adding the Dataset type to Schema.org. This could result in the publication of more embedded metadata about datasets. I’m interested in tools that can extract that information and do something useful with it.
Thinking about these topics I realised that there are many definitions of “dataset”. Unsurprisingly it means different things in different contexts. If we’re defining models, registries and markup for describing datasets we may need to get a sense of what these different definitions actually are.
As a result, I ended up looking around for a series of definitions and I thought I’d write them down here.