When I pulled the over 5,000 datasets from 22 federal agencies after the implementation of OMB Memorandum M-13-13 Open Data Policy-Managing Information as an Asset, I was reminded how much work there is still to do around opening up government data. Overall I gave the efforts a C grade, because it seemed like agencies just rounded up a bunch of data laying around, published to meet the deadline, without much regard for the actual use or consumption of the data.
Even with these thoughts, I know how hard it was to round up this 5,000 datasets in government, and because of that I can't get these datasets out of my mind. I want to go through all of them, cleanup and share them back with each agency. Obvioiusly I can't do that, but what is the next best thing? As I ws walking the other day, I thought it would be a good to have a sort of adoption tool, so that anyone can step up and adopt a federal agency's dataset and improve it.
AS with most of my projects, it is hard to get them out of my head until I get at least a proof of concept up and running. So I developed Federal Agecny Dataset Adoption, and published it to Github. I published a JSON listing of the 22 federal agencies, and the data.json file that each agency published. Next I created a simple UI to browse the agencies, datasets, view details, distributions with the ability to "adopt" a dataset.
When you choose to adopt a federal agency dataset, the site authenticates with your Github account using oAuth, then creates a repository to house any work that will occur. Each dataset you adopt gets its own branch within the repository, a README file, and a copy of the datasets entry from it's agency's data.json file.
I would copy actual datasets to the repo, but many of the references are just HTML or ASP pages, and you have to manually look up the data. Each repo is meant to be a workspace, and users who adopt datasets can just update the data.json to point to any new data distributions that are generated. I will programmatically pull these updates and register with the master site on a regular basis.
The system is meant to help me track which datasets I'm working on, and if other people want to get involved, awesome! I envision it acting as a sort of distributed directory, in which agencies, and consumers of agency's data, can find alternate versions of federal government data. Additionally, data obsessed folks like me can clean up data, and contribute back to the lifecycle in a federated way, using our own Github accounts combined with a centralized registry.
As with my other side projects, who knows where this will go. I'd like to get some other folks involved, and maybe get the attention of agencies, then drum up with some funding so I can put some more cycles into it. If you are interested, you can get involved via the Federal Government Dataset Adoption Github repository.