Students for Urban Data Systems

at Carnegie Mellon University

Want to host open data? You can with CKAN!

| 0 comments

By Matt Cleinman

If you visit many government open data websites, you may notice that they all start to look very, very similar.  (For some examples, look at the UK national government, Washington DC, and our own Western Pennsylvania Regional Data Center.)  Your eyes are not going numb from looking at datasets – it’s that many are powered by CKAN.  

pic1

What is CKAN?  It’s a behind-the-scenes secret that helps make open data possible.  In their words:

CKAN is a powerful data management system that makes data accessible – by providing tools to streamline publishing, sharing, finding and using data. CKAN is aimed at data publishers (national and regional governments, companies and organizations) wanting to make their data open and available.

Even better, this web application is open source, meaning that anyone can see the sourcecode and add features for their implementation.  Even-even better, it is designed to easily incorporate extensions so that any organization that uses CKAN can add your feature.

As part of the Master of Information Systems Management degree from Heinz College, all students participate in a client-driven group capstone project in their final semester.  Being a member of SUDS, I was delighted that my team was assigned to work with the City of Philadelphia, as every other group had a large corporate client.

Tim Wisniewski, Philadelphia’s Chief Data Officer, had several exciting ideas for projects.  With our end-of-semester time constraint in mind, we chose a CKAN extension that would streamline Philadelphia’s open data workflow.  (Some of his other proposals will be tackled by future MISM capstone teams!)

CKAN is wonderful, but does not allow for data dictionaries (or “metadata”) to be stored for each dataset. Philadelphia currently handles this by using a separate system to track the data dictionaries.  Most datasets contain a link to the metadata mixed in with the links to the data – and those links go to the metadata server.

What is metadata? If you’ve ever asked someone what column D in a spreadsheet represented, you have asked for metadata – it’s the information about the data.  “Column D is the number of clients impacted by the project described in Column A.  It should be a positive integer.”

pic2Our challenge: Learn about CKAN development and write an extension that allows native handling of data dictionaries.  Great documentation is available, but CKAN is a fairly complex system.  It uses Jinja for the frontend, Python on the backend, a PostgreSQL database, and many more technologies.  Luckily our team brought a diverse skillset to the project.

I’ll spare you the gory details, but we eventually got our extension working and tested.  Like CKAN itself, the extension is open-source, and we’ve been excited by the interest in it so far.  You can view our extension on GitHub.

pic3
CKAN was a perfect project for us: Large enough to be complex and somewhat bewildering at first, but understandable enough to be able to deliver the final product.  It stretched our skills, but in a manageable way.  For individuals looking to push their web app development abilities, consider contributing to CKAN – sharpen your skills while contributing to the open data movement!

Matt Cleinman is a recent grad of the Heinz College MISM program (’15). While writing this post, he realized he never actually learned why the application is named CKAN.

Leave a Reply

Required fields are marked *.