Data Day 2016

by Eric Darsow

Our digital age birthed another unusual occurrence: a tabling event devoted entirely and exclusively to the idea of data. Organizations of all sizes and girths carted in maps made in several centuries, charts of dazzling design, and slews of glimmering screens. A 3D printing robot was even spotted spewing layers of plastic into cute shapes. Amid this flurry of patterns and coefficients, the Students for Urban Data Systems (SUDSers) teamed up with nerds from CMU’s CREATE LAB to referee the pesky spar between the number crunchers and the story tellers.

The so-called “numbers and narrative” divide is turning out to be a chasm of our own making. While the process of regressing a spreadsheet full of figures obviously lacks a well-told story’s emotional pin-pricks, cryptic tabular outputs can, in fact, add dimensions of extent and intensity to an issue first illuminated by a personal narrative.

For example, how are we to make sense of, say, a sudden drop in high school test scores without talking to some teenagers about their experience bubbling in answers to mind-numbing test questions? The other direction works, too: few folks would rebuff a decision to augment an angry biker’s story about getting run off the road by a texting driver with a map showing ten years of bike crashes in Pittsburgh.

The SUDS + CREATE exhibit facilitated a safe crossing of this oft-feared number/narrative gap by displaying a few statistics about a central topic—such as transportation—and then inviting folks to write and physically connect a story or question to an otherwise lonely and contextless number.

One attendee affixed a short story about his personal experience with skyrocketing housing prices in his home city of Seoul. Pinned and ready for connections, another visitor complemented the narrative account with satellite images (pixel data) showing the Korean capital’s stunning vertical growth since the mid-1980s. Adding some sky shots of Austin, Texas’s metastasizing suburbanization over the same time period couched the sky-high rent story into a global context.

Even young people (perhaps less demoralized by hours of myopic method design meetings) sense intuitively the value of a well-told story alongside a chart or graph. One 9 year-old who visited our station looked over a bar graph depicting the average number of bicycle crashes by hour of the day. After a few minutes of thinking and talking aloud about the bars and axes, he used a marker and construction paper to ask all future board viewers: Why are there so many more bike crashes at midnight than 4:00 am? With an average bed time of 9:15 pm for children under ten in the United States, his wonder was about as genuine as it comes.

An enthusiastic transplant to Pittsburgh, Eric explores how the computerization of society impacts our geographic communities, social landscapes, and work identities. Eric is eagerly wrapping up his grad program in information systems at CMU and actively balances his screen-based life with wood carpentry and trying his hand at “installation art.” He serves as SUDS’s Assistant Director of Outreach, and interned at the CREATE Lab last summer.

Want to host open data? You can with CKAN!

By Matt Cleinman

If you visit many government open data websites, you may notice that they all start to look very, very similar.  (For some examples, look at the UK national government, Washington DC, and our own Western Pennsylvania Regional Data Center.)  Your eyes are not going numb from looking at datasets – it’s that many are powered by CKAN.  


What is CKAN?  It’s a behind-the-scenes secret that helps make open data possible.  In their words:

CKAN is a powerful data management system that makes data accessible – by providing tools to streamline publishing, sharing, finding and using data. CKAN is aimed at data publishers (national and regional governments, companies and organizations) wanting to make their data open and available.

Even better, this web application is open source, meaning that anyone can see the sourcecode and add features for their implementation.  Even-even better, it is designed to easily incorporate extensions so that any organization that uses CKAN can add your feature.

As part of the Master of Information Systems Management degree from Heinz College, all students participate in a client-driven group capstone project in their final semester.  Being a member of SUDS, I was delighted that my team was assigned to work with the City of Philadelphia, as every other group had a large corporate client.

Tim Wisniewski, Philadelphia’s Chief Data Officer, had several exciting ideas for projects.  With our end-of-semester time constraint in mind, we chose a CKAN extension that would streamline Philadelphia’s open data workflow.  (Some of his other proposals will be tackled by future MISM capstone teams!)

CKAN is wonderful, but does not allow for data dictionaries (or “metadata”) to be stored for each dataset. Philadelphia currently handles this by using a separate system to track the data dictionaries.  Most datasets contain a link to the metadata mixed in with the links to the data – and those links go to the metadata server.

What is metadata? If you’ve ever asked someone what column D in a spreadsheet represented, you have asked for metadata – it’s the information about the data.  “Column D is the number of clients impacted by the project described in Column A.  It should be a positive integer.”

pic2Our challenge: Learn about CKAN development and write an extension that allows native handling of data dictionaries.  Great documentation is available, but CKAN is a fairly complex system.  It uses Jinja for the frontend, Python on the backend, a PostgreSQL database, and many more technologies.  Luckily our team brought a diverse skillset to the project.

I’ll spare you the gory details, but we eventually got our extension working and tested.  Like CKAN itself, the extension is open-source, and we’ve been excited by the interest in it so far.  You can view our extension on GitHub.

CKAN was a perfect project for us: Large enough to be complex and somewhat bewildering at first, but understandable enough to be able to deliver the final product.  It stretched our skills, but in a manageable way.  For individuals looking to push their web app development abilities, consider contributing to CKAN – sharpen your skills while contributing to the open data movement!

Matt Cleinman is a recent grad of the Heinz College MISM program (’15). While writing this post, he realized he never actually learned why the application is named CKAN.