What’s happening with Healthy Ride?

By Jackson Whitmore

The below analysis provides a quick look at the first data released by Healthy Ride Pittsburgh, a bike share system operated by Pittsburgh Bike Share. The system opened at the end of May 2015. This analysis evaluates data from the first quarter of the system’s operations (May 30 to Sep 30). Additional data will be released quarterly.

System Overview

From 2015/06/30 to 2015/09/30 a total of 40,083 trips were taken. These trips were broken down into the following categories: Customer, Daily, Subscriber.

The total number of trips made by each user category over the course of the time period are depicted in the chart below.


As we can see, almost all trips were made using either a Subscriber or Customer pass. After a quick look at the Healthy Ride website, it appears that customer passes represent pay-as-you-go riders. Subscriber passes represent users who have purchased a recurring plan that allows for unlimited 30 or 60 minute trips, depending on the plan.

Daily passes were hardly used with only 39 rides taken. Interestingly, there does not appear to be a corresponding category for these passes on the Healthy Ride website. A quick check of the hypothesis that they may have been a promotional pass offered during the opening of the system is quickly discredited since the passes were used from 2015/07/04 to 2015/09/23. It would be interesting to find out exactly what these trips represent.

Finally, there were some records which, for whatever reason, did not have a user type recorded.

Temporal Aspects of Trips

As bike share programs are becoming more and more popular across the country, urban policy analysts and planners are attempting to identify their effect on the movement of people in and around cities. While a full analysis into the usage patterns of Healthy Rider users is beyond the scope of this document, a few quick plots can help us get a feel for how the system was used during its first few months in existence.

The below histograms depict the total number of trips taken by trip duration in 10 minute increments for each user type. Given the extremely small number of daily pass users, they were dropped from the analysis along with trips with no user type. Furthermore, trips over 500 minutes long were eliminated as they do not represent the riding patterns of the vast majority of users.


It is immediately evident that riders making use of a subscription pass take many more short trips than those making use of a pay-as-you-go pass. This intuitively makes sense because these customers are only charged for trips over 30 or 60 minutes. Essentially, they are reducing their overall cost per trip with each additional trip they make within their limit. Additionally, we would expect someone signing up for a subscription to anticipate using the pass repeatedly, most likely for short trips such as a leg of their commute or to complete an errand.

However, these numbers are aggregated over months so let’s take a look at them for each day of the week.


It is interesting to note that, in the aggregate, it appears that the number of subscription pass holders using the system for short trips remains relatively stable. There is a subtle increase in the number of very short trips being made by these users in the second half of the work week. Overall, this fits with our hypothesis that these riders planned on frequently using the system.

Riders with a customer pass see a large increase in trips made across all trip durations on Saturday and Sunday. Again, this makes sense as these users are most likely making use of the system for “leisure” rides which they may not make during the work week.

Let’s drill down some more and see how total system usage varied not just by day but by time of day as well.


The above chart shows average (mean) trips throughout the system by time of day for each day of the week. We can see that users of the customer pass mostly use the bikes on weekend afternoons while users of the subscriber pass maintain a more consistent level of usage throughout the week. This fits with our previous hypotheses about how each type of pass holder utilizes the system. Finally, there is a spike in subscriber pass usage at the end of the workday indicating that for these users the bike share is favored more as a means of transportation after work than before.

Now that we have a general feel for the temporal characteristics of the system, we will take a look at how it is used spatially.

Spatial Aspects of Trips

The Healthy Ride system’s capacity is distributed throughout Pittsburgh but appears clustered in the Downtown area and then relatively segmented throughout the rest of the Pittsburgh area.


Unsurprisingly, the system seems to be centered around Downtown Pittsburgh. Complementing the stations Downtown are a cluster of them on the North Shore, including one of the system’s largest stations. These were likely positioned to service the Heinz Field and PNC Park as well as the large park and rides on the North Side.

Of note is how the system exists in a relatively segmented state. The “segmenting” refers to the gaps of station availability between major Pittsburgh neighborhoods. For example, there are no stations connecting Downtown to East Liberty or Oakland. Furthermore, while South Oakland and the University of Pittsburgh are relatively well served by the system there is only one station near the Carnegie Mellon University campus.

These phenomena may be explained by the fact that the system was partly funded with federal Congestion Mitigation and Air Quality Improvement (CMAQ) funds. Funds distributed by this program are meant to reduce congestion and improve air quality via the reduction of car emissions. Thus, the current station placement schema was most likely developed in a manner which prioritized these criteria over system connectivity.

The map below explores how the usage of the Healthy Ride stations varies by hour for weekdays and weekends.


It appears that the majority of the system’s trips originate in the Downtown for all days of the week. South Side also shows up as an origination hot spot. These two areas see a higher increase in average trips over the weekend than other areas, such as Shadyside, which remain relatively stable.


Looking at average trips by hour for the work week we can see that system usage starts to pick up in the morning around 7am or 8am and then remains relatively constant throughout the course of the day. There are a large amount of trips made late at night in the Downtown and Oakland which may be a result of the large student populations in these areas. Either way, the larger than expected usage levels would likely prove interesting to investigate further.


Quite interestingly, it seems that less trips are made late at night on the weekends relative to the day than during the week. It is important to note the relativistic nature of this comparison since in absolute terms many more trips are made during the weekend than on the week. Otherwise, weekend use seems to follow the same spatio-temporal patterns as the week.

Final Thoughts

The system as a whole seems to be used in the way that one would expect. For instance, there is a higher level of higher weekend usage and the majority of trips are concentrated around major population/job centers such as Oakland and Downtown. However, there are some interesting aspects of the system’s current state such as the system’s preponderance of non-subscription users and the high level of late night weekday usage.

For future analyses, it would be interesting to examine the effect of weather on the system’s usage, look into which stations generate trips and which stations terminate them during the peak periods, and as more data is released look into capacity constraints within the system.

Jackson Whitmore is a Public Policy and Data Analytics student at Carnegie Mellon’s Heinz College. His interests lie at the intersection of data and cities, specifically the transportation systems that support them.

Want to host open data? You can with CKAN!

By Matt Cleinman

If you visit many government open data websites, you may notice that they all start to look very, very similar.  (For some examples, look at the UK national government, Washington DC, and our own Western Pennsylvania Regional Data Center.)  Your eyes are not going numb from looking at datasets – it’s that many are powered by CKAN.  


What is CKAN?  It’s a behind-the-scenes secret that helps make open data possible.  In their words:

CKAN is a powerful data management system that makes data accessible – by providing tools to streamline publishing, sharing, finding and using data. CKAN is aimed at data publishers (national and regional governments, companies and organizations) wanting to make their data open and available.

Even better, this web application is open source, meaning that anyone can see the sourcecode and add features for their implementation.  Even-even better, it is designed to easily incorporate extensions so that any organization that uses CKAN can add your feature.

As part of the Master of Information Systems Management degree from Heinz College, all students participate in a client-driven group capstone project in their final semester.  Being a member of SUDS, I was delighted that my team was assigned to work with the City of Philadelphia, as every other group had a large corporate client.

Tim Wisniewski, Philadelphia’s Chief Data Officer, had several exciting ideas for projects.  With our end-of-semester time constraint in mind, we chose a CKAN extension that would streamline Philadelphia’s open data workflow.  (Some of his other proposals will be tackled by future MISM capstone teams!)

CKAN is wonderful, but does not allow for data dictionaries (or “metadata”) to be stored for each dataset. Philadelphia currently handles this by using a separate system to track the data dictionaries.  Most datasets contain a link to the metadata mixed in with the links to the data – and those links go to the metadata server.

What is metadata? If you’ve ever asked someone what column D in a spreadsheet represented, you have asked for metadata – it’s the information about the data.  “Column D is the number of clients impacted by the project described in Column A.  It should be a positive integer.”

pic2Our challenge: Learn about CKAN development and write an extension that allows native handling of data dictionaries.  Great documentation is available, but CKAN is a fairly complex system.  It uses Jinja for the frontend, Python on the backend, a PostgreSQL database, and many more technologies.  Luckily our team brought a diverse skillset to the project.

I’ll spare you the gory details, but we eventually got our extension working and tested.  Like CKAN itself, the extension is open-source, and we’ve been excited by the interest in it so far.  You can view our extension on GitHub.

CKAN was a perfect project for us: Large enough to be complex and somewhat bewildering at first, but understandable enough to be able to deliver the final product.  It stretched our skills, but in a manageable way.  For individuals looking to push their web app development abilities, consider contributing to CKAN – sharpen your skills while contributing to the open data movement!

Matt Cleinman is a recent grad of the Heinz College MISM program (’15). While writing this post, he realized he never actually learned why the application is named CKAN.

Learn more about SUDS: Email preference survey

If you’d like to learn more about SUDS or are interested in joining our permanent mailing list, please fill out the following 4-question form where you’ll enter you name, email, and contact preferences.

City bikers and weekend riders

By Lauren Renaud

At a SUDS Hack Night in October, I was introduced to Tableau for the first time and Pittsburgh Bike Share  released their data from the first six months of operation. So naturally, I combined these two opportunities. First I looked at what time of day riders took out a bike. I was surprised to find roughly a bell curve (left figure below) — I thought you’d see more of a peak at morning and evening commute times. So I looked at the days count, and found fewer riders Monday to Friday than on the weekends (right figure below). In this plot, the horizontal color changes indicate counts per hour of day.

bike data - all

This seemed to indicate more casual weekend riders than I had anticipated. So I then broke out the days of the week by subscribers — monthly pass holders — against “customers”, or people picking up a bike for just one ride (right figure below). Now I saw a distinct difference — single rider customers had many more rides on the weekends, while subscribers took bikes out throughout the week, mostly during the work week. I did the same thing for start times and saw the morning and evening peaks that I expected in the subscriber trends, while single ride customers were likely to ride any time between noon and 7pm (figure left below).

bike data - split

Then lastly I broke out weekdays versus weekend (figure below), where you can see slight peaks for even non-subscribers at commute hours, and you can see a slightly more pronounced peak at lunchtime for subscribers.

bike data - weekend

I’m excited to work in Tableau more and delve to a little deeper. Fun start!


Lauren Renaud is a Public Policy student at Carnegie Mellon’s Heinz College seeking to combine her background in social justice with with her growing policy and data analysis skill set. She’s also a cyclist and transit rider who likes maps and exploring Pittsburgh. You can find some of Lauren’s other projects at www.laurenrenaud.com