What’s happening with Healthy Ride?

By Jackson Whitmore

The below analysis provides a quick look at the first data released by Healthy Ride Pittsburgh, a bike share system operated by Pittsburgh Bike Share. The system opened at the end of May 2015. This analysis evaluates data from the first quarter of the system’s operations (May 30 to Sep 30). Additional data will be released quarterly.

System Overview

From 2015/06/30 to 2015/09/30 a total of 40,083 trips were taken. These trips were broken down into the following categories: Customer, Daily, Subscriber.

The total number of trips made by each user category over the course of the time period are depicted in the chart below.


As we can see, almost all trips were made using either a Subscriber or Customer pass. After a quick look at the Healthy Ride website, it appears that customer passes represent pay-as-you-go riders. Subscriber passes represent users who have purchased a recurring plan that allows for unlimited 30 or 60 minute trips, depending on the plan.

Daily passes were hardly used with only 39 rides taken. Interestingly, there does not appear to be a corresponding category for these passes on the Healthy Ride website. A quick check of the hypothesis that they may have been a promotional pass offered during the opening of the system is quickly discredited since the passes were used from 2015/07/04 to 2015/09/23. It would be interesting to find out exactly what these trips represent.

Finally, there were some records which, for whatever reason, did not have a user type recorded.

Temporal Aspects of Trips

As bike share programs are becoming more and more popular across the country, urban policy analysts and planners are attempting to identify their effect on the movement of people in and around cities. While a full analysis into the usage patterns of Healthy Rider users is beyond the scope of this document, a few quick plots can help us get a feel for how the system was used during its first few months in existence.

The below histograms depict the total number of trips taken by trip duration in 10 minute increments for each user type. Given the extremely small number of daily pass users, they were dropped from the analysis along with trips with no user type. Furthermore, trips over 500 minutes long were eliminated as they do not represent the riding patterns of the vast majority of users.


It is immediately evident that riders making use of a subscription pass take many more short trips than those making use of a pay-as-you-go pass. This intuitively makes sense because these customers are only charged for trips over 30 or 60 minutes. Essentially, they are reducing their overall cost per trip with each additional trip they make within their limit. Additionally, we would expect someone signing up for a subscription to anticipate using the pass repeatedly, most likely for short trips such as a leg of their commute or to complete an errand.

However, these numbers are aggregated over months so let’s take a look at them for each day of the week.


It is interesting to note that, in the aggregate, it appears that the number of subscription pass holders using the system for short trips remains relatively stable. There is a subtle increase in the number of very short trips being made by these users in the second half of the work week. Overall, this fits with our hypothesis that these riders planned on frequently using the system.

Riders with a customer pass see a large increase in trips made across all trip durations on Saturday and Sunday. Again, this makes sense as these users are most likely making use of the system for “leisure” rides which they may not make during the work week.

Let’s drill down some more and see how total system usage varied not just by day but by time of day as well.


The above chart shows average (mean) trips throughout the system by time of day for each day of the week. We can see that users of the customer pass mostly use the bikes on weekend afternoons while users of the subscriber pass maintain a more consistent level of usage throughout the week. This fits with our previous hypotheses about how each type of pass holder utilizes the system. Finally, there is a spike in subscriber pass usage at the end of the workday indicating that for these users the bike share is favored more as a means of transportation after work than before.

Now that we have a general feel for the temporal characteristics of the system, we will take a look at how it is used spatially.

Spatial Aspects of Trips

The Healthy Ride system’s capacity is distributed throughout Pittsburgh but appears clustered in the Downtown area and then relatively segmented throughout the rest of the Pittsburgh area.


Unsurprisingly, the system seems to be centered around Downtown Pittsburgh. Complementing the stations Downtown are a cluster of them on the North Shore, including one of the system’s largest stations. These were likely positioned to service the Heinz Field and PNC Park as well as the large park and rides on the North Side.

Of note is how the system exists in a relatively segmented state. The “segmenting” refers to the gaps of station availability between major Pittsburgh neighborhoods. For example, there are no stations connecting Downtown to East Liberty or Oakland. Furthermore, while South Oakland and the University of Pittsburgh are relatively well served by the system there is only one station near the Carnegie Mellon University campus.

These phenomena may be explained by the fact that the system was partly funded with federal Congestion Mitigation and Air Quality Improvement (CMAQ) funds. Funds distributed by this program are meant to reduce congestion and improve air quality via the reduction of car emissions. Thus, the current station placement schema was most likely developed in a manner which prioritized these criteria over system connectivity.

The map below explores how the usage of the Healthy Ride stations varies by hour for weekdays and weekends.


It appears that the majority of the system’s trips originate in the Downtown for all days of the week. South Side also shows up as an origination hot spot. These two areas see a higher increase in average trips over the weekend than other areas, such as Shadyside, which remain relatively stable.


Looking at average trips by hour for the work week we can see that system usage starts to pick up in the morning around 7am or 8am and then remains relatively constant throughout the course of the day. There are a large amount of trips made late at night in the Downtown and Oakland which may be a result of the large student populations in these areas. Either way, the larger than expected usage levels would likely prove interesting to investigate further.


Quite interestingly, it seems that less trips are made late at night on the weekends relative to the day than during the week. It is important to note the relativistic nature of this comparison since in absolute terms many more trips are made during the weekend than on the week. Otherwise, weekend use seems to follow the same spatio-temporal patterns as the week.

Final Thoughts

The system as a whole seems to be used in the way that one would expect. For instance, there is a higher level of higher weekend usage and the majority of trips are concentrated around major population/job centers such as Oakland and Downtown. However, there are some interesting aspects of the system’s current state such as the system’s preponderance of non-subscription users and the high level of late night weekday usage.

For future analyses, it would be interesting to examine the effect of weather on the system’s usage, look into which stations generate trips and which stations terminate them during the peak periods, and as more data is released look into capacity constraints within the system.

Jackson Whitmore is a Public Policy and Data Analytics student at Carnegie Mellon’s Heinz College. His interests lie at the intersection of data and cities, specifically the transportation systems that support them.

City bikers and weekend riders

By Lauren Renaud

At a SUDS Hack Night in October, I was introduced to Tableau for the first time and Pittsburgh Bike Share  released their data from the first six months of operation. So naturally, I combined these two opportunities. First I looked at what time of day riders took out a bike. I was surprised to find roughly a bell curve (left figure below) — I thought you’d see more of a peak at morning and evening commute times. So I looked at the days count, and found fewer riders Monday to Friday than on the weekends (right figure below). In this plot, the horizontal color changes indicate counts per hour of day.

bike data - all

This seemed to indicate more casual weekend riders than I had anticipated. So I then broke out the days of the week by subscribers — monthly pass holders — against “customers”, or people picking up a bike for just one ride (right figure below). Now I saw a distinct difference — single rider customers had many more rides on the weekends, while subscribers took bikes out throughout the week, mostly during the work week. I did the same thing for start times and saw the morning and evening peaks that I expected in the subscriber trends, while single ride customers were likely to ride any time between noon and 7pm (figure left below).

bike data - split

Then lastly I broke out weekdays versus weekend (figure below), where you can see slight peaks for even non-subscribers at commute hours, and you can see a slightly more pronounced peak at lunchtime for subscribers.

bike data - weekend

I’m excited to work in Tableau more and delve to a little deeper. Fun start!


Lauren Renaud is a Public Policy student at Carnegie Mellon’s Heinz College seeking to combine her background in social justice with with her growing policy and data analysis skill set. She’s also a cyclist and transit rider who likes maps and exploring Pittsburgh. You can find some of Lauren’s other projects at www.laurenrenaud.com



Do Pittsburgh’s buses bunch?

by Mark Egge, Ranjana Krishnamoorthy, Bhavna Prasad, Enbo Zhang, and Rohita Kamath

As bus riders, we wanted to know what trends of bus service levels can be learned from the data on bus locations. Like many mass transit systems, the Port Authority of Allegheny County (the entity that operates the bus system that services Pittsburgh and the surrounding vicinity) publishes real-time information about the locations of its vehicles in service. This information can be accessed through the Port Authority website (PortAuthority.org), or through various third-party apps and websites (e.g. pitlivebus.com). Unfortunately for would-be analysts, the Port Authority does not publish any historical bus location data. That is, the data published by the Port Authority cannot be used to answer questions about historical service delivery patterns.

In particular, we wanted to know if buses bunch, or cluster. We’ve observed, anecdotally, that the wait between bus arrivals can sometimes be much longer than scheduled, and that after a long delay, the buses often show up in pairs or even triples.

Building a data warehouse

To answer this question, we obtained an API key from the Port Authority (which allows Port Authority data to be retrieved in XML or JSON format) and built a data warehouse to capture and record the real-time bus location information. We capture the location of all buses on the 61A/61B/61C/61D routes once every sixty seconds. Additionally, to investigate if service levels are impacted by weather, we also capture the concurrent weather conditions (via WeatherUnderground’s developer API).

The data is retrieved from the Port Authority in XML format. We use Microsoft SQL Server Integration Services (SSIS) to extract, transform, and load this data into a data warehouse that captures historic bus location information. In addition to the vehicle location information, we also load in other dimensions useful for analysis, including the routes and the patterns (sequences of stops and waypoints) that constitute a route.  For those connected to the CMU network, you can access our database by following these instructions.

How bad is the bunching?

Our data substantiates our anecdotal observations. We see that buses do often end up travelling in bunches of two or more buses. The graph below shows the progress of buses along their routes from their downtown departure (at the bottom of the graph) to their arrival at Hamburg Hall (approximately 18,000 along the path from downtown on 61 bus routes). Each ascending line represents one vehicle. The horizontal distance between lines (at any fixed distance) shows the wait time between buses at that location. When lines are close together or overlapping, they represent buses that are clustered together (or, bunched).


As a result of bunching the average time between bus arrivals varies greatly. The box and whiskers plot below shows wait times by hour of the day (at the outbound bus stop located in front of Hamburg Hall). Wait times are minimized (and have the least variance) just before rush hour (from 7:00 to 8:00 am, and from 4:00 pm to 5:00 pm), and on weekends and holidays. Wait times exhibit the greatest variance during the 6:00 – 7:00 pm hour.


Problems and solutions

Bunching is a widely-observed and well-documented transportation phenomenon (see this great visual explanation). Bunching is caused when a leading bus is delayed (such as a rush-hour crowd, or loading a bike), causing more riders than average at subsequent stops. If the trailing bus does not experience the same delay it will have fewer than average riders. The phenomenon continues until buses end up operating in pairs or groups of three.

Creating more slack in the system reduces the frequency and severity of bunching, but requires more buses, operators, or longer ride times. Real-time GPS tracking opens a window of opportunity of reducing bunching through better coordination. If a trailing bus is notified of a delay encountered by the proceeding bus, it could reduce its travel speed to avoid having fewer-than-average riders at subsequent stops. Unfortunately, such remedies only function when a bus operator has some discretion in travel speed, which is seldom possible in Pittsburgh’s historic, narrow streets.


Mark Egge is a data analyst with a background in healthcare operations and entrepreneurship. He balances his work in GIS, data mining, and health information exchange with an abiding love of playing outside and exploring the natural world.

Ranjana Krishnamoorthy is a graduate student of the master of information systems management program at Carnegie Mellon. University. She loves working with data and is passionate about exploring how technology can be used to improve a business, its management and processes.

Bhavna Prasad recently completed her Masters in Information Systems Management from the Heinz College at Carnegie Mellon University. She is passionate about technology and has a strong penchant to mold raw data into key business drivers for product decisions.

Enbo Zhang is a Public Policy & Management student and graduate teaching assistant at Carnegie Mellon’s Heinz College.

Rohita Kamath is a Summer MISM graduate student in Heinz college. She has previously worked with Deloitte Consulting for 4 years. She loves working on technology and has worked in SAP practice. She loves the idea of using data and technology for the improvement of management in the companies.

SUDS tours Google: better cities through public-private data partnerships

Last Wednesday, SUDS visited Google’s Pittsburgh office, on the eve of its 10-year anniversary celebrations. We got to see their famous hammock room, Kennywood-themed hallways, and micro kitchens stocked according to behavioral science. But as jealous as we were of the nap pods, the best part of the visit was a talk by CMU Computer Science PhD alumna Sarah Loos on Google’s Better Cities project.

Cities face huge challenges in monitoring and managing their transportation infrastructure. In the US alone, $124 billion is wasted each year in traffic jams. The Better Cities team at Google has been piloting methodologies that match up cities’ transport data with aggregate, anonymized snapshots of historical traffic statistics in order to yield insight and solutions to nasty traffic problems.

For example, Google partnered with the City of Amsterdam to validate sensor readings on the A10 highway, which can tell when cars are slowing down (and thus, if a traffic jam might occur). The city can then analyze the data and change speed limits on its digital signs and take other measures to mitigate the jam’s impact. The physical sensors are really accurate, but also really expensive to install and maintain. Google found that by combining only some of the sensor data with representative models of aggregate data, it could detect the same traffic patterns with a high level of accuracy. By reducing the number of sensors needed in each stretch of road, Amsterdam’s government can save between 50,000-100,000 Euros per kilometer per year.


As cities are using more individual-level data from more sources, including public-private partnerships, Loos stressed the importance of keeping information anonymous and private. Her work is focusing on differential privacy algorithms, which add enough noise to the data to mask the influence of any one individual’s contribution to the set.

These pilot projects are an exciting example of how simple, but smart, data collaboration can improve city management. And Loos and her Google team are looking for new cities to partner with—we hope Pittsburgh will be one of them!