City bikers and weekend riders

By Lauren Renaud

At a SUDS Hack Night in October, I was introduced to Tableau for the first time and Pittsburgh Bike Share  released their data from the first six months of operation. So naturally, I combined these two opportunities. First I looked at what time of day riders took out a bike. I was surprised to find roughly a bell curve (left figure below) — I thought you’d see more of a peak at morning and evening commute times. So I looked at the days count, and found fewer riders Monday to Friday than on the weekends (right figure below). In this plot, the horizontal color changes indicate counts per hour of day.

bike data - all

This seemed to indicate more casual weekend riders than I had anticipated. So I then broke out the days of the week by subscribers — monthly pass holders — against “customers”, or people picking up a bike for just one ride (right figure below). Now I saw a distinct difference — single rider customers had many more rides on the weekends, while subscribers took bikes out throughout the week, mostly during the work week. I did the same thing for start times and saw the morning and evening peaks that I expected in the subscriber trends, while single ride customers were likely to ride any time between noon and 7pm (figure left below).

bike data - split

Then lastly I broke out weekdays versus weekend (figure below), where you can see slight peaks for even non-subscribers at commute hours, and you can see a slightly more pronounced peak at lunchtime for subscribers.

bike data - weekend

I’m excited to work in Tableau more and delve to a little deeper. Fun start!


Lauren Renaud is a Public Policy student at Carnegie Mellon’s Heinz College seeking to combine her background in social justice with with her growing policy and data analysis skill set. She’s also a cyclist and transit rider who likes maps and exploring Pittsburgh. You can find some of Lauren’s other projects at



Do Pittsburgh’s buses bunch?

by Mark Egge, Ranjana Krishnamoorthy, Bhavna Prasad, Enbo Zhang, and Rohita Kamath

As bus riders, we wanted to know what trends of bus service levels can be learned from the data on bus locations. Like many mass transit systems, the Port Authority of Allegheny County (the entity that operates the bus system that services Pittsburgh and the surrounding vicinity) publishes real-time information about the locations of its vehicles in service. This information can be accessed through the Port Authority website (, or through various third-party apps and websites (e.g. Unfortunately for would-be analysts, the Port Authority does not publish any historical bus location data. That is, the data published by the Port Authority cannot be used to answer questions about historical service delivery patterns.

In particular, we wanted to know if buses bunch, or cluster. We’ve observed, anecdotally, that the wait between bus arrivals can sometimes be much longer than scheduled, and that after a long delay, the buses often show up in pairs or even triples.

Building a data warehouse

To answer this question, we obtained an API key from the Port Authority (which allows Port Authority data to be retrieved in XML or JSON format) and built a data warehouse to capture and record the real-time bus location information. We capture the location of all buses on the 61A/61B/61C/61D routes once every sixty seconds. Additionally, to investigate if service levels are impacted by weather, we also capture the concurrent weather conditions (via WeatherUnderground’s developer API).

The data is retrieved from the Port Authority in XML format. We use Microsoft SQL Server Integration Services (SSIS) to extract, transform, and load this data into a data warehouse that captures historic bus location information. In addition to the vehicle location information, we also load in other dimensions useful for analysis, including the routes and the patterns (sequences of stops and waypoints) that constitute a route.  For those connected to the CMU network, you can access our database by following these instructions.

How bad is the bunching?

Our data substantiates our anecdotal observations. We see that buses do often end up travelling in bunches of two or more buses. The graph below shows the progress of buses along their routes from their downtown departure (at the bottom of the graph) to their arrival at Hamburg Hall (approximately 18,000 along the path from downtown on 61 bus routes). Each ascending line represents one vehicle. The horizontal distance between lines (at any fixed distance) shows the wait time between buses at that location. When lines are close together or overlapping, they represent buses that are clustered together (or, bunched).


As a result of bunching the average time between bus arrivals varies greatly. The box and whiskers plot below shows wait times by hour of the day (at the outbound bus stop located in front of Hamburg Hall). Wait times are minimized (and have the least variance) just before rush hour (from 7:00 to 8:00 am, and from 4:00 pm to 5:00 pm), and on weekends and holidays. Wait times exhibit the greatest variance during the 6:00 – 7:00 pm hour.


Problems and solutions

Bunching is a widely-observed and well-documented transportation phenomenon (see this great visual explanation). Bunching is caused when a leading bus is delayed (such as a rush-hour crowd, or loading a bike), causing more riders than average at subsequent stops. If the trailing bus does not experience the same delay it will have fewer than average riders. The phenomenon continues until buses end up operating in pairs or groups of three.

Creating more slack in the system reduces the frequency and severity of bunching, but requires more buses, operators, or longer ride times. Real-time GPS tracking opens a window of opportunity of reducing bunching through better coordination. If a trailing bus is notified of a delay encountered by the proceeding bus, it could reduce its travel speed to avoid having fewer-than-average riders at subsequent stops. Unfortunately, such remedies only function when a bus operator has some discretion in travel speed, which is seldom possible in Pittsburgh’s historic, narrow streets.


Mark Egge is a data analyst with a background in healthcare operations and entrepreneurship. He balances his work in GIS, data mining, and health information exchange with an abiding love of playing outside and exploring the natural world.

Ranjana Krishnamoorthy is a graduate student of the master of information systems management program at Carnegie Mellon. University. She loves working with data and is passionate about exploring how technology can be used to improve a business, its management and processes.

Bhavna Prasad recently completed her Masters in Information Systems Management from the Heinz College at Carnegie Mellon University. She is passionate about technology and has a strong penchant to mold raw data into key business drivers for product decisions.

Enbo Zhang is a Public Policy & Management student and graduate teaching assistant at Carnegie Mellon’s Heinz College.

Rohita Kamath is a Summer MISM graduate student in Heinz college. She has previously worked with Deloitte Consulting for 4 years. She loves working on technology and has worked in SAP practice. She loves the idea of using data and technology for the improvement of management in the companies.

Emojis of Pittsburgh

by Dan Tasse and Jennifer Chou

What do people in Squirrel Hill talk about?

Or, more interestingly, what do people in Squirrel Hill talk about that people in other neighborhoods don’t? What is it that makes Squirrel Hill Squirrel Hill? That’s the question we set out to answer with this project.


Most frequently tweeted words in each Pittsburgh neighborhood

How it works

We gathered all tweets geotagged in Pittsburgh over about a year, from December 2013 to January 2015. We sorted them by neighborhood (using boundaries provided by the WPRDC) and used a modified TF-IDF algorithm to figure out what words were specific to each neighborhood. This algorithm counts the frequency of a word in a given neighborhood, and then adjusts the word’s final score based on how many other neighborhoods also use that word.

For example, “Steelers” is used a lot in Squirrel Hill, but it’s also used in many other neighborhoods, so it has a pretty low score. “Tunnel”, however, is quite popular in Squirrel Hill (mostly due to people grousing about tunnel traffic), but not elsewhere. Similarly, “10a” is a popular bus used to get around Pitt, but isn’t used elsewhere, so “10a” shows up a lot in Oakland.


Tweets referencing the “10a” bus

An emoji is worth…

These words just represent what people are talking about on Twitter. What are people feeling? To answer that question, we looked to the emojis people are tweeting. Emojis are an interesting new form of communication: one character can often say more than a word, so they can tell us about where people like to do certain things, or maybe even how people feel.


Top emojis in each ‘hood

For example, we can see that the zoo is up in Highland Park, and that people like watching baseball and football and drinking beer on the North Shore. Obvious enough. But did you know how popular the swimming pool in Oakland is, or the Christmas tree lighting downtown?

Future work, and so what?

There’s still work to do, of course. One major challenge is algorithmic: How do we combine these posts from multiple people into a representative aggregate? A lot of these words/emojis are boosted by one person tweeting them multiple times. We don’t want one person to dominate the neighborhood’s tweets, but we do want an avid basketball fan to count more than someone who just tweeted about basketball once.

We hope this is the first step towards useful neighborhood guides. Imagine if you were moving to Pittsburgh for the first time, and looking for the right area to live in. Knowing that Squirrel Hill South has a lot of basketball fans, or that the top words in Lawrenceville are trendy bars or music venues, could really help you get a feel for the city and its many unique neighborhoods.

Try it out!

(Be patient; it’s on a free server so it’ll be a little slow.) And send any feedback or ideas to

Dan Tasse is a PhD student in Human-Computer Interaction at CMU. He’s interested in how we can use social media posts to help people understand their cities and neighborhoods better.

Jennifer Chou is an undergraduate studying Computer Science at CMU.

Energy for all in Nigeria

by Madeleine Gleave

Nigeria’s energy poverty crisis

Like many developing countries, Nigeria is facing an energy poverty crisis. The International Energy Agency (IEA) estimates that nearly 1.3 billion people globally lack access to electricity, and about half of these people live in Africa. Energy poverty has crippling side effects; no electricity also means no access to safer and healthier electric cooking and heating, powered health centers and refrigerated medicines, light to study at night, or electricity to run a business. In Nigeria, the average level of access is only 53%.

Despite being rich in natural resources required to produce energy, such as oil and gas, Nigeria’s energy infrastructure is lacking. Many people live near power plants and transmission lines, but aren’t yet connected to the grid. Others are in very remote areas where off-grid solutions, such as solar panels, may help them generate their own electricity long before a power line reaches them.

To explore this problem, I created a StoryMap in ArcGIS that shows the highly disparate levels of electricity access, energy demand, and infrastructure across Nigeria.

Check out the full StoryMap here:

Identifying the best electricity access solution

As Nigeria and its development partners look for energy access expansion solutions, how can they choose the best intervention for the best region? Where should they target grid connections, grid expansion, or off-grid solutions?

Selecting the best approach from this set of solutions depends on the context of the specific geographic area, and is influenced by existing levels of access, proximity to existing lines and power plants, level of urban development, demographic characteristics, and income levels. I developed a composite energy access index, mapped using a kernel density heat map, to evaluate an area’s suitability for each type of access intervention. The higher the index score (the red areas on the heat map seen here), the more suited the area is to grid supply. The lower the index score (pale yellow), the more suitable for off-grid power. Mid-range scores (the orange and dark yellow areas) are good candidates for grid expansion.


Madeleine Gleave is a Public Policy and Management student in the Heinz College at CMU. She is particularly passionate about using data to improve planning, management, and evaluation in international development policy and programs.

Why did Pittsburgh survive the housing slump?

by Nick Kharas and Emily Sasse

The Stability of Pittsburgh’s Property Market

Pittsburgh is known to have one of the most stable property markets in the United States. The city has not had a housing recession. It is safe from housing bubbles for a few reasons:

  • Land Value Tax – Historically, Pittsburgh’s taxation policies encouraged productive land use and steadied its housing market. The city taxed the value of land at a higher rate, and the value of buildings and improvements at a lower rate. Productive investors could maximize their after-tax return on investment, while speculating on idle land was not lucrative. However, this split-rate tax was discontinued in 2001.
  • Available Space – Unlike larger cities, Pittsburgh is not constrained in building space or in growing outward.

It is important to note that there has been a drop in property value in recent times. Home values have fallen over the last year. However, this does not yet indicate that there is a bubble in the property market.

  • Owners of high-quality real estate want to hold on to their property and not sell. Additionally, lower interest rates since 2010 encourage owners to hold on to low mortgage rates.
  • According to Mr. Hanna of Howard Hanna Real Estate Services, first time buyers are finding it hard to get a mortgage, and millennial buyers who want to be flexible and not be in Pittsburgh forever will not want to buy real estate in the city.
  • Home values are down in some neighborhoods, but are rising in neighborhoods like Lawrenceville. This is primarily because of the thriving restaurants, shops and jobs in the east side of the city.

To validate these views, we analyzed Pittsburgh property sales over the last four years (January 2012 to November 2015) found on the WPRDC open data portal. We can see that a majority of the property parcels involved in transactions are owned by individuals rather than corporations. This can give us a good indication of the housing market in Pittsburgh.image1We also decided to look at the changes in the median property values in Pittsburgh over the last four years. We selected the median instead of the mean, as there are a few transactions in 2012 and 2013 with extremely high property values, and these outliers are adding unwanted bias to the average property values. Thus, the median is a good indicator of the common trend in Pittsburgh housing.


The data does seem to support the points mentioned already. In 2013, the median property value rose by only 6%, while in 2014, we noticed a negligible fall by 0.31%. This does indicate the stability of the city’s property market. Also, we do notice that the median property value has fallen by about 10% in 2015. However, it does not give sufficient evidence to conclude the presence of a real estate bubble in the city.

In parallel, we also decided to check if the federal rates (retrieved from were having any impact on Pittsburgh’s real estate market. We found that there was no statistically significant correlation between both. We calculated the below figures to reach our result:

Correlation Coefficient -0.16
Significance test – p-value 0.25

The probability of the property values being correlated to federal interest rates by chance rather than statistical significance is less than 0.25. For the correlation to be statistically significant, this probability should have been less than 0.05.image3

Next Steps

This project focuses on property sales in Pittsburgh. Going forward, future work involves continuing to monitor property sales and relevant indicators in the county, to determine whether or not current and historical trends continue. This will enable informed and successful future policy decisions.

In addition to Pittsburgh, it would be interesting to extend this research to Allegheny County and surrounding counties, or the nation overall. This type of analysis would provide key insight into the Pittsburgh real estate market. It would reveal the performance of the Pittsburgh real estate market in comparison to real estate markets in similar cities across the United States.


Nick Kharas is pursuing a Masters in Information Systems Management at Carnegie Mellon University. He has a deep focus on emerging technologies in business intelligence (BI), advanced analytics and data science. Prior to this, he was a data warehousing professional at a Japanese multinational financial holding company. When not a data nerd, he enjoys travelling or just meeting new people. You can connect to Nick at

Emily Sasse is pursuing a Masters in Public Policy and Management: Data Analytics. During her time at Heinz, she has developed a keen interest in the study of business intelligence and data analytics. After graduation, she will join Accenture as a Digital Consultant in Boston, Massachusetts. Emily enjoys winter sports and exploring the east coast. You can connect to Emily at

The original project also had active contributions from Sridevi Yagati Venkateshdatta, Ranjani Padmanabhan and Jingwei Cao.