by Mark Egge, Ranjana Krishnamoorthy, Bhavna Prasad, Enbo Zhang, and Rohita Kamath
As bus riders, we wanted to know what trends of bus service levels can be learned from the data on bus locations. Like many mass transit systems, the Port Authority of Allegheny County (the entity that operates the bus system that services Pittsburgh and the surrounding vicinity) publishes real-time information about the locations of its vehicles in service. This information can be accessed through the Port Authority website (PortAuthority.org), or through various third-party apps and websites (e.g. pitlivebus.com). Unfortunately for would-be analysts, the Port Authority does not publish any historical bus location data. That is, the data published by the Port Authority cannot be used to answer questions about historical service delivery patterns.
In particular, we wanted to know if buses bunch, or cluster. We’ve observed, anecdotally, that the wait between bus arrivals can sometimes be much longer than scheduled, and that after a long delay, the buses often show up in pairs or even triples.
Building a data warehouse
To answer this question, we obtained an API key from the Port Authority (which allows Port Authority data to be retrieved in XML or JSON format) and built a data warehouse to capture and record the real-time bus location information. We capture the location of all buses on the 61A/61B/61C/61D routes once every sixty seconds. Additionally, to investigate if service levels are impacted by weather, we also capture the concurrent weather conditions (via WeatherUnderground’s developer API).
The data is retrieved from the Port Authority in XML format. We use Microsoft SQL Server Integration Services (SSIS) to extract, transform, and load this data into a data warehouse that captures historic bus location information. In addition to the vehicle location information, we also load in other dimensions useful for analysis, including the routes and the patterns (sequences of stops and waypoints) that constitute a route. For those connected to the CMU network, you can access our database by following these instructions.
How bad is the bunching?
Our data substantiates our anecdotal observations. We see that buses do often end up travelling in bunches of two or more buses. The graph below shows the progress of buses along their routes from their downtown departure (at the bottom of the graph) to their arrival at Hamburg Hall (approximately 18,000 along the path from downtown on 61 bus routes). Each ascending line represents one vehicle. The horizontal distance between lines (at any fixed distance) shows the wait time between buses at that location. When lines are close together or overlapping, they represent buses that are clustered together (or, bunched).
As a result of bunching the average time between bus arrivals varies greatly. The box and whiskers plot below shows wait times by hour of the day (at the outbound bus stop located in front of Hamburg Hall). Wait times are minimized (and have the least variance) just before rush hour (from 7:00 to 8:00 am, and from 4:00 pm to 5:00 pm), and on weekends and holidays. Wait times exhibit the greatest variance during the 6:00 – 7:00 pm hour.
Problems and solutions
Bunching is a widely-observed and well-documented transportation phenomenon (see this great visual explanation). Bunching is caused when a leading bus is delayed (such as a rush-hour crowd, or loading a bike), causing more riders than average at subsequent stops. If the trailing bus does not experience the same delay it will have fewer than average riders. The phenomenon continues until buses end up operating in pairs or groups of three.
Creating more slack in the system reduces the frequency and severity of bunching, but requires more buses, operators, or longer ride times. Real-time GPS tracking opens a window of opportunity of reducing bunching through better coordination. If a trailing bus is notified of a delay encountered by the proceeding bus, it could reduce its travel speed to avoid having fewer-than-average riders at subsequent stops. Unfortunately, such remedies only function when a bus operator has some discretion in travel speed, which is seldom possible in Pittsburgh’s historic, narrow streets.
Mark Egge is a data analyst with a background in healthcare operations and entrepreneurship. He balances his work in GIS, data mining, and health information exchange with an abiding love of playing outside and exploring the natural world.
Ranjana Krishnamoorthy is a graduate student of the master of information systems management program at Carnegie Mellon. University. She loves working with data and is passionate about exploring how technology can be used to improve a business, its management and processes.
Bhavna Prasad recently completed her Masters in Information Systems Management from the Heinz College at Carnegie Mellon University. She is passionate about technology and has a strong penchant to mold raw data into key business drivers for product decisions.
Enbo Zhang is a Public Policy & Management student and graduate teaching assistant at Carnegie Mellon’s Heinz College.
Rohita Kamath is a Summer MISM graduate student in Heinz college. She has previously worked with Deloitte Consulting for 4 years. She loves working on technology and has worked in SAP practice. She loves the idea of using data and technology for the improvement of management in the companies.