Students for Urban Data Systems

at Carnegie Mellon University

Previous Projects

Predicting Building Fire Risk

With Pittsburgh Bureau of Fire, Metro21 Smart Cities Institute, and Pittsburgh Department of Innovation and Performance

SUDS members: Palak Narang (team lead),Jessica Lee, Jeffrey Chen, Fangyan Chen, Nathan Kuo, Amaya Taylor, Xingyuan Ying

The Fire Risk Analysis project uses fire incident and property data to develop predictive models of structure fire risk in partnership with the Pittsburgh Bureau of Fire (PBF) and the Department of Innovation and Performance (I&P). PBF conducts regular fire inspections of commercial properties, as stipulated by the municipal fire code. This project helps PBF prioritize their property inspections with data-driven insights from the fire risk analyses from machine learning models, implemented in a data dashboard and interactive map visualization, in order to target their inspections at the properties at greatest risk of fire.

FireRiskMap_BurghsEyeView

Map of high-risk commercial properties visualized on the “Burgh’s Eye View” interactive map

STATUS: Our team has developed and deployed a statistical model using machine learning to predict structure fire risk in properties around Pittsburgh. We trained the model using data from historical fire incident data from PBF, property inspection data from the Department of Permits, Licenses, and Inspections, and property assessment data from the Allegheny County property assessment office. We evaluated multiple model types (e.g. random forests, SVM, XGBoost), and tuned these models to find the best-performing model and generate risk scores for each commercial property in the city, as a function of its probability for a fire to occur at that address. The risk scores generated by the model are displayed on a data dashboard and an interactive map developed by the Department of Innovation and Performance. These tools are used by fire inspectors and fire chiefs to inform the Bureau of Fire’s prioritization of property fire inspections, so they can inspect the properties at greatest risk of fire.

The model is currently deployed on their servers and retrains on a regular basis, as new data is generated. In the months since the first model was trained and risk scores generated, 14 of the 45 (31%) building-related fire incidents (not all of which are “working fires”) occurred in one of the medium or high-risk properties, significantly higher than the 0.20% base rate for fire incidents in the city. We are currently monitoring the stability of the model’s performance over time and conducting experiments using neural network models to better capture some of the temporal dynamics of incident events around the city. We have found that the model has remained quite stable over the 6 months since its official deployment in February, 2018, with standard deviations of our key performance metrics less than 0.01, over the 16 model iterations since February.

Fire Risk Analysis Model Image

Our future work includes incorporating new data sets, experimenting with additional model types (e.g. recurrent neural networks, reinforcement learning, and/or “active learning”), and expanding this approach to predict fire risk in residential properties at the census block level. Future work may include making the model more easily usable by other cities’ fire agencies.

A technical report from the first phase of the project can be found here.

Open-source code for this project can be found here.

Singh Walia, B., Hu, Q., Chen, J., Chen, F., Lee, J., Kuo, N., Narang, P., Batts, J., Arnold, G.,Madaio, M. (2018, July). A Dynamic Pipeline for Spatio-Temporal Fire Risk Prediction. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (pp. 764-773). ACM. [pre-print pdf]

 

Understanding the Education Innovation Landscape in Pittsburgh

With Remake Learning

SUDS members: Carolina Arroyo (team lead), Allyson Fierro, Zach Goldstein, Sheena Jain, Eric Shapiro, Amit Sharma, Stephanie Truong, Angela Wang

Non-profits often use insights from data to maximize their impact. However, in many cases, organizations don’t have the ability to do this on their own. That’s where SUDS comes in.

In 2017, a SUDS team partnered with the education non-profit Remake Learning to help them use data to improve educational opportunities in the Pittsburgh metropolitan area. 

 

Remake Learning is a network of education organizations in the Southwest Pennsylvania region focused on STEAM (Science, Technology, Engineering, Arts, and Math) and “maker” education. Remake Learning supports these organizations by providing funding support for “makerspaces” – which use cutting-edge technologies for hands-on, project-based learning in fields like engineering and design – and other innovative educational initiatives in the region.

The SUDS team used geographic and financial data to derive valuable insights and develop data visualizations about the education innovation landscape of the Pittsburgh metro area. The SUDS team also analyzed data from Remake Learning’s events to help decision-makers decide where to best invest funds to maximize engagement and better serve underserved communities in the region.

Below is one example of an interactive data visualization developed by the SUDS team to visualize the makerspaces in the Pittsburgh area.

RL1

Remake Learning plans to use the interactive map SUDS built to help interested learners all across Pittsburgh find and make use of makerspaces near them. It can also be used by the leaders of those organizations themselves or founders of new makerspaces, to target areas in Pittsburgh without easy access to spaces for informal STEAM learning.

This work aligns with SUDS’s mission to “do good with data”, and we’re thrilled that our student members have an opportunity to exercise their data science skills while giving back to the Pittsburgh community. You can learn more about Remake Learning’s work at remakelearning.org, and you can find other SUDS projects on this page!

 

 

Improving Fresh Food Access in Pittsburgh

With Just Harvest

SUDS members: David Mitre Becerril (team lead), Jessica Young, Sukrit Ajmani, Ashish Arora, Angela Liu, Julie Kim, Bai Xue, Jennifer Yang

Just Harvest is a non-profit organization working to end hunger by expanding access to fresh, healthy food. The Fresh Access program helps enable shoppers to use their food stamps – as well as credit and debit cards – to buy fresh, nutritious, and locally-grown food.

For this project, they want to better understand the population they serve and the farmers’ markets currently supported through Fresh Access. To do this, they want to develop an interactive map to visualize and understand the transactions that occur at the ~300 markets they work with, to better target their support and plan for future market partnership. Our team of SUDS students is designing and developing this interactive visualization of their transaction and market data and developing a pipeline so the map can be updated as new data are collected. This also involves finding and integrating publicly available civic data on the City of Pittsburgh area, such as, for instance, census data or SNAP data. As part of this work, they are also conducting additional statistical analyses to understand the pattern of market usage from their Fresh Access users.

Some questions explored in this project include:

  1. Do SNAP sales decrease at end of the month?
  2. How much does weather affect market turnout?
  3. Market’s clusters attended by same individual?
  4. Vendor retention through time?
  5. Are there “food deserts” markets?

JH

 

 

 

Visualizing Water Treatment Data

With Allegheny Land Trust

SUDS members: Linda Kuster (team lead), Zachary Goldstein, Pengji Zhang, Shlok Goyal

Allegheny Land Trust (ALT) is a land conservation and stewardship organization which serves the Pittsburgh region. The mission of ALT is to “serve as the lead land trust conserving and stewarding lands that support the scenic, recreational and environmental well-being of communities in Allegheny County and its environs.”[1] As of 2015, ALT owned 572 acres of land, including Wingfield Pines, a site in Upper St. Clair and South Fayette in the southwestern corner of Allegheny County. The site was affected by Abandoned Mine Drainage until ALT worked with an environmental engineer to design a natural filtration system. Abandoned Mine Drainage water flows through a series of ponds which remove oxides before the water reaches Chartiers Creek. Wingfield Pines also offers recreational opportunities for the public.

Wingfield Pines Pond Filtration System

ALT

“The treatment system brings groundwater from close by and directs it into pond 1 (A). The water is moved slowly into the pie shaped settling ponds entering pond 2 and exiting at pond 5 (B). As the water leaves pond 5, most of the iron has been extracted, but the wetlands (C) ensure any remaining iron left will fall out with the unique design and native plants that maximize oxygen mixture.”[2]

In April 2017, ALT approached SUDS for their help analyzing data collected at Wingfield Pines and designing data visualizations to demonstrate ALT’s impact at Wingfield Pines. The SUDS team members of this project have a diverse set of interests and backgrounds, including statistics, coding, and design. The team will complete the following:

  1. Clean, compile, and analyze data collected by a multitude of sources since 2011.
  2. Design data visualizations which convey the importance of ALT and Wingfield Pines for the public and donors.
  3. Embed data visualizations in ALT’s WordPress website.
  4. Ensure ALT has ability to independently update visualizations with new data in the future.

The team presented their visualizations to ALT in early August. The project was then finalized several weeks later following revisions and testing. ALT hopes to increase awareness and donations for Wingfield Pines as a result of this collaboration.


[1] Allegheny Land Trust. 2013. “Strategic Plan.”  http://alleghenylandtrust.org/wp-content/uploads/2016/04/ALT_Strategic_Plan_2014-18.pdf.

[2] Allegheny Land Trust. “Wingfield Pines: Graphs and Analysis.” http://alt.pl1548.pairlitesite.com/properties/wingfield/science/chemistry/analysis.html.

 

 

Boys and Girls Club of Western Pennsylvania

The BGCA is a non-profit organization that currently operates eight clubs in Allegheny County. Because of shifting demographic changes in the city of Pittsburgh over the last few years some of these clubs are no longer located in areas where they can best serve underprivileged youth populations.

The BGCA is attempting to locate other areas in the city and county that would be suitable locations for new club development or merger of existing club locations. This requires a data analysis of census and income information that they are not able to perform on their own.

Skillsets: Data analysis, Mapping, Spatial Analysis

 

Criminal Justice Reform with the Alliance for Police Accountability

Project leadership: Lizzie Silver and Lauren Renaud

We partnered with the Alliance for Police Accountability (APA) on a project that weaved together personal experiences from community interviews with data analysis of larger trends.

This project involved a thorough exploration of the overlapping and confusing boundaries and jurisdictions of police forces in the County. Read the project’s first fascinating analysis on our blog.

The APA’s core goals were to reverse the war on drugs, break up concentrated poverty and block the school to prison pipeline. Datasets we’re examining include the Allegheny County jail census, the Pittsburgh police blotter, Pittsburgh Department of Public Safety interactive reports and the American Community Survey.

Specific Projects in this area:

  • Crime and Development in Larimer
    • Brandi Fisher from the Alliance for Police Accountability was interested in exploring crime hotspots in Larimer and comparing them to where new development is happening or is planned. This was pulled from the WPRDC police blotter dataset. Some of this had been plotted in the past from Code for Pittsburgh, but this project involved working directly with Brandi on Larimer questions. 
    • Skillsets: Mapping, Data Cleaning, Visualization, Qualitative and Quantitative analysis
  • Allegheny County Sentencing Analysis
    • Analyzing what factors impact lengths of sentences in negotiated plea deals using a dataset from the Pennsylvania Commission on Sentencing that includes race, gender, age, income, previous convictions, and level of offense.
    • Skillsets: Data Cleaning, Analysis, Visualization, Machine Learning, Data Mining

 

Hazelwood Community Data Portal

SUDS members involved: Justin Cole, Donghun Kang, Clayton Oeth, Will Levine, and Eric Darsow

In partnership with the Greater Hazelwood Community Collaborative, SUDS members built a web-portal to feature the results of a community survey conducted to assess the civic interests and opportunities for service collaboration among residents of the Hazelwood neighborhood of Pittsburgh.

Project components involved analyzing survey data, mapping the results, creating a web-based portal, presenting the findings to community organizations in Hazelwood, and collaboration with artists local to Hazelwood.

Our data portal is here! Check it out!

hazelwoodPortal