Projects

Predicting Building Fire Risk

With Pittsburgh Bureau of Fire, Metro21 Smart Cities Institute, Pittsburgh Department of Innovation and Performance

SUDS Members: Palak Narang (Project Manager), Jessica Lee, Jeffrey Chen, Fangyan Chen, Nathan Kuo, Amaya Taylor, Xingyuan Ying

The Fire Risk Analysis project uses fire incident and property data to develop predictive models of structure fire risk in partnership with the Pittsburgh Bureau of Fire (PBF) and the Department of Innovation and Performance (I&P). PBF conducts regular fire inspections of commercial properties, as stipulated by the municipal fire code. This project helps PBF prioritize their property inspections with data-driven insights from the fire risk analyses from machine learning models, implemented in a data dashboard and interactive map visualization, in order to target their inspections at the properties at greatest risk of fire.

FireRiskMap_BurghsEyeView

Map of high-risk commercial properties visualized on the "Burgh's Eye View" interactive map

Outcome: Our team has developed and deployed a statistical model using machine learning to predict structure fire risk in properties around Pittsburgh. We trained the model using data from historical fire incident data from PBF, property inspection data from the Department of Permits, Licenses, and Inspections, and property assessment data from the Allegheny County property assessment office. We evaluated multiple model types (e.g. random forests, SVM, XGBoost), and tuned these models to find the best-performing model and generate risk scores for each commercial property in the city, as a function of its probability for a fire to occur at that address. The risk scores generated by the model are displayed on a data dashboard and an interactive map developed by the Department of Innovation and Performance. These tools are used by fire inspectors and fire chiefs to inform the Bureau of Fire’s prioritization of property fire inspections, so they can inspect the properties at greatest risk of fire.

The model is currently deployed on their servers and retrains on a regular basis, as new data is generated. In the months since the first model was trained and risk scores generated, 14 of the 45 (31%) building-related fire incidents (not all of which are “working fires”) occurred in one of the medium or high-risk properties, significantly higher than the 0.20% base rate for fire incidents in the city. We are currently monitoring the stability of the model’s performance over time and conducting experiments using neural network models to better capture some of the temporal dynamics of incident events around the city. We have found that the model has remained quite stable over the 6 months since its official deployment in February, 2018, with standard deviations of our key performance metrics less than 0.01, over the 16 model iterations since February.

Fire Risk Analysis Model Image

Our future work includes incorporating new data sets, experimenting with additional model types (e.g. recurrent neural networks, reinforcement learning, and/or “active learning”), and expanding this approach to predict fire risk in residential properties at the census block level. Future work may include making the model more easily usable by other cities’ fire agencies.

A technical report from the first phase of the project can be found here. Open-source code for this project can be found here

Publication: Singh Walia, B., Hu, Q., Chen, J., Chen, F., Lee, J., Kuo, N., Narang, P., Batts, J., Arnold, G.,Madaio, M. (2018, July). A Dynamic Pipeline for Spatio-Temporal Fire Risk Prediction. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (pp. 764-773). ACM. [pre-print pdf]

 

Understanding the Education Innovation Landscape in Pittsburgh

With Remake Learning

SUDS Members: Carolina Arroyo (Project Manager, Allyson Fierro, Zach Goldstein, Sheena Jain, Eric Shapiro, Amit Sharma, Stephanie Truong, Angela Wang

Non-profits often use insights from data to maximize their impact. However, in many cases, organizations don’t have the ability to do this on their own. That’s where SUDS comes in.

In 2017, a SUDS team partnered with the education non-profit Remake Learning to help them use data to improve educational opportunities in the Pittsburgh metropolitan area. 

Remake Learning is a network of education organizations in the Southwest Pennsylvania region focused on STEAM (Science, Technology, Engineering, Arts, and Math) and “maker” education. Remake Learning supports these organizations by providing funding support for “makerspaces” – which use cutting-edge technologies for hands-on, project-based learning in fields like engineering and design – and other innovative educational initiatives in the region.

The SUDS team used geographic and financial data to derive valuable insights and develop data visualizations about the education innovation landscape of the Pittsburgh metro area. The SUDS team also analyzed data from Remake Learning’s events to help decision-makers decide where to best invest funds to maximize engagement and better serve underserved communities in the region.

Below is one example of an interactive data visualization developed by the SUDS team to visualize the makerspaces in the Pittsburgh area.

RL1

Remake Learning plans to use the interactive map SUDS built to help interested learners all across Pittsburgh find and make use of makerspaces near them. It can also be used by the leaders of those organizations themselves or founders of new makerspaces, to target areas in Pittsburgh without easy access to spaces for informal STEAM learning.

This work aligns with SUDS’s mission to “do good with data”, and we’re thrilled that our student members have an opportunity to exercise their data science skills while giving back to the Pittsburgh community. You can learn more about Remake Learning’s work at remakelearning.org, and you can find other SUDS projects on this page!

Improving Fresh Food Access in Pittsburgh

With Just Harvest

SUDS members: David Mitre Becerril (team lead), Jessica Young, Sukrit Ajmani, Ashish Arora, Angela Liu, Julie Kim, Bai Xue, Jennifer Yang

Just Harvest is a non-profit organization working to end hunger by expanding access to fresh, healthy food. The Fresh Access program helps enable shoppers to use their food stamps – as well as credit and debit cards – to buy fresh, nutritious, and locally-grown food.

For this project, they want to better understand the population they serve and the farmers’ markets currently supported through Fresh Access. To do this, they want to develop an interactive map to visualize and understand the transactions that occur at the ~300 markets they work with, to better target their support and plan for future market partnership. Our team of SUDS students is designing and developing this interactive visualization of their transaction and market data and developing a pipeline so the map can be updated as new data are collected. This also involves finding and integrating publicly available civic data on the City of Pittsburgh area, such as, for instance, census data or SNAP data. As part of this work, they are also conducting additional statistical analyses to understand the pattern of market usage from their Fresh Access users.

Some questions explored in this project include:

  1. Do SNAP sales decrease at end of the month?
  2. How much does weather affect market turnout?
  3. Market’s clusters attended by same individual?
  4. Vendor retention through time?
  5. Are there “food deserts” markets?

JH

 

 

 

 

 

 

Visualizing Water Treatment Data

With Allegheny Land Trust

SUDS Members: Linda Kuster (team lead), Zachary Goldstein, Pengji Zhang, Shlok Goyal

Allegheny Land Trust (ALT) is a land conservation and stewardship organization which serves the Pittsburgh region. The mission of ALT is to “serve as the lead land trust conserving and stewarding lands that support the scenic, recreational and environmental well-being of communities in Allegheny County and its environs.”[1] As of 2015, ALT owned 572 acres of land, including Wingfield Pines, a site in Upper St. Clair and South Fayette in the southwestern corner of Allegheny County. The site was affected by Abandoned Mine Drainage until ALT worked with an environmental engineer to design a natural filtration system. Abandoned Mine Drainage water flows through a series of ponds which remove oxides before the water reaches Chartiers Creek. Wingfield Pines also offers recreational opportunities for the public.

Wingfield Pines Pond Filtration System

ALT

“The treatment system brings groundwater from close by and directs it into pond 1 (A). The water is moved slowly into the pie shaped settling ponds entering pond 2 and exiting at pond 5 (B). As the water leaves pond 5, most of the iron has been extracted, but the wetlands (C) ensure any remaining iron left will fall out with the unique design and native plants that maximize oxygen mixture.”[2]

In April 2017, ALT approached SUDS for their help analyzing data collected at Wingfield Pines and designing data visualizations to demonstrate ALT’s impact at Wingfield Pines. The SUDS team members of this project have a diverse set of interests and backgrounds, including statistics, coding, and design. The team will complete the following:

  1. Clean, compile, and analyze data collected by a multitude of sources since 2011.
  2. Design data visualizations which convey the importance of ALT and Wingfield Pines for the public and donors.
  3. Embed data visualizations in ALT’s WordPress website.
  4. Ensure ALT has ability to independently update visualizations with new data in the future.

The team presented their visualizations to ALT in early August. The project was then finalized several weeks later following revisions and testing. ALT hopes to increase awareness and donations for Wingfield Pines as a result of this collaboration.


[1] Allegheny Land Trust. 2013. “Strategic Plan.”  http://alleghenylandtrust.org/wp-content/uploads/2016/04/ALT_Strategic_Plan_2014-18.pdf.

[2] Allegheny Land Trust. “Wingfield Pines: Graphs and Analysis.” http://alt.pl1548.pairlitesite.com/properties/wingfield/science/chemistry/analysis.html.

Boys and Girls Club of Western Pennsylvania

The BGCA is a non-profit organization that currently operates eight clubs in Allegheny County. Because of shifting demographic changes in the city of Pittsburgh over the last few years some of these clubs are no longer located in areas where they can best serve underprivileged youth populations.

The BGCA is attempting to locate other areas in the city and county that would be suitable locations for new club development or merger of existing club locations. This requires a data analysis of census and income information that they are not able to perform on their own.


Criminal Justice Reform with the Alliance for Police Accountability

Project leadership: Lizzie Silver and Lauren Renaud

We partnered with the Alliance for Police Accountability (APA) on a project that weaved together personal experiences from community interviews with data analysis of larger trends.

This project involved a thorough exploration of the overlapping and confusing boundaries and jurisdictions of police forces in the County. Read the project’s first fascinating analysis on our blog.

The APA’s core goals were to reverse the war on drugs, break up concentrated poverty and block the school to prison pipeline. Datasets we’re examining include the Allegheny County jail census, the Pittsburgh police blotter, Pittsburgh Department of Public Safety interactive reports and the American Community Survey.

Specific Projects in this area:

  • Crime and Development in Larimer
    • Brandi Fisher from the Alliance for Police Accountability was interested in exploring crime hotspots in Larimer and comparing them to where new development is happening or is planned. This was pulled from the WPRDC police blotter dataset. Some of this had been plotted in the past from Code for Pittsburgh, but this project involved working directly with Brandi on Larimer questions. 
    • Skillsets: Mapping, Data Cleaning, Visualization, Qualitative and Quantitative analysis
  • Allegheny County Sentencing Analysis
    • Analyzing what factors impact lengths of sentences in negotiated plea deals using a dataset from the Pennsylvania Commission on Sentencing that includes race, gender, age, income, previous convictions, and level of offense.

Hazelwood Community Data Portal

SUDS members involved: Justin Cole, Donghun Kang, Clayton Oeth, Will Levine, and Eric Darsow

In partnership with the Greater Hazelwood Community Collaborative, SUDS members built a web-portal to feature the results of a community survey conducted to assess the civic interests and opportunities for service collaboration among residents of the Hazelwood neighborhood of Pittsburgh.

Project components involved analyzing survey data, mapping the results, creating a web-based portal, presenting the findings to community organizations in Hazelwood, and collaboration with artists local to Hazelwood.

Our data portal is here! Check it out!

hazelwoodPortal

Environmental Health Project – Understanding environmental and health impact data

The Environmental Health Project (EHP) is a nonprofit public health organization that assists and supports residents of Southwestern Pennsylvania and beyond who believe their health has been, or could be, impacted by unconventional oil and gas development (UOGD, or “fracking”).

Partnered with the Environmental Health Project (EHP), a nonprofit public health organization, this project aims to utilize data management tools facilitating analysis on the impact of unconventional oil and gas development on the public health for the Southwestern Pennsylvania and New York.

For this project, they have a large number of data sets about citizen health symptoms from local hospitals, and environmental pollution data (air quality, soil quality, etc). They want to develop a data pipeline that can intake their data and join the various data sets at the level of granularity needed, to best understand the impact that polluting facilities have on the health quality of citizens of SW PA. Such a pipeline will allow them to run reproducible analyses on their data sets, incorporating spatial (GIS) data, temporal data, to run models and recommend appropriate environmental health interventions.

The goal of the project is to develop a real-time, integrated data management system to provide support for data-driven public health analysis and assistance, including manual and machine data collection, database development, data management across multi locations, and data visualization.

EHP1

The project is in the process of developing the database management interface, and integrating the historic data, data collection channels, data dictionary, and data visualization.

Currently recruiting students with any of the following skills:

  • Database management (Postgres, SQL, etc)
  • Full-stack / back-end (Javascript, Flask, Node, Firebase, Ruby)
  • Data cleaning and analysis (Python, R)
  • Data visualization (R Shiny, D3.js)
  • GIS (Mapbox, ArcGIS, etc)
  • Web app development (HTML/CSS, Javascript, Flask, Node, Firebase)
  • Communications (WordPress, Twitter, Facebook/Instagram, writing copy, HTML/CSS, etc)

Digital Redlining – Understanding internet access speeds in Pittsburgh, with Pittsburgh Dept. of Innovation and Performance

As part of its Inclusive Innovation initiatives, the Pittsburgh Department of Innovation and Performance (I&P) is working to assess the extent of “digital redlining” in the city, before taking action with internet service providers (ISPs). Digital redlining is the phenomenon where ISPs offer reduced internet speeds to various (typically lower-income) areas of a city, while still charging the same rates. Similar to the disproportionate loan rates offered to lower-income and racially-segregated communities throughout the 20th century (‘redlining’), the digital version of this exacerbates existing inequalities by reducing access to information.

For this project, I&P wants to run a large-scale data collection of internet speeds mapped around the city of Pittsburgh, to make the case to ISPs that they are engaging in inequitable redlining practices. This may involve adapting code from a related project in Louisville, “Speed Up Your City“, or code may be developed from scratch, depending on the availability of code and the needs of the project.

DR

Currently recruiting students with any of the following skills:

  • Involvement in this project may require skills in either data cleaning and analysis (Python, R), data visualization (R Shiny, D3.js), GIS (Mapbox, ArcGIS, etc), web app development (HTML/CSS, Javascript, Flask, Node, Firebase), or database management (Postgres, SQL, etc).
  • Experience working with internet speed or socio-economic data is beneficial, but not required.
  • Communications (WordPress, Twitter, Facebook/Instagram, writing copy, HTML/CSS, etc)

“We Use Data To Impact Our Community “