Potholes and Crime Stats – A Useful Package for Cleaning Data

Note: this is a guest post from Geoffrey Hing from FreeGeek Chicago. FreeGeek Chicago is working with us on a project to help analyze and visualize Cook County conviction data obtained through a FOIA request by the Chicago Justice Project. As part of the project, Geoffrey created a set of open source packages that we think will be useful to the overall civic innovation community.   —DXO

Potholes are an ever-present nuisance for Chicago residents. While not the biggest challenge facing Chicagoans, they make the navigation of the city in day-to-day life more frustrating. This year, I’ve lost a car rim and had numerous near-crashes on my bike due to these street craters. I often feel like the blocks where streets have new pavement or have had the largest potholes filled in are more exceptional than the rutted norm.Working with open data, I often run into problems that feel like potholes. They’re small, solvable, problems, but they make it harder to get to the bigger problem or the new insight. These are the kinds of problems where dozens of civic hackers have hacked around the same little problems, their solutions burried somewhere in their code repositories. Working on a project covering records of convictions in Cook County criminal courts, our project team ran into one of these data potholes.Who will watch the watchmen? Reed!

We wanted to analyze the number of convictions and variance in sentencing based on the type of offense. We wanted to roll up the statutes under which people were convicted into a common, easily understandable set of offenses and categories. We decided to use the offenses that are part of the Illinois Uniform Crime Reporting(IUCR) program for our analysis. However, our data didn’t have fields mapping each record to an IUCR offense. Instead it had fields describing the statute under either the Illinois Revised Statutes (ILRS) or the Illinois Compiled Statutes (ILCS). This page on the General Assembly website was the best reference I could find describing the differences between how laws are referenced in the two law compilations. The Illinois State Police published a crosswalk between ILCS statutes and IUCR offenses but it was in PDF format and not very useful for processing the thousands of records we needed. Furthermore, our data had references to both ILRS and ILCS statutes and we needed to convert the ILRS statute to an ILCS one in order to look up the IUCR code.

Values for statutes in our data set look like this:

38-12-4
38 9-1E
38-19-1-A
38-18-1-A
38   18-2
38   11-1
720-5/24-1.1(a)
720-5/21-1.3(a)

I ended up implementing an ilcs package for converting the ILRS references to ILCS references and an iucrpackage for looking up an IUCR offense based on an ILCS reference.

It’s just CSV!

There isn’t much to the Python code in these packacges. Essentially, they provide classes that allow statues or offenses to have a string representation, compared and used as keys in dictionaries.

The packages just wrap CSV versions of the crosswalks provided by the states. This isn’t the most performant solution, because it requires that the CSV be parsed when the packages are imported, but I wanted to make it easy for people to update the data, use the data in a spreadsheet or database without using the Python interface, or implement similar functionality in other programming languages.

You can view or download the raw CSV data for the ILCS package here and for the IUCR package here.

Using the packages

Let’s look at an example of looking up an IUCR offense from an ILRS reference:

>>> import ilcs, iucr
>>> import re
>>> ilrs_re = re.compile(r'(?P<chapter>\d+)-(?P<paragraph>[-0-9]+)')
>>> # This is an example of an ILRS reference from our data
... ilrs_ref = '38-12-4'
>>> 
>>> # Parse the reference into chapter and paragraph parts
... m = ilrs_re.match(ilrs_ref)
>>> chapter, paragraph = m.groups()
>>> 
>>> # Lookup the ILCS section from the ILRS reference
... # Note that the lookup functions return lists because some ILRS sections
... # map to multiple ILCS sections
... ilcs_section = ilcs.lookup_by_ilrs(chapter=chapter, paragraph=paragraph)[0]
>>> 
>>> # The section object can evaluate to a nicely formatted string
... print(ilcs_section)
720 ILCS 5/12-4
>>> 
>>> # And you can access its individual components
... print(ilcs_section.chapter, ilcs_section.act_prefix, ilcs_section.section)
720 5 12-4
>>> 
>>> # Now let's look up the IUCR offense.
... # Again, the lookup function returns a list because in some cases,
... # an ILCS statute maps to multiple offenses
... iucr_offense = iucr.lookup_by_ilcs(ilcs_section.chapter, ilcs_section.act_prefix, ilcs_section.section)[0]
>>> 
>>> # An Offense object has various useful attributes
... print("The 4-digit code for the offense is {}".format(iucr_offense.code))
The 4-digit code for the offense is 0410
>>> print("The description of the offense is {}".format(iucr_offense.offense))
The description of the offense is Aggravated Battery
>>> print("The category of the offense is {}".format(iucr_offense.offense_category))
The category of the offense is Battery

Improvements

These are a few areas where I could imagine improvements for the packages.

We’d love to hear about your use cases for these packages, get updates or corrections to the underlying CSV data, suggestions for improvements to the API, or pull requests implementing them.

The best way to provide this feedback is by opening an issue or a pull request through the GitHub repositories for the python-ilcs or python-iucr packages.

Documentation

There are docstrings for the public API of the packages, so you can do something like:

import ilcs
help(ilcs)
help(ilcs.lookup_by_ilrs)

and get some help about the classes and functions in the packages. However, as the API matures, it would be nice to have Sphinx-generated HTML documentation for the packacges.

Exceptions

Currently, the KeyError exceptions trickle up when looking up ILCS sections or IUCR offenses. It’s probably better to catch these and raise more domain-specific exceptions.

Fuzzy lookup or parsing

In our dataset, statutes were referenced in a variety of formats, often including subsection references. Because of this, we couldn’t just pass the raw values to the lookup functions in our packages. It might be good to add functions for parsing strings containing statute references into more standardized formats that can be used with the lookup functions, or using something like jellyfish for doing approximate string matching.

Crime and Punishment Release Party at FreeGeekChicago

A couple weeks ago we had a party to celebrate our shared work with FreeGeek Chicago on the Crime and Punishment in Chicago project.  During the event, we heard from the people on the project team and got feedback on how the project can become even more helpful for community members, journalists, and policymakers.

I’m particularly proud of this project, because it takes a hard look at the gaping data holes we have in seeing the full cycle of crime in our city. I am very proud of the large cache of crime reports available on the City data portal, but I am mindful that we seem to be no closer in having a true understanding of the system.

In my work at EveryBlock, I was responsible for finding crime data in cities all over the country. It gave me a great window into the day-to-day reality of the data and the differences in what’s published. When we had the opportunity to partner with Tracy Siska of the  Chicago Justice Project as the main subject matter expert, and the community-based developers like Brian Peterson of the FreeGeek Chicago’s Supreme Chi-Town Coding Crew, I knew we had something special.

New 2013 CPD Ford Explorer Police Interceptor

Continue reading

On the launch of Crime and Punishment in Chicago

Smart Chicago Collaborative is proud to launch our latest Civic Works Project: Crime and Punishment in Chicago. This project is a collaborative effort among Smart Chicago, FreeGeekChicago, and the Chicago Justice Project.

Chicago Police Department Memorial at Buckingham Fountain

Photo by Chris Smith / Flikr

The Crime and Punishment in Chicago project provides an index of data sources regarding the criminal justice system in Chicago. We aggregate sources of data, how this data is generated, how to get it, and what data is unavailable. This project is a key way we are using the Civic Works grant to use data journalism to uncover the value of data and cover the stories behind the data.

Continue reading

Tonight: Social Justice + Civic Technology

There’s a lot of action in the world of civic innovation here in Chicago. Just today, Foodborne Chicago was the cover story for the Red Eye and the focus of a front-page news story in the Tribune and there’s a Wall Street Journal story highlighting some of the great things going on here in Chicago.

#DirtyDining: Food-poisoning tweets get city follow-up Health authorities seek out sickened Chicagoans, ask them to report restaurants

It’s fun to focus on the more scatological aspects of the work that’s going on. And lots of the work, while certainly helping people live better in Chicago, fails to directly address the lives of working people.

But tonight, at the weekly OpenGov Hack Night, we have a great opportunity to do that (and happens to be in food industry!) Here’s a note from OpenGov Hack Night showrunner Derek Eder:

The next Open Gov Hack Night is tomorrow, Aug 13th at 6pm!

Matt Bruce with the Chicago Community Trust and Restaurant Opportunities Centers United will talk about the US Labor Department’s app challenge for creating a smartphone app that integrates the department’s publicly available enforcement data with consumer ratings, geo-positioning, and other relevant data sets. More details here.

Food will be provided by the Smart Chicago Collaborative! Please RSVP so we know how much to get.
Social justice is where it’s at.