Potholes and Crime Stats – A Useful Package for Cleaning Data

Note: this is a guest post from Geoffrey Hing from FreeGeek Chicago. FreeGeek Chicago is working with us on a project to help analyze and visualize Cook County conviction data obtained through a FOIA request by the Chicago Justice Project. As part of the project, Geoffrey created a set of open source packages that we think will be useful to the overall civic innovation community.   —DXO

Potholes are an ever-present nuisance for Chicago residents. While not the biggest challenge facing Chicagoans, they make the navigation of the city in day-to-day life more frustrating. This year, I’ve lost a car rim and had numerous near-crashes on my bike due to these street craters. I often feel like the blocks where streets have new pavement or have had the largest potholes filled in are more exceptional than the rutted norm.Working with open data, I often run into problems that feel like potholes. They’re small, solvable, problems, but they make it harder to get to the bigger problem or the new insight. These are the kinds of problems where dozens of civic hackers have hacked around the same little problems, their solutions burried somewhere in their code repositories. Working on a project covering records of convictions in Cook County criminal courts, our project team ran into one of these data potholes.Who will watch the watchmen? Reed!

We wanted to analyze the number of convictions and variance in sentencing based on the type of offense. We wanted to roll up the statutes under which people were convicted into a common, easily understandable set of offenses and categories. We decided to use the offenses that are part of the Illinois Uniform Crime Reporting(IUCR) program for our analysis. However, our data didn’t have fields mapping each record to an IUCR offense. Instead it had fields describing the statute under either the Illinois Revised Statutes (ILRS) or the Illinois Compiled Statutes (ILCS). This page on the General Assembly website was the best reference I could find describing the differences between how laws are referenced in the two law compilations. The Illinois State Police published a crosswalk between ILCS statutes and IUCR offenses but it was in PDF format and not very useful for processing the thousands of records we needed. Furthermore, our data had references to both ILRS and ILCS statutes and we needed to convert the ILRS statute to an ILCS one in order to look up the IUCR code.

Values for statutes in our data set look like this:

38-12-4
38 9-1E
38-19-1-A
38-18-1-A
38   18-2
38   11-1
720-5/24-1.1(a)
720-5/21-1.3(a)

I ended up implementing an ilcs package for converting the ILRS references to ILCS references and an iucrpackage for looking up an IUCR offense based on an ILCS reference.

It’s just CSV!

There isn’t much to the Python code in these packacges. Essentially, they provide classes that allow statues or offenses to have a string representation, compared and used as keys in dictionaries.

The packages just wrap CSV versions of the crosswalks provided by the states. This isn’t the most performant solution, because it requires that the CSV be parsed when the packages are imported, but I wanted to make it easy for people to update the data, use the data in a spreadsheet or database without using the Python interface, or implement similar functionality in other programming languages.

You can view or download the raw CSV data for the ILCS package here and for the IUCR package here.

Using the packages

Let’s look at an example of looking up an IUCR offense from an ILRS reference:

>>> import ilcs, iucr
>>> import re
>>> ilrs_re = re.compile(r'(?P<chapter>\d+)-(?P<paragraph>[-0-9]+)')
>>> # This is an example of an ILRS reference from our data
... ilrs_ref = '38-12-4'
>>> 
>>> # Parse the reference into chapter and paragraph parts
... m = ilrs_re.match(ilrs_ref)
>>> chapter, paragraph = m.groups()
>>> 
>>> # Lookup the ILCS section from the ILRS reference
... # Note that the lookup functions return lists because some ILRS sections
... # map to multiple ILCS sections
... ilcs_section = ilcs.lookup_by_ilrs(chapter=chapter, paragraph=paragraph)[0]
>>> 
>>> # The section object can evaluate to a nicely formatted string
... print(ilcs_section)
720 ILCS 5/12-4
>>> 
>>> # And you can access its individual components
... print(ilcs_section.chapter, ilcs_section.act_prefix, ilcs_section.section)
720 5 12-4
>>> 
>>> # Now let's look up the IUCR offense.
... # Again, the lookup function returns a list because in some cases,
... # an ILCS statute maps to multiple offenses
... iucr_offense = iucr.lookup_by_ilcs(ilcs_section.chapter, ilcs_section.act_prefix, ilcs_section.section)[0]
>>> 
>>> # An Offense object has various useful attributes
... print("The 4-digit code for the offense is {}".format(iucr_offense.code))
The 4-digit code for the offense is 0410
>>> print("The description of the offense is {}".format(iucr_offense.offense))
The description of the offense is Aggravated Battery
>>> print("The category of the offense is {}".format(iucr_offense.offense_category))
The category of the offense is Battery

Improvements

These are a few areas where I could imagine improvements for the packages.

We’d love to hear about your use cases for these packages, get updates or corrections to the underlying CSV data, suggestions for improvements to the API, or pull requests implementing them.

The best way to provide this feedback is by opening an issue or a pull request through the GitHub repositories for the python-ilcs or python-iucr packages.

Documentation

There are docstrings for the public API of the packages, so you can do something like:

import ilcs
help(ilcs)
help(ilcs.lookup_by_ilrs)

and get some help about the classes and functions in the packages. However, as the API matures, it would be nice to have Sphinx-generated HTML documentation for the packacges.

Exceptions

Currently, the KeyError exceptions trickle up when looking up ILCS sections or IUCR offenses. It’s probably better to catch these and raise more domain-specific exceptions.

Fuzzy lookup or parsing

In our dataset, statutes were referenced in a variety of formats, often including subsection references. Because of this, we couldn’t just pass the raw values to the lookup functions in our packages. It might be good to add functions for parsing strings containing statute references into more standardized formats that can be used with the lookup functions, or using something like jellyfish for doing approximate string matching.

Crime and Punishment Release Party at FreeGeekChicago

A couple weeks ago we had a party to celebrate our shared work with FreeGeek Chicago on the Crime and Punishment in Chicago project.  During the event, we heard from the people on the project team and got feedback on how the project can become even more helpful for community members, journalists, and policymakers.

I’m particularly proud of this project, because it takes a hard look at the gaping data holes we have in seeing the full cycle of crime in our city. I am very proud of the large cache of crime reports available on the City data portal, but I am mindful that we seem to be no closer in having a true understanding of the system.

In my work at EveryBlock, I was responsible for finding crime data in cities all over the country. It gave me a great window into the day-to-day reality of the data and the differences in what’s published. When we had the opportunity to partner with Tracy Siska of the  Chicago Justice Project as the main subject matter expert, and the community-based developers like Brian Peterson of the FreeGeek Chicago’s Supreme Chi-Town Coding Crew, I knew we had something special.

New 2013 CPD Ford Explorer Police Interceptor

Continue reading

Recap of Cook County Land Bank Training

Last month, Smart Chicago took part in the Cook County Land Bank Training. We recorded all the sessions and have a re-cap of the proceedings below. Christopher Whitaker and Joshua Kalov of our team also created and taught the curriculum around Researching vacant properties.

The Land Bank is a great resource for residents and a great partner to Smart Chicago!

IMG_0312.jpg

Cook County President Toni Preckwinkle announcing a $10 million investment to the Cook County Land Bank

Continue reading

Honorary Chicago at OpenGov Hack Night

On Tuesday, August 5, 2014, Linda Zabors of Honorary Chicago talked about her experience mapping out Chicago’s brown honorary street signs with Honorary Chicago.  Smart Chicago’s Christopher Whitaker captured the  entire presentation– see it after the jump or on our YouTube channel.

Kool.

Honorary Street Sign for The Cool Gent, Photo by Flickr user Jessi

The ubiquitous brown street signs are placed to honor Chicago residents and require passage by the Chicago City Council.

Below the fold, Zabors talks about the work she’s done mapping out the streets signs and researching their biographies of who these honorees are.

Continue reading

Deadline for Knight-Mozilla Fellows closing fast!

We’re big fans of the Knight-Mozilla News Fellows. For ten months, the fellows are embedded in newsrooms like the New York Times, Texas Tribune, ProPublica, and the Washington Post bringing their technology skills to the front lines of journalism.

So, with that in mind – we’re encouraging developers, designers, data scientists and all around do-gooders to apply for the Knight-Mozilla Fellowship.

We have more details about the program below the fold. You can get the full story on the OpenNews website here.

Knight-Mozilla OpenNews 2013 Fellows

Dan Sinker and the Knight-Mozilla OpenNews 2013 Fellows, Photo by Laurian Gridinoc

Continue reading

Tomorrow: Civic Tech 4-Pack with OpenTwin Cities (With live stream!)

This Tuesday: Bill Bushey, Laura Andersen, and Steven Clift with E-Democracy/Open Twin Cities will be in Chicago to visit with interesting civic tech, open gov, digital inclusion, and related projects. They’ll also be hosting a four-pack of events at 1871 Chicago.

OpenTwin Cities pitching during National Day of Civic Hacking, Photo by OpenTwin Cities

OpenTwin Cities pitching during National Day of Civic Hacking, Photo by OpenTwin Cities

Christopher Whitaker will be running his usual live stream of the events on our livestream page. We’ve listed the schedule and more detailed information about the events below the fold.

Continue reading