PDF Liberation Hackathon Resource Page

In preparation for the PDF Liberation Hackathon, we’re putting together a short how-to of different PDF Liberation tools starting with Tabula – a tool to extract tables of data from PDFs.

Tabula

Tabula is an open source tool built by Manuel Aristarán with the help of ProPublica, La Nación DATA and Knight-Mozilla OpenNews.

When you first open Tabula, you’re given the option to load PDFs into the system. For this example, we’ve taken the monthly veterans report from the Illinois Department of Employment Security (currently only available in PDF) and loaded it into Tabula.

TabulaDemo1

Once you upload it, Tabula will process the file. This can take a little bit of time depending on the size of the file.

Once it’s loaded, you simply draw rectangulars over the tables in the PDF.

TabulaDemo2

From there, Tabula will show you the data that’s it’s captured. Now, you can copy the data to the clipboard or download to your own local machine as a file. It’s that simple.

TabulaDemo3

You can find more information on how Tablua works on Source.  You can also find a list of other PDF extraction tools on the national PDF Liberation Hackathon homepage.

OpenRefine

Once you get the data into a csv, you may have to clean up the data. A common tool to do this is OpenRefine (formally Google Refine). You can load a CSV file into OpenRefine and dig into the data to find possible data entry errors (somebody writing in Chicgo instead of Chicago),  transform the data (change the format of a cell to show currency instead of text), and easily spot inconsistencies in the data (One set of entries classified as ‘phone’ and another ‘phone number’.

OpenRefine also has  comprehensive documentation on how to use it including videos tutorials. Here’s the video that introduces OpenRefine.

You can find out more information on OpenRefine on their website.

Google Fusion Tables

Once you have the data you’re interested in, you can load it into Google Fusion Tables in order to build apps that use the data.

Google Fusion Tables operates much the same way an Excel spreadsheet does. The difference is that you can use the Google Fusion Table API to load data into your civic app. A good example of this is Derek Eder’s Searchable Map Template.

Do you have PDFs that need liberated? Interested in freeing the data?

If you have PDF’s that you’d like to see data extracted from, you can fill out wufoo form here.  If you’re interested in taking part in the PDF Liberation hackathon, you can RSVP for the event here.

 

 

Hack Night Live: CivicLab

Today is the first Chicago OpenGov Hack Night of 2014!

Benjamin Sugar will introduce us to CivicLab, a new space that has opened up in Chicago to encourage ‘civic making.’ He’ll also touch on the intersection of civic media and it’s relationship to activism and community organizing.

As always, we’ll be live streaming the event right here on this blog. The live stream will start at about 6:15 CST.

White House Safety Datapalooza Resources

On January 14th, White House will be hosting the Safety Datapalooza. The event will focus on safety data resources in the areas of transportation, food, occupational and consumer product safety and disaster preparation and response.

Chicago’s civic innovation community has worked in these areas in some detail and with more data being released, there’s potential for much more work to be done. Below we’ve put together a collection of data sources from local and federal resources and given some examples of safety apps being used around the country.

Untitled

Jim McGowan of the Chicago Red Cross demonstrates the volunteer connection app at the their headquarters.

Continue reading

Join us for the PDF Liberation & OSM Hackathon on January 18 and 19

The Smart Chicago Collaborative will host the Chicago location for the PDF Liberation Hackathon 2014 at the offices of the Chicago Community Trust from Saturday, January 18, 2014 from 9AM – 5PM CST and Sunday, January 19, 2014 from 9AM – 1PM CST.

We’re also hosting the Chicago Winter OpenStreetMap #editathon for all day Saturday. Below the fold, we’ve got the details on both of these nationwide events.

Downtown Chicago crashes, Photo By Steve Vance

Downtown Chicago crashes on OpenStreetMap, Photo By Steve Vance

Continue reading