Smart Chicago’s FOIA Fest 2016 Data Crunching Session

On Saturday, March 12th I participated in the 4th annual FOIA Fest in Chicago. This is how the conference is described on its website:

This daylong conference, featuring more than two dozen journalists and other FOIA experts, kicks off national Sunshine Week, a time to celebrate the importance of access to public information. FOIAFest is made possible by the generous support of the Chicago Headline Club, Loyola University Chicago and the Robert R. McCormick Foundation.

I led a hands-on session on intermediate Microsoft Excel for researchers and journalists. Using Chicago’s 2012 – 2015 Lobbyist Compensation Data, we walked through analysis tips and tricks and built pivot tables. The goal was simple: let’s walk through how someone might take a dataset and begin to tell a story with it.

You can see all the slides from the session here:

Here is the screencast video from the session:

A Taxonomy for Regional Data Ecosystems

This post is about designing a taxonomy for Chicagoland’s data ecosystem, and why a taxonomy would be useful for the growth and development of the ecosystem.

Taxonomies are used in many disciplines to organize knowledge. Carl Linneaus’s taxonomy to classify species — like Homo sapiens — is used today, almost three hundred years later. It’s a good example of a nested hierarchy, where each category is a subset of a broader category. A strong taxonomy has a notation convention for classifying individual items and an organizing principle (or principles!) for putting items in relationships with each other. Structured lists, alphabetical order, numerical order, headers, indexes, tables of contents, the branching diagram, all kinds of finding aides — these things are so common now, most people take reference tools for granted. So what?

This is a diagram from Howard T. Odum’s 1971 study of Silver Springs, Florida, an early, pioneering effort to model the thermodynamic and material flows for an ecosystem. What are the elements of a data ecosystem?

The Ecosystem Project

The warrant for Smart Chicago’s “ecosystem project” is to build with, not for, to be at the service of people. Enter the Chicago School of Data. We’ve done interviews, surveys, a convening, and are writing a  book dedicated to how we shape Chicagoland’s data ecosystem to fit the needs of people in Chicagoland. Thanks to our documenters, the convening had an unprecedented amount of raw, in-the-moment documentation. We’ve classified these resources, archived them, and analyzed parts of the data already — data about how organizations in Chicagoland put data to work (how meta!). Now we need to make sure our work doesn’t collect dust. A taxonomy for Chicagoland’s regional ecosystem would turn our documentation into actionable intelligence.

I’m helping develop the structure of this taxonomy so it works for the community. The taxonomy is a way to format data about organizations who participate in the regional ecosystem. With structured metadata, we’ll be able to manage the knowledge we have about these organizations, such as organizations’ sizes, missions, and skill gaps. Regardless if this or a version of this specific structure is used, an established taxonomy for managing knowledge about data ecosystems is a good idea. It will accelerate the hardest parts of building capacity, building technology skills, and building coalitions. Designed to get data to work for people, a simple reference directory for organizations in the ecosystem would help organizations find worn paths to cross technology skill gaps. It would help organizations quickly match themselves with other organizations facing similar challenges, sharing similar successes. A taxonomy acts as a backbone for these kind of reference directories.

Originally, we classified Chicago School of Data participants by industry. Participants were either a university, government department, non-profit, or private company. These buckets were useful when we were finding people to interview. We wanted to get as broad a cross-section of the landscape as possible. We didn’t want to miss the perspective of any of our partners. Over time, though, we found that these buckets weren’t specific enough for our purposes. They weren’t organized in a way that told us anything about how our partners really used data. Ideally, we wanted look at an organization’s place in the ecosystem, its niche, and know exactly what support it needed around data and how else the organization could benefit — and be benefited by — the ecosystem.

Landscape Scan

I listened to and transcribed all the interviews and analyzed the pre-convening survey material. I tried to capture what were, by my lights, the main themes brought up during our scan of the ecosystem. I wrote a draft taxonomy in JSON. It’s okay if you don’t know JSON from a day salon. The drafts were guided by the idea that our data should work for people in the ecosystem, people we know and work with every day, and that it’d be easier to work with our project material if it were indexed. I classified ecosystem members, often non-profit organizations, as creators, consumers, and enablers of data. Wide nets, to be sure, so I introduced a few sub-classifications. The teased out data ecosystem looked like this:

  • The data ecosystem has
    • Creators
      • Who open their data for free
      • Who open their data for a price
      • Who don’t open their data
        • Because of technical capacity
        • Because of cost
        • Because of legal agreements
        • Because of the public interest
        • Because of other reasons
    • Consumers
      • Who only consume free data
      • Who pay for some of their data
      • Who use data
        • To evaluate their own operations
        • To evaluate other organizations’ operations
        • And turn it into a digital product
        • And turn it into a printed product
    • Enablers who provide services and goods in the ecosystem such as
      • Volunteers
      • Consultants
      • Funded organizations
      • Paid organizations

This taxonomy gives you a better sense of the ecosystem’s niches, but it amounts to a bunch of redundant lists of participants. An institution as big as Chicago Public Schools, say, is clearly a creator, consumer, and enabler of data in the ecosystem. CPS shares some data while protecting other data. Different departments use data in different ways. One department might focus on general national trends in education policy while another focuses on budget allocation versus tenure within one district. CPS is a good test case. Its multi-faceted role shows that members of the data ecosystem aren’t easily classified.

Survey Responses

After the convening, on September 19th and 20th, 2014, it was clear that the taxonomy needed revision. More detail was important, especially about how participants used data. Through a survey we found lots of ways data works in the ecosystem. I incorporated categories from our survey into the ecosystem, which changed the structure from:

  • The data ecosystem has
    • Consumers
      • Who use data
        • To evaluate their own operations
        • To evaluate other organizations’ operations
        • And turned it into a digital product
        • And turned it into a printed product

to look like this:

  • The data ecosystem has
    • Consumers
      • Who use data for
        • Resource allocation
        • Measuring impact
        • Advocacy and outreach
        • Understanding the needs of people served
        • Donor development
        • Operations
        • Research

Much better! This structure is more specific and it gives you a clearer picture about the many different ways data gets used by organizations in the ecosystem. This set of categories are specific to our ecosystem, given they were after all created in conversation with a specific group of partners mostly from the Chicagoland region for a conference. That said, my bet is that many organizations would say they use data for at least 1 of these 7 reasons. It’s important for any taxonomy to be flexible enough for people to enter and update survey data, though. Surveys are one of the most important instruments in civic technology.

Most of our survey categories are not mutually exclusive. When you look at our 246 participating organizations’ responses, the network of responses they share is extremely dense, with many millions of combinations. Without more metadata, we may as well reference the raw results of the survey, to learn, for example, that desktop spreadsheets are the most used tool among survey respondents. These survey categories are not enough when there’s already data available to see the ins-and-outs of the ecosystem.

Moar Metadata

People have worked hard to classify organizations. We can build off their work. Our taxonomy can incorporate IRS codes, property tax identifiers, budget, size, whatever’s useful. There are several classification systems for economic entities, such as S&P’s Global Industry Classification Standard, Forbes’s Industry Classification Benchmark, the UN’s International Standard Industrial Classifications. The 501(c)3 classification for a non-profit is one of twenty-nine other types of 501(c) organizations. Usefulness for these codes is measured by how many people actually use them to collect and organize data.

Categories developed by the National Center for Charitable Statistics (NCCS) are extremely useful for our ecosystem project and for other regional initiatives trying to get working inventories of their local data ecosystems. The NCCS categories are organized in their National Tax-Exempt Entities taxonomy.

Incorporating our survey data, NCCS categories, and a few federal codes, an organization’s place in the taxonomy might look like this:

  • Organization name
    • Type
      • NTEE Code
        • A54
      • IRS Code
        • 501(c)4
      • EIN
        • 43-219431A
    • Size
      • Revenue
        • $1,000,000
      • Employees
        • 15
    • Scan
      • Uses data for
        • Resource allocation
          • Yes
        • Measuring impact
          • Yes
        • Advocacy and outreach
          • Yes
        • Understand the needs of people served
          • Yes
        • Donor development
          • Yes
        • Operations
          • Yes
        • Research
          • Yes
      • Needs support in
        • Outreach
          • Yes
        • Analysis
          • No
      • Survey question…
        • Survey category #1
          • Value
        • Survey category #2
          • Value

An abstract version of this taxonomy is available in JSON here. The biggest design change reflects a guiding principle that our work should benefit people. During the first phase of outreach we cast the widest net, grouping organizations in buckets like “non-profit”, “university”, or “government”. These categories were useful when we were scanning the landscape and trying to include as many voices as possible. We changed the categories before the Chicago School of Data convening in September 2014 so that we could group organizations relative to how they worked with data. The JSON taxonomy classified organizations by whether they created, consumed, or enabled the data ecosystem. We added subfields and, after the convening, revised these subfields to include categories from our survey.

This version of the taxonomy does not use the “creator”, “consumer”, or “enabler” categories. It replaces these with organization names, which are now the ‘top’ categories in the structure. Under organization names I included a “Type” category with a few subfields, a “Scan” category to house our survey results, and a blanket “Size” category for other relevant fields, such as budget, employee numbers, and so on. It can be filled out with a simple Google Sheets template, and the template, in turn, could streamline the early research and design phases of capacity building.

Real working standards are hard to create. They’re products of collaborative work. Please add to or modify the taxonomy on Github.

Look out for the Chicago School of Data book, dropping January 2016.

Themes from #NNIP Dallas 2015

The National Neighborhood Indicators Partnership (NNIP) is a network of trusted city organizations committed to collecting, analyzing, and sharing neighborhood data in service to their communities. Partner organizations convene twice a year to share their work and collaborate on topics from policing to tracking investments in neighborhoods. Last week, I attended the NNIP meeting in Dallas, Texas.

NNIP_PartnersBadge_Logo_RGB

It’s worth noting that the humans behind the number crunching and data visualizations were of extremely high quality. I was struck by the camaraderie, creativity, city pride, and good ole fashioned work ethic coursing through the NNIP culture.

It’s also worth noting that any conference or meeting that starts with a “what’s your favorite dataset?” icebreaker is just awesome.

Here’s a look at the major themes that arose throughout the three days of conversations, panels, and tours.

Neighborhood Data Needs Context

It was no accident that presenters from Dallas, Austin, and other cities had trouble making sense of neighborhood indicators without also nodding to historical and social context.

The first panel of the NNIP meeting was just as much about the origins of geographic inequity as it was about the data of geographic inequity. Nakia Douglas of the Barack Obama Male Leadership Academy, John Fullinwider of Lumin Education, Regina Montoya of the Mayor’s Poverty Task Force in Dallas, Theresa O’Donnell of the City of Dallas, and Donald Payton of the African American Genealogy Interest Group discussed the city’s “divides” – especially the prominent north-south divide.  The panel pointed out that these modern inequalities stem from both historical and present racial discrimination.

nnip 2

Living out this need for context, NNIP scheduled tours in Dallas. I had the opportunity to visit the Cottages at Hickory Crossing, the city’s first Housing First community. The 50 approximately 400 square foot single occupancy homes are for the homeless, mentally ill, and previously incarcerated. Future residents of the Cottages will have access to a suite of supportive on-site health and social services.

We walked through the construction, asked questions, and learned about the evaluation plans paired with the program. Even before the residents have moved in, the Cottages are planning an evaluation of the initiative – tracking resident outcomes and savings to Dallas taxpayers, for example. Residents are those who incur the highest cost to taxpayers by remaining homeless, less healthy, and less supported.

nnip 1

By the way, the Cottages at Hickory Crossing have their own Target registry if you would like to help furnish the homes!

NNIP Partners as Local Leaders & Conveners

Several NNIP partners discussed how they lead the conversations and collaborations around data within their cities. Many hold “Data Days” – events usually involving trainings and/or collaborations around neighborhood datasets of interest. Milwaukee’s Impact Inc. is holding their first Data Days this week. Charlotte, NC held their Data Days earlier in October.

One of the most interesting examples of data leadership? Every month Cleveland’s NNIP partner, the Center on Urban Poverty and Community Development at Case Western University, convenes all of the city’s organizations that collect data so they can share their work and build up a citywide data catalogue.

To accomplish their local work, NNIP partners form strong, trusted relationships with government agencies, police departments and other public collectors of data. During the meeting, partners what it took to open up in-demand local data and information for residents. One of my favorite insights came from Data-Driven Detroit (D3) who shared concrete advice for cities working with police departments to open up data.

Going forward, I hope NNIP partners can continue to discuss how data can build and repair community relationships in our cities. In Chicago there is so much work to do in this area. Data can be open and free, but if residents don’t trust it, there is still work to be done. Our own Kyla Williams spoke to this on social media while following the NNIP meeting remotely:

Data for Local Action

NNIP isn’t just about the data for data’s sake; it’s about turning data into informed local action. At the end of the day, if the data aren’t useful, used, or noticed then they are worthless. It’s all about democratizing information for community empowerment and smart policy decisions. This theme echoed several times throughout the NNIP meeting. One example was in Impact, Inc.’s mantra: “No data without stories, no stories without data.”

During the meeting, NNIP dared its partners to make their tech ecosystem. What does that mean? It means taking inventory of information lifecycles in your city and where residents and local organizations fall in those process maps. After all, it’s not enough to know how data is collected, analyzed, and repurposed; cities also need to know how neighborhood indicators and data stories can be turned into smart policy changes and smart local programs.

Here at Smart Chicago we’re also been thinking about ecosystem definition, turning data into action, and formulating meaningful resident engagement around Chicago’s data work. Between Array of Things, WindyGrid, and last year’s Chicago School of Data, there’s a lot to talk about! There are also essential Chicago partners with excellent neighborhood data: DePaul’s Institute for Housing Studies, the Woodstock Institute, and the Heartland Alliance. We need to work together to centralize our neighborhood data, engage with residents and make sure that Chicago isn’t just a “smart city,” but a smart city that works for everyone.

NNIP as a Community of Learning

The NNIP meetings are called “meetings” and not conferences for a reason. There was a palpable roll-up-your-sleeves attitude across the participating partners. I heard stories of people traveling to friends and collaborators in other cities to help replicate successful work nationally. Again, this is a great group of humans.

Those of us visiting NNIP or attending for the first time certainly saw the value of these meetings. Collecting, using, and disseminating neighborhood data to improve your city can be slow work with long-term gains. Having a supportive national network facilitating peer learning seems like an essential ingredient to progress.

Well said, April! Let the homework begin!

To see all NNIP documentation on the Dallas 2015 meeting, see their website.

City of Chicago Tech Plan Update

city-of-chicago-tech-planAt Techweek, City of Chicago Chief Information Officer Brenna Berman announced an 18-month update to Chicago’s Tech Plan.

Chicago’s first Tech Plan was first launched in 2013 and laid out a strategy to establish Chicago as a national and global center of technological innovation.

Since it’s launch, Chicago’s civic technology community has made significant progress towards the goals of the tech plan.

As a civic organization devoted to improving lives in Chicago through technology, Smart Chicago is proud to be heavily involved in the implementation of Chicago’s Tech Plan.

Here are some highlights from the update.

Next Generation Infrastructure

Chicago is working with internal and external partners to improve the speed, availability, and affordability of broadband across the city. The City is preparing to create a Request for Proposal for companies to design, construct, implement, and manage a gigabit-speed broadband network.

In addition to broadband infrastructure, the city is also working to digitally connect it’s infrastructure. Part of this includes the launch of The Array of Things project which will place network of interactive, modular sensor boxes around Chicago collecting real-time data on the city’s environment, infrastructure, and activity for research and public use. (You can listen to their presentation at Chi Hack Night here.) You can already get up to the hour updates on beach conditions thanks to sensors maintained by the Chicago Park District. The Department of Innovation and Technology has loaded the information onto their data portal.

Make Every Community a Smart Community

One of the major efforts of the civic technology community in Chicago is closing the digital divide in every neighborhood.

Much of the work in the coming months will focus on Connect Chicago. This citywide effort, led by Smart Chicago in partnership with LISC Chicago, Chicago Public Library, World Business Chicago, and the City of Chicago’s Department of Innovation and Technology aligns citywide efforts to make Chicago the most skilled, most connected, most dynamic digital city in America.

Here’s more from the Tech Plan about the program:

As part of this initiative, program partners are creating a profile of a fully connected digital community that can be used as a benchmark and will provide best-practice toolkits and other resources to help all Chicago communities reach this benchmark.

If you’re interested in getting involved in  – you should reach out or join the Connect Chicago Meetup!

Another big part of the City’s strategy to close the digital divide in Chicago involves the Chicago Public Library. Libraries around the city already function as public computing centers and now they provide Internet to Go – a program where residents can check out laptops and 4G modems so that they can access the internet at home.

The City of Chicago and the civic tech community is also heavily focused not only access, but on digital skills. The Chicago Public Library’s Cybernavigator Program is set to be expanded and Chicago Public School is working on implementing computer science curriculum at all schools.

On our end, Smart Chicago is working with Get In Chicago to run a youth-led tech program this summer. The conceptual model for this program is “youth-led tech”, which means teaching technology in the context of the needs & priorities of young people. Youth will learn how to use free and inexpensive Web tools to make websites and use social media to build skills, generate revenue, and get jobs in the growing technology industry. They will also learn about all sorts of other jobs in tech— strategy, project management, design, and so on.

Effective Government

The City of Chicago’s Department of Innovation and Technology is also making great progress in using data to help city government be more efficient and effective. One of their first projects, WindyGrid, is a geospatial Web application designed by the City’s Department of Innovation and Technology that strategically consolidates Chicago’s big data into one easily accessible location. WindyGrid presents a unified view of City operations—past and present—across a map of Chicago, giving City personnel access to the city’s spatial data, historically and in real time, to better coordinate resources and respond to incidents.

The City of Chicago will be open sourcing the project later this year on their Github page.

That’s not the only open source project that the city has on the books. Chief Data Officer Tom Schenk Jr recently spoke at Chi Hack Night to talk about their new system to predict the riskiest restaurants in order to prioritize food inspections. The system has found a way to find critical food safety violations seven days faster. Aside from the important aspect of less people getting sick from foodborne illness in the City of Chicago, there is another very important aspect of this work that has national impact. The entire project is open source and reproducible from end to end.

Since the release of the Tech Plan, Smart Chicago has been working with the Chicago Department of Public Health on the Foodborne Chicago project. Foodborne listens to Twitter for tweets about food poisoning and converts them into city service requests.  The Tech Plan update has some results from the project.

A study of the system, published by the Centers for Disease Control, found that during March 2013 – January 2014, FoodBorne Chicago identified 2,241 “food poisoning” tweets originating from Chicago and neighboring suburbs. The complaints identified 179 Chicago restaurant locations; at 133 (74.3%) locations, CDPH inspectors conducted unannounced health inspections. A total of 21 (15.8%) of the 133 restaurants reported through FoodBorne Chicago failed inspection and were closed; an additional 33 restaurants (24.8%) passed with conditions, indicating that serious or critical violations were identified and corrected during inspection or within a specified timeframe.

Chicago’s open data portal is also getting expanded as part of the updated Tech Plan having grown by more than 200 data sets over the last two years. Chicago was the first City to accept edits to select data sets through the City’s GitHub account.

Open311 is also getting an upgrade with the city undergoing a procurement processes to build a new 311 system. As part of the process for upgrading 311, the new system will go through user testing through the Civic User Testing Group.

Civic Innovation

A big part of the city’s strategy around civic innovation is supporting the work of civic technologists here in Chicago. As part of the Tech Plan, Smart Chicago will continue to provide resources to civic technologists like developer resources, user testing, and financial support to civic technology projects.

The Tech Plan also calls out our work with the Chicago School of Data. The two day experience was wholly based on the feedback we received from dozens of surveys, months of interviews, and a huge amount of research into the work being done with data in the service of people. If you missed the conference, here are some of the key takeaways.

The Civic User Testing Group also plays a part in the Tech Plan and has recently been expanded to include all of Cook County.

Chicago Chief Information Officer Brenna Berman stated that Chicago has the strongest civic innovation community in the country. A large part of that community has been the Chi Hack Night, now in it’s fourth year with attendance now reaching over 100 people regularly.

Technology Sector Growth

One of the most thorny issues for civic technologist is the issue of government procurement. One of the things that the city has been doing is meeting with different groups to talk about ways the city can make it easier to buy products and services from smaller business and startups. (You can see Brenna Berman’s talk at the OpenGov Chicago Meetup here.)

As part of the Tech Plan, the City of Chicago is taking this on directly. Here’s the quote from the Tech Plan:

This summer, DoIT will release a Request for Qualifications for start-up and small-sized companies to join a new pool of pre-qualified vendors eligible for future City procurement opportunities. Companies who are deemed qualified will be placed into a pool and receive access to City contract opportunities in the areas of software application development and data analytics.

To further decrease the barriers facing smaller-sized companies in competing for City business, the City has modernized its insurance requirements to allow for pooled insurance plans. Start-ups that are members of an incubator, such as 1871, or smaller companies that come together for a group insurance plan, may now meet the City’s insurance requirements as a group. Insurance requirements were identified as a barrier to conducting business with the City in a series of listening sessions conducted over the past year with these companies.

This is a huge opportunity not only for civic tech companies, but it will enable the city to take advantage of the innovation coming out of these companies.

You can read the full tech plan here.

Denise Linn Joins Smart Chicago as Program Analyst

AshDenise4 copy smaller copyToday Denise Linn joins the Smart Chicago Collaborative as the Program Analyst. She will manage citywide ecosystem initiatives like Connect Chicago and the Chicago School of Data.

Denise comes to us from the Harvard Kennedy School, where she completed her Master in Public Policy degree and researched civic innovation and city-level Internet access projects. In 2015, she published “A Data-Driven Digital Inclusion Strategy for Gigabit Cities” and co-wrote the “Next Generation Network Connectivity Handbook.” She previously worked as an Economics Research Assistant in the Auctions & Spectrum Access Division of the Federal Communications Commission and is an alumna of the AmeriCorps VISTA program.

As Program Analyst, Denise will develop, execute, and manage the evaluation of Smart Chicago Programming.  She has primary responsibility for the day-to-day activities of Connect Chicago, the Chicago School of Data, and other data engagement projects like the Array of Things and the National Neighborhood Indicators Partnership.

You can follow her work on Twitter, LinkedIn, and Slideshare.

Please join me in welcoming Denise Linn.

Follow-up from On The Table 2015: Data Integrity for Small Businesses and Small Non-Profits

on-the-table-logoFor On The Table 2015 I met with Heidi Massey and Ben Merriman over coffee and tea in the Loop. My idea for the conversation focused on creating an open consent form template — meaning, a web form users could finish and then export as a Memorandum of Understanding (MOU), a Non-Disclosure Agreement (NDA), or a Data Sharing Agreement (DSA).

The different documents work in different contexts. Except when working with datasets protected by federal law (more on this later), calling an agreement between parties an MOU or a DSA is largely a matter of habit, while an NDA is a legally binding contract that says which types of confidential information should not be disclosed. Within legal limits, there’s nothing stopping you from writing agreements for your organization in the language and structure you prefer. Consider the purpose of the dataset, who has stakes in its integrity, and what might happen to the dataset in the future.

Often boilerplate NDAs and MOUs are kept filed by organizations. An employee, consultant, or another partner adds their details to the agreement. Both parties sign the agreement and each keeps a copy for themselves. The agreement acts as a promise that, essentially, data stays where it belongs. Violations end the data sharing relationship.

Wseedere saw problems with agreements whose force relies on the color of law and a CYA — Cover Your Ass — mentality. So we tried to imagine how the language of the agreements could promote a culture of shared best practices. The conversation followed Heidi’s idea that small nonprofits have more in common with small businesses than they do with very large nonprofits. Here’s a plain English outline for a data agreement which also works like a data integrity check list.

People who are working with shared data should understand:

  • How the data is formatted for use. This means organizing the dataset into simple tables and, for example, by using the same file type, naming conventions, and variable order.
  • The versions of the dataset. An original version of the dataset should be kept unmodified. Changes to the dataset should be made to a copy of the original version and documented in detail. The location of the original version of the dataset should be known but access restricted.
  • How long the data sharing agreement lasts. The dataset’s life cycle—how a dataset gets created, to where it can be transferred, and when, if at all, a dataset is destroyed–is just as important as a straightforward timeline for deliverables.
  • How to keep information confidential. Avoiding accidental violations of the data sharing agreement is easier when everyone who works with the dataset is familiar with its terms of use. It’s possible to define access permissions to datasets by using password protection and defining read/write roles for users. Data cleaning is a crucial part of this process to ensure that personally identifiable information is kept safe.
  • What costs come with sharing the data. This means being clear about who is in charge of updating the dataset, whether there are financial obligations associated with the data sharing process, and knowing risks associated with breaches. Federal law regulates the sharing of datasets about school children (FERPA), medical information (HIPPA), and vulnerable populations (IRBs).
  • Specific use requirements. This is the nitty-gritty of data sharing. Use requirements specify whether a dataset can be shared with third parties, what other data (if any) can be linked to the dataset, and what changes can be made to the dataset.

Ben has written extensively about the consent process as it relates to the genetic material of vulnerable populations. A vulnerable person — say, a prisoner, child, or an indigenous person — consents to give a sample of their genetic material to a researcher for a study. The genetic material gets coded into a machine readable format and aggregated into a dataset with other samples. The researchers publish their study and offer the aggregated dataset to others for study.

Bowser_Tsai

Image from Anne Bowser and Janice Tsai’s “Supporting Ethical Web Research: A New Research Ethics Review”. Copyright held by International World Wide Web Conference Committee: http://dx.doi.org/10.1145/2736277.2741654.

As it stands, though, there is no way for a person to revoke their consent once s/he gives away their genetic material. The dilemma applies not just to genetic material but any dataset that contains sensitive material. We thought people should have a say in what data counts as sensitive. An organization can limit how much data is shared in the first place. There are technical limitations and capacity limitations that stop people “in” datasets from having a voice during the dataset’s full life cycle.

For more information you can go to one of Smart Chicago’s meetups or review a list of informal groups here. The documentation is from last year’s Data Days conference as part of the Chicago School of Data project. There’s a large community in Chicago willing to teach people about data integrity. Check out Heidi’s resource list, which you can access and edit through Google.