Data Journalism

Written by Damian Trilling, Penny Sheets, & Frederic Hopp

Overview

One of the most important recent innovations in journalism is the increasing use of data. Often referred to as data journalism (or data-driven journalism), we see a development of using computational techniques to make use of, for instance, massive sets of documents (e.g., leaks), or government data, provided via APIs or scraped from the web.

In short, the increased availability of digital data, fueled by developments such as the trend towards open governance or the use of online media, has opened new ways for journalists to discover and research interesting and relevant stories. While the use of data in journalism is not new (there are examples of tables and data visualizations in newspapers from a century ago), the amount of data and their digital nature require new skills from journalists. At the same time, audiences are demanding greater transparency from news organizations, and the news cycle is ever-more choked with content, both of which challenge journalists to use data in ways that are creative, compelling, transparent, and innovative.

This course combines theoretical discussion of these developments and practical skills training. Next to reading and discussing relevant literature, students will be introduced to the programming language Python, which is widely used for retrieving data from the web and for analyzing both textual and numerical data. Additional topics include data visualization, how to find stories in large amounts of data, and dealing with messy data.

Goals

Upon completion of this course, students should be able to demonstrate that they…

  • are able to find an interesting and compelling story in a dataset;

  • are able to apply basic python programming techniques learned in this course to process, analyze, and visualize the data;

  • are able to translate these skills and techniques into an original piece of data journalism of 750-1000 words;

  • are able, in groups, to identify an online tool relevant to data journalism and to teach their classmates how to use that tool in a short presentation and handout.

Course structure

We tackle these learning goals in 3 units of the course: Gathering & Verifying Data (unit I); Analyzing Data (unit II); and Visualizing & Presenting Data (unit III). Each unit finishes with an assignment related to the skills and knowledge within that unit, and which builds toward your final assignment.

Additionally, there is a group project, where groups of students will select one of the many freely available online tutorials related to data journalism, learn from it, and prepare a short lesson & guide for their fellow students. This assignment is designed to maximize our knowledge sharing about this developing field during the course, as well as to familiarize you with the vast resources available to those of you who continue to pursue this sort of journalism in the future. More detailed information about the assignments will be made available in class.

Questions

During the Thursday sessions, we will have plenty of room for answering your questions. To collect questions—especially those about your individual projects—we will also use Canvas (please do not e-mail us technical questions directly; in almost all cases, others will benefit from the answer as well).

It is really difficult to answer programming-related questions without knowing the complete context of what you have done. And it is really hard to know the exact solution if we cannot try things out/change things in your code/etc. We therefore introduce a set of guidelines for asking questions that will help us answer your questions, and will make sure that you get a good answer (instead of a lot of time-wasting clarification questions).

When asking a Python-related question, please always make sure to include the following elements:

  • Subject line (sth more descriptive than “Error” or “Help needed”)

  • Environment (GoogleColab or Anaconda; if the latter: on Linux, Mac, Windows?)

  • Steps to reproduce (what have you done exactly?)

  • Expected Result (what do you think should happen?)

  • Actual Result (what did actually happen?)

  • Visual Proof (screenshots, error messages, code)

  • Severity/Priority (does this prevent you from continuing to work (critical) or is it sth you’d just like to know?)

In particular, always include:

  • The FULL error message (not just the last or first lines)

  • your code (which usually boils down to your.ipynb file)

Keep in mind that we do not offer any help regarded to specifics of Microsoft Windows or MacOS (such as file locations etc.)

The easier you make it for us to see/reproduce your error, the more likely you are to get quick help.

Asking Questions via Hypothesis.io

A very effective way to ask questions related to code or statistical analysis is via the hypothesis.io annotation tool. This allows you to simply highlight a portion of code (or text) in our notebooks and attach a public comment or question. Simply create an account at https://web.hypothes.is/ and start highlighting in our online book. Hint: This is also a great way to track personal annotations that only you can see.