Learning spreadsheet basics is one thing, but establishing data as a steady practice is another. Those who teach data journalism agree there are some steps needed to get the most out of these data skills: namely, establishing habits and collaboration across all levels of staff.
Establishing the ‘data state of mind’
Hilary Niles, who works as a data freelancer, said she would like to think sustainability can happen from the ground up – originating with reporters – but there needs to be editorial support. “There needs to be buy-in from the editors,” she said.
One way to support that grassroots sustainability is establishing a “data state of mind.” IRE and others use this phrase to describe the awareness that if you are looking into a topic, there is data on it somewhere out there. And not only that, but you can get it and examine it.
Derek Willis at ProPublica recommended that journalists practice simply requesting information in the form of data. Budgets and crime rates, he said, are examples of information every newsroom should be getting as data, making it easier to examine later. In addition, governments are releasing information in this format more and more.
Getting into a mindset of asking for data is one of the most important factors in becoming a data-savvy newsroom.
[pullquote align=right]Getting into a mindset of asking for data is one of the most important factors in becoming a data-savvy newsroom.[/pullquote]
Data journalism “is more a way of thinking” than merely a technique, Bengtsson said. When reporters first started using phones to report stories, they didn’t say, “I’m now going to do telephone journalism,” she said. It was merely a technological progression in what they were already doing.
Keng, the data consultant, said a little data knowledge goes a long way. For example, Keng said, reporters might not realize how easy it is to make their own charts or web presentations, reducing the workload on the graphics or IT teams.
All of these advantages, however, require an initial time investment, followed by additional time allowance from higher-ups. Without that sustained support, training would come to naught. Editors who suspect data is of no use, and don’t afford their reporters enough time to work on data stories, can make those suspicions a self-fulfilling prophecy.
How to grow and sustain data capabilities, despite financial and staff limitations
Besides fears that data journalism is too hard or too time-consuming, there’s a prevailing idea that it’s prohibitively expensive. Some of this is rooted in fact – but it’s not an aspect of the data, but rather, of newsrooms that haven’t managed to keep up with the changing technology in general.
“Unfortunately, in many newsrooms there’s been successive resistance to lots of different kinds of technology,” Huffington Post technology and society editor Alex Howard said. “The very rapid change in delivery and distribution, which is now owned by tech companies, has put a lot of these papers in a difficult place.”
Newsrooms experience what Howard called “technical debt” – an unavoidable inheritance of the technology that was purchased and used by the generation before, for better or worse.
This isn’t just an issue for data work, then, but rather all technology from computers to email to social media. Content management systems, the programs that organize and publish content, were designed for print news.
One solution to overcome this technical debt, Howard said, is to pull from outside sources like GitHub. GitHub is a website that hosts a universe of open-source tools, meaning tools that are free to use. The only obstacle, he said, is the understanding and skills needed to put it to work.
The most common tool for data use is one that most newsrooms probably already have installed, no matter how deep their technical debt: Microsoft Excel. While Excel isn’t technically a free tool, it’s a part of the Microsoft Office suite, installed on practically every office computer since the ‘90s.
[pullquote align=right]The most common tool for data use is one that most newsrooms probably already have installed, no matter how deep their technical debt: Microsoft Excel.[/pullquote]
“Most of our work is done in Excel,” said Flor Coelho, a data and multimedia editor at La Nacion.
To expand beyond Excel’s borders, though, the team experimented with free tools like Google Spreadsheets, a free alternative to Excel, and Google Fusion Tables, free software for making data visualizations.
While Excel is technically a paid program, free alternatives like Google Sheets exist. Thanks to rapidly spreading technology and the open source mindset, a reporter could go through her whole career using only free tools.
Niles said she mainly uses free tools like OpenRefine, an online tool for making data more organized, and Silk, a website for making data into graphics, and Excel for analysis.
“I may use Navicat Essentials (a paid app) to do some joining and analysis, if there’s something I can’t do easily in Excel,” she said. This $40 program helped Niles score one of her best stories: that the state of Vermont had no idea how much it was spending on IT services.
Niles got the story by making her own database out of public data that the state didn’t look at. “There’s just insights sitting there on the table waiting for somebody to find them,” she said. “We identified at least six follow-up stories to mine from the database.”
Not only did her data work provide a wealth of stories, Niles said, but her client, Vermont Public Radio, received accolades. A few donors called to say that Niles’s story was why they were renewing their memberships for the station.
Keng, the data consultant, said it’s important to convey to newsrooms how data projects can help their bottom line. If their business model is advertising-based, for example, data projects can increase their traffic. If they depend on funding from foundations, data can make a story more widely cited or published.
Take advantage of free tools
Ideally, journalists can in turn contribute to the open source community. An example is a tool called Tabula, which finds data tables inside PDF’s and scrapes them out into Excel format, so they can be analyzed. A team of journalists created the program with help from organizations like La Nacion and the Knight Foundation. The journalist-coders made the tool because they needed one, then expanded that to share it with other journalists or anyone who needs to scrape data out of a PDF.
Free and open-source tools like Google Sheets and Tabula are a great way to start overcoming your newsroom’s technical debt. To sustain data work, though, actual foundations are needed. This can mean an investment of time and staff labor, where employees learn to use data for their work. It can also mean a financial investment, in the form of new software or tools.
Willis, at ProPublica, said this cost isn’t as prohibitive as it used to be. In other days in the industry, you would have to pay for software to make data-based projects like maps, ProPublica’s Derek Willis said.
“But for most things these days,” he said, “neither the software nor the hardware cost a lot of money to do. Technical costs are much, much less than they used to be.” When he was at The New York Times, Willis said, most of the software they used on the data team was open-source and free.
“(But) there’s no escaping that for many of these skills that if you don’t have them then there is an upfront cost in time,” he went on. “A lot of times people will get scared off by the initial investment,” he said. “(But) it’ll pay off both in terms of the kinds of stories you’re able to do, and being able to build on those kinds of stories.”
[pullquote align=right]A lot of times people will get scared off by the initial investment, [but] it’ll pay off both in terms of the kinds of stories you’re able to do, and being able to build on those kinds of stories.[/pullquote]
All too often, he said, editors think of it as a one-off project, like a feature story, where the reporter invests several days of work and then washes his hands of the project.
“I think that’s a mistake, because I think the payoff is definitely not ephemeral,” Willis said. “It’ll pay off both in terms of the kinds of stories you’re able to do and being able to build on those kinds of stories.”
Willis gave the example of his computer program that checks the FEC website for him. “That gives me a competitive advantage as a reporter,” he said.
Like Niles, who derived half a dozen stories from one database, Willis had made himself – and, by extension, his newsroom – more efficient.
“If you’re doing something repetitive with a computer, then you’re probably doing it wrong,” he said. “There’s probably a better way to do it.”
Bridge the gap between reporters and editors
Once committed, reporters still need time and space to practice their skills, and that message doesn’t always reach editors and publishers.
A way to get buy-in from higher-ups, Huffington Post editor Alex Howard said, is to have “measures of success” in order for data work to be sustainable. Measurements like Google rankings, web analytics, ad revenue and monetization can all influence the higher-ups at an outlet, and guide editors on where to allocate resources.
Data journalist Hilary Niles said when she’s pitching a data story to a newsroom, she always lays out how it will benefit their bottom line. She posits that a data visualization adds value to a text story, while previously unused data can drive web traffic.
Niles is a fan of creating, uploading databases and then maintaining them. Lots of data, like the salaries of public figures, local budgets and crime statistics can be updated every year with new data from the state.
At one point, the Texas Tribune was getting the majority of its web traffic from one such database: a large interactive spreadsheet of salaries of public officials.
[pullquote align=right]Measurements like Google rankings, web analytics, ad revenue and monetization can all influence the higher-ups at an outlet, and guide editors on where to allocate resources.[/pullquote]
For her story for Vermont Public Radio earlier this year, Niles obtained a bunch of data from the state on its IT spending. She organized it, analyzed it and did some shoe leather reporting, all of it leading to at least six more stories. “Putting it into a structured format allowed for much keener analysis that revealed a virtual mine of public interest stories,” Niles said.
Flor Coelho at La Nacion, agreed that you can get many stories from one database, and that can in turn make your reporting, and data reporting in general, more time- and cost-effective.
For instance, La Nacion keeps a database updated of complaints phoned in to the Buenos Aires government. Whenever a new mayoral election comes along, they can ask the database, “what are people most upset about with the way the city is run?”
“So that gives you original content,” Coelho said. “You can ask different questions to the databases.”
Establishing databases to be used again and again can make data journalism sustainable, just like establishing some base knowledge of data can be used again and again for different stories and sources.
Starting with free – or common – tools like Excel and Google Sheets can help publishers overcome technical debt, even though it may be necessary to pay for some software down the road. More integral to sustained data work, though, isn’t the money spent on fancy software, but the time given to reporters to practice and learn and experiment.