Artificial Informer - Issue One

An Interview with Chase Davis: Journalism + Machine Learning Pioneer

Chase Davis is a journalist based in his hometown, Minneapolis, where he's the Senior Digital Editor at the Star Tribune. He got his start working at The Houston Chronicle, eventually co-founding a media technology consultancy, Hot Type Consulting, which brought him to help launch The Texas Tribune. He's probably best known for leading the Interactive Desk at The New York Times and for being a major presence at NICAR conferences. He's spoken on topics including advanced computational techniques, like machine learning (ML), and innovation at small, local journalism organizations.

We talked to Chase for this first issue not only because he was way ahead of the curve on using ML in the newsroom but also because he has a wide range of practical experience in both the trenches of journalism and in the tech industry. Additionally, he was at The New York Times at a pivotal moment during its storied and successful transition to digital. We reached him via email for this interview. Our questions are in bold.

Just to get started, can you give us some background on how you got into journalism and where your interest in it comes from?

I followed a pretty typical path into journalism: I was the editor of my high school paper, went to journalism school at Mizzou, worked at student papers and did a bunch of internships, and kind of moved on from there. As for where the interest came from, I think I just liked the idea of having a career where constant learning was part of the job description.

As far as I understand, you don't have a formal computer science degree/education. How did you pick up your data and programming skills?

Not only do I not have a formal computer science degree — I basically failed the only computer science classes I ever took. I started learning to code for fun in middle school, and I kept it up by pursuing personal projects. In college I figured out that using data, basic programming, web scraping, etc., could help me get scoops as a reporter. And then, almost by accident, I discovered IRE and NICAR, which gave me an outlet for refining those skills and applying them more broadly to journalism.

Your ML presentation at NICAR 2013 with Jeff Larson was really far ahead of its time. Can you explain how you got into ML and applying it to journalism?

I started getting interested in machine learning pretty soon after I graduated college in 2006. It just seemed like an interesting new frontier with big potential in the world of data journalism. So I started studying it in my spare time, checking out books from libraries, trying to make sense of them, and making little prototypes. At some point I ended up on a couple working groups, notably one put together by Brant Houston at the University of Illinois, between the journalism school and the National Center for Supercomputing Applications there. Those really convinced me there was some potential there.

For a while, machine learning in journalism felt like a solution in search of a problem. But there have been a few problems in recent years where knowing some machine learning has really been helpful. Having some background in the subject has made it easier to step in, see some of those opportunities, and put together solutions that make sense in the context of news.

A lot of focus is on where ML should be used in journalism. Are there any areas in journalism that you feel ML shouldn't be used?

Most of them. At this point I think the problem space for supervised learning, in particular, is relatively small and well defined: namely a handful of data cleaning, parsing and classification problems that defy rules-based approaches. Unsupervised learning is more of a question mark. But for most problems in journalism at this point, machine learning is both overkill and even a little dangerous — especially if you don't understand the techniques well enough to adjust for some of their weaknesses.

Are there any media organizations that you think are doing the best in using these new technologies?

Beyond the WaPo, NYT, ProPublica, etc., I think the Data Desk at the LA Times has done some incredible stuff.

Do you see understanding statistics and computer science as becoming a requirement for future journalists, if it's not already?

We're getting to the point where a basic understanding of things like probability and technology should be a requirement for any professional human, journalist or not. But do I think every journalist needs to be able to code? No. Should they all be able to run a regression analysis? No.

That said, I do think we need to stop with the willful ignorance: "I'm a journalist, I'm bad at math!" or "I don't do computers!" Not only are assertions like that self-defeating, they sound increasingly absurd as technology's role in society continues to grow.

Plus, our job is basically to learn things for a living. We don't just get to throw up our hands when the learning gets hard.

It's not ML, but I like to ask people who know about advanced computational techniques about this since journalism is still dominated by it: what do you think about the prevalence of Excel in the newsroom?

Excel has been the gateway drug for literally thousands of journalists to begin exploring data analysis in the newsroom. Without the right people learning Excel at the right times — and organizations like IRE/NICAR being there to teach them — I'm almost horrified to think of all incredible stories that never would have been told.

Any advice for people looking to get into journalism in 2019 who can't go to college?

Do good work and don't be a jerk.

Seriously. In journalism more than most professions, it's all about the work. If you have a demonstrated history of doing good stuff, you're a humble, curious and decent human being, and you plug into the communities of people who do this kind of thing on a daily basis, there's a place for you in this industry.