Now You Can Mine Two Million Hours of Network News
Television Explorer is an online tool that lets anyone search for particular keywords from 150 television stations as far back as 2009.
A quiet little blog post over at the Internet Archive this week is worth paying attention to for anyone interested in how media is evolving in the 21st century. It announces the release of a massively powerful big data tool that lets anyone mine the output of network television news over the last six-and-a-half years.
It's an amazing little chunk of technology, really: The Television Explorer lets you punch in a given keyword - "Trump," say - and get back charts and interactive timelines that track when and how often the word popped up in U.S. television news over the last half decade.
The tool works by scanning the closed captioning feed of 150 networks and individual affiliate stations back to 2009. That's nearly two million hours of TV news totaling more than 5.7 billion words. The Television Explorer tool is a collaboration between the people behind the Internet Archive's existing Television News Archive and the nonprofit open database initiative known as the GDELT Project (Global Database of Events, Language, and Tone).
The search tool is an expanded version of the 2016 Candidate Television Tracker, which tracked how TV news covered the various presidential candidates throughout the year.
Now, the system has been tweaked to facilitate even more sophisticated searches. You can select for keywords in conjunction with other words or phrases to narrow down the returns, for example, "Trump" and "tax returns." By sorting for particular networks, you can see how often CNN covered the topic, as opposed to Fox News, Bloomberg, PBS or Aljazeera America.
Clearly, this is useful for assessing day-to-day political coverage in the media. But in its new incarnation, the tool opens up options to scan for any keyword or keyword combination, and adds some powerful new features.
With the previous Television News Archive interface, returns came back at the level of an hour or half-hour show. But the new tool breaks down results at the sentence level. Click each individual return, and you're shuttled to an interactive timeline feature where you can click on the video itself and get just that snippet that you're searching for - an excerpt from CNN's "Anderson Cooper 360," for instance, or "Fox News Sunday."
To explore the Television Explorer tool in regard to my own paranoia interests, I ran some numbers. It turns out that the term "artificial intelligence" was only used 154 times on CNN in the last five years. Whew. The word "robots" clocked in at 2,012. Uh-oh. "Armageddon" popped up 847 times. Hmm.
The GDELT project is largely the brainchild of Kalev Leetaru, professor at Georgetown University and a marquee name in the field of big data. Over at the GDELT website, Leetaru narrates a series of intriguing videos concerning the power of cutting-edge data mining:
"There are as many words posted to Twitter every single day today as in the entire New York Times in the last half century," Leetaru says in the opening video. "That's one of the most amazing things about the time period we live in today, just how much data is out there. The fundamental transformative thing to me, today, is that for the first time in our history we have computers that can actually process that.
"It's all unexplored, I feel like Indiana Jones some days."
WATCH VIDEO: How Watching TV Can Save Your Relationship