“Each desk labors over section fronts, but pays little attention to promoting its work on social media.”
— New York Times Innovation Report, 2014
During a recent appearance as part of the IACS seminar series, New York Times Chief Data Scientist Chris Wiggins noted that virtually all news organizations have become startups: they are all firms “in search of a scalable business model.” In response to challenges of the digital age, many “traditional” news organizations have expanded efforts to promote their content on social media websites like Facebook and Twitter as a new medium to garner traffic and users. We are interested in how these efforts differ among news organizations and how we might predict their success.
Stated formally, we investigate two broad questions about news organization activities on social media. The first is descriptive: how do different organizations use each social network, and how do their social media presences differ? The second is predictive: what measurable features of social media activity can predict popularity of news content? These questions fundamentally guide our later analysis and form the basis of the various angles of analysis that we pursue.
We have identified 25 news organizations across several categories (newspapers, cable channels, wire services, etc.) to analyze. They are:
The Boston Globe, The Los Angeles Times, The New York Times, The Wall Street Journal, The Washington Post, USA Today, BBC, The Daily Mail, The Guardian, CNN, Fox News, MSNBC, ABC News, CBS News, NBC News, The Daily Beast, The Huffington Post, Slate, Agence France-Presse, The Associated Press, Reuters, Newsweek, Time, Yahoo News, and NPR News.
Several of last year's CS 109 projects utilized social media data for analytical purposes, and researchers have begun to mine social media data to quite startling ends. Rather than mine aggregate, general data, we have opted to focus on a narrower selection of accounts across media (both Facebook and Twitter) to divine details about the particular social strategies that they exhibit.
While we believe that our analysis is unique in its breadth, our work is certainly inspired by past social media analysis of news content. Brian Abelson’s Pageviews Above Replacement paints a fascinating picture of pageviews that accrue to different types of content that @nytimes shares on Twitter, and lends insight into the potential strategic changes that might yield more traffic. And the Times' leaked 2014 Innovation report discussed the firm's social media strategy extensively. Finally, Bitly analyzed social media traffic to its own links in 2012; their analysis is directly relevant to our observations about how posting time influences social response.
At a high level, our analysis utilizes data from four primary sources: Twitter, Facebook, Bitly, and a custom-built dataset of the final destinations of each link that our news organizations post. Methods of generating each (done through a combination of APIs and web scraping) are discussed in the next section.
For context these figures convey the volume of records that we retreived from each data source:
Data Source | Records |
---|---|
Twitter (Tweets) | 119,010 |
Facebook (Posts) | 53,493 |
Articles Matched (Between Facebook & Twitter) | 23,633 |
Bitly (Click Statistics) | 137,675 links for 23 organizations |
URL Paths | 196,372* |
*Many social media posts—especially on Facebook—feature more than one link. This is why we found more links than we have posts.