Technical Details

Our work stands atop the shoulders of giants. These are the tools that we found indispensable. You will too.

Source Code

Source code is available on Github. Please fork and extend as you please. We would appreciate hearing about anything you change!

Data Engineering

Twitter, Facebook, and Bitly all provide helpful APIs that we used for retrieving their content and data. We used Twython to access the Twitter API, and Beautiful Soup for finding older tweets. Python’s json module and the excellent pandas library were indispensible to our engineering effort.

Analysis

Python veterans will be unsurprised that much of our analysis employed pandas, matplotlib, and Seaborn.

Clustering & Prediction

We performed principal component analysis and generated our predictive models with scikit-learn. For prediction in particular, we relied on RidgeCV and RandomForestRegressor for our regression problems. We explored SVR as well but opted not to employ it in our final work.

Natural Language Processing

NLTK (the Natural Language Toolkit) was our workhorse for language processing. Our sentiment analysis efforts relied on the WordNet lexical database and SentiWordNet sentiment classifications (both included with NLTK’s data). Our topic modeling results owe a debt of gratitude to gensim.

Collaboration

This website is generously hosted on Github. We enthusiastically recommend this setup to any future CS 109 students who wander this way and read all the way down this page.