Degree Type

Thesis

Date of Award

2016

Degree Name

Master of Science

Department

Computer Science

Major

Computer Science

First Advisor

Wallapak Tavanapong

Abstract

Twitter has become a political reality where political parties, presidential candidates, legislatures and journalists post tweets about the latest events sharing texts, pictures, hashtags, URLs, and mentioning other users. Gaining insight from the vast amount of political data on Twitter is only possible with proper computational tools.

We propose to store and manage Twitter data in an optimized Neo4j graph database for serving queries about political communication among state legislators of 50 U.S. states, state reporters, and presidential candidates for the 2016 presidential election. Our rationale for selecting this relatively new database technology is threefold: (1) ease of use in explicitly modeling and visualizing communication relationships among entities of interest; (2) flexibility to evolve the database overtime to quickly adapt to changes in user requirements; and (3) user-friendly intuitive query interface. We developed a Python-based Google App Engine application using Twitter API to collect tweets from the Twitter’s handlers of the aforementioned political actors. We employed best practice guidelines in graph database design to develop five different database models in order to distinguish the impact of each query optimization technique. We evaluated each of the models on the same set of tweets posted during January 1, 2016 to November 11, 2016 using the same set of queries of interest to political communication scholars in terms of the average query response times. Our experimental results confirmed the benefits of the best practice design guidelines. In addition, they show that the optimized database model is able to provide significant improvement in query response times. Reducing the number of hops used in the graph queries and using database indexes on most commonly used attributes reduced the average query response time in our dataset by as much as 74.52% and by 85.27%, respectively, compared to the reference model. Nevertheless, the reduction in the average query response time comes with the cost of the increase in graph database relationship store size by 5.49% compared to the reference model.

Our contributions are as follows. (1) The optimized Neo4j graph database that will be updated weekly with new tweets; the access to this database can be made available to political communication scholars. (2) The above findings added to currently limited guidelines in graph database designs. (3) The findings about political communication prior to the Iowa caucus of the 2016 primary presidential election.

DOI

https://doi.org/10.31274/etd-180810-5576

Copyright Owner

Prashant Kumar

Language

en

File Format

application/pdf

File Size

54 pages

Share

COinS