Degree Type

Thesis

Date of Award

2016

Degree Name

Master of Science

Department

Computer Science

Major

Computer Science

First Advisor

Johnny S. Wong

Abstract

Twitter is a novel online microblogging service launched in July 2006. This service has been rapidly gaining worldwide popularity. It has more than 500 million users, out of which there are more than 332 million active users in May 2015. Twitter website is one of the ten most visited websites and has been described as “the SMS of the Internet.” It is not only widely used in a person’s daily life, but also in politics, such as running election campaigns, mining or influencing public opinions.

We study the problem of automated classification of tweets posted on official accounts by a state’s senate and a state’s House of Representatives as well as accounts by individual senators and house representatives, to one of the 21 policy agenda topics specified by Policy Agenda Project [2]. This problem is a multi-class classification problem for short text since each tweet has a limit of 140 characters. Compared with traditional text classification, short text classification has a special characteristic that the content is short and sparse. Therefore, it is very challenging to extract a useful feature for classification. To achieve a reasonable performance, we investigate three methods including Support Vector Machine (SVM) with Linear Kernel, SVM with Topics Grouping and Tweets Merging, and Convolutional Neural Networks (CNN). Based on the experimental results, the CNN method achieved the best performance of 77.34% accuracy on an independent testing set of 1,388 tweets in the Iowa dataset. Furthermore, the CNN method is robust and stable without the need to manually tune the hyper-parameters, which are the attributes of the neural network such as the number of hidden layers, the number of units per layer, and the connections per unit.

DOI

https://doi.org/10.31274/etd-180810-5384

Copyright Owner

Rihui Li

Language

en

File Format

application/pdf

File Size

73 pages

Share

COinS