Degree Type


Date of Award


Degree Name

Doctor of Philosophy


Electrical and Computer Engineering


Computer Engineering

First Advisor

Srikanta Tirthapura


With the current explosion in the speed and volume of data, the conventional computation systems are not capable of dealing with large data efficiently. In this project, we do research in the data stream sampling methods and an application on insider threat detection.

The goal of random sampling is to select a subset from the original population so that the subset can represent the whole population. In many real world applications, by sampling a subset from the original population, we can estimate the global statistical properties, such as mean, variance, probability distribution, etc. The goal of random sampling from a distributed stream is to select a subset from the union of the streams such that each element in the distributed stream is sampled with equal probability.

In some cases, the “Heavy Hitters” dominate the random sample. The heavy hitters are the elements with high frequency. The distinct random sample can be applied so that the elements with low frequency can also be seen. Distinct sampling from a distributed stream is proposed to extract a subset from the unique set of the union of the distributed stream. In database query optimization, sampling unique subset from the population is an important task. Random sampling and distinct sampling are among the fundamental techniques and algorithms for large scale data analysis and the query enhancement over database systems. We propose algorithms, theoretical analysis, and experimental evaluations on random sampling and distinct sampling from a distributed stream.

Nowadays, with more and more attacks on the computer systems, it is important to know how we protect our computer systems or classified information from hackers or attackers. Among all the attack or data breaches, more and more cases come from inside of an organization. It is called “Insider Threat.” In recent reports, malicious insiders are causing enormous damages in organizations. We propose two insider threat detection framework that monitors the system logs and detect anomaly behaviors. We propose a Scenario-based Insider Threat Detection method and a Session-based Insider Threat Detection.

We implement our framework in Java, and present experimental evaluation on a synthetic dataset.


Copyright Owner

Yung-Yu Chung



File Format


File Size

136 pages