Degree Type

Dissertation

Date of Award

2015

Degree Name

Doctor of Philosophy

Department

Electrical and Computer Engineering

First Advisor

Srikanta Tirthapura

Abstract

Graphs represent relationships. Some relationships can be represented as a deterministic graph while others can only be represented by using probabilities. Mining dense structures from graphs help us to find useful patterns in these relationships having applications in wide areas like social network analysis, bioinformatics etc. Arguably the two most fundamental dense substructures are Maximal Cliques and Maximal Bicliques. The enumeration of both these structures are central to many data mining problems. With the advent of “big data”, real world graphs have become massive. Recently systems like MapReduce have evolved to process such large data. However using these systems to mine dense substrucures in massive graphs is an open question. In this thesis, we present novel parallel algorithms using MapReduce for the enumeration of Maximal Cliques / Bicliques in large graphs. We show that our algorithms are work optimal and load balanced. Further, we present a detailed evaluation which shows that the algorithm scales to large graphs with millions of edges and tens of millions of output structures. Finally we consider the problem of Maximal Clique Enumeration in an Uncertain Graph, which is a probability distribution on a set of deterministic graphs. We define the notion of a maximal clique for an uncertain graph, give matching upper and lower bounds on the number of such structures and present a near optimal algorithm to mine all maximal cliques.

Copyright Owner

Arko Provo Mukherjee

Language

en

File Format

application/pdf

File Size

101 pages

Research_Presentation.pdf (12361 kB)
Presentation explaining the thesis

Share

COinS