Title
Mining maximal cliques from a large graph using MapReduce: Tackling highly uneven subproblem sizes
Campus Units
Electrical and Computer Engineering
Document Type
Article
Publication Version
Submitted Manuscript
Publication Date
5-2015
Journal or Book Title
Journal of Parallel and Distributed Computing
Volume
79-80
First Page
104
Last Page
114
DOI
10.1016/j.jpdc.2014.08.011
Abstract
We consider Maximal Clique Enumeration (MCE) from a large graph. A maximal clique is perhaps the most fundamental dense substructure in a graph, and MCE is an important tool to discover densely connected subgraphs, with numerous applications to data mining on web graphs, social networks, and biological networks. While effective sequential methods for MCE are known, scalable parallel methods for MCE are still lacking.
We present a new parallel algorithm for MCE, Parallel Enumeration of Cliques using Ordering (PECO" role="presentation" style="box-sizing: border-box; display: inline-block; line-height: normal; font-size: 14.4px; word-spacing: normal; word-wrap: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border: 0px; padding: 0px; margin: 0px; position: relative;">PECO), designed for the MapReduce framework. Unlike previous works, which required a post-processing step to remove duplicate and non-maximal cliques, PECO" role="presentation" style="box-sizing: border-box; display: inline-block; line-height: normal; font-size: 14.4px; word-spacing: normal; word-wrap: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border: 0px; padding: 0px; margin: 0px; position: relative;">PECOenumerates only maximal cliques with no duplicates. The key technical ingredient is a total ordering of the vertices of the graph which is used in a novel way to achieve a load balanced distribution of work, and to eliminate redundant work among processors. We implemented PECO" role="presentation" style="box-sizing: border-box; display: inline-block; line-height: normal; font-size: 14.4px; word-spacing: normal; word-wrap: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border: 0px; padding: 0px; margin: 0px; position: relative;">PECO on Hadoop MapReduce, and our experiments on a cluster show that the algorithm can effectively process a variety of large real-world graphs with millions of vertices and tens of millions of maximal cliques, and scales well with the degree of available parallelism.
Copyright Owner
Elsevier Inc.
Copyright Date
2015
Language
en
File Format
application/pdf
Recommended Citation
Svendsen, Michael; Mukherjee, Arko Provo; and Tirthapura, Srikanta, "Mining maximal cliques from a large graph using MapReduce: Tackling highly uneven subproblem sizes" (2015). Electrical and Computer Engineering Publications. 145.
https://lib.dr.iastate.edu/ece_pubs/145
Comments
This is a manuscript of an article from Svendsen, Michael, Arko Provo Mukherjee, and Srikanta Tirthapura. "Mining maximal cliques from a large graph using mapreduce: Tackling highly uneven subproblem sizes." Journal of Parallel and Distributed Computing 79 (2015): 104-114. DOI: 10.1016/j.jpdc.2014.08.011. Posted with permission.