### Summary

The proposed research aims at designing highly efficient and scalable algorithms for large graph-based computations on modern data sets and emerging parallel platforms. Graph abstractions and graph-theoretic computations play a critical role in the analysis of data from experimental devices, medical data sets, socially-generated data from the web and mobile devices, and scientific simulation data. The proposal aims to provide a fundamental understanding of parallel graph analytics, with the creation of novel algorithmic frameworks. Motivated by current terascale applications in genomics, proteomics, and social network analytics, the project will undertake the clean-slate design of four algorithmic frameworks that capture broad classes of graph-based computations: traversal-based static graph computations, dynamic graph analytics, subgraph enumeration and pattern search computations, and multiscale and multilevel graph computations. This research will lead to the design of new memory-efficient graph representations and data structures, the creation of novel linear time parallel algorithmic strategies based on data partitioning, and a deeper understanding of architectural features that impact graph processing efficiency and scalability. The expected outcome is to enable non-parallel computing experts design graph analytics at orders-of-magnitude higher levels of abstraction and performance than the current state-of-the-art.

The proposed educational activities, closely related to the research goals, attempt to foster an environment of interdisciplinary computational research within Penn State. New graduate classes on parallel graph analysis, parallel algorithms for computational biology, and high-performance social data mining, will facilitate student involvement in current research activities of this project and its collaborators. The identification of new high-level data-centric parallel algorithm and software design principles will be a key broader impacts outcome. The project will actively collaborate with academic, industrial, and government laboratory partners that rely on graph-based analytics, release the algorithmic frameworks under open-source licenses, and train practitioners and interdisciplinary teams through virtual workshops and tutorials.

### Research Team

- Kamesh Madduri
- Thap Panitanarak, CSE PhD student, 2013-Present.
- Hongyuan Zhan, CSE PhD student, 2014-Present.
- George Slota, CSE PhD student, 2013-16.
- Ramachandran K. N., CSE MS student, 2014-16.
- Sindhuja Parimalarangan, CSE MS student, 2014-16.
- Sai Chirravuri, CSE MS student, 2013-14.
- Emmanuel Oppong, CMPEN undergrad student, 2014 SROP intern.
- Ucheoma Ukah, CMPSC undergrad student, 2014 SROP intern.

### Publications

- H. Zhan and K. Madduri, “Ranking Communities in Weighted Networks,”
*technical report*, Jul. 2016. - G. M. Slota and K. Madduri, “Parallel Color-coding,”
*Parallel Computing*, vol. 47, pp. 51-69, 2015. - G. M. Slota, S. Rajamanickam, and K. Madduri, “High-performance Graph Analytics on Manycore Processors,” in
*Proc. Int'l. Parallel and Distributed Proc. Symp. (IPDPS)*, May 2015. - G. M. Slota and K. Madduri, “Simple parallel biconnectivity algorithms for multicore platforms,” in
*Proc. Int'l. Conf. on High Performance Computing (HiPC)*, Dec 2014.

- G. M. Slota, K. Madduri, and S. Rajamanickam, “PULP: Scalable multi-objective multi-constraint partitioning for small-world networks,” in
*Proc. 2nd IEEE Int'l. Conf. on Big Data (BigData)*, Oct 2014.

- S. Chirravuri, RDF3X-MPI: A Partitioned RDF engine for Data-Parallel SPARQL Querying, Penn State Computer Science and Engineering Master of Science thesis, July 2014.
- T. Panitanarak and K. Madduri, "Performance Analysis of
Single-source Shortest Path Algorithms on Distributed-memory Systems,"
in
*Proc. SIAM Workshop on Combinatorial Scientific Computing (CSC)*, July 2014.

- G. Slota, S. Rajamanickam, and K. Madduri, "BFS and Coloring-based Parallel
Algorithms for Strongly Connected Components and Related Problems,” in
*Proc. Int'l. Parallel and Distributed Proc. Symp. (IPDPS)*, May 2014. - G. Slota and K. Madduri, “Complex Network Analysis
using Parallel Approximate Motif Counting,” in
*Proc. Int'l. Parallel and Distributed Proc. Symp. (IPDPS)*, May 2014. - G. Slota and K. Madduri, “Fast Approximate Subgraph
Counting and Enumeration,” in
*Proc. Int'l. Conf. on Parallel Processing (ICPP)*, Oct 2013.

### Talks and Presentations

- K. Madduri, A Combinatorially-Interpretable Matrix Factorization for Network Community Structure Evaluation, SIAM AN16.
- G. Slota, High-performance Graph Analytics on Manycore Processors, IPDPS 2015.
- G. Slota, PuLP: Complex Objective Partitioning of Small-World Networks Using Label Propagation, SIAM CSE15.
- K. Madduri, Simple Parallel Biconnectivity Algorithms for Multicore Platforms, HiPC 2014.
- K. Madduri, PULP: Fast and Simple Complex Network Partitioning, Dagstuhl Seminar 14461 on High-performance Graph Algorithms and Applications in Computational Science, 2014.
- G. Slota, PULP: Scalable multi-objective multi-constraint partitioning for small-world networks, BigData 2014.
- S. Chirravuri, RDF3X-MPI: A Partitioned RDF engine for Data-Parallel SPARQL Querying, Penn State CSE MS thesis presentation, 2014.
- K. Madduri, Performance Analysis of Single-source Shortest Path
Algorithms on Distributed-memory Systems, SIAM CSC14.

- G. Slota, BFS and Coloring-based Parallel Algorithms for Strongly Connected Components and Related Problems, IPDPS 2014.
- G. Slota, Complex Network Analysis using Parallel Approximate Motif Counting, IPDPS 2014.
- G. Slota, Parallel Strongly Connected Components in Shared Memory Architectures, SIAM PP14.
- K. Madduri, Characterizing Biological Networks using Subgraph Counting and Enumeration, SIAM PP14.
- K. Madduri, High-Performance Graph Analytics, Invited talks at IIIT Hyderabad, University of Hyderabad, and IDRBT, Jan 2014.
- G. Slota, Fast Approximate Subgraph Counting and Enumeration, ICPP 2013.
- G. Slota, FASCIA: Fast Approximate Subgraph Counting and Enumeration (poster), SIAM Network Science Workshop 2013.
- K. Madduri, Parallel analysis of Graph-structured data in Genomics and Proteomics, BigLS Workshop 2013.

### Software

- PULP graph partitioner, available in Zoltan2.
- Parallel biconnected components code, HiPC 2014.
- Parallel SSSP code, SIAM CSC14.
- Parallel implementation of RDF3X, Sai Chirravuri MS thesis code.
- FastPath: Parallel Color-coding applied to signaling pathway detection.
- FASCIA: Fast Parallel Subgraph Counting using Color-coding.