# COMPUTATIONAL MODELING, SIMULATION AND KNOWLEDGE EXTRACTION

*Feature Subspace Transformation for Enhancing K-Means Clustering*

### Abstract

Unsupervised classification typically concerns identifying clusters of similar entities in an unlabeled dataset. Popular methods include clustering based on (i) distance-based metrics between the entities in the feature space (K-Means), and (ii) combinatorial properties in a weighted graph representation of the dataset (Multilevel K-Means). In this paper, we present a force-directed graph layout based feature subspace transformation (FST) scheme to transform the dataset before the application of K-Means. Our FST-K-Means method utilizes both distance-based and combinatorial attributes of the original dataset to seek improvements in the internal and external quality metrics of unsupervised classification. We demonstrate the effectiveness of FST-K-Means in improving classification quality relative to K-Means and Multilevel K-Means (GraClus). The quality of classification is measured by observing internal and external quality metrics on a test suite of datasets. Our results indicate that on average, the internal quality metric (cluster cohesiveness) is 20.2% better than K-Means, and 6.6% better than GraClus. More significantly, FST-K-Means improves the external quality metric (accuracy) of classification on average by 14.9% relative to K-Means and 23.6% relative to GraClus.

**Publication**

- Feature Subspace Transformations for Enhancing K-Means Clustering,
**A.Chatterjee**, S. Bhowmick, P. Raghavan,*In proceedings International Conference on Information and Knowledge Management (CIKM)*, Toronto, Canada, 2010

*Fast Multicore Simulation** *

**Abstract**

One of the challenges in the design of multicore architectures concerns the fast evaluation of hardware design-tradeoffs using simulation techniques. Simulation tools for multicore architectures tend to have long execution times that grow linearly with the number of cores simulated. We have developed two hybrid techniques for fast and accurate multicore simulation.

Our first method, the Monte Carlo Co-Simulation (MCCS) scheme, considers application phases, and within each phase, interleaves a Monte Carlo modeling scheme with a traditional simulator, such as Simics. Our second method, the Curve Fitting Based Simulation (CFBS) scheme, is tailored to evaluate the behavior of applications with multiple iterations, such as scientific applications that have consistent cycles per instruction (CPI) behavior within a subroutine over different iterations.

**Publication**

- Hybrid Techniques for Fast Multicore Simulation,
**M. Shantharam**, P. Raghavan and M. Kandemir, Euro-Par 09: Proceedings of the 15th International Euro-Par Conference on Parallel Processing, pp. 122–134, Springer Verlag, 2009.

### Publications List in the area of Computational Modeling, Simulation and Knowledge Extraction:

**2009**

- Towards Low-Cost, High-Accuracy Classiﬁers for Linear Solver Selection, S. Bhowmick, B. Toth and P. Raghavan, Proceedings of International Conference on Computational Science 2009, pp. 463–472. Lecture Notes in Computer Science, Vol, 5544, Springer, 2009.

**2008**

- FAST: Force-directed Approximate Subspace Transformation for Improved Unsupervised Document Classification,
**A.Chatterjee**, S. Bhowmick, P. Raghavan,*In proceedings SIAM Textmining Workshop held in conjunction with SIAM Datamining*, Atlanta, GA, 2008. - Readily Regenerable Reduced Microstructure Representations, K.Teranishi, P. Raghavan, J. Zhang, T. Wang, L. Q. Chen and Z. K. Liu, Computational Materials Science, DOI: //dx.doi.org/10.1016/j.commatsci.2007.07.015, 18 pages, March 2008.

**2007**

- An Evaluation of Limited Memory Sparse Linear Solvers for Thermo-Mechanical Applications, K. Teranishi, P. Raghavan, J. Sun and P. Michaleris, International Journal of Numerical Methods for Engineering, Wiley InterScience, DOI: www.interscience.wiley.com,10.10002/nme.2239, 34 pages, November 2007.
- Si Nanotrees: Structure and Electronic Properties, M. Menon, E. Richter, I. Lee and P. Raghavan, J. of Computational and Theoretical Nanoscience, Vol. 4, pp. 250–256, 2007.
- Applications of the FETI-DP-RBS-LNA algorithm on Coupled Linear-Nonlinear Large Scale Problems with Localized Nonlinearities, J. Sun, P. Michaleris, A. Gupta and P. Raghavan, Lecture Notes in Computational Science and Engineering, Domain Decomposition Methods in Science and Engineering XVI, Vol. 55, pp. 431–438, 2007.

**2005**

- A Fast Implementation of the FETI-DP Method: FETI-DP-RBS-LNA and Applications on Large Scale Problems with Localized Nonlinearities, J. Sun, P. Michaleris, A. Gupta and P. Raghavan International Journal for Numerical Methods in Engineering, Vol. 60, No. 4, pp. 833–858, 2005.
- Large Scale Simulations of Branched Si-nano wires, M. Menon, E. Richter, I. Lee, K. Teranishi and P. Raghavan, Proceedings of The IEEE/ACM International Workshop on High Performance Computing for Nano-science and Technology (HPCNano05), November 2005.
- Parallel Adaptive Solvers inCompressible PETSc-FUN3DSimulations, S. Bhowmick, D. Kaushik, L. McInnes, B. Norris and P. Raghavan. Argonne National Laboratory preprint ANL/MCS-P1279-0805, Proceedings of the 17th International Conference on Parallel Computational Fluid Dynamics, August 2005.

**2004**

- An Integrated Framework for Multi-Scale Materials Simulation and Design, Z. K. Liu, L. Q. Chen, P. Raghavan, Q. Du, J. O. Sofo, S. Langer and C. Wolverton, Journal of Computer-Aided Materials Design, Volume 11, No. 2-3, pp. 183–199, 2004.
- Towards A Grid Enabled System for Multicomponent Materials Design, K. Teranishi, P. Raghavan and Z. K. Liu, Proceedings of CCGrid04: IEEE International Symposium on Cluster Computing and the Grid, Chicago, Illinois, IEEE Computer Society Press, 6 pages, April 2004.

**2003**

- A Quality of Service Approach for High-Performance Numerical Components, P. Hovland, K. Keahey, L. C. McInnes, B. Norris, L. F. Diachin and P. Raghavan. Proceedings of the Workshop on Quality-of-Service in Component-Based Software Engineering, Software Technologies Conference, Toulouse, France, pp. 89–98, June 2003.
- Dimension Reduction in Spectral Element Methods, I. Lee, P. Raghavan, S. Schoﬁeld and P. Fischer, Computational Fluid and Solid Mechanics 2003, Proceedings of the Second MIT Conference on Computational Fluid and Solid Mechanics, Editor K. J. Bathe, Volume 2, pp. 2039–2042, June 2003.
- Adaptive Sparse Linear Solvers for Implicit CFD Using Newton-Krylov Algorithms, B. Norris, L. McInnes, S. Bhowmick and P. Raghavan, Proceedings of the Second MIT Conference on Computational Fluid and Solid Mechanics, Editor K. J. Bathe, Volume 2, pp. 1024–1028, June 2003.

**2002**

- Scalable Sparse Matrix Techniques for Modeling Crack Growth, P. Raghavan, M. A. James, J. C. Newman and B. R. Seshadri, Lecture Notes in Computer Science, Applied Parallel Computing, pp. 588–602, June 2002.
- Experiences with FETI-DP in a Production Level Finite Element Code, K. Pierson, G. Reese and P. Raghavan, Proceedings of the 14th International Conference on Domain Decomposition Methods, also available in the electronic archive at http://www.ddm.org/DD14/, pp. 233–240, January 2002.
- Large-scale Normal Coordinate Analysis on Distributed Memory Parallel Systems, C. Yang, P. Raghavan, L. Arrowood, D. W. Noid, B. G. Sumpter and R. E. Tuzun, Int. Journal of High Performance Computing Applications, Vol. 1, pp. 409–424, 2002.

**2001**

- Level SearchTechniques for Scalable Information Filtering andRetrieval, M. W. Berry, P. Raghavan and X. Zhang, Information Processing and Management, Vol. 37, pp. 313–334, 2001.

**2000**

- A Grid Computing Environment for Enabling Large Scale Quantum Mechanical Simulations, J. J. Dongarra and P. Raghavan, Proceedings of GRID’2000: IEEE/ACM International Workshop on Grid Computing, Lecture Notes in Computer Science, No. 1971, Editors R. Buyya and M. Baker, pp. 102–110, December 2000.

**1997**

- A Comparison of Computational Complexities of HFEM and ABC Finite Element Methods, M. A. Nasir, P. Raghavan, W. C. Chew and M. T. Heath, Journal of Electromagnetic Waves and Applications, Vol. 11, pp. 1601–1617, 1997.

**1996**

- Sparse Matrix Reordering Schemes for Browsing Hypertext, M. W. Berry, B. Hendrickson and P. Raghavan, Lectures in Applied Mathematics, Vol. 32: The Mathematics of Numerical Analysis, pp. 99–123, 1996.

**1994**

- A Comparison of Computational Complexities of HFEMand ABCBased Finite Element Methods, M. A. Nasir, P. Raghavan, W. C. Chew and M. T. Heath, Proceedings of IEEE APS International Symposium, pp. 447–450, June 1994.