Campus Units

Mechanical Engineering, Materials Science and Engineering, Electrical and Computer Engineering

Document Type


Publication Version

Published Version

Publication Date


Journal or Book Title

Integrating Materials and Manufacturing Innovation





First Page


Last Page





Materials science research has witnessed an increasing use of data mining techniques in establishing process‐structure‐property relationships. Significant advances in high‐throughput experiments and computational capability have resulted in the generation of huge amounts of data. Various statistical methods are currently employed to reduce the noise, redundancy, and the dimensionality of the data to make analysis more tractable. Popular methods for reduction (like principal component analysis) assume a linear relationship between the input and output variables. Recent developments in non‐linear reduction (neural networks, self‐organizing maps), though successful, have computational issues associated with convergence and scalability. Another significant barrier to use dimensionality reduction techniques in materials science is the lack of ease of use owing to their complex mathematical formulations. This paper reviews various spectral‐based techniques that efficiently unravel linear and non‐linear structures in the data which can subsequently be used to tractably investigate process‐structure‐property relationships. In addition, we describe techniques (based on graph‐theoretic analysis) to estimate the optimal dimensionality of the low‐dimensional parametric representation. We show how these techniques can be packaged into a modular, computationally scalable software framework with a graphical user interface ‐ Scalable Extensible Toolkit for Dimensionality Reduction (SETDiR). This interface helps to separate out the mathematics and computational aspects from the materials science applications, thus significantly enhancing utility to the materials science community. The applicability of this framework in constructing reduced order models of complicated materials dataset is illustrated with an example dataset of apatites described in structural descriptor space. Cluster analysis of the low‐dimensional plots yielded interesting insights into the correlation between several structural descriptors like ionic radius and covalence with characteristic properties like apatite stability. This information is crucial as it can promote the use of apatite materials as a potential host system for immobilizing toxic elements.


This article is published as Samudrala, Sai Kiranmayee, Prasanna Venkataraman Balachandran, Jaroslaw Zola, Krishna Rajan, and Baskar Ganapathysubramanian. "A software framework for data dimensionality reduction: application to chemical crystallography." Integrating Materials and Manufacturing Innovation 3, no. 1 (2014): 1-20. DOI: 10.1186/s40192-014-0017-5. Posted with permission.

Creative Commons License

Creative Commons Attribution 4.0 International License
This work is licensed under a Creative Commons Attribution 4.0 International License.

Copyright Owner

The Authors



File Format