Welcome to ULaMDyn’s documentation!
ULaMDyn is a Python toolkit designed for advanced data analysis of nonadiabatic molecular dynamics (NAMD) simulations. Built on the Pandas and Scikit-Learn frameworks, this package offers powerful tools for preprocessing, statistical analysis, and unsupervised learning of molecular dynamics (MD) trajectory data generated by the Newton-X program. ULaMDyn was designed to automate the search and discovery of hidden patterns in high-dimensional molecular data sets representing complex potential energy surfaces, thereby enhancing the interpretability and understanding of nonadiabatic dynamics simulations.
General features
Data curation:
Efficiently collects and organizes output files from multiple MD trajectories into structured datasets.
Facilitates data sharing by exporting curated datasets in standard CSV format.
Statistical analysis:
Provide a complete statistical description of the data by computing the median, mean, standard deviation, skewness, and kurtosis for all NAMD trajectories available.
The bootstrap algorithm can be used to determine the uncertainty of molecular properties within a given confidence level.
Normal mode analysis:
Includes a seamlessly integrated NMA module to identify key vibrational modes influencing molecular dynamics by using a principal component decomposition of the geometric displacements.
Ring-puckering analysis:
Calculates Cremer-Pople puckering parameters of cyclic fragments, providing insights into molecular conformational dynamics under light exposure.
Unsupervised learning methods
In its current version, ULaMDyn provides a suite of linear and nonlinear unsupervised learning methods for dimensionality reduction and clustering analysis. Within the former class of methods, the high-dimensional feature vectors representing a molecular configuration can be compressed in a meaningful way to a few coordinates, enabling a visual inspection of the underlying relationships and pattern discovery in molecular dynamics datasets. To search for patterns in the full-dimensional MD data, ULaMDyn has implemented a set of clustering algorithms that can be applied both in geometry space – where each point represents a specific molecular configuration sampled during the dynamics and in trajectory space, treating each MD trajectory as a multi-variate time series. Clustering in trajectory space enables grouping trajectories based on their temporal evolution and similarity in behavior.
Dimensionality reduction:
Principal Component Analysis (PCA).
Kernel Principal Component Analysis (KPCA).
Isometric Mapping.
t-distributed Stochastic Neighbor Embedding (TSNE).
Clustering methods:
K-Means (geometries or trajectories).
Gaussian Mixture Model (GMM).
Hierarchical agglomerative clustering.
Spectral clustering.
License
This package is freely available for use and distribution under the terms of the GNU Public License (GPL version 3).
Our team:
Light and Molecules group - Aix-Marseille University (AMU), France 🇫🇷
- Maintainers:
Max Pinheiro Jr (AMU), max.pinheiro-jr@univ-amu.fr
- Contributors:
Mariana Casal (AMU): Jupyter notebook tutorials
Bidhan Chandra Garain (AMU): SOAP descriptor interface
- Coordinator:
Prof. Mario Barbatti (AMU)
We encourage any contributions and feedback. Feel free to fork and make pull-request to the “development” branch in GitLab.