Welcome to ULaMDyn’s documentation!

ULaMDyn is a Python toolkit designed for advanced data analysis of nonadiabatic molecular dynamics (NAMD) simulations. Built on the Pandas and Scikit-Learn frameworks, this package offers powerful tools for preprocessing, statistical analysis, and unsupervised learning of molecular dynamics (MD) trajectory data generated by the Newton-X program. ULaMDyn was designed to automate the search and discovery of hidden patterns in high-dimensional molecular data sets representing complex potential energy surfaces, thereby enhancing the interpretability and understanding of nonadiabatic dynamics simulations.

General features

  • Data curation:

    • Efficiently collects and organizes output files from multiple MD trajectories into structured datasets.

    • Facilitates data sharing by exporting curated datasets in standard CSV format.

  • Statistical analysis:

    • Provide a complete statistical description of the data by computing the median, mean, standard deviation, skewness, and kurtosis for all NAMD trajectories available.

    • The bootstrap algorithm can be used to determine the uncertainty of molecular properties within a given confidence level.

  • Normal mode analysis:

    • Includes a seamlessly integrated NMA module to identify key vibrational modes influencing molecular dynamics by using a principal component decomposition of the geometric displacements.

  • Ring-puckering analysis:

    • Calculates Cremer-Pople puckering parameters of cyclic fragments, providing insights into molecular conformational dynamics under light exposure.

Unsupervised learning methods

In its current version, ULaMDyn provides a suite of linear and nonlinear unsupervised learning methods for dimensionality reduction and clustering analysis. Within the former class of methods, the high-dimensional feature vectors representing a molecular configuration can be compressed in a meaningful way to a few coordinates, enabling a visual inspection of the underlying relationships and pattern discovery in molecular dynamics datasets. To search for patterns in the full-dimensional MD data, ULaMDyn has implemented a set of clustering algorithms that can be applied both in geometry space – where each point represents a specific molecular configuration sampled during the dynamics and in trajectory space, treating each MD trajectory as a multi-variate time series. Clustering in trajectory space enables grouping trajectories based on their temporal evolution and similarity in behavior.

  • Dimensionality reduction:

    • Principal Component Analysis (PCA).

    • Kernel Principal Component Analysis (KPCA).

    • Isometric Mapping.

    • t-distributed Stochastic Neighbor Embedding (TSNE).

  • Clustering methods:

    • K-Means (geometries or trajectories).

    • Gaussian Mixture Model (GMM).

    • Hierarchical agglomerative clustering.

    • Spectral clustering.

License

This package is freely available for use and distribution under the terms of the GNU Public License (GPL version 3).

Our team:

Light and Molecules group - Aix-Marseille University (AMU), France 🇫🇷

Maintainers:
Contributors:
  • Mariana Casal (AMU): Jupyter notebook tutorials

  • Bidhan Chandra Garain (AMU): SOAP descriptor interface

Coordinator:
  • Prof. Mario Barbatti (AMU)

We encourage any contributions and feedback. Feel free to fork and make pull-request to the “development” branch in GitLab.