ulamdyn package
Comprehensive API documentation with detailed information on how to use the functions and/or classes of ULaMDyn package. This is automatically generated from source code and comments.
Subpackages
- ulamdyn.nma package
- ulamdyn.wrappers package
- Submodules
- ulamdyn.wrappers.bootstrap module
- ulamdyn.wrappers.clustering module
ClusteringAnalysis
ClusteringAnalysis.data_scaler
ClusteringAnalysis.descriptor
ClusteringAnalysis.dist_metric
ClusteringAnalysis.method
ClusteringAnalysis.mwc
ClusteringAnalysis.n_clusters
ClusteringAnalysis.n_cpus
ClusteringAnalysis.n_samples
ClusteringAnalysis.run()
ClusteringAnalysis.space
ClusteringAnalysis.time_step
ClusteringAnalysis.transform
- ulamdyn.wrappers.dim_reduction module
DimensionReductionAnalysis
DimensionReductionAnalysis.data_scaler
DimensionReductionAnalysis.descriptor
DimensionReductionAnalysis.dist_metric
DimensionReductionAnalysis.kernel
DimensionReductionAnalysis.method
DimensionReductionAnalysis.mwc
DimensionReductionAnalysis.n_cpus
DimensionReductionAnalysis.n_dim
DimensionReductionAnalysis.n_samples
DimensionReductionAnalysis.perplexity
DimensionReductionAnalysis.run()
DimensionReductionAnalysis.time_step
DimensionReductionAnalysis.transform
- ulamdyn.wrappers.ring_analysis module
- ulamdyn.wrappers.save_datasets module
- ulamdyn.wrappers.save_xyz module
- Module contents
- ulamdyn.unsup_models package
Submodules
ulamdyn.data_loader module
The ulamdyn.data_loader
module is used to build datasets from NAMD
output files.
- class ulamdyn.data_loader.GetCoords
Bases:
BaseClass
Read the Cartesian coordinates from Newton-X MD trajectories.
Note
In the case of NX classical series, the Cartesian XYZ coordinates can be read either from dyn.out or dyn.xyz. For NX new series, the data will be read from the h5 file available in each TRAJ directory. Repeated geometries are skipped.
It also provides a function to calculate the root-mean-squared deviation (RMSD) between each geometry read from the MD trajectories and a reference geometry. Before calculating the RMSD, the two geometries are aligned using the Kabsch algorithm (https://en.wikipedia.org/wiki/Kabsch_algorithm).
This class does not require arguments in its constructor. All the outputs generate by the class are given in angstroms.
Data attributes:
trajectories
(dict): contain indices of the trajectories availablewith the respective maximum time to read.
labels
(numpy.ndarray): stores the sequence of atom labels.eq_xyz
(numpy.ndarray): stores the XYZ matrix of the referencegeometry (geom.xyz).
xyz
(numpy.ndarray): stores the XYZ matrices of all geometries readfrom the TRAJ directories.
rmsd
(numpy.ndarray): vector of RMSD values between all geometriesand the reference one.
dataset
(pandas.dataframe): stores a dataframe of the flattenedXYZ matrices.
- align_geoms(xyz_data=None) None
Calculate the RMSD between the current and reference geometries.
Note
Before calculating the RMSD, the method uses the Kabsch algorithm to find the optimal alignment between the loaded molecular geometry for each time step t and the reference geometry.
The calculated RMSD values will be stored in the class attribute
rmsd_values
, while thexyz
attribute will be updated with the aligned geometries.
- build_dataframe() None
Create a DataFrame containing flattened XYZ coordinates from all MD trajectories.
After running this function, the class attribute
dataset
will be updated with the loaded DataFrame object.
- dataset
- eq_xyz
- from_dyn(outfile: str, tmax=100000) Tuple[ndarray, ndarray, ndarray]
Get XYZ coordinates from the dyn.out file of classical Newton-X.
- Parameters:
outfile (str) – name of the NX output file, dyn.out
- Returns:
tuple containing an array of strings defining the atom labels, a tensor of size (n_steps, n_atoms, 3) with all XYZ coordinates of a single MD trajectory, and an array with the time steps.
- Return type:
tuple in the form (numpy.ndarray, numpy.ndarray, numpy.ndarray)
- static from_h5(outfile: str) Tuple[ndarray, ndarray, ndarray]
Read the XYZ coordinates of the MD frames generated by Newton-X NS.
- Parameters:
outfile (str) – name of the NX output file, usually dyn.h5
- Returns:
tuple containing an array of strings defining the atom labels, a tensor of size (n_steps, n_atoms, 3) with all XYZ coordinates of a single MD trajectory, and an array with the time steps.
- Return type:
tuple in the form (numpy.ndarray, numpy.ndarray, numpy.ndarray)
- static from_xyz(xyzfile) Tuple[ndarray, ndarray]
Get XYZ coordinates from the dyn.xyz file of classical Newton-X.
This method has the same parameters and return type as in
from_dyn()
.
- labels
- read_all_trajs(calc_rmsd: bool = True) None
Concatenate the XYZ coordinates read from all MD trajectories.
After running this method, the class attributes
labels
andxyz
will be updated.
- read_eq_geom() None
Read the XYZ coordinates of a reference geometry.
Note
This method should be executed before calculating the RMSD with the
align_geoms()
function.A file with name geom.xyz must be provided in the working directory (TRAJECTORIES). After reading the coordinates, the method will update the class attributes
labels
andeq_xyz
.
- rmsd
- save_csv() None
Save all loaded geometries (raw format) into a csv file.
If the RMSD has been calculated, it will be included as an extra column in the XYZ coordinates data set.
The default name of the output file is all_coordinates.csv.
- traj_time
- trajectories
- xyz
- class ulamdyn.data_loader.GetCouplings
Bases:
BaseClass
Class used to read the Nonadiabatic Coupling Vectors (NAC) from the MD trajectories.
This class does not require arguments in its constructor. The outputs generate by the class is given in atomic units.
Data attributes:
trajectories
(dict): contain indices of the trajectories availablewith the respective maximum time to read.
all_nacs
(dict): store the NAC matrices (per state) for allMD trajectories.
dataset
(dict): dictionary of dataframes with the flattened gradientmatrices.
- all_nacs
- build_dataframe(save_csv=False) None
Generate a dataset (pandas.DataFrame object) with all NACs.
The XYZ matrices with the nonadiabatic couplings obtained for each molecular geometry is flattened into a one dimensional vector. In this way, every row of the dataset corresponds to one step of a given MD trajectory.
- Parameters:
save_csv (bool, optional) – if True enable the output of all NAC dataframes in a csv format, defaults to False. The default name of the exported dataset is all_nacs_s[n].csv, where n is the number of the state.
- datasets
- static from_h5(outfile) dict
Read NACs from one trajectory generated by the new Newton-X.
- Parameters:
outfile (str) – name of the NX output file in the .h5 format, usually dyn.h5.
- Returns:
stacked nonadiabatic coupling vectors (NAC) for all steps of a single MD trajectory, provided as a dictionary of tensors (numpy.ndarray, one for each state) with shape (n_steps, n_atoms, 3).
- Return type:
- static from_nxlog(outfile: str, tmax=100000) dict
Read NACs from one trajectory generated by the Newton-X CS.
- trajectories
- class ulamdyn.data_loader.GetGradients
Bases:
BaseClass
Class used to collect QM gradients from Newton-X MD trajectories.
This class does not require arguments in its constructor. The outputs generate by the class is given in eV/angstrom.
Data attributes:
trajectories
(dict): contain indices of the trajectories availablewith the respective maximum time to read.
all_grads
(dict): store the gradient matrices (per state) for allMD trajectories.
dataset
(dict): dictionary of dataframes with the flattened gradientmatrices.
- all_grads
- build_dataframe(save_csv=False) None
Generate a dataset (pandas.DataFrame object) with all gradients.
The XYZ matrices with the gradients of each molecular geometry is flattened into a one dimensional vector, such that every row of the dataset corresponds to one step of the MD trajectories.
- Parameters:
save_csv (bool, optional) – if True enable the output of all gradient dataframes in a csv format, defaults to False. The default name of the saved dataset is all_gradients_s[n].csv, where n is the number of the state.
- datasets
- static from_nxlog(outfile, tmax=100000) dict
Read XYZ gradients from one trajectory generated by Newton-X CS.
This method has the same parameters and return type as in
from_h5()
.
- trajectories
- class ulamdyn.data_loader.GetProperties
Bases:
BaseClass
Read all properties available in the Newton-X MD trajectories.
Note
This class does not require arguments in its constructor. All the energy quantities processed by the class are transformed from Ha to eV. For the other properties, the original units used in Newton-X are kept.
Data attributes:
trajectories
(dict): trajectories ID (TRAJXX) found in the workingdirectory are stored as keys of the dictionary, and the corresponding values are the maximum simulation time to be read.
dataset
(pd.dataframe): store a dataframe with all properties.num_states
(int): keep track of the number of states considered inthe MD simulation.
nx_version
(str): identify the Newton-X version, can be either cs(classical series) or ns (new series).
- dataset
- energies()
Read / process the energy information from the en.dat (classical NX) or .h5 (new NX) file.
- Returns:
a processed dataset with the information of all trajectories stacked, and containing the following columns “TRAJ”, “time”, “State”, “Total_Energy” plus the energy gaps between the accessible states (e.g., DE12) and binary columns to identify the hopping points (e.g., Hops_S21).
- Return type:
- mcscf_coefs()
Collect the squared coefficients of the MCSCF wavefunction.
Note
This method works only for NAMD trajectories generated with Columbus.
- Returns:
a dataset with the three highest MCSCF coefficients (the three largest) for each electronic state read from the NX output per state; if the class variable
dataset
has been already updated with some properties, the population data will be merged with the existing dataset.- Return type:
- nac_norm()
Calculate the norm of the nonadiabatic coupling matrices for each pair of states.
After running this function, the class variable
dataset
will be updated with the loaded NACs norm data.- Returns:
a dataset with the Frobenius norm of the nonadiabatic coupling matrices of all NX trajectories calculated for each possible transition between states; the number of columns in the NACs norm dataset is given by n_states * (n_states - 1)/2.
- Return type:
- num_states
- nx_version
- oscillator_strength()
Collect the oscillator strength from properties file.
Note
The oscillator strength (OSS) is not always available in the output. If needed, check the Newton-Xdocumentation to see what are the required keywords and methods. The OSS data can be read from either classical or new series NX trajectories.
- Returns:
a dataset with the oscillator strength information read from all available NX trajectories with one column for each transition between states; if the class variable
dataset
has been already updated with some properties, the oscillator strength data will be merged with the existing properties dataset.- Return type:
- populations()
Read and calculate the population for each accessible state.
Note
If the MD simulations were performed with the classical Newton-X, the populations will be calculated using the wavefunction coefficients. In the new Newton-X, the populations are already calculated and available in the .h5 file.
- Returns:
a dataset with the populations obtained from all available NX trajectories with one column per state; if the class variable
dataset
has been already updated with some properties, the population data will be merged with the existing dataset.- Return type:
- property save_csv: None
Save the dataset with QM properties read from the NX trajectories.
The following properties are included in the dataset:
trajectory index;
simulation time;
total energy;
energy gaps between states (eV);
oscillator strength (if available);
states population.
norm of the nonadiabatic coupling matrices (if available);
three highest MCSCF coefficients per state (only for NX/Columbus);
- trajectories
ulamdyn.data_writer module
- class ulamdyn.data_writer.Geometries(atom_labels, properties_data=None, add_properties=[])
Bases:
BaseClass
Handle and save XYZ coordinates for selected frames of MD trajectories.
- save_xyz(geoms_array: ndarray, out_name: str = 'selected_geoms.xyz') None
Save an XYZ file for a set of selected molecular geometries.
Note
The comment line of the XYZ file will contain a list of property values for each molecule.
- Parameters:
geoms_array (numpy.ndarray) – a 3D array containing the list of XYZ matrices.
out_name (str) – name of the XYZ file containing all the selected geometries, defaults to selected_geoms.xyz.
ulamdyn.descriptors module
Module used to generate ML descriptors from molecular geometries.
- class ulamdyn.descriptors.R2(all_geoms=None, use_mwc=False)
Bases:
GetCoords
Class used to convert the XYZ coordinates of molecular geometries into R2-type of descriptors.
The R2 descriptor is defined as the (flattened) matrix of all pairwise Euclidean distances between all atoms in the molecule. Since the matrix is symmetric with respect to the interchange of atom indices (i.e., Dij = Dji) only the lower triangular portion of the R2 matrix will be outputed in the final data set.
This class also provides a method to compute other molecular descriptors derived from the R2 distance matrix. They are:
- inverse R2 -> defined as \(1/R_{ij}\), similarly to the Coulomb
matrix descriptor.
- delta R2 -> difference between the R2 vector of the current geometry in
time t and the equivalent R2 vector of a reference geometry (typically the ground-state geometry), \(R_{ij}(t) - R_{ij}(ref)\).
- RE -> R2 vector normalized relative to equilibrium geometry,
\(R_{ij}(eq)/R_{ij}(t)\). For more details, check the reference J. Chem. Phys. 146, 244108 (2017).
- apply_to_delta
- build_descriptor(variant=None, apply_to_delta=None, save_csv=False)
Generate a dataframe with R2-based descriptors for all geometries.
- Parameters:
variant (str, optional) – molecular representation derived from the R2 descriptor, defaults to None.
apply_to_delta (str, optional) – select a nonlinear function to apply as a transformation (sigmoid or hyperbolic tangent) on delta-R2 descriptors, defaults to None.
save_csv (bool, optional) – if True export the data set with all calculated descriptors in a csv format with name all_geoms_r2.csv, defaults to False.
- Returns:
a dataframe object of shape (n_samples, n_atoms * (n_atoms - 1)/2), where each row is a vector with the R2-based descriptor computed for a given molecular geometry.
- Return type:
- inverse_transform(descriptor)
Apply all transformations in reverse order to recover the original R2 vector.
- mass_weights
- norm_factor
- r2_descriptor
- r2_ref_geom
- static reconstruct_xyz(r2_vec: ndarray) ndarray
Reconstruct the XYZ coordinates from the pairwise atom distance vector.
- Parameters:
r2_vec (numpy.ndarray) – Numpy 2D array of shape (n_samples, n_atoms * (n_atoms - 1)/2) that contains stacked R2 vectors.
- Returns:
Numpy 3D array of shape (n_samples, n_atoms, 3) containing the molecular geometries transformed back into the Cartesian coordinate space.
- Return type:
- variant
- static xyz_to_distances(xyz_matrix: ndarray) ndarray
Calculate the pairwise distance matrix for a given XYZ geometry.
- Parameters:
xyz_matrix (numpy.ndarray) – Cartesian coordinates of a molecular structure given as a matrix of shape (n_atoms, 3).
- Returns:
vector of size \(n_{atoms} (n_{atoms} - 1)/2\) containing the lower triangular portion of the R2 matrix.
- Return type:
- class ulamdyn.descriptors.RingParams(ring_atom_ind: list, ring_coords: ndarray = None)
Bases:
GetCoords
Class used to the Cremer-Pople parameters from ring coordinates.
- build_dataframe(save_csv=False)
Construct a data frame containing the Cremer-Pople parameters of all collected geometries.
- dataset
- eq_xyz
- get_conf_5memb(phi) str
Determine the class of a 5-membered ring deformation based on the CP parameters.
- get_conf_6memb(theta, phi) str
Determine the conformation class for a 6-membered ring based on the CP parameters.
- get_pucker_coords(coords: ndarray) ndarray
Calculate the Cremer-Pople puckering parameters for one ring.
- labels
- property ring_coords: ndarray
Filter the XYZ coordinates corresponding to the selected ring and translate the coordinates to the ring center.
- rmsd
- traj_time
- trajectories
- xyz
- class ulamdyn.descriptors.SOAPDescriptor(all_geoms=None, atoms=None, r_cut=14, n_max=8, l_max=6, average='outer', rbf='polynomial')
Bases:
GetCoords
Class used to generate SOAP (Smooth Overlap of Atomic Positions) descriptors for molecular geometries.
This class extends the GetCoords class, allowing it to handle molecular geometries and generate SOAP descriptors based on the atomic coordinates and species extracted from those geometries.
- atoms
- create_features() DataFrame
Generate SOAP descriptors for the provided molecular geometries.
- Returns:
Array of SOAP descriptors for each molecular geometry.
- Return type:
- soap
- species
- class ulamdyn.descriptors.ZMatrix(all_geoms=None)
Bases:
GetCoords
Class used to generate molecular descriptors using internal coordinates (Z-Matrix).
This class does not require arguments in its constructor. All quantities related to distances are given in angstrom, while the features derived from angles are provided in degrees.
In addition to the standard Z-Matrix, the class also provides a method to compute other variants of the Z-Matrix molecular descriptors:
delta Z-Matrix -> difference between the Z-Matrix representation of the current geometry in time t and the Z-Matrix of a reference geometry.
tanh Z-Matrix -> hyperbolic tangent transformation on all features of delta Z-Matrix.
sig Z-Matrix -> sigmoid transformation on all features of delta Z-Matrix.
Data attributes:
distancematrix
(numpy.ndarray): stores the full matrix of bonddistances for all geometries.
connectivity
(list): indices of connected atoms based on a distancecriterion of proximity.
angleconnectivity
(list): indices of three neighboring atoms tocompute angles.
dihedralconnectivity
(list): four indices of neighboring atoms tocalculate dihedrals.
zmat_ref_geom
(numpy.ndarray): stores the Z-matrix calculated forthe reference geometry.
- angleconnectivity
- build_descriptor(delta=False, apply_to_delta=None, save_csv=False)
Construct the standard Z-Matrix descriptor and other variants.
Note
By default, the algorithm will calculate the three main components of the Z-Matrix: bond distances, angles and dihedrals. An augmented version of the Z-Matrix descriptor can be also obtained by calculating additional distances, angles, dihedrals and/or bending angles using the methods provided in the class.
- Parameters:
delta (bool, optional) – if True, the Z_matrix feature vector of each geometry will be subtracted from the Z_Matrix of the reference geometry, defaults to False.
apply_to_delta (str, optional) – select a nonlinear function to apply as a transformation (sigmoid or hyperbolic tangent) on the delta Z-matrix, defaults to None.
save_csv (bool, optional) – if true save a single csv file named all_geoms_zmatrix.csv containing the Z-Matrix descriptors computed for all geometries available the MD trajectories., defaults to False.
- Returns:
a dataframe object with the (flattened) Z-Matrix descriptors stacked for all MD geometries.
- Return type:
- connectivity
- dihedralconnectivity
- distancematrix
- static get_angle(geom: ndarray, idx_atoms: list) float64
Calculate the angle formed by three selected atoms.
- Parameters:
geom (numpy.ndarray) – matrix of shape (natoms, 3) storing the XYZ coordinates of a single molecule.
idx_atoms (list) – a list of three indices corresponding to the atoms for which the angle will be calculated.
- Returns:
angle (in degrees) between three selected atoms.
- Return type:
numpy.float
- static get_bending(geom: ndarray, idx_atoms: list) float64
Calculate the bending angle between two planes of the molecule.
This method is particularly useful to describe large out-of-plane distortions in the molecular structure that involves more than four atoms. The bending angle is calculated by first defining two vectors, each one perpendicular to different molecular planes formed by two sets of three atoms. Then, the angle between the two vectors is obtained by calculating the inverse cosine of the scalar product between these vectors.
Note
By default, the bending angle is not used to construct the Z-Matrix descriptor. It can be used to construct an augmented version of the Z-Matrix that better captures changes in the molecular structure during the dynamics.
- Parameters:
geom (numpy.ndarray) – matrix of shape (natoms, 3) storing the XYZ coordinates of a single molecule.
idx_atoms (list) – a list of lists with three atom indices in each, used to define two molecular planes for which the angle will be calculated.
- Returns:
bending angle (in degrees) defined by six specified atoms.
- Return type:
np.float
- static get_dihedral(geom: ndarray, idx_atoms: list) float64
Calculate the dihedral angle formed by four selected atoms.
- Parameters:
geom (numpy.ndarray) – matrix of shape (natoms, 3) storing the XYZ coordinates of a single molecule.
idx_atoms (list) – a list of four indices to select the atoms for which the dihedral angle will be calculated.
- Returns:
dihedral angle (in degrees) formed by four specified atoms.
- Return type:
numpy.float
- static get_distance(geom: ndarray, idx_atoms: list) float64
Calculate the Euclidean distance between a pair of atoms.
- Parameters:
geom (numpy.ndarray) – matrix of shape (natoms, 3) storing the XYZ coordinates of a single molecule.
idx_atoms (list) – a pair of indices corresponding to the atoms for which the distance will be calculated.
- Returns:
Euclidean distance (in Angstrom) between two selected atoms.
- Return type:
numpy.float
- zmat_ref_geom
ulamdyn.kinetics module
- class ulamdyn.kinetics.GetVelocities(n_atoms=None)
Bases:
BaseClass
Class used to collect velocities from all NAMD trajectories generated by Newton-X.
- build_dataframe(save_csv=False) None
Generate a dataset (pandas.DataFrame object) containing all velocities.
The XYZ matrices with the velocities of each molecular geometry is flattened into a one dimensional vector, such that every row of the dataset corresponds to one step of the MD trajectories.
- Parameters:
save_csv (bool, optional) – if True, the dataframe will be exported in a csv format, defaults to False. The default name of the saved dataset is all_velocities.csv.
- dataset
- classmethod from_all_trajs(n_atoms=None)
Wrapper function to collect the MD velocities without instantiating the class.
- Parameters:
n_atoms (int, optional) – define the total number of atoms to consider in the velocities dataset, defaults to None. If None, the value will be taken from the geom.xyz file.
- Returns:
the full velocities data set given as a tensor of shape (n_steps, n_atoms, 3).
- Return type:
- from_dyn(outfile: str, tmax: float = 100000) ndarray
Read velocities from a single trajectory of Newton-X.
Note
The final velocities dataset will contains information only for the first set of n_atoms, as defined by the class attribute
n_atoms
.- Parameters:
- Returns:
tensor of shape (n_steps, n_atoms, 3) with all velocities data.
- Return type:
- from_h5(outfile: str) ndarray
Read velocities from the .h5 file generated by the new Newton-X.
Note
The full velocities dataset will be sliced to select only the first set of n_atoms, as defined by the class attribute
n_atoms
.- Parameters:
outfile (str) – name of the NX output file, usually dyn.h5.
- Returns:
tensor of shape (n_steps, n_atoms, 3) with all velocities data.
- Return type:
- n_atoms
- read_all_trajs()
Concatenate the XYZ velocities read from all available MD trajectories.
After running this method, the class attribute
veloc
will be updated with the full dataset of velocities read from the Newton-X output files.
- trajectories
- veloc
- class ulamdyn.kinetics.KineticEnergy(n_atoms=None)
Bases:
object
Class used to calculate the atomic or molecular components of the MD kinetic energy.
- atom_labels
- atom_mass
- build_dataframe(discretization_level: str = 'atom', n_mols_per_type: int = None, n_atoms_per_mol: int = None, save_csv: bool = False) DataFrame
Create a dataset containing the kinetic energies splitted by atoms or molecules.
- Parameters:
discretization_level (str, optional) – how to split the kinetic energy, defaults to “atom”
n_mols_per_type (int, optional) – number of molecules per subsystem, typically one solute and several solvent molecules as in QM/MM, defaults to None.
n_atoms_per_mol (int, optional) – number of atoms in each molecular subsystem, defaults to None.
save_csv (bool, optional) – export the dataset with the computed kinetic energies as a csv file, defaults to False
- Returns:
a structured dataset with the values of kinetic energies computed for all trajectories, where each row corresponds to one time step of a given trajectory and the columns refer to the discretization level (atoms or molecules).
- Return type:
- calc_per_atom() ndarray
Calculate the kinetic energy of each atom for all MD trajectories available.
- Returns:
matrix with the kinetic energy of the selected atoms arranged
in columns. :rtype: numpy.ndarray
- calc_per_molecule(n_mols_per_type: list, n_atoms_per_mol: list) ndarray
Sum the atomic contributions of kinetic energy for each molecule.
Note
In the current version, this function only supports two different types of molecules in the geom file, where the first set of atoms in the geom file should always correspond to the solute. The function will be generalized in future versions to support any number of different molecules.
- Parameters:
- Returns:
a matrix of shape (n_samples, n_mols_1 + n_mols_2) with the kinetic energy of each molecule in the system arranged in columns.
- Return type:
- energies
- n_atoms
- trajectories
- class ulamdyn.kinetics.VibrationalSpectra(n_atoms=None)
Bases:
GetVelocities
Calculate the vibrational density of states for each MD trajectory.
- build_dataframe(save_csv=False)
Create a dataset with information of vibrational spectra per MD trajectory.
- Parameters:
save_csv (bool, optional) – if True, export the dataset with the computed spectra to a csv file, defaults to False.
- Returns:
two-columns dataframe storing the vibrational frequencies (cm^-1) in the first columns and the density of states in the second one (a.u.)
- Return type:
- calc_all_trajs(mass_weighted=True)
Calculate the vibrational spectra for each MD trajectory.
- static calc_pdos(Vel: ndarray, dt: float, mass: ndarray = None) ndarray
Calculate the power spectrum of the velocity auto-correlation function.
- Parameters:
Vel (numpy.ndarray) – tensor of shape (n_steps, n_atoms, 3) with atomic velocities collected from a single MD trajectory.
dt (numpy.ndarray) – time step used to integrate the classical equations.
mass – if provided, rescale the velocities by the atomic mass of each specie; the default is None.
- Returns:
two-columns array containing the calculated frequencies (cm^-1) in the first columns and the density of states in the second one (a.u.)
- Return type:
- dataset
- n_atoms
- trajectories
- veloc
ulamdyn.nx_utils module
Module with auxiliary functions and constants used to process Newton-X data.
- ulamdyn.nx_utils.check_nx_trajs() dict
Check the status of each trajectory and set the maximum time step to read accordingly.
Note
If a two-columns file named ‘trajs_tmax.dat’ is available in the working directory, the data available for each trajectory will be collected until the tmax value defined in the file. Otherwise, the data will be read until the last step of the dynamics before any error message found in the output file.
- Returns:
A dictionary with the trajectory ID as the key and the tmax as the values.
- Return type:
- ulamdyn.nx_utils.get_labels_masses(traj_dir: str, n_atoms: int = None) Tuple[ndarray, ndarray]
Extract the atom labels and atomic masses from the NX geometry input.
- Parameters:
traj_dir (str) – Name of the TRAJ directory from which the geometry file will be read.
n_atoms (_type_, optional) – If provided, the first n_atoms of the XYZ coordinates matrix will be selected and returned, defaults to None.
- Returns:
Two separate numpy arrays containing the atom labels and the respective masses.
- Return type:
Tuple[np.ndarray, np.ndarray]
- ulamdyn.nx_utils.get_num_atoms() int
Get the number of atoms from a reference ‘geom.xyz’ file.
- Returns:
The number of atoms in a molecule or fragment of it.
- Return type:
- ulamdyn.nx_utils.get_traj_dirs() list
Generate a list of TRAJ directories to be processed.
- Returns:
A list with the name of all ‘TRAJXX’ directories (sorted in ascending order) generated by the NX simulations available in the current directory.
- Return type:
- ulamdyn.nx_utils.read_h5_nx(traj_dir: str)
Read trajectory data from h5 file generated by Newton-X new series.
- Parameters:
traj_dir – Name of the TRAJ directory from which the geometry file will be read.
- Returns:
H5py data object where trajectory data can be accessed via keywords, similarly to a dictionary.
- Return type:
h5py._hl.files.File
ulamdyn.statistics module
Module to perform statistical analysis of the MD datasets.
- ulamdyn.statistics.aggregate_data(data: DataFrame, vars_to_group: list = ['time']) DataFrame
Group data based on variables to calculate the statistical descriptors.
The statistical quantities calculated by this function are mean and median to describe the central tendency, and standard deviation to measure the variability or dispersion of the data. In addition, the skewness and kurtosis of the distribution are also computed. So each feature of the original input data will be unfolded into five new columns identified with the suffixes ‘_median’, ‘_mean’, ‘_std’, ‘_skew’, and ‘_kurt’.
- Parameters:
data (pandas.DataFrame) – input dataset containing the information extracted from all MD trajectories.
vars_to_group (list, optional) – set of variables used by the function to group the data, defaults to [“time”]
- Returns:
dataframe object with statistical description of the input data
- Return type:
- ulamdyn.statistics.bootstrap(dataframe, n_samples=None, n_repeats=1000, save_csv=False)
Estimate the mean by resampling the data with replacement.
- Parameters:
dataframe (pandas.DataFrame | modin.pandas.dataframe.DataFrame) – input dataset having the quantities extrated from all available MD trajectories; the dataset must contain the ‘TRAJ’ and ‘time’ columns
n_samples (int, optional) – number of trajectories considered in each round of the resampling, defaults to None
n_repeats (int, optional) – number of resampling to be performed, defaults to 1000
save_csv (bool, optional) – if true outputs the final bootstrapped data in a csv file, defaults to False
- Returns:
dataset containing all the bootstrapped data with estimated mean
- Return type:
- ulamdyn.statistics.calc_avg_occupations(df)
Calculate the fraction of trajectories in each state as a function of time.
Note
The fraction of trajectories (occupation) is an important quantity to assess the quality of the surface hopping (SH) simulations. It should be compared to the population of each state averaged over all trajectories, which is provided by the method
aggregate_data()
. If the ensemble of SH trajectories is statistically converged, the occupation and the average population should match.- Parameters:
df (pandas.DataFrame | modin.pandas.dataframe.DataFrame) – properties dataset with information collected from the available MD trajectories
- Returns:
a new dataset with the occupations calculated for each state
- Return type:
- ulamdyn.statistics.create_bootstrap_stats(boot_data, ci_level=95)
Generate the descriptive statistics for the bootstrapped data within a given confidence interval.
- Parameters:
boot_data (pandas.DataFrame | modin.pandas.dataframe.DataFrame) – bootstrapped dataframe generate by the
bootstrap()
functionci_level (int, optional) – Size of the confidence interval to draw when aggregating the data (by time) to estimate the statistical descriptors (median, mean, std), defaults to 95
- Returns:
dataframe grouped by the simulation time with each feature of the original dataset unfolded into three columns representing the statistical descriptors and other two columns storing the values for the lowest and highest variability as defined by the confidence interval
- Return type:
- ulamdyn.statistics.create_stats(selected_data, save_csv=False)
Create datasets with statistical summary of the requested quantities.
- Parameters:
- Returns:
one or several datasets containing the median, mean and standard deviation calculated for each feature of the input as a function of time; if the attribute is equal to ‘all’, the statistics will be calculated for the properties and descriptors (R2 and Z-Matrix) datasets.
- Return type:
dict(pandas.DataFrame) | dict(modin.pandas.dataframe.DataFrame)
- ulamdyn.statistics.stats_hopping(dataframe)
Generate a statistical summary for the hopping points.
- Parameters:
dataframe (pandas.DataFrame | modin.pandas.dataframe.DataFrame) – properties dataset with information collected from the available MD trajectories
- Returns:
a new dataset containing the total number of hops per trajectory and the minimum and maximum time in which the hops between each pair of electronic states occur.
- Return type:
Module contents
ULaMDyn - Unsupervised Learning analysis for Molecular Dynamics
ULaMDyn is a python package built on top of sklearn designed to perform data preprocessing, statistical and unsupervised learning analysis of (non-adiabatic) molecular dynamics simulations.
ULaMDyn is composed of several Python modules:
Data_Loader
Data_Writer
Statistics
Kinetics
Descriptors
Normal mode analysis
Unsup_Models
NMA
- class ulamdyn.ClusterGeoms(data, dt=None, indices=None, n_samples=None, scaler=None, random_state=51, n_cpus=-1, verbosity=0)
Bases:
Utils
Class to find groups of similar geometries in MD trajectories data
- gaussian_mixture(n_clusters=5, covariance='full', tol=0.0001, n_init=10, max_iter=500, init='k-means++', save_model=True)
Perform probabilist clustering in geometry space with GMM
- Parameters:
n_clusters (int, optional) – The number of clusters to find, which corresponds to the number of mixed gaussians. Defaults to 5.
covariance (str, optional) – String describing the type of covariance parameters to use, default is “full”. Acceptable values are “full”, “tied”, “diag”, “spherical” (equivalent to K-Means).
tol (float, optional) – Convergence criteria of the lower bound average gain, below which the EM iterations stop, defaults to 1e-4.
n_init (int, optional) – The number of initializations to perform, where best results are kept. The default is 10.
max_iter (int, optional) – The number of EM iterations to perform. Defaults to 500.
init – The method used to initialize the weights, the means and the precisions, default is “k-means++”. Acceptable strings are “k-means”, “k-means++”, “random”, or “‘random_from_data”.
- hierarchical(n_clusters=5, affinity='euclidean', connectivity=None, linkage='complete', distance_threshold=None, save_model=True)
Perform a hierarchical cluster analysis based on the agglomerative
- Parameters:
n_clusters (int, optional) – The number of clusters to find. It must be None if distance_threshold is not None, defaults to 5.
affinity (str, optional) – Metric used to compute the linkage. Can be “euclidean”, “l1”, “l2”, “manhattan”, “cosine”, or “precomputed”. If linkage is “ward”, only “euclidean” is accepted. If “precomputed”, a distance matrix (instead of a similarity matrix) is needed as input for the fit method, defaults to “euclidean”.
connectivity (array-like or callable, optional) – Connectivity matrix. Defines for each sample the neighboring samples following a given structure of the data. This can be a connectivity matrix itself or a callable that transforms the data into a connectivity matrix, such as derived from kneighbors_graph. Default is None, meaning that hierarchical clustering algorithm is unstructured.
linkage (str, optional) –
Define the linkage criterion to build the tree. It determines which distance to use between sets of observation. The algorithm will merge the pairs of cluster that minimize this criterion. The options are : + ‘ward’ -> minimizes the variance of the clusters
being merged.
- ’average’ -> uses the average of the distances of
each observation of the two sets.
- ’complete’ or ‘maximum’ -> uses the maximum distances
between all observations
of the two sets.
- ’single’ -> uses the minimum of the distances between
all observations of the two sets.
The default is “complete”.
distance_threshold (float, optional) – The linkage distance threshold above which, clusters will not be merged. If not None, n_clusters must be None and compute_full_tree must be True. Defaults to None.
save_model (bool, optional) – Store the trained parameters of the model in a binary file. Defaults to True.
- Returns:
Dataframe of shape (n_samples,) with cluster labels for each data point.
- Return type:
- kmeans(n_clusters=5, init='k-means++', n_init=100, max_iter=1000, convergence=1e-06, save_model=True)
Perform K-Means clustering in geometry space.
- Parameters:
n_clusters (int, list or str optional) – Specifies the number of clusters (and centroids) to form. Defaults to 5. If a list is provided, K-Means will run for each value in the list. If set to ‘best’, the algorithm will perform multiple runs with n_clusters ranging from 2 to 15. The best result is selected based on silhouette and Calinski-Harabasz scores.
init (str or array, optional) – Method for initialization : + ‘k-means++’ -> selects initial cluster centers for K-means clustering in a smart way to speed up convergence. + ‘random’ -> choose n_clusters observations (rows) at random from data for the initial centroids. + If an array is passed, it should be of shape (n_clusters, n_features) and gives the initial centers. Defaults to “k-means++”.
n_init (int, optional) – Number of time the k-means algorithm will be run with different centroid seeds. The final results will be the best output of n_init consecutive runs in terms of loss function, defaults to 500.
max_iter (int, optional) – Maximum number of iterations of the k-means algorithm for a single run, defaults to 1000.
convergence (float, optional) – Relative tolerance with regards to Frobenius norm of the difference in the cluster centers of two consecutive iterations to declare convergence. Defaults to 1e-06.
save_model (bool, optional) – Store the trained parameters of the model in a binary file, defaults to True.
- Returns:
Dataframe of shape (n_samples,) with cluster labels for each data point.
- Return type:
- spectral(n_clusters=5, n_components=10, n_init=100, affinity='rbf', gamma=0.01, n_neighbors=10, degree=3, coef0=1, kernel_params=None, save_model=True)
Apply clustering to a projection of the normalized Laplacian.
Note that Spectral Clustering is a highly expensive method due to the computation of the affinity matrix. Hence, this method is recommended only for small to medium size datasets (n_samples < 10000).
Note
This method is equivalent to kernel K-means (DOI: 10.1145/1014052.1014118). Spectral clustering is recommended for non-linearly separable dataset, where the individual clusters have a highly non-convex shape.
- Parameters:
n_clusters (int, optional) – The number of clusters to form which in this case corresponds to the dimension of the projection subspace. Defaults to 5. If a list is passed, spectral clustering will be run for all n_clusters in the list, whereas if the argument is equal to ‘best’, consecutive runs will be performed with n_clusters varying in the range of [2, 15]. In both cases, the final results will be the best output labels with respect to the clustering performance on the silhouette and Calinski-Harabasz scores.
n_components (int, optional) – Number of eigenvectors to use for the spectral embedding. Defaults to 10
n_init (int, optional) – Number of time the k-means algorithm will be run with different centroid seeds. The final results will be the best output of n_init consecutive runs in terms of inertia. Only used if assign_labels=’kmeans’. Defaults to 100.
affinity (str or callable, optional) –
Method used to construct the affinity matrix. The available options are : + ‘nearest_neighbors’: construct the affinity matrix
by computing a graph of nearest neighbors.
’rbf’: construct the affinity matrix using a radial basis function (RBF) kernel.
’precomputed_nearest_neighbors’: interpret X as a sparse graph of precomputed distances, and construct a binary affinity matrix from the n_neighbors nearest neighbors of each instance.
one of the kernels supported by pairwise_kernels.
The default method is “rbf”.
gamma (float, optional) – Kernel coefficient for rbf, poly, sigmoid, laplacian and chi2 kernels. Ignored for affinity=’nearest_neighbors’. Defaults to 0.01.
n_neighbors (int, optional) – Number of neighbors to use when constructing the affinity matrix using the nearest neighbors method. Ignored for affinity=’rbf’. Defaults to 20.
degree (int, optional) – Degree of the polynomial kernel. Ignored by other kernels. Defaults to 3.
coef0 (int, optional) – Zero coefficient for polynomial and sigmoid kernels. Ignored by other kernels. Defaults to 1.
kernel_params (dict or str, optional) – Parameters (keyword arguments) and values for kernel passed as callable object. Ignored by other kernels. Defaults to None.
save_model (bool, optional) – Store the trained parameters of the model in a binary file. Defaults to True.
- Returns:
Dataframe of shape (n_samples,) with cluster labels for each data point.
- Return type:
- class ulamdyn.ClusterTrajs(data, dt=None, scaler=None, random_state=42, n_cpus=-1, verbosity=0)
Bases:
Utils
Class used to find groups of similar trajectories in NAMD data.
- kmeans(n_clusters=3, metric='dtw', metric_params=None, n_init=5, max_iter=100, convergence=1e-06, save_model=True)
K-means clustering to group similar MD trajectories.
- Parameters:
n_clusters (int, optional) – Number of clusters to form, defaults to 3.
metric (str, optional) – Metric to be used for both cluster assignment and barycenter computation. Options: “euclidean”, “dtw”, “softdtw”. Defaults to “dtw”.
metric_params (dict or None, optional) – Parameter values for the chosen metric. Defaults to None.
n_init (int, optional) – Number of time the K-means algorithm will be run with different centroid seeds. The final results will be the best output of n_init consecutive runs in terms of inertia. Defaults to 5.
max_iter (int, optional) – Maximum number of iterations of the k-means algorithm for a single run. Defaults to 100.
convergence (float, optional) – Inertia variation threshold. If at some point, inertia varies less than this threshold between two consecutive iterations, the model is considered to have converged and the algorithm stops. Defaults to 1e-6.
save_model (bool, optional) – Store the trained parameters of the model in a binary file. Defaults to True.
- Returns:
Dataframe of shape (n_trajs,) with cluster labels assigned to each trajectory.
- Return type:
- transform()
Convert input data to time series format (tslearn) and apply scaler methods.
- Returns:
Three dimensional numpy array with shape (n_ts, n_steps, n_features)
- Return type:
- class ulamdyn.DimensionReduction(data, dt=None, n_samples=None, scaler=None, random_state=42, n_cpus=-1)
Bases:
Utils
Find low dimensional representation of MD trajectories data.
- isomap(n_components=2, n_neighbors=30, neighbors_algorithm='auto', metric=None, p=2, metric_params=None, calc_error=False)
Perform a nonlinear dimensionality reduction with Isometric Mapping
- Parameters:
n_components (int, optional) – Number of coordinates (features) for the low-dimensional manifold, defaults to 2.
n_neighbors (int, optional) – Number of neighbors to consider around each point, defaults to 12.
neighbors_algorithm (str, optional) – Method used for nearest neighbors search, defaults to “auto”
metric (str or callable, optional) – The metric to use when calculating distance between instances in a feature array. If metric is a string or callable, it must be one of the options allowed by sklearn.metrics.pairwise_distances for its metric parameter. If metric is “precomputed”, X is assumed to be a distance matrix and must be square. Defaults to “cosine”.
p (int, optional) – Parameter for the Minkowski metric from sklearn.metrics.pairwise pairwise_distances. When p = 1, this is equivalent to using manhattan_distance (l1), and euclidean_distance (l2) for p = 2. For arbitrary p minkowski_distance (l_p) is used. Defaults to 2.
metric_params (dict, optional) – Additional keyword arguments for the metric function. Defaults to None.
calc_error (bool, optional) – If True, the reconstruction error between the original and the projected data will be calculated defaults to False.
- Returns:
a new dataset with the transformed values where the coordinates of the low-dimensional manifold are stored in columns.
- Return type:
- kpca(n_components=2, kernel='rbf', gamma=None, degree=4, coef0=1, kernel_params=None, alpha=1.0, fit_inverse_transform=False)
Perform a nonlinear dimensionality reduction using kernel PCA.
- Parameters:
n_components (int, optional) – Number of components (features) to keep after KPCA transformation, defaults to 2.
kernel (str, optional) – Kernel function used in the transformation. The possible values are ‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’, ‘cosine’ or precomputed’, defaults to “rbf”.
gamma (float, optional) – Kernel coefficient for rbf, poly and sigmoid kernels. Ignored by other kernels. If gamma is None, then it is set to 1/n_features. Defaults to None.
degree (int, optional) – Degree of polynomial kernel. Ignored by other kernels. Defaults to 4.
coef0 (int, optional) – Independent term in poly and sigmoid kernels. Ignored by other kernels. Defaults to 1.
kernel_params (dict, optional) – Parameters (keyword arguments) and values for kernel passed as callable object. Ignored by other kernels. Defaults to None.
alpha (float, optional) – Hyperparameter of the ridge regression that learns the inverse transform (when fit_inverse_transform=True), defaults to 1.0.
fit_inverse_transform (bool, optional) – Hyperparameter of the ridge regression that learns the inverse transform (when fit_inverse_transform=True), defaults to False.
- Returns:
a new dataset with the transformed values where the selected components are stored in columns.
- Return type:
- pca(n_components=2, calc_error=False, save_errors=False)
Perform linear dimension reduction with principal component analysis
Note
By default the percentage of variance explained by each of the selected components will be printed after the PCA analysis.
- Parameters:
n_components (int, optional.) – Number of principal components to keep defaults to 2.
calc_error (bool, optional) – If True, the reconstruction error between the original and the projected data will be calculated defaults to False.
save_errors (bool, optional) – If True, save the reconstruction error calculated for each sample to a csv file, defaults to False.
- Returns:
a new dataset with the transformed values where the selected components are stored in columns.
- Return type:
- tsne(n_components=2, perplexity=40.0, learning_rate=180.0, n_iter=2000, n_iter_without_progress=400, metric='euclidean', init='pca', verbose=1, method='barnes_hut')
Perform the t-distributed Stochastic Neighbor Embedding analysis.
- Parameters:
n_components (int, optional) – Number of coordinates (features) for the low-dimensional embedding, defaults to 2.
perplexity (float, optional) – This hyperparameter is used to control the attention between local and global aspects of the data, in a certain sense, by guessing the number of close neighbors each point has. Larger datasets usually require a larger perplexity. Consider selecting a value between 5 and 50. Different values can result in significantly different results. Defaults to 40.
learning_rate (float, optional) – The learning rate for t-SNE is usually in the range [10.0, 1000.0]. If the learning rate is too high, the data may look like a ‘ball’ with any point approximately equidistant from its nearest neighbours. If the learning rate is too low, most points may look compressed in a dense cloud with few outliers. If the cost function gets stuck in a bad local minimum increasing the learning rate may help. Defaults to 200.0
n_iter (int, optional) – Maximum number of iterations for the optimization. Should be at least 250. Defaults to 2000.
n_iter_without_progress (int, optional) – Maximum number of iterations without progress before we abort the optimization, used after 250 initial iterations with early exaggeration. Note that progress is only checked every 50 iterations so this value is rounded to the next multiple of 50. Defaults to 400.
metric (str or callable, optional) – The metric to use when calculating distance between instances in a feature array. If metric is a string, it must be one of the options allowed by scipy. spatial.distance.pdist for its metric parameter, or a metric listed in pairwise.PAIRWISE_DISTANCE_FUNCTIONS. If metric is “precomputed”, X is assumed to be a distance matrix. Alternatively, if metric is a callable function, it is called on each pair of instances (rows) and the resulting value recorded. The callable should take two arrays from X as input and return a value indicating the distance between them. The default is “euclidean” which is interpreted as squared euclidean distance. Defaults to “euclidean”.
init (str, optional) – Initialization of embedding. Possible options are ‘random’, ‘pca’, and a numpy array of shape (n_samples, n_components). PCA initialization cannot be used with precomputed distances and is usually more globally stable than random initialization. Defaults to “pca”.
verbose (int, optional) – Verbosity level. Defaults to 1
method (str, optional) – By default the gradient calculation algorithm uses Barnes-Hut approximation running in O(NlogN) time. method=’exact’ will run on the slower, but exact, algorithm in O(N^2) time. The exact algorithm should be used when nearest-neighbor errors need to be better than 3%. However, the exact method cannot scale to millions of examples. Defaults to “barnes_hut”.
- Returns:
a new dataset with the transformed values where the coordinates of the low-dimensional manifold are stored in columns.
- Return type:
- class ulamdyn.GeomSampling(descriptor, n_samples, n_clusters, transform=None, use_mwc=False, feat_scaler=None)
Bases:
object
Class used to sample new molecular geometries from GMM clustering.
- gen_geometries(num_geoms, from_cluster=0, where_prop='')
- property properties_data
- class ulamdyn.Geometries(atom_labels, properties_data=None, add_properties=[])
Bases:
BaseClass
Handle and save XYZ coordinates for selected frames of MD trajectories.
- save_xyz(geoms_array: ndarray, out_name: str = 'selected_geoms.xyz') None
Save an XYZ file for a set of selected molecular geometries.
Note
The comment line of the XYZ file will contain a list of property values for each molecule.
- Parameters:
geoms_array (numpy.ndarray) – a 3D array containing the list of XYZ matrices.
out_name (str) – name of the XYZ file containing all the selected geometries, defaults to selected_geoms.xyz.
- class ulamdyn.GetCoords
Bases:
BaseClass
Read the Cartesian coordinates from Newton-X MD trajectories.
Note
In the case of NX classical series, the Cartesian XYZ coordinates can be read either from dyn.out or dyn.xyz. For NX new series, the data will be read from the h5 file available in each TRAJ directory. Repeated geometries are skipped.
It also provides a function to calculate the root-mean-squared deviation (RMSD) between each geometry read from the MD trajectories and a reference geometry. Before calculating the RMSD, the two geometries are aligned using the Kabsch algorithm (https://en.wikipedia.org/wiki/Kabsch_algorithm).
This class does not require arguments in its constructor. All the outputs generate by the class are given in angstroms.
Data attributes:
trajectories
(dict): contain indices of the trajectories availablewith the respective maximum time to read.
labels
(numpy.ndarray): stores the sequence of atom labels.eq_xyz
(numpy.ndarray): stores the XYZ matrix of the referencegeometry (geom.xyz).
xyz
(numpy.ndarray): stores the XYZ matrices of all geometries readfrom the TRAJ directories.
rmsd
(numpy.ndarray): vector of RMSD values between all geometriesand the reference one.
dataset
(pandas.dataframe): stores a dataframe of the flattenedXYZ matrices.
- align_geoms(xyz_data=None) None
Calculate the RMSD between the current and reference geometries.
Note
Before calculating the RMSD, the method uses the Kabsch algorithm to find the optimal alignment between the loaded molecular geometry for each time step t and the reference geometry.
The calculated RMSD values will be stored in the class attribute
rmsd_values
, while thexyz
attribute will be updated with the aligned geometries.
- build_dataframe() None
Create a DataFrame containing flattened XYZ coordinates from all MD trajectories.
After running this function, the class attribute
dataset
will be updated with the loaded DataFrame object.
- dataset
- eq_xyz
- from_dyn(outfile: str, tmax=100000) Tuple[ndarray, ndarray, ndarray]
Get XYZ coordinates from the dyn.out file of classical Newton-X.
- Parameters:
outfile (str) – name of the NX output file, dyn.out
- Returns:
tuple containing an array of strings defining the atom labels, a tensor of size (n_steps, n_atoms, 3) with all XYZ coordinates of a single MD trajectory, and an array with the time steps.
- Return type:
tuple in the form (numpy.ndarray, numpy.ndarray, numpy.ndarray)
- static from_h5(outfile: str) Tuple[ndarray, ndarray, ndarray]
Read the XYZ coordinates of the MD frames generated by Newton-X NS.
- Parameters:
outfile (str) – name of the NX output file, usually dyn.h5
- Returns:
tuple containing an array of strings defining the atom labels, a tensor of size (n_steps, n_atoms, 3) with all XYZ coordinates of a single MD trajectory, and an array with the time steps.
- Return type:
tuple in the form (numpy.ndarray, numpy.ndarray, numpy.ndarray)
- static from_xyz(xyzfile) Tuple[ndarray, ndarray]
Get XYZ coordinates from the dyn.xyz file of classical Newton-X.
This method has the same parameters and return type as in
from_dyn()
.
- labels
- read_all_trajs(calc_rmsd: bool = True) None
Concatenate the XYZ coordinates read from all MD trajectories.
After running this method, the class attributes
labels
andxyz
will be updated.
- read_eq_geom() None
Read the XYZ coordinates of a reference geometry.
Note
This method should be executed before calculating the RMSD with the
align_geoms()
function.A file with name geom.xyz must be provided in the working directory (TRAJECTORIES). After reading the coordinates, the method will update the class attributes
labels
andeq_xyz
.
- rmsd
- save_csv() None
Save all loaded geometries (raw format) into a csv file.
If the RMSD has been calculated, it will be included as an extra column in the XYZ coordinates data set.
The default name of the output file is all_coordinates.csv.
- traj_time
- trajectories
- xyz
- class ulamdyn.GetCouplings
Bases:
BaseClass
Class used to read the Nonadiabatic Coupling Vectors (NAC) from the MD trajectories.
This class does not require arguments in its constructor. The outputs generate by the class is given in atomic units.
Data attributes:
trajectories
(dict): contain indices of the trajectories availablewith the respective maximum time to read.
all_nacs
(dict): store the NAC matrices (per state) for allMD trajectories.
dataset
(dict): dictionary of dataframes with the flattened gradientmatrices.
- all_nacs
- build_dataframe(save_csv=False) None
Generate a dataset (pandas.DataFrame object) with all NACs.
The XYZ matrices with the nonadiabatic couplings obtained for each molecular geometry is flattened into a one dimensional vector. In this way, every row of the dataset corresponds to one step of a given MD trajectory.
- Parameters:
save_csv (bool, optional) – if True enable the output of all NAC dataframes in a csv format, defaults to False. The default name of the exported dataset is all_nacs_s[n].csv, where n is the number of the state.
- datasets
- static from_h5(outfile) dict
Read NACs from one trajectory generated by the new Newton-X.
- Parameters:
outfile (str) – name of the NX output file in the .h5 format, usually dyn.h5.
- Returns:
stacked nonadiabatic coupling vectors (NAC) for all steps of a single MD trajectory, provided as a dictionary of tensors (numpy.ndarray, one for each state) with shape (n_steps, n_atoms, 3).
- Return type:
- static from_nxlog(outfile: str, tmax=100000) dict
Read NACs from one trajectory generated by the Newton-X CS.
- trajectories
- class ulamdyn.GetGradients
Bases:
BaseClass
Class used to collect QM gradients from Newton-X MD trajectories.
This class does not require arguments in its constructor. The outputs generate by the class is given in eV/angstrom.
Data attributes:
trajectories
(dict): contain indices of the trajectories availablewith the respective maximum time to read.
all_grads
(dict): store the gradient matrices (per state) for allMD trajectories.
dataset
(dict): dictionary of dataframes with the flattened gradientmatrices.
- all_grads
- build_dataframe(save_csv=False) None
Generate a dataset (pandas.DataFrame object) with all gradients.
The XYZ matrices with the gradients of each molecular geometry is flattened into a one dimensional vector, such that every row of the dataset corresponds to one step of the MD trajectories.
- Parameters:
save_csv (bool, optional) – if True enable the output of all gradient dataframes in a csv format, defaults to False. The default name of the saved dataset is all_gradients_s[n].csv, where n is the number of the state.
- datasets
- static from_nxlog(outfile, tmax=100000) dict
Read XYZ gradients from one trajectory generated by Newton-X CS.
This method has the same parameters and return type as in
from_h5()
.
- trajectories
- class ulamdyn.GetProperties
Bases:
BaseClass
Read all properties available in the Newton-X MD trajectories.
Note
This class does not require arguments in its constructor. All the energy quantities processed by the class are transformed from Ha to eV. For the other properties, the original units used in Newton-X are kept.
Data attributes:
trajectories
(dict): trajectories ID (TRAJXX) found in the workingdirectory are stored as keys of the dictionary, and the corresponding values are the maximum simulation time to be read.
dataset
(pd.dataframe): store a dataframe with all properties.num_states
(int): keep track of the number of states considered inthe MD simulation.
nx_version
(str): identify the Newton-X version, can be either cs(classical series) or ns (new series).
- dataset
- energies()
Read / process the energy information from the en.dat (classical NX) or .h5 (new NX) file.
- Returns:
a processed dataset with the information of all trajectories stacked, and containing the following columns “TRAJ”, “time”, “State”, “Total_Energy” plus the energy gaps between the accessible states (e.g., DE12) and binary columns to identify the hopping points (e.g., Hops_S21).
- Return type:
- mcscf_coefs()
Collect the squared coefficients of the MCSCF wavefunction.
Note
This method works only for NAMD trajectories generated with Columbus.
- Returns:
a dataset with the three highest MCSCF coefficients (the three largest) for each electronic state read from the NX output per state; if the class variable
dataset
has been already updated with some properties, the population data will be merged with the existing dataset.- Return type:
- nac_norm()
Calculate the norm of the nonadiabatic coupling matrices for each pair of states.
After running this function, the class variable
dataset
will be updated with the loaded NACs norm data.- Returns:
a dataset with the Frobenius norm of the nonadiabatic coupling matrices of all NX trajectories calculated for each possible transition between states; the number of columns in the NACs norm dataset is given by n_states * (n_states - 1)/2.
- Return type:
- num_states
- nx_version
- oscillator_strength()
Collect the oscillator strength from properties file.
Note
The oscillator strength (OSS) is not always available in the output. If needed, check the Newton-Xdocumentation to see what are the required keywords and methods. The OSS data can be read from either classical or new series NX trajectories.
- Returns:
a dataset with the oscillator strength information read from all available NX trajectories with one column for each transition between states; if the class variable
dataset
has been already updated with some properties, the oscillator strength data will be merged with the existing properties dataset.- Return type:
- populations()
Read and calculate the population for each accessible state.
Note
If the MD simulations were performed with the classical Newton-X, the populations will be calculated using the wavefunction coefficients. In the new Newton-X, the populations are already calculated and available in the .h5 file.
- Returns:
a dataset with the populations obtained from all available NX trajectories with one column per state; if the class variable
dataset
has been already updated with some properties, the population data will be merged with the existing dataset.- Return type:
- property save_csv: None
Save the dataset with QM properties read from the NX trajectories.
The following properties are included in the dataset:
trajectory index;
simulation time;
total energy;
energy gaps between states (eV);
oscillator strength (if available);
states population.
norm of the nonadiabatic coupling matrices (if available);
three highest MCSCF coefficients per state (only for NX/Columbus);
- trajectories
- class ulamdyn.GetVelocities(n_atoms=None)
Bases:
BaseClass
Class used to collect velocities from all NAMD trajectories generated by Newton-X.
- build_dataframe(save_csv=False) None
Generate a dataset (pandas.DataFrame object) containing all velocities.
The XYZ matrices with the velocities of each molecular geometry is flattened into a one dimensional vector, such that every row of the dataset corresponds to one step of the MD trajectories.
- Parameters:
save_csv (bool, optional) – if True, the dataframe will be exported in a csv format, defaults to False. The default name of the saved dataset is all_velocities.csv.
- dataset
- classmethod from_all_trajs(n_atoms=None)
Wrapper function to collect the MD velocities without instantiating the class.
- Parameters:
n_atoms (int, optional) – define the total number of atoms to consider in the velocities dataset, defaults to None. If None, the value will be taken from the geom.xyz file.
- Returns:
the full velocities data set given as a tensor of shape (n_steps, n_atoms, 3).
- Return type:
- from_dyn(outfile: str, tmax: float = 100000) ndarray
Read velocities from a single trajectory of Newton-X.
Note
The final velocities dataset will contains information only for the first set of n_atoms, as defined by the class attribute
n_atoms
.- Parameters:
- Returns:
tensor of shape (n_steps, n_atoms, 3) with all velocities data.
- Return type:
- from_h5(outfile: str) ndarray
Read velocities from the .h5 file generated by the new Newton-X.
Note
The full velocities dataset will be sliced to select only the first set of n_atoms, as defined by the class attribute
n_atoms
.- Parameters:
outfile (str) – name of the NX output file, usually dyn.h5.
- Returns:
tensor of shape (n_steps, n_atoms, 3) with all velocities data.
- Return type:
- n_atoms
- read_all_trajs()
Concatenate the XYZ velocities read from all available MD trajectories.
After running this method, the class attribute
veloc
will be updated with the full dataset of velocities read from the Newton-X output files.
- trajectories
- veloc
- class ulamdyn.KineticEnergy(n_atoms=None)
Bases:
object
Class used to calculate the atomic or molecular components of the MD kinetic energy.
- atom_labels
- atom_mass
- build_dataframe(discretization_level: str = 'atom', n_mols_per_type: int = None, n_atoms_per_mol: int = None, save_csv: bool = False) DataFrame
Create a dataset containing the kinetic energies splitted by atoms or molecules.
- Parameters:
discretization_level (str, optional) – how to split the kinetic energy, defaults to “atom”
n_mols_per_type (int, optional) – number of molecules per subsystem, typically one solute and several solvent molecules as in QM/MM, defaults to None.
n_atoms_per_mol (int, optional) – number of atoms in each molecular subsystem, defaults to None.
save_csv (bool, optional) – export the dataset with the computed kinetic energies as a csv file, defaults to False
- Returns:
a structured dataset with the values of kinetic energies computed for all trajectories, where each row corresponds to one time step of a given trajectory and the columns refer to the discretization level (atoms or molecules).
- Return type:
- calc_per_atom() ndarray
Calculate the kinetic energy of each atom for all MD trajectories available.
- Returns:
matrix with the kinetic energy of the selected atoms arranged
in columns. :rtype: numpy.ndarray
- calc_per_molecule(n_mols_per_type: list, n_atoms_per_mol: list) ndarray
Sum the atomic contributions of kinetic energy for each molecule.
Note
In the current version, this function only supports two different types of molecules in the geom file, where the first set of atoms in the geom file should always correspond to the solute. The function will be generalized in future versions to support any number of different molecules.
- Parameters:
- Returns:
a matrix of shape (n_samples, n_mols_1 + n_mols_2) with the kinetic energy of each molecule in the system arranged in columns.
- Return type:
- energies
- n_atoms
- trajectories
- class ulamdyn.NormalModeAnalysis
Bases:
GetCoords
- dataset
- eq_xyz
- labels
- rmsd
- run(molden_ref: str = 'freq.molden')
Perform the Normal Mode analysis over all MD geometries.
- Parameters:
molden_ref (str, optional) – Name of molden file containing the calculated frequencies for the reference geometry, defaults to freq.molden.
- traj_time
- trajectories
- xyz
- class ulamdyn.R2(all_geoms=None, use_mwc=False)
Bases:
GetCoords
Class used to convert the XYZ coordinates of molecular geometries into R2-type of descriptors.
The R2 descriptor is defined as the (flattened) matrix of all pairwise Euclidean distances between all atoms in the molecule. Since the matrix is symmetric with respect to the interchange of atom indices (i.e., Dij = Dji) only the lower triangular portion of the R2 matrix will be outputed in the final data set.
This class also provides a method to compute other molecular descriptors derived from the R2 distance matrix. They are:
- inverse R2 -> defined as \(1/R_{ij}\), similarly to the Coulomb
matrix descriptor.
- delta R2 -> difference between the R2 vector of the current geometry in
time t and the equivalent R2 vector of a reference geometry (typically the ground-state geometry), \(R_{ij}(t) - R_{ij}(ref)\).
- RE -> R2 vector normalized relative to equilibrium geometry,
\(R_{ij}(eq)/R_{ij}(t)\). For more details, check the reference J. Chem. Phys. 146, 244108 (2017).
- apply_to_delta
- build_descriptor(variant=None, apply_to_delta=None, save_csv=False)
Generate a dataframe with R2-based descriptors for all geometries.
- Parameters:
variant (str, optional) – molecular representation derived from the R2 descriptor, defaults to None.
apply_to_delta (str, optional) – select a nonlinear function to apply as a transformation (sigmoid or hyperbolic tangent) on delta-R2 descriptors, defaults to None.
save_csv (bool, optional) – if True export the data set with all calculated descriptors in a csv format with name all_geoms_r2.csv, defaults to False.
- Returns:
a dataframe object of shape (n_samples, n_atoms * (n_atoms - 1)/2), where each row is a vector with the R2-based descriptor computed for a given molecular geometry.
- Return type:
- inverse_transform(descriptor)
Apply all transformations in reverse order to recover the original R2 vector.
- mass_weights
- norm_factor
- r2_descriptor
- r2_ref_geom
- static reconstruct_xyz(r2_vec: ndarray) ndarray
Reconstruct the XYZ coordinates from the pairwise atom distance vector.
- Parameters:
r2_vec (numpy.ndarray) – Numpy 2D array of shape (n_samples, n_atoms * (n_atoms - 1)/2) that contains stacked R2 vectors.
- Returns:
Numpy 3D array of shape (n_samples, n_atoms, 3) containing the molecular geometries transformed back into the Cartesian coordinate space.
- Return type:
- variant
- static xyz_to_distances(xyz_matrix: ndarray) ndarray
Calculate the pairwise distance matrix for a given XYZ geometry.
- Parameters:
xyz_matrix (numpy.ndarray) – Cartesian coordinates of a molecular structure given as a matrix of shape (n_atoms, 3).
- Returns:
vector of size \(n_{atoms} (n_{atoms} - 1)/2\) containing the lower triangular portion of the R2 matrix.
- Return type:
- class ulamdyn.RingParams(ring_atom_ind: list, ring_coords: ndarray = None)
Bases:
GetCoords
Class used to the Cremer-Pople parameters from ring coordinates.
- build_dataframe(save_csv=False)
Construct a data frame containing the Cremer-Pople parameters of all collected geometries.
- dataset
- eq_xyz
- get_conf_5memb(phi) str
Determine the class of a 5-membered ring deformation based on the CP parameters.
- get_conf_6memb(theta, phi) str
Determine the conformation class for a 6-membered ring based on the CP parameters.
- get_pucker_coords(coords: ndarray) ndarray
Calculate the Cremer-Pople puckering parameters for one ring.
- labels
- property ring_coords: ndarray
Filter the XYZ coordinates corresponding to the selected ring and translate the coordinates to the ring center.
- rmsd
- traj_time
- trajectories
- xyz
- class ulamdyn.SOAPDescriptor(all_geoms=None, atoms=None, r_cut=14, n_max=8, l_max=6, average='outer', rbf='polynomial')
Bases:
GetCoords
Class used to generate SOAP (Smooth Overlap of Atomic Positions) descriptors for molecular geometries.
This class extends the GetCoords class, allowing it to handle molecular geometries and generate SOAP descriptors based on the atomic coordinates and species extracted from those geometries.
- atoms
- create_features() DataFrame
Generate SOAP descriptors for the provided molecular geometries.
- Returns:
Array of SOAP descriptors for each molecular geometry.
- Return type:
- soap
- species
- class ulamdyn.VibrationalSpectra(n_atoms=None)
Bases:
GetVelocities
Calculate the vibrational density of states for each MD trajectory.
- build_dataframe(save_csv=False)
Create a dataset with information of vibrational spectra per MD trajectory.
- Parameters:
save_csv (bool, optional) – if True, export the dataset with the computed spectra to a csv file, defaults to False.
- Returns:
two-columns dataframe storing the vibrational frequencies (cm^-1) in the first columns and the density of states in the second one (a.u.)
- Return type:
- calc_all_trajs(mass_weighted=True)
Calculate the vibrational spectra for each MD trajectory.
- static calc_pdos(Vel: ndarray, dt: float, mass: ndarray = None) ndarray
Calculate the power spectrum of the velocity auto-correlation function.
- Parameters:
Vel (numpy.ndarray) – tensor of shape (n_steps, n_atoms, 3) with atomic velocities collected from a single MD trajectory.
dt (numpy.ndarray) – time step used to integrate the classical equations.
mass – if provided, rescale the velocities by the atomic mass of each specie; the default is None.
- Returns:
two-columns array containing the calculated frequencies (cm^-1) in the first columns and the density of states in the second one (a.u.)
- Return type:
- dataset
- n_atoms
- trajectories
- veloc
- class ulamdyn.ZMatrix(all_geoms=None)
Bases:
GetCoords
Class used to generate molecular descriptors using internal coordinates (Z-Matrix).
This class does not require arguments in its constructor. All quantities related to distances are given in angstrom, while the features derived from angles are provided in degrees.
In addition to the standard Z-Matrix, the class also provides a method to compute other variants of the Z-Matrix molecular descriptors:
delta Z-Matrix -> difference between the Z-Matrix representation of the current geometry in time t and the Z-Matrix of a reference geometry.
tanh Z-Matrix -> hyperbolic tangent transformation on all features of delta Z-Matrix.
sig Z-Matrix -> sigmoid transformation on all features of delta Z-Matrix.
Data attributes:
distancematrix
(numpy.ndarray): stores the full matrix of bonddistances for all geometries.
connectivity
(list): indices of connected atoms based on a distancecriterion of proximity.
angleconnectivity
(list): indices of three neighboring atoms tocompute angles.
dihedralconnectivity
(list): four indices of neighboring atoms tocalculate dihedrals.
zmat_ref_geom
(numpy.ndarray): stores the Z-matrix calculated forthe reference geometry.
- angleconnectivity
- build_descriptor(delta=False, apply_to_delta=None, save_csv=False)
Construct the standard Z-Matrix descriptor and other variants.
Note
By default, the algorithm will calculate the three main components of the Z-Matrix: bond distances, angles and dihedrals. An augmented version of the Z-Matrix descriptor can be also obtained by calculating additional distances, angles, dihedrals and/or bending angles using the methods provided in the class.
- Parameters:
delta (bool, optional) – if True, the Z_matrix feature vector of each geometry will be subtracted from the Z_Matrix of the reference geometry, defaults to False.
apply_to_delta (str, optional) – select a nonlinear function to apply as a transformation (sigmoid or hyperbolic tangent) on the delta Z-matrix, defaults to None.
save_csv (bool, optional) – if true save a single csv file named all_geoms_zmatrix.csv containing the Z-Matrix descriptors computed for all geometries available the MD trajectories., defaults to False.
- Returns:
a dataframe object with the (flattened) Z-Matrix descriptors stacked for all MD geometries.
- Return type:
- connectivity
- dihedralconnectivity
- distancematrix
- static get_angle(geom: ndarray, idx_atoms: list) float64
Calculate the angle formed by three selected atoms.
- Parameters:
geom (numpy.ndarray) – matrix of shape (natoms, 3) storing the XYZ coordinates of a single molecule.
idx_atoms (list) – a list of three indices corresponding to the atoms for which the angle will be calculated.
- Returns:
angle (in degrees) between three selected atoms.
- Return type:
numpy.float
- static get_bending(geom: ndarray, idx_atoms: list) float64
Calculate the bending angle between two planes of the molecule.
This method is particularly useful to describe large out-of-plane distortions in the molecular structure that involves more than four atoms. The bending angle is calculated by first defining two vectors, each one perpendicular to different molecular planes formed by two sets of three atoms. Then, the angle between the two vectors is obtained by calculating the inverse cosine of the scalar product between these vectors.
Note
By default, the bending angle is not used to construct the Z-Matrix descriptor. It can be used to construct an augmented version of the Z-Matrix that better captures changes in the molecular structure during the dynamics.
- Parameters:
geom (numpy.ndarray) – matrix of shape (natoms, 3) storing the XYZ coordinates of a single molecule.
idx_atoms (list) – a list of lists with three atom indices in each, used to define two molecular planes for which the angle will be calculated.
- Returns:
bending angle (in degrees) defined by six specified atoms.
- Return type:
np.float
- static get_dihedral(geom: ndarray, idx_atoms: list) float64
Calculate the dihedral angle formed by four selected atoms.
- Parameters:
geom (numpy.ndarray) – matrix of shape (natoms, 3) storing the XYZ coordinates of a single molecule.
idx_atoms (list) – a list of four indices to select the atoms for which the dihedral angle will be calculated.
- Returns:
dihedral angle (in degrees) formed by four specified atoms.
- Return type:
numpy.float
- static get_distance(geom: ndarray, idx_atoms: list) float64
Calculate the Euclidean distance between a pair of atoms.
- Parameters:
geom (numpy.ndarray) – matrix of shape (natoms, 3) storing the XYZ coordinates of a single molecule.
idx_atoms (list) – a pair of indices corresponding to the atoms for which the distance will be calculated.
- Returns:
Euclidean distance (in Angstrom) between two selected atoms.
- Return type:
numpy.float
- zmat_ref_geom