Drugs discovery can benefit greatly from robust high-throughput Molecular Dynamics (MD) simulations. Classical MD simulations heavily rely on the force field and its parameterization. The parameters of force fields are usually derived from quantum mechanical computations, which get tabulated and assigned based on predefined atom types. While this approach has been used productively for a broad spectrum of standard biomolecules (e.g. proteins, lipids), its application to arbitrary small drug molecules is still obstructed due to the vast number of possible atomic interactions and orientations within an arbitrary molecule in the estimated 1060 space of drug-like molecules. Therefore, drug design applications often require designing a complicated set of rules for assigning atom types and carrying out computationally costly quantum calculations to produce an accurate set of parameters for the force field and running MD simulations.
One of the prominent drawbacks of the whole methodology based on atom types and tabulated force field parameters is that it purposely maps atomic chemical properties from a continuous into discrete space. This mapping is solely based on pre-defined rules (often ambiguous and sometimes contradictory), yet it introduces a substantial human bias to the approach.
With this work, we demonstrate that the artificial reduction into a space of atom types to obtain force field parameters is not necessary. The discrete atom types can be bypassed by using continuous atomic embeddings directly mapped into continuous force field parameters. To achieve this, we built the model based on Graph Neural Networks with Message Passing (MP-GNN) trained to predict atom types within a molecule. The trained MP-GNN without classification layer was utilized as embeddings for atoms and used further to predict parameters of the force fields such as bonds length and bonds angle. The model was trained on ~1M crystal structures of small molecules from The Cambridge Structural Database (www.ccdc.cam.ac.uk). We achieved a prediction accuracy of ~0.02Å mean average error (MAE) for bonds length and ~1.65° MAE for bond angles which is beyond the limit of experimental resolution. Moreover, we analyzed the atomic embeddings using unsupervised clustering approaches and found that the embeddings pick up well the features of the atomic environment and can resolve ambiguous situations when rule-based atom typing and tabulated parameters assignment usually fail.
We believe that this work is a first step towards changing the paradigm of force-field-based MD simulations by avoiding an artificial discretization of atomic features. Moreover, it will help significantly reduce computational costs associated with the calculations of parameters for drug molecules.