ModelX: A toolsuite for molecular modeling


ModelX is a toolsuite for biomolecular modeling, it includes the version 1 of the double-stranded DNA(dsDNA)-protein docking algorithm PADA1[1] (Protein Assisted Dna Assembly version 1) that has been proben to be useful predicting dsDPi interactions using naked protein structures (in PDB format) as input. It uses fragment libraries as 3 dimensional puzzle pieces trying to recompose interaction conformational spaces. PADA1 is part of the ModelX modeling tool-suite based on usage of fragment libraries as building blocks developed to predict or reconstruct biomolecules and its interactions. The design of PADA1 relies on protein (pepX) and dsDNA (dnaX) fragments pairs (intX) standing for an interaction. PADA1 uses a training set of 2103 high quality DNA-Protein complexes extracted from the Protein Data Bank (PDB). It includes a fast statistical force field computed from the distances found in the training dataset (resolution smaller than 2.5Å) to quickly  evaluate and filter the 3D docking models.


Out of 212 DNA-Protein structures (validation dataset) we predicted the DNA binding region with less than 1.8Å RMSD per residue in 209 cases (98%) and for the 3 remaining ones we also found the binding region for 2 of them using PADA1 exhaustive mode. The returned structural templates can be used by the PADA1 force field (low resolution) or by the protein design software FoldX (high resolution) to scan random DNA sequences.  We show that the quality of the docked templates allows in the 80% of the cases to identify also the crystallized DNA molecule sequence. Also we illustrate how we could reconstitute dsDNA conformational changes upon protein mutagenesis using a meganuclease and its variants. Thus PADA1 can be used to discover DNA binding regions, dock dsDNA molecules, generate conformational diversity, and identify in combination with FoldX[2] the binding sequences.


Density maps and roc curves: (a) Roc curves for all predicted fragments for the three validation subsets and all together considering and RMSD threshold per residue < 1.8Å for 1 (pink), 2 (purple), and 3 (blue) mismatches. (b) Density maps for the roc spaces corresponding to the same subsets, x-axis represents RMSD per residue and y-axis binding energy for 1 mismatch. (c) histogram with the frequency of cases for a given area under the curve or TP/FN rate for 1 (pink), 2 (purple), and 3 (blue) mismatches.