Work in progress: Current repository only contains codes and models for prediction. Full training/evaluation codes along with datasets will be released here upon publication.
CatPred predicts in vitro enzyme kinetic parameters (kcat, Km and Ki) using EC, Organism and Substrate features.
Table of contents
Installation is compatible with 3.7 <= Python <= 3.10 and PyTorch >= 1.8.0.
Install PyTorch libraries
pip install torch==1.9.0
pip install torch-scatter torch-cluster -f https://pytorch-geometric.com/whl/torch-1.9.0+cu102.html
To install torch-scatter
for other PyTorch or CUDA versions, please see the
instructions in https://github.com/rusty1s/pytorch_scatter
We need PyTorch >= 1.13 to run TorchDrug on Apple silicon. For torch-scatter
and
torch-cluster
, they can be compiled from their sources. Note TorchDrug doesn't
support mps
devices.
pip install torch==1.13.0
pip install git+https://github.com/rusty1s/pytorch_scatter.git
pip install git+https://github.com/rusty1s/pytorch_cluster.git
Clone this repo and install
git clone https://github.com/vedasheersh/CatPred.git # this repo main branch
cd CatPred
pip install .
wget https://catpred.s3.amazonaws.com/data.tar.gz
tar -xvzf data.tar.gz
Download the data folder and pre-trained models. Extract into root directory
wget https://catpred.s3.amazonaws.com/models.tar.gz
tar -xvzf models.tar.gz
Prepare an input.csv file as shown in catpred/examples/demo.csv
-
The first column should contain the EC number as per Enzyme Classification. In case of unknown EC number at a particular level, use '-' as a place holder. For example, if the last two levels are unknown then, use 1.1.1.-
-
The second column should contain the Organism name as per NCBI Taxonomy. Common names or short forms will not be processed. In case of a rare Organism or a new strain, use the NCBI Taxonomy website to find the Organism that you think is the closest match.
-
The third column should contain a SMILES string. It should be read-able by rdkit RDKit. You can use PubChem or BRENDA-Ligand or CHE-EBI to search for SMILES. Alternatively, you can use PubChem-Draw to generate SMILES string for any molecule you draw.
cd catpred
Use the python script (python run-catpred.py
):
usage: python run-catpred.py [-i] -input INPUT_CSV [-p] -parameter [PARAMETER]
The command will first featurize the input file using pre-defined EC and Taxonomy vocabulary. Then, it will add the rdkit fingerprints for SMILES and output the featurized inputs as a pandas dataframe input_feats.pkl.
The predictions will be written to a .csv file with a name INPUT_CSV_results.csv
If you find the models useful in your research, we ask that you cite the relevant paper:
@article{In-preparation,
author={Boorla, Veda Sheersh and Maranas, Costas D},
title={CatPred: Machine Learning models for in vitro enzyme kinetic parameter prediction},
year={2023},
doi={},
url={},
journal={}
}
TorchDrug is released under Apache-2.0 License. LICENSE file is in the root directory of this source tree.
CatPred makes use of the TorchDrug library. TorchDrug is a [PyTorch]-based machine learning toolbox designed for several purposes.
- You can visit the original repos for TorchDrug and TorchProtein for more info.
TorchDrug is released under Apache-2.0 License.