Skip to content

Machine Learning models for in vitro enzyme kinetic parameter prediction

License

Notifications You must be signed in to change notification settings

Vedasheersh/CatPred

 
 

Repository files navigation

CatPred: Machine Learning models for in vitro enzyme kinetic parameter prediction

Work in progress: Current repository only contains codes and models for prediction. Full training/evaluation codes along with datasets will be released here upon publication.

CatPred predicts in vitro enzyme kinetic parameters (kcat, Km and Ki) using EC, Organism and Substrate features.

Table of contents

Installing pre-requisites

Installation is compatible with 3.7 <= Python <= 3.10 and PyTorch >= 1.8.0.

Install PyTorch libraries

From Pip

pip install torch==1.9.0
pip install torch-scatter torch-cluster -f https://pytorch-geometric.com/whl/torch-1.9.0+cu102.html

To install torch-scatter for other PyTorch or CUDA versions, please see the instructions in https://github.com/rusty1s/pytorch_scatter

Apple Silicon (M1/M2 Chips)

We need PyTorch >= 1.13 to run TorchDrug on Apple silicon. For torch-scatter and torch-cluster, they can be compiled from their sources. Note TorchDrug doesn't support mps devices.

pip install torch==1.13.0
pip install git+https://github.com/rusty1s/pytorch_scatter.git
pip install git+https://github.com/rusty1s/pytorch_cluster.git

Now install CatPred

Clone this repo and install

git clone https://github.com/vedasheersh/CatPred.git  # this repo main branch
cd CatPred
pip install .
wget https://catpred.s3.amazonaws.com/data.tar.gz
tar -xvzf data.tar.gz

Download the data folder and pre-trained models. Extract into root directory

wget https://catpred.s3.amazonaws.com/models.tar.gz
tar -xvzf models.tar.gz

Usage

Input preparation

Prepare an input.csv file as shown in catpred/examples/demo.csv

  1. The first column should contain the EC number as per Enzyme Classification. In case of unknown EC number at a particular level, use '-' as a place holder. For example, if the last two levels are unknown then, use 1.1.1.-

  2. The second column should contain the Organism name as per NCBI Taxonomy. Common names or short forms will not be processed. In case of a rare Organism or a new strain, use the NCBI Taxonomy website to find the Organism that you think is the closest match.

  3. The third column should contain a SMILES string. It should be read-able by rdkit RDKit. You can use PubChem or BRENDA-Ligand or CHE-EBI to search for SMILES. Alternatively, you can use PubChem-Draw to generate SMILES string for any molecule you draw.

Making predictions

cd catpred

Use the python script (python run-catpred.py):

usage: python run-catpred.py [-i] -input INPUT_CSV [-p] -parameter [PARAMETER]

The command will first featurize the input file using pre-defined EC and Taxonomy vocabulary. Then, it will add the rdkit fingerprints for SMILES and output the featurized inputs as a pandas dataframe input_feats.pkl.

The predictions will be written to a .csv file with a name INPUT_CSV_results.csv

Citations

If you find the models useful in your research, we ask that you cite the relevant paper:

@article{In-preparation,
  author={Boorla, Veda Sheersh and Maranas, Costas D},
  title={CatPred: Machine Learning models for in vitro enzyme kinetic parameter prediction},
  year={2023},
  doi={},
  url={},
  journal={}
}

License

TorchDrug is released under Apache-2.0 License. LICENSE file is in the root directory of this source tree.

Acknowledgements

CatPred makes use of the TorchDrug library. TorchDrug is a [PyTorch]-based machine learning toolbox designed for several purposes.

  • You can visit the original repos for TorchDrug and TorchProtein for more info.

![TorchDrug] ![TorchProtein]

TorchDrug is released under Apache-2.0 License.

About

Machine Learning models for in vitro enzyme kinetic parameter prediction

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 89.0%
  • C++ 5.6%
  • Cuda 5.1%
  • Other 0.3%