Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproducibility and reliability of ECFP descriptors #136

Open
subercui opened this issue Apr 7, 2023 · 3 comments
Open

Reproducibility and reliability of ECFP descriptors #136

subercui opened this issue Apr 7, 2023 · 3 comments

Comments

@subercui
Copy link

subercui commented Apr 7, 2023

Hi, when I use the same model on the same molecule but run multiple times. I have different results. Please see the following:

Code snippets:

smiles_ = config["highlight_smiles"]
space = exmol.sample_space(smiles_, model_pred, batched=True, num_samples=1000)
exmol.lime_explain(space, descriptor_type="ECFP")
svg = exmol.plot_descriptors(space, return_svg=True)
skunk.display(svg)

Results of three runs:
image
image
image

I think this may be related to the randomness of the space, and setting a random seed somewhere can increase reproducibility? Meanwhile, I think the concern is more related to how I interpret the results? Is there a way to make it more reliable?

@whitead
Copy link
Contributor

whitead commented Apr 7, 2023

Great question @subercui! This is something on our list of things to explore. @geemi725 - this is an important point. Can we explore this a bit?

@subercui
Copy link
Author

subercui commented Apr 7, 2023

Thanks. I wonder what is the cause of the randomness and is there a way to relieve it to some degree? I tried increasing num_samples up to 10000. It doesn't seem to help

@whitead
Copy link
Contributor

whitead commented Apr 7, 2023

@subercui ECFP gives poor correlations on local vs global explanations often (depends on system) and those poor correlations make the p-values non-robust. MACCS is often better, or custom descriptors for your application. We're working on improving this, but it typically cannot be addressed by sample space. It's more a function of the fragments not fitting well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants