Reproducibility and reliability of ECFP descriptors #136

subercui · 2023-04-07T16:09:11Z

Hi, when I use the same model on the same molecule but run multiple times. I have different results. Please see the following:

Code snippets:

smiles_ = config["highlight_smiles"]
space = exmol.sample_space(smiles_, model_pred, batched=True, num_samples=1000)
exmol.lime_explain(space, descriptor_type="ECFP")
svg = exmol.plot_descriptors(space, return_svg=True)
skunk.display(svg)

Results of three runs:

I think this may be related to the randomness of the space, and setting a random seed somewhere can increase reproducibility? Meanwhile, I think the concern is more related to how I interpret the results? Is there a way to make it more reliable?

The text was updated successfully, but these errors were encountered:

whitead · 2023-04-07T19:27:41Z

Great question @subercui! This is something on our list of things to explore. @geemi725 - this is an important point. Can we explore this a bit?

subercui · 2023-04-07T19:38:14Z

Thanks. I wonder what is the cause of the randomness and is there a way to relieve it to some degree? I tried increasing num_samples up to 10000. It doesn't seem to help

whitead · 2023-04-07T19:48:04Z

@subercui ECFP gives poor correlations on local vs global explanations often (depends on system) and those poor correlations make the p-values non-robust. MACCS is often better, or custom descriptors for your application. We're working on improving this, but it typically cannot be addressed by sample space. It's more a function of the fragments not fitting well.

hgandhi2411 mentioned this issue Jun 2, 2024

BBB example for LIME #142

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reproducibility and reliability of ECFP descriptors #136

Reproducibility and reliability of ECFP descriptors #136

subercui commented Apr 7, 2023

whitead commented Apr 7, 2023

subercui commented Apr 7, 2023

whitead commented Apr 7, 2023

Reproducibility and reliability of ECFP descriptors #136

Reproducibility and reliability of ECFP descriptors #136

Comments

subercui commented Apr 7, 2023

whitead commented Apr 7, 2023

subercui commented Apr 7, 2023

whitead commented Apr 7, 2023