Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproducibility of GDN #68

Open
mhkim9714 opened this issue May 12, 2023 · 3 comments
Open

Reproducibility of GDN #68

mhkim9714 opened this issue May 12, 2023 · 3 comments

Comments

@mhkim9714
Copy link

Hello,
I am very impressed by your work, and am trying to start my anomaly detection research based off of your work.

The first thing I am trying to do is to reproduce the results for the SWaT dataset given in Table 2.
I followed the exact step that you provided in scripts/readme.md for SWaT preprocessing.
After running process_swat.py, I got these statistics for the final data.

  • train.csv : (47520, 52)
  • test.csv : (44991, 52)

I noticed that it is slightly different from the data statistics given in Table 1. (5 extra data points exist in my processed data)

After creating train.csv, test.csv, and list.txt, I compared the created files with demo data (swat_train_demo.csv, swat_test_demo.csv) given in https://drive.google.com/drive/folders/1_4TlatKh-f7QhstaaY7YTSCs8D4ywbWc?usp=sharing.
However, the first 999 rows of the data didn't match.

Finally, I tried to run your code with the same seed and data multiple times to see if the performance varies between each run. Unfortunately, fixing the seed didn't really work because the performance varied so much between each run. (For your understanding, I used the hyperparameter settings from #4) +) I also tried to run the code under cpu environment, but the results are still non-reproducible.

(1)
F1 score: 0.8163308589607635
precision: 0.9778963414634146
recall: 0.7007099945385036
(2)
F1 score: 0.7394631639063391
precision: 0.9926402943882244
recall: 0.5892954669579464
(3)
F1 score: 0.8220572640509013
precision: 0.9845020325203252
recall: 0.7054432914618606
(4)
F1 score: 0.8120639690887624
precision: 0.9895370128171593
recall: 0.6886947023484434

How did you evaluate your model when reporting to the paper? Have you come across this problem before?

My question can be arranged as follows.

  • Why does the difference in data statistics occur?
  • Why following the exact preprocessing step results in different data from the given demo data?
  • Why does fixing the seed not work in GDN? Is it something related to the atomic operations(non-deterministic operations) included in torch_scatter and torch_sparse?

The same thing happened for WADI as well.

  • The data statistics are different.

    • train.csv : (102697,128)
    • test.csv : (17280, 128)
  • The processed data and the demo data do not match.

  • The code is not reproducible with a fixed seed for WADI dataset as well.
    image

  • The results are nowhere near the reported results in the paper.


Has anyone been succesful at reproducing the results for SWaT and WADI?

@hamiid01
Copy link

would you be willing to share the code with me? i could not make gdn work with new versions of torch_geometric

@DavidDong004
Copy link

I tried to reproduce the results of the WADI and SWAT datasets on my computer, but the results are much worse than the original and the results you got. If it is convenient for you, could you please send me a copy of the code according to. /scripts/readme.md file to me? Thank you very much. My email address is [email protected].

@KeepMovingXX
Copy link

I meet the same question, did you solve it? The results are different with the same setting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants