-
-
Notifications
You must be signed in to change notification settings - Fork 603
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
preparatory work for v0.12 #1527
Conversation
There seem to be some regressions; this is not a complete list by any means and is pending further looking into the ecosystem Metalhead should have benefitted from the CUDA work but seem to have regressed in its backwards passes - this is for resnet (by about 25%) master
tag
Objectdetector - used to be faster than darknet - fails to load - and is now slower after fixing The changes made to CI are appreciated, but the Manifest change can be reserved for the time being. We use it for consistency and development anyway. We want our CI runs to be meaningful (see the Group Conv for eg) The change to tests is curious, since we shouldn't be passing tests with that. Same was the case with the normalization PR which still needs to pass tests before merging for 0.12. |
Hard to pinpoint the perf regressions if you don't share something reproducible. I suspect any regressions is related to CUDA.jl, so little we can do here, but let's see when you share the script (or some reduced examples).
again, this should be pinpointed to some specific layer/function. Failure to load probably due to using stuff at the end of the deprecation cycle (e.g. losses are not exported from Flux anymore, should get them from Flux.Losses). You should open an issue/pr there. Model-zoo vision's examples are working properly and I could see any noticeable perf regression.
There is no such thing as Flux.batchnorm, if torch Torch.jl is using CUDA.CUDNN.batchnorm as long as it keeps its compat bound it will be fine (same as flux), it won't be removed till CUDA 3.0.0 is released.
You should open an issue with a MWE
Problem is with a Manifest I cannot test with CUDA 2.4 and 2.6 respectively on julia 1.5 and 1.6.
why not,
since that pr has been open for more than a month, it would be nice if you could wrap it up and avoid derailing it with questionable changes (read bugs) to the activation state and to the returned parameters. In any case, that shouldn't block v0.12 since it is an internal refactoring. |
The tests are pretty standard and can be found here; https://gist.github.com/DhairyaLGandhi/12d06b9389e13130cf4c03c67885be91 CUDA uses Manifests for exactly this reason as well. |
The PR seems fine to me, but let's address the regressions. Either we use |
this script cannot be run (e.g. can't install metalhead). Also, there is a subtle bug, during the benchmarks new models are created, but gradients are computed with respect to old params, so the gradient is nothing. I fixed the script and cannot see any performance difference between master and latest release.
Regression should be addressed, but I cannot see any regression. If someone (@DhairyaLGandhi ?) claims to have spotted any bug or per regression, he should open a corresponding issue and provide a reproducible example to be helpful. I'm opening a pr adding a benchmark script to a perf/ folder here, but definitely, we should think of a longer-term solution. |
In any case, this PR can be merged, release is not supposed to be tagged right now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this isn't a PR to tag a release, then the changes are fine.
@@ -1,6 +1,6 @@ | |||
name = "Flux" | |||
uuid = "587475ba-b771-5e3f-ad9e-33799f191a9c" | |||
version = "0.12.0-dev" | |||
version = "0.12.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we leave this as is? Can do a commit to change it and tag that commit for release?
Thanks for pointing out the bug. I still see numbers similar to what I reported earlier after fixing it. The script does require adding the appropriate bounds to Metalhead in a local clone (and possibly removing the dev from the version number in the Flux clone). The gradients wouldn't have affected the runtime anyway. |
Even without #1535, let's put these benchmarks in a repo? Can just be quick and dirty so that we can all commit/clone the same code for testing. Doesn't have to be a permanent solution, just something to make us all more comfortable about the release and on the same page. |
bors r+ |
Build succeeded: |
1535: add vgg16 performance script r=CarloLucibello a=CarloLucibello We could start collecting a few (possibly more granular) perf scripts, as a first crude benchmarking facility. Related discussion in #1527 . On my laptop, performances are the same for Flux master and v0.11.6. ```julia # Julia 1.6-rc1, Flux v0.11.6, CUDA v2.6.1, Zygote v0.6.3, NNlib v0.7.16 CPU benchmark forward 1.752 s (915 allocations: 288.05 MiB) backward 4.166 s (20840 allocations: 1.01 GiB) forw and back 6.117 s (51836 allocations: 1.63 GiB) CUDA benchmark forward 21.032 ms (4115 allocations: 156.30 KiB) backward 139.990 ms (9218 allocations: 365.88 KiB) forw and back 168.142 ms (19244 allocations: 2.57 MiB) ``` ```julia # Julia 1.6-rc1, Flux master, CUDA v2.6.1, Zygote v0.6.3, NNlib v0.7.16 CPU benchmark forward 1.682 s (1123 allocations: 220.57 MiB) backward 4.210 s (20153 allocations: 1.08 GiB) forw and back 6.382 s (32100 allocations: 1.69 GiB) CUDA benchmark forward 21.704 ms (4052 allocations: 149.59 KiB) backward 139.710 ms (9832 allocations: 377.77 KiB) forw and back 169.053 ms (20759 allocations: 2.32 MiB) ``` ```julia ``` Co-authored-by: Carlo Lucibello <[email protected]> Co-authored-by: Carlo Lucibello <[email protected]>
Flux is also not compatible with CUDA#master on 1.6 |
yeah this is known and nothing that concerns v0.12. CUDA's next release will be breaking. We have to register NNlibCUDA |
No description provided.