preparatory work for v0.12 #1527

CarloLucibello · 2021-03-07T12:18:59Z

No description provided.

DhairyaLGandhi · 2021-03-11T07:42:15Z

There seem to be some regressions; this is not a complete list by any means and is pending further looking into the ecosystem

Metalhead should have benefitted from the CUDA work but seem to have regressed in its backwards passes - this is for resnet (by about 25%)

master

julia> include("ctest.jl")
BenchmarkTools.Trial:
  memory estimate:  3.54 MiB
  allocs estimate:  82438
  --------------
  minimum time:     357.666 ms (0.00% GC)
  median time:      413.510 ms (0.00% GC)
  mean time:        401.555 ms (0.00% GC)
  maximum time:     435.641 ms (0.00% GC)
  --------------
  samples:          5
  evals/sample:     1
BenchmarkTools.Trial:
  memory estimate:  3.55 MiB
  allocs estimate:  80798
  --------------
  minimum time:     406.275 ms (0.00% GC)
  median time:      442.683 ms (0.00% GC)
  mean time:        461.172 ms (0.00% GC)
  maximum time:     517.030 ms (0.00% GC)
  --------------
  samples:          5
  evals/sample:     1
BenchmarkTools.Trial:
  memory estimate:  3.56 MiB
  allocs estimate:  81026
  --------------
  minimum time:     422.131 ms (0.00% GC)
  median time:      562.444 ms (0.00% GC)
  mean time:        629.683 ms (0.00% GC)
  maximum time:     904.475 ms (0.00% GC)
  --------------
  samples:          3
  evals/sample:     1
BenchmarkTools.Trial:
  memory estimate:  3.66 MiB
  allocs estimate:  86891
  --------------
  minimum time:     566.048 ms (2.05% GC)
  median time:      708.968 ms (1.80% GC)
  mean time:        688.524 ms (3.28% GC)
  maximum time:     770.113 ms (6.90% GC)
  --------------
  samples:          4
  evals/sample:     1

tag

julia> include("ctest.jl")
BenchmarkTools.Trial:
  memory estimate:  3.78 MiB
  allocs estimate:  88973
  --------------
  minimum time:     416.192 ms (0.00% GC)
  median time:      466.101 ms (0.00% GC)
  mean time:        456.617 ms (0.00% GC)
  maximum time:     493.553 ms (0.00% GC)
  --------------
  samples:          5
  evals/sample:     1
BenchmarkTools.Trial:
  memory estimate:  3.79 MiB
  allocs estimate:  87399
  --------------
  minimum time:     486.490 ms (0.00% GC)
  median time:      513.770 ms (0.00% GC)
  mean time:        517.863 ms (0.00% GC)
  maximum time:     557.423 ms (0.00% GC)
  --------------
  samples:          4
  evals/sample:     1
BenchmarkTools.Trial:
  memory estimate:  3.80 MiB
  allocs estimate:  87540
  --------------
  minimum time:     535.202 ms (0.00% GC)
  median time:      543.230 ms (0.00% GC)
  mean time:        549.727 ms (0.00% GC)
  maximum time:     577.248 ms (0.00% GC)
  --------------
  samples:          4
  evals/sample:     1
BenchmarkTools.Trial:
  memory estimate:  3.90 MiB
  allocs estimate:  93363
  --------------
  minimum time:     707.908 ms (1.23% GC)
  median time:      771.742 ms (1.13% GC)
  mean time:        821.113 ms (1.23% GC)
  maximum time:     983.688 ms (1.32% GC)
  --------------
  samples:          3
  evals/sample:     1

Objectdetector - used to be faster than darknet - fails to load - and is now slower after fixing
Torch.jl - uses Flux.batchnorm (same as CUDA until recently or even till today) and there's no replacement
EquivariantNetworks can't use StaticArrays in some kernels now.

The changes made to CI are appreciated, but the Manifest change can be reserved for the time being. We use it for consistency and development anyway. We want our CI runs to be meaningful (see the Group Conv for eg)

The change to tests is curious, since we shouldn't be passing tests with that. Same was the case with the normalization PR which still needs to pass tests before merging for 0.12.

CarloLucibello · 2021-03-11T10:45:22Z

Hard to pinpoint the perf regressions if you don't share something reproducible. I suspect any regressions is related to CUDA.jl, so little we can do here, but let's see when you share the script (or some reduced examples).

Objectdetector - used to be faster than darknet - fails to load - and is now slower after fixing

again, this should be pinpointed to some specific layer/function. Failure to load probably due to using stuff at the end of the deprecation cycle (e.g. losses are not exported from Flux anymore, should get them from Flux.Losses). You should open an issue/pr there.

Model-zoo vision's examples are working properly and I could see any noticeable perf regression.

Torch.jl - uses Flux.batchnorm (same as CUDA until recently or even till today) and there's no replacement

There is no such thing as Flux.batchnorm, if torch Torch.jl is using CUDA.CUDNN.batchnorm as long as it keeps its compat bound it will be fine (same as flux), it won't be removed till CUDA 3.0.0 is released.

EquivariantNetworks can't use StaticArrays in some kernels now.

You should open an issue with a MWE

The changes made to CI are appreciated, but the Manifest change can be reserved for the time being. We use it for consistency and development anyway. We want our CI runs to be meaningful (see the Group Conv for eg)

Problem is with a Manifest I cannot test with CUDA 2.4 and 2.6 respectively on julia 1.5 and 1.6.
CI runs are very meaningful, you test with the latest versions of dependencies. The largest part of julia's packages don't use a manifest I don't see how why we should do something different that has no pros and a few cons in our situation.

The change to tests is curious, since we shouldn't be passing tests with that.

why not, initW has been deprecated for init

Same was the case with the normalization PR which still needs to pass tests before merging for 0.12.

since that pr has been open for more than a month, it would be nice if you could wrap it up and avoid derailing it with questionable changes (read bugs) to the activation state and to the returned parameters. In any case, that shouldn't block v0.12 since it is an internal refactoring.

DhairyaLGandhi · 2021-03-11T16:28:46Z

The tests are pretty standard and can be found here; https://gist.github.com/DhairyaLGandhi/12d06b9389e13130cf4c03c67885be91

CUDA uses Manifests for exactly this reason as well.

darsnack · 2021-03-11T20:59:25Z

The PR seems fine to me, but let's address the regressions. Either we use ModelZooKeeper or need to put together a temporary repo with all the tests (so that we can all debug).

CarloLucibello · 2021-03-12T11:06:46Z

The tests are pretty standard and can be found here; https://gist.github.com/DhairyaLGandhi/12d06b9389e13130cf4c03c67885be91

this script cannot be run (e.g. can't install metalhead). Also, there is a subtle bug, during the benchmarks new models are created, but gradients are computed with respect to old params, so the gradient is nothing.

I fixed the script and cannot see any performance difference between master and latest release.

The PR seems fine to me, but let's address the regressions. Either we use ModelZooKeeper or need to put together a temporary repo with all the tests (so that we can all debug).

Regression should be addressed, but I cannot see any regression. If someone (@DhairyaLGandhi ?) claims to have spotted any bug or per regression, he should open a corresponding issue and provide a reproducible example to be helpful.

I'm opening a pr adding a benchmark script to a perf/ folder here, but definitely, we should think of a longer-term solution.

CarloLucibello · 2021-03-12T11:08:57Z

In any case, this PR can be merged, release is not supposed to be tagged right now

darsnack

If this isn't a PR to tag a release, then the changes are fine.

darsnack · 2021-03-12T13:55:04Z

Project.toml

@@ -1,6 +1,6 @@
 name = "Flux"
 uuid = "587475ba-b771-5e3f-ad9e-33799f191a9c"
-version = "0.12.0-dev"
+version = "0.12.0"


Do we leave this as is? Can do a commit to change it and tag that commit for release?

DhairyaLGandhi · 2021-03-12T19:25:49Z

Thanks for pointing out the bug. I still see numbers similar to what I reported earlier after fixing it. The script does require adding the appropriate bounds to Metalhead in a local clone (and possibly removing the dev from the version number in the Flux clone). The gradients wouldn't have affected the runtime anyway.

darsnack · 2021-03-12T19:39:33Z

Even without #1535, let's put these benchmarks in a repo? Can just be quick and dirty so that we can all commit/clone the same code for testing. Doesn't have to be a permanent solution, just something to make us all more comfortable about the release and on the same page.

CarloLucibello · 2021-03-12T19:51:12Z

bors r+

bors · 2021-03-12T20:03:38Z

Build succeeded:

buildkite/flux-dot-jl

1535: add vgg16 performance script r=CarloLucibello a=CarloLucibello We could start collecting a few (possibly more granular) perf scripts, as a first crude benchmarking facility. Related discussion in #1527 . On my laptop, performances are the same for Flux master and v0.11.6. ```julia # Julia 1.6-rc1, Flux v0.11.6, CUDA v2.6.1, Zygote v0.6.3, NNlib v0.7.16 CPU benchmark forward 1.752 s (915 allocations: 288.05 MiB) backward 4.166 s (20840 allocations: 1.01 GiB) forw and back 6.117 s (51836 allocations: 1.63 GiB) CUDA benchmark forward 21.032 ms (4115 allocations: 156.30 KiB) backward 139.990 ms (9218 allocations: 365.88 KiB) forw and back 168.142 ms (19244 allocations: 2.57 MiB) ``` ```julia # Julia 1.6-rc1, Flux master, CUDA v2.6.1, Zygote v0.6.3, NNlib v0.7.16 CPU benchmark forward 1.682 s (1123 allocations: 220.57 MiB) backward 4.210 s (20153 allocations: 1.08 GiB) forw and back 6.382 s (32100 allocations: 1.69 GiB) CUDA benchmark forward 21.704 ms (4052 allocations: 149.59 KiB) backward 139.710 ms (9832 allocations: 377.77 KiB) forw and back 169.053 ms (20759 allocations: 2.32 MiB) ``` ```julia ``` Co-authored-by: Carlo Lucibello <[email protected]> Co-authored-by: Carlo Lucibello <[email protected]>

DhairyaLGandhi · 2021-03-15T10:29:55Z

Flux is also not compatible with CUDA#master on 1.6

CarloLucibello · 2021-03-15T10:33:44Z

yeah this is known and nothing that concerns v0.12. CUDA's next release will be breaking. We have to register NNlibCUDA

CarloLucibello requested a review from DhairyaLGandhi March 9, 2021 03:44

CarloLucibello added 4 commits March 11, 2021 21:25

preparatory work for v0.12

fe33711

remove initW

4e6de83

disable buildkite nightly

d10950a

rebase

06bee44

CarloLucibello force-pushed the cl/v012 branch from 7fc2eaa to 06bee44 Compare March 11, 2021 20:26

CarloLucibello requested a review from darsnack March 11, 2021 20:26

CarloLucibello mentioned this pull request Mar 12, 2021

add vgg16 performance script #1535

Merged

darsnack approved these changes Mar 12, 2021

View reviewed changes

bors bot merged commit d9d03db into master Mar 12, 2021

CarloLucibello deleted the cl/v012 branch April 7, 2022 07:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

preparatory work for v0.12 #1527

preparatory work for v0.12 #1527

CarloLucibello commented Mar 7, 2021

DhairyaLGandhi commented Mar 11, 2021 •

edited

Loading

CarloLucibello commented Mar 11, 2021

DhairyaLGandhi commented Mar 11, 2021 •

edited

Loading

darsnack commented Mar 11, 2021

CarloLucibello commented Mar 12, 2021 •

edited

Loading

CarloLucibello commented Mar 12, 2021

darsnack left a comment

darsnack Mar 12, 2021

DhairyaLGandhi commented Mar 12, 2021

darsnack commented Mar 12, 2021

CarloLucibello commented Mar 12, 2021

bors bot commented Mar 12, 2021

DhairyaLGandhi commented Mar 15, 2021

CarloLucibello commented Mar 15, 2021

preparatory work for v0.12 #1527

preparatory work for v0.12 #1527

Conversation

CarloLucibello commented Mar 7, 2021

DhairyaLGandhi commented Mar 11, 2021 • edited Loading

CarloLucibello commented Mar 11, 2021

DhairyaLGandhi commented Mar 11, 2021 • edited Loading

darsnack commented Mar 11, 2021

CarloLucibello commented Mar 12, 2021 • edited Loading

CarloLucibello commented Mar 12, 2021

darsnack left a comment

Choose a reason for hiding this comment

darsnack Mar 12, 2021

Choose a reason for hiding this comment

DhairyaLGandhi commented Mar 12, 2021

darsnack commented Mar 12, 2021

CarloLucibello commented Mar 12, 2021

bors bot commented Mar 12, 2021

DhairyaLGandhi commented Mar 15, 2021

CarloLucibello commented Mar 15, 2021

DhairyaLGandhi commented Mar 11, 2021 •

edited

Loading

DhairyaLGandhi commented Mar 11, 2021 •

edited

Loading

CarloLucibello commented Mar 12, 2021 •

edited

Loading