Added Bilinear layer #1009

bhvieira · 2020-01-29T15:21:12Z

A basic implementation inspired on https://pytorch.org/docs/stable/nn.html#bilinear

I haven't exported it, because I think this layer is a bit more esoteric compared with others.

It basically computes interactions between two sets of inputs.

I thought about augmenting it to also include the non-interaction terms (this can easily be done, eg. augmenting the data with a row of ones) but for now it simply mirrors PyTorch's one.

I had to use splatting vcat(x...) and hcat(x...) in the forward pass. I wanted to avoid it, but with reduce I couldn't get gradients. But I think this can be improved.

mcabbott · 2020-01-31T14:08:06Z

Seems like a reason to want FluxML/NNlib.jl#100

But even without that, I think it's not hard to avoid the splats, and go almost 100x faster. Some scribbles here: https://gist.github.com/mcabbott/29cc74f287a95724d6f561f4ed285624

bhvieira · 2020-01-31T14:50:04Z

But even without that, I think it's not hard to avoid the splats, and go almost 100x faster. Some scribbles here: https://gist.github.com/mcabbott/29cc74f287a95724d6f561f4ed285624

Cool stuff, didn't know about @einsum (as a physicist, I'm specially pleased with it!), really concise. Feel free to push your code to my branch!

mcabbott · 2020-02-02T15:24:57Z

I updated the gist, now OMEinsum's @ein is the fastest way. (Curiously pytorch's layer is quite a bit slower, and using their einsum brings it down to only twice the time.) And I believe it will work on GPUs too, which the others may not? Haven't tried as mine is broken.

However I'm not sure that Flux wants to depend on that package, so I'm not so sure what the best answer is.

bhvieira · 2020-02-02T19:53:24Z

If we can't add that dependency we can just fallback to your previous implementation, thanks for the commit @mcabbott!

src/layers/basic.jl

bhvieira · 2020-02-02T23:04:49Z

@mcabbott where does eachcol come from?

bhvieira · 2020-02-02T23:05:56Z

Oh, is it DataFrames.jl? I think we could define our own here, DataFrames.jl would be a big dependency to include.

Edi: It's Base, but I'm on Julia 1.0 right now, that's why I can't see it.

src/layers/basic.jl

bhvieira · 2020-02-02T23:28:46Z

No, it's in Base, but perhaps only since 1.1? It is one line though.

Yeah, I tend to stick to the LTS versions, I'll check it

bhvieira · 2020-02-03T00:20:14Z

The only error remaining is about reducehcat, probably about it's adjoint as it appears in the gradient call in the last test.

I went ahead and removed its type annotations, but it'd be better to put other, suitable ones in place.

DhairyaLGandhi · 2020-02-07T11:09:43Z

eachcol is think is Julia v1.1+ only, so will fail on earlier versions

bhvieira · 2020-02-10T17:40:14Z

I got a new implementation working now though, using Zygote.Buffer. It's faster (edit: I'm not that sure it's faster now, but I liked the idea of reusing Zygote machinery when possible nonetheless) than the previous one, and now the code uses it:

#current
@btime b($x...); #  48.201 μs (609 allocations: 103.31 KiB)
@btime gradient(() -> sum(abs2.(b($x...))), params(b)); #  11.179 ms (62819 allocations: 8.26 MiB)

#previous
@btime b($x...); #  53.300 μs (1022 allocations: 69.66 KiB)
@btime gradient(() -> sum(abs2.(b($x...))), params(b)); #  11.262 ms (62819 allocations: 8.26 MiB)

arnavs · 2020-02-26T22:05:06Z

Actually, I"m having trouble Chaining this to other layers.

I think you might need to add a multi-arg chain, e.g.

Flux.applychain(fs::Tuple, x, y) = Flux.applychain(Base.tail(fs), first(fs)(x, y))
(c::Chain)(x, y) = Flux.applychain(c.layers, x, y)

or something.

I see

MethodError: no method matching (::Chain{Tuple{Bilinear{Array{Float32,3},Array{Float64,1},typeof(tanh)},Dense{typeof(tanh),Array{Float32,2},Array{Float32,1}},Dense{typeof(identity),Array{Float32,2},Array{Float32,1}}}})(::Array{Float64,1}, ::Array{Float64,1})
Closest candidates are:
  Any(::Any) at /Users/arnavsood/.julia/packages/Flux/2i5P1/src/layers/basic.jl:32

Stacktrace:
 [1] top-level scope at In[99]:1

bhvieira · 2020-02-27T15:03:35Z

@dhairyagandhi96 can this one be merged?

arnavs · 2020-02-27T16:34:43Z

@bhvieira Any reason to not add the chain methods? They were what I needed to get this to work.

bhvieira · 2020-02-27T16:41:11Z

@arnavs I think you deleted a comment or something? Didn't see your suggestion for some reason, I might look into it, but could do it in another PR as well

arnavs · 2020-02-27T17:20:54Z

Yeah, I'd made an earlier comment just asking if this was ready to merge. And then a follow up with the bug report.

Thanks for looking into it. Basically we just need chains to act on two arguments, otherwise you can't use a bilinear layer as the first in a chain. So those two lines work for me, but perhaps there are better ways.

bhvieira · 2020-03-01T15:49:06Z

I fixed that issue @arnavs without touching Chain, see if it works for you now 🙂

mcabbott · 2020-03-02T09:07:53Z

Note that batched_mul is now merged, FluxML/NNlib.jl#100, and has a gradient FluxML/Zygote.jl#531. Not yet hooked up for cuarrays, but will surely be. I think this PR ought to use that, instead of hacking its own version. Not sure how you are timing things, but if my updated gist is correct, then using this is about 1000 times faster.

bhvieira · 2020-03-02T15:54:21Z

@mcabbott Gosh this never stops haha. It's cool that we can rely on batched_mul, but I wouldn't call using Buffer 'hacking' by any means. Anyways, can you open a PR against my branch again? Checks are failing because I probably did something wrong, so I'll look into that when I have the time later today.

mcabbott · 2020-03-02T20:39:20Z

Sorry, no insult intended, if that came off wrong, I'm guilty of earlier hacks. But this is a common operation which, like *, should ideally be outsourced to the professionals. And now at last we can easily do so.

function (a::Bilinear)(x::AbstractMatrix, y::AbstractMatrix)
    W, b, σ = a.W, a.b, a.σ

    d_z, d_x, d_y = size(W)
    d_x == size(x,1) && d_y == size(y,1) || throw(DimensionMismatch("number of rows in data must match W"))
    size(x,2) == size(y,2) || throw(DimensionMismatch("data inputs must agree on number of columns"))

    # @einsum Wy[o,i,s] := W[o,i,j] * y[j,s]
    Wy = reshape(reshape(W, (:, d_y)) * y, (d_z, d_x, :))

    # @einsum Z[o,s] := Wy[o,i,s] * x[i,s]
    Wyx = batched_mul(Wy, reshape(x, (d_x, 1, :)))
    Z = reshape(Wyx, (d_z, :))

    # @einsum out[o,s] := σ(Z[o,i] + b[o])
    σ.(Z .+ b)
end

src/layers/basic.jl

bhvieira · 2020-03-04T19:29:33Z

With the timely PRs by @mcabbott, I think we are set here and the functionality is better than ever. Is there anything else you think we should do here @dhairyagandhi96?

bhvieira · 2020-03-04T19:41:23Z

Btw, should it be exported? Similarly "uncommon" functionalities aren't exported, so I did not include it, but I can add it you deem it useful.

src/layers/basic.jl

CarloLucibello · 2020-03-07T10:15:29Z

looks good! I would leave it unexported

test/cuda/layers.jl

bhvieira · 2021-02-07T20:09:47Z

Would the @test_nowarn not return gs_gpu to the local scope @CarloLucibello?

CarloLucibello · 2021-02-07T20:13:07Z

could be. I didn't even know it existed though. I'll just remove the test

test/cuda/layers.jl

CarloLucibello · 2021-02-08T21:08:28Z

I really hope this goes green, this commit suggestion thing is becoming painful 😅

CarloLucibello · 2021-02-09T07:10:30Z

victory!

bors r+

bhvieira · 2021-02-09T13:55:33Z

@CarloLucibello thanks for the efforts haha. I had no idea a simple equality test between gpu and cpu would take so much. Are gpu gradients stored as gpu arrays? Perhaps if we moved it back to the cpu it would've worked.

CarloLucibello · 2021-02-10T07:46:50Z

bors r+

CarloLucibello · 2021-02-10T22:38:57Z

bors r-

CarloLucibello · 2021-02-10T22:39:06Z

bors r+

CarloLucibello · 2021-02-10T22:50:56Z

@DhairyaLGandhi maybe you should just merge manually here

CarloLucibello · 2021-02-11T12:45:23Z

bors r+

1009: Added Bilinear layer r=CarloLucibello a=bhvieira A basic implementation inspired on https://pytorch.org/docs/stable/nn.html#bilinear I haven't exported it, because I think this layer is a bit more esoteric compared with others. It basically computes interactions between two sets of inputs. I thought about augmenting it to also include the non-interaction terms (this can easily be done, eg. augmenting the data with a row of ones) but for now it simply mirrors PyTorch's one. I had to use splatting `vcat(x...)` and `hcat(x...)` in the forward pass. I wanted to avoid it, but with `reduce` I couldn't get gradients. But I think this can be improved. Co-authored-by: Bruno Hebling Vieira <[email protected]> Co-authored-by: Michael Abbott <[email protected]>

bors · 2021-02-11T12:58:00Z

This PR was included in a batch that successfully built, but then failed to merge into master (it was a non-fast-forward update). It will be automatically retried.

bhvieira changed the title ~~Added Billinear layer~~ Added Bilinear layer Jan 31, 2020

mcabbott mentioned this pull request Feb 2, 2020

faster Bilinear implementation bhvieira/Flux.jl#1

Merged

mcabbott reviewed Feb 2, 2020

View reviewed changes

src/layers/basic.jl Outdated Show resolved Hide resolved

mcabbott reviewed Feb 2, 2020

View reviewed changes

src/layers/basic.jl Outdated Show resolved Hide resolved

bhvieira requested a review from DhairyaLGandhi February 10, 2020 17:56

mcabbott reviewed Mar 2, 2020

View reviewed changes

src/layers/basic.jl Outdated Show resolved Hide resolved

mcabbott mentioned this pull request Mar 2, 2020

Use batched_mul bhvieira/Flux.jl#3

Merged

CarloLucibello reviewed Mar 7, 2020

View reviewed changes

src/layers/basic.jl Outdated Show resolved Hide resolved

src/layers/basic.jl Outdated Show resolved Hide resolved

Update test/cuda/layers.jl

82d61a0

CarloLucibello reviewed Feb 7, 2021

View reviewed changes

test/cuda/layers.jl Outdated Show resolved Hide resolved

CarloLucibello and others added 2 commits February 7, 2021 20:33

Update test/cuda/layers.jl

cde19ed

Missing parentheses in cuda test

9eac5b4

CarloLucibello reviewed Feb 7, 2021

View reviewed changes

test/cuda/layers.jl Outdated Show resolved Hide resolved

Update test/cuda/layers.jl

ca283f7

CarloLucibello reviewed Feb 8, 2021

View reviewed changes

test/cuda/layers.jl Outdated Show resolved Hide resolved

Update test/cuda/layers.jl

793b92c

CarloLucibello previously approved these changes Feb 9, 2021

View reviewed changes

CarloLucibello closed this Feb 10, 2021

CarloLucibello reopened this Feb 10, 2021

CarloLucibello closed this Feb 11, 2021

CarloLucibello reopened this Feb 11, 2021

Merge branch 'master' into billinear

f4a60c7

bhvieira dismissed CarloLucibello’s stale review via f4a60c7 February 11, 2021 12:33

CarloLucibello approved these changes Feb 11, 2021

View reviewed changes

DhairyaLGandhi approved these changes Feb 11, 2021

View reviewed changes

DhairyaLGandhi merged commit 3bc42f2 into FluxML:master Feb 11, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added Bilinear layer #1009

Added Bilinear layer #1009

bhvieira commented Jan 29, 2020

mcabbott commented Jan 31, 2020

bhvieira commented Jan 31, 2020 •

edited

Loading

mcabbott commented Feb 2, 2020 •

edited

Loading

bhvieira commented Feb 2, 2020

bhvieira commented Feb 2, 2020

bhvieira commented Feb 2, 2020 •

edited

Loading

bhvieira commented Feb 2, 2020

bhvieira commented Feb 3, 2020 •

edited

Loading

DhairyaLGandhi commented Feb 7, 2020

bhvieira commented Feb 10, 2020 •

edited

Loading

arnavs commented Feb 26, 2020 •

edited

Loading

bhvieira commented Feb 27, 2020

arnavs commented Feb 27, 2020

bhvieira commented Feb 27, 2020

arnavs commented Feb 27, 2020

bhvieira commented Mar 1, 2020

mcabbott commented Mar 2, 2020 •

edited

Loading

bhvieira commented Mar 2, 2020

mcabbott commented Mar 2, 2020

bhvieira commented Mar 4, 2020

bhvieira commented Mar 4, 2020

CarloLucibello commented Mar 7, 2020

bhvieira commented Feb 7, 2021

CarloLucibello commented Feb 7, 2021

CarloLucibello commented Feb 8, 2021

CarloLucibello commented Feb 9, 2021

bhvieira commented Feb 9, 2021

CarloLucibello commented Feb 10, 2021

CarloLucibello commented Feb 10, 2021

CarloLucibello commented Feb 10, 2021

CarloLucibello commented Feb 10, 2021

CarloLucibello commented Feb 11, 2021

bors bot commented Feb 11, 2021

Added Bilinear layer #1009

Added Bilinear layer #1009

Conversation

bhvieira commented Jan 29, 2020

mcabbott commented Jan 31, 2020

bhvieira commented Jan 31, 2020 • edited Loading

mcabbott commented Feb 2, 2020 • edited Loading

bhvieira commented Feb 2, 2020

bhvieira commented Feb 2, 2020

bhvieira commented Feb 2, 2020 • edited Loading

bhvieira commented Feb 2, 2020

bhvieira commented Feb 3, 2020 • edited Loading

DhairyaLGandhi commented Feb 7, 2020

bhvieira commented Feb 10, 2020 • edited Loading

arnavs commented Feb 26, 2020 • edited Loading

bhvieira commented Feb 27, 2020

arnavs commented Feb 27, 2020

bhvieira commented Feb 27, 2020

arnavs commented Feb 27, 2020

bhvieira commented Mar 1, 2020

mcabbott commented Mar 2, 2020 • edited Loading

bhvieira commented Mar 2, 2020

mcabbott commented Mar 2, 2020

bhvieira commented Mar 4, 2020

bhvieira commented Mar 4, 2020

CarloLucibello commented Mar 7, 2020

bhvieira commented Feb 7, 2021

CarloLucibello commented Feb 7, 2021

CarloLucibello commented Feb 8, 2021

CarloLucibello commented Feb 9, 2021

bhvieira commented Feb 9, 2021

CarloLucibello commented Feb 10, 2021

CarloLucibello commented Feb 10, 2021

CarloLucibello commented Feb 10, 2021

CarloLucibello commented Feb 10, 2021

CarloLucibello commented Feb 11, 2021

bors bot commented Feb 11, 2021

bhvieira commented Jan 31, 2020 •

edited

Loading

mcabbott commented Feb 2, 2020 •

edited

Loading

bhvieira commented Feb 2, 2020 •

edited

Loading

bhvieira commented Feb 3, 2020 •

edited

Loading

bhvieira commented Feb 10, 2020 •

edited

Loading

arnavs commented Feb 26, 2020 •

edited

Loading

mcabbott commented Mar 2, 2020 •

edited

Loading