Cleaner training loop #1149

DhairyaLGandhi · 2020-04-26T04:15:35Z

No description provided.

DhairyaLGandhi · 2020-04-26T07:26:02Z

@CarloLucibello it would be good if DataLoader did this by default

Would the necessary change be to wrap the output of getdata with a single index in a tuple?

CarloLucibello · 2020-04-26T08:23:22Z

The current behavior is not accidental, it is what I would expect when iterating on a single data collection:

for x in DataLoader(X)
for (x, y) in DataLoader(X, Y)

Knet and pytorch do the same. We could add

for (x,) in DataLoader((X,))
for (x, y) in DataLoader((X, Y))

and maybe deprecate 2., but I would keep 1. as it is, along with the changes in this PR

johnnychen94 · 2020-04-26T08:34:45Z

I prefer to preserve the DataLoader((X, Y)) API for usage as a combination of two datasets instead of an alternative to DataLoader(X, Y).

DhairyaLGandhi · 2020-04-26T08:44:47Z

I would prefer it if we followed consistent semantics. Since we have so far allowed every element of a dataset being basically a minibatch and it's labels, it follows that single data collections be represented by (x,)

CarloLucibello · 2020-04-26T10:05:14Z

src/optimise/train.jl

@@ -56,6 +56,10 @@ function stop()
  throw(StopException())
 end

+@inline batchmemaybe(x::AbstractArray) = tuple(x)
+@inline batchmemaybe(x::AbstractArray{T}) where T <: AbstractArray = x
+@inline batchmemaybe(x) = x


Why not follow the current branching:

@inline batchmemaybe(x::AbstractArray{<:Number}) = tuple(x) @inline batchmemaybe(x) = x

which is stricter with respect to when bypassing splatting?

Cases where we optimise colors and RGB values don't follow subtyping Number.

We shouldn't disallow StructArray here either

This might be tricky if we had VecOfArray inputs

CarloLucibello · 2020-04-26T10:09:22Z

I would prefer it if we followed consistent semantics. Since we have so far allowed every element of a dataset being basically a minibatch and it's labels, it follows that single data collections be represented by (x,)

I don't see why a data iterator for the unsupervised setting should follow the semantics of the supervised one.
In any case, I think this PR should be merged independently of what happens to DataLoader, since we want to support generic data iterators.

bhvieira · 2020-04-26T14:07:43Z

Good to have multiple-dispatch on this instead of the if/else, but the naming of the function is non-informative. Perhaps a better name could be suggested?

DhairyaLGandhi · 2020-04-26T15:48:12Z

I'll take suggestions

DhairyaLGandhi · 2020-04-27T13:26:42Z

Assuming we require every data loader to return a tuple of arguments to the loss, we would remain consistent and generic.

In that case both supervised and unsupervised data loaders should just return a tuple of what they expect to be arguments to the loss.

CarloLucibello · 2020-04-27T13:38:47Z

All right. So, maybe let's take a week or so to merge a few non-breaking PRs, tag a release, and then pipeline as many breaking PRs as we can for the release after that

MikeInnes · 2020-04-27T13:51:11Z

Carlo's option of deleting the DataLoader(X, Y) method and documenting DataLoader((X, Y)) would be fine. Then we can delete the branch from the training loop and encourage either DataLoader((X,)) of zip(DataLoader(X)) in the unsupervised case.

I don't see why a data iterator for the unsupervised setting should follow the semantics of the supervised one.

Primarily because there is no unsupervised or supervised case as far as this code is concerned; just a one-arg and multi-arg case. Here's a few ways the current setup could go wrong:

Splatting arguments; loss(x...) and DataLoader(X...). This will throw a confusing error in the case where you happen to have only one X, where it should just work consistently.
Modifying existing code: If I change loss(x, y) to loss(x) (or vice versa) it's intuitive to change DataLoader(X, Y) to DataLoader(X). In fact the correct change is more subtle and non-obvious.
Hitting this branch unexpectedly: If you have data = [[1, 2, 3], [4, 5, 6], ...] we should call loss(1, 2, 3) etc. Every other iterator of args works, but Array{<:Number} will break with an unhelpful error. We can't even assume iterators have consistent eltypes in julia, so someone could write data = [(1, 2, 3), [4, 5, 6], (i for i in 1:3)] and get very inconsistent behaviour.

Perhaps these seem unlikely, but if we really feel that supervised and unsupervised are fundamentally separate and need to behave differently, we need to represent that with two separate APIs and different types, not with a distinction between one and multiple arguments.

Also, note that because of the last point, the original PR #1051 was breaking. So quickly un-breaking it in a patch release seems like less of a big problem, since it's only restoring compatibility with v10.0.

CarloLucibello · 2020-04-27T13:52:51Z

Actually I'm still quite torn. On a second thought I would use for train the rule "a tuple argument will be splatted", implemented as

batchmemaybe(x) = tuple(x)
batchmemaybe(x::Tuple) = x

loss(batchmemaybe(x)...)

Would this cover any reasonable scenario?

DhairyaLGandhi · 2020-04-27T13:56:23Z

Agree that an un-breaking patch would be least disruptive

MikeInnes · 2020-04-27T14:03:04Z

Just splatting tuples might be better, and if we do that it's easy for more advanced use cases to get reliable behaviour (just wrap everything in a tuple). There's still some potential for surprise since in general, iterators and tuples behave the same in Julia, but it's a big improvement over the current situation.

I suggest we figure that out as a follow up to the fixes here (it would also technically be breaking).

DhairyaLGandhi · 2020-04-29T06:49:02Z

I've added a simple backwards compatibility check with a warning hinting the users to the changed API. This along with checking for batching within the train loop should help with not breaking code. A quick look would be helpful @MikeInnes

src/data/dataloader.jl

CarloLucibello · 2020-04-29T07:31:29Z

I'd like to preserve dataloader's interface 1. (and possibly also 2.) along with 3. and 4. introduced here, and without any warning. Let me do this in another PR and see if you people like it

DhairyaLGandhi · 2020-04-29T08:00:40Z

The PR as it stands is not breaking anymore, with a compat layer

CarloLucibello · 2020-04-29T08:22:27Z

The PR as it stands is not breaking anymore, with a compat layer

it's still deprecating the current dataloader interface, I'd like to avoid that

CarloLucibello · 2020-04-29T08:33:42Z

also this is breaking the iteration behavior of DataLoader(X)

MikeInnes · 2020-04-29T10:20:05Z

Let's do the following:

Delete the DataLoader(X, Y) method, but support DataLoader(X) and DataLoader((X, Y)). In principle this could be generalised to any nest of tuples, and the tuple structure you iterate over reflects what you put in.
Delete the branch in the training loop / batchmemaybe entirely to restore compatibility with v10.0.
In a follow up PR we can consider adding a reverse batchmemaybe that explicitly splats only tuples, rather than not splatting only numeric arrays; this will have to go into v11.0.

CarloLucibello · 2020-04-29T10:48:09Z

ollowing:

Delete the DataLoader(X, Y) method, but support DataLoader(X) and DataLoader((X, Y)). In principle this could be generalised to any nest of tuples, and the tuple structure you iterate over reflects what you put in.

Delete the branch in the training loop / batchmemaybe entirely to restore compatibility with v10.0.

In a follow up PR we can consider adding a reverse batchmemaybe that explicitly splats only tuples, rather than not splatting only numeric arrays; this will have to go into v11.0.

I've implemented the DataLoader part of this ins #1152 .
We are breaking, the DataLoader interface though. It's been around for more than a month and it is used in recent model-zoo updates. This seems likely to cause more breakage than the previous change in train that probably has gone unnoticed. I suggest we implement all of the changes now and tag v0.11

DhairyaLGandhi · 2020-04-29T11:13:42Z

Hmm, is this closer to what you had in mind?

1152: extend dataloader r=CarloLucibello a=CarloLucibello cfr discussion in #1149. Currently DataLoader interface supports 1. `for x in DataLoader(X)` 2. `for (x, y) in DataLoader(X, Y)` This PR adds 3. `for (x,) in DataLoader((X,))` 4. `for (x, y) in DataLoader((X, Y))` Edit: the constructor in 2. is removed in this PR Co-authored-by: CarloLucibello <[email protected]>

CarloLucibello · 2020-06-16T17:06:36Z

@dhairyagandhi96 bump. This should implement #1149 (comment) , so that we are done with the train/DataLoader overhaul

cossio · 2020-06-16T17:36:13Z

Just so this idea is not lost. We could splat NamedTuple as keywords loss(; nt...), which I think would fit very nicely with #1221 (@CarloLucibello's original comment #1221 (comment)).

Something like

if d isa NamedTuple
  loss(; d...)
else
  loss(d...)
end

DhairyaLGandhi · 2020-06-17T08:17:50Z

Ah, so the change here is to support a loss function which takes in named tuples? We should add a test here for that case as well

CarloLucibello · 2020-06-17T08:53:41Z

Actually, I wouldn't have train! specialize on named tuples as well, we can't possibly support any type an iterator can throw. I suggest we have train! just distinguish tuples and non-tuples, as per #1149 (comment).

In #1221 (comment) I was referring to somthing that could be done on the user side. That is if they want to work with named tuples and keyword arguments they can do something like

function loss(; x, y)
....
end

loss(nt) = loss(; nt...)  # helper for train!

train_loader = DataLoader((x=rand(10,10), y=rand(10)))

train!(loss, ps, train_loader, opt)

CarloLucibello · 2020-06-26T10:11:13Z

a squash before merging would be nice

DhairyaLGandhi · 2020-06-26T10:14:13Z

Now that I am looking at it, it means that I can no longer have a mini batch be something like [x,y] (ie not a tuple) which seems a little unfortunate. We are then making the API a bit more rigid to what it can expect.

CarloLucibello · 2020-06-26T14:44:41Z

src/optimise/train.jl

-and compute the gradient of `loss(d)`.
+Each `d` should return a collection of arguments to the `loss` function wrapped as a tuple.
+
+In case datapoints `d` are of numeric array type, assume no splatting is needed and compute the gradient of `loss(d)`.


Maybe the last 3 sentences could be replaced by

For each datapoint `d` in `data`, compute the gradient of `loss` with respect to `params` through backpropagation and call the optimizer `opt`. The way the batch `d` is passed to `loss` depends on the type of `d` : - if `d` is a tuple, call `loss(d...)` - otherwise, call `loss(d)`

CarloLucibello · 2020-06-28T08:34:41Z

src/optimise/train.jl

-Each `d` should return a collection of arguments to the `loss` function wrapped as a tuple.
-
-In case datapoints `d` are of numeric array type, assume no splatting is needed and compute the gradient of `loss(d)`.
+For each datapoint `d` in `data`, compute the gradient of  `loss` with respect to `params` through backpropagation and call the optimizer `opt`. If `d` is a tuple of arguments to `loss`, call `loss(d...)`. Else, `loss` may handle `d` as desired.


line too long.
Also, the meaning of "loss may handle d as desired" is not clear
Could you squash the commits after fixing this?

Is this clearer?

DhairyaLGandhi · 2020-06-28T10:34:43Z

oof seemingly chose some weird commits to squash. Will fix that

return array add inline fixes formatting fixes return tuple(getdata) dataloader doc fixes test fixes more test fixes add correct api to error add backwards compat remove branching rm batchmemaybe get rid of dataloader parts white lines add check for batching explain tuple limit in docs nicer doc string cleaner doc string add test

CarloLucibello · 2020-06-28T11:34:07Z

bors r+

bors · 2020-06-28T11:55:23Z

Build succeeded:

ci/gitlab/gitlab.com

CarloLucibello approved these changes Apr 26, 2020

View reviewed changes

CarloLucibello reviewed Apr 26, 2020

View reviewed changes

CarloLucibello added this to the v0.11 milestone Apr 27, 2020

CarloLucibello added the breaking label Apr 27, 2020

CarloLucibello reviewed Apr 29, 2020

View reviewed changes

src/data/dataloader.jl Outdated Show resolved Hide resolved

CarloLucibello reviewed Apr 29, 2020

View reviewed changes

src/data/dataloader.jl Outdated Show resolved Hide resolved

CarloLucibello mentioned this pull request Apr 29, 2020

extend dataloader #1152

Merged

CarloLucibello mentioned this pull request Jun 16, 2020

DataLoader with NamedTuple #1221

Merged

4 tasks

CarloLucibello reviewed Jun 26, 2020

View reviewed changes

CarloLucibello reviewed Jun 28, 2020

View reviewed changes

DhairyaLGandhi force-pushed the dg/train branch from 85b46ea to e2b33b2 Compare June 28, 2020 10:32

DhairyaLGandhi force-pushed the dg/train branch from 466cbb3 to 889a0ff Compare June 28, 2020 10:42

bors bot merged commit 318ef9d into FluxML:master Jun 28, 2020

Cleaner training loop #1149

Cleaner training loop #1149

Conversation

DhairyaLGandhi commented Apr 26, 2020

DhairyaLGandhi commented Apr 26, 2020

CarloLucibello commented Apr 26, 2020

johnnychen94 commented Apr 26, 2020 • edited Loading

DhairyaLGandhi commented Apr 26, 2020

CarloLucibello Apr 26, 2020 • edited Loading

Choose a reason for hiding this comment

DhairyaLGandhi Apr 26, 2020

Choose a reason for hiding this comment

DhairyaLGandhi Apr 27, 2020

Choose a reason for hiding this comment

CarloLucibello commented Apr 26, 2020

bhvieira commented Apr 26, 2020

DhairyaLGandhi commented Apr 26, 2020

DhairyaLGandhi commented Apr 27, 2020

CarloLucibello commented Apr 27, 2020

MikeInnes commented Apr 27, 2020

CarloLucibello commented Apr 27, 2020 • edited Loading

DhairyaLGandhi commented Apr 27, 2020

MikeInnes commented Apr 27, 2020

DhairyaLGandhi commented Apr 29, 2020

CarloLucibello commented Apr 29, 2020

DhairyaLGandhi commented Apr 29, 2020

CarloLucibello commented Apr 29, 2020

CarloLucibello commented Apr 29, 2020

MikeInnes commented Apr 29, 2020

CarloLucibello commented Apr 29, 2020

DhairyaLGandhi commented Apr 29, 2020

CarloLucibello commented Jun 16, 2020

cossio commented Jun 16, 2020 • edited Loading

DhairyaLGandhi commented Jun 17, 2020

CarloLucibello commented Jun 17, 2020 • edited Loading

CarloLucibello commented Jun 26, 2020

DhairyaLGandhi commented Jun 26, 2020

CarloLucibello Jun 26, 2020

Choose a reason for hiding this comment

CarloLucibello Jun 28, 2020

Choose a reason for hiding this comment

DhairyaLGandhi Jun 28, 2020

Choose a reason for hiding this comment

DhairyaLGandhi commented Jun 28, 2020

CarloLucibello commented Jun 28, 2020

bors bot commented Jun 28, 2020

johnnychen94 commented Apr 26, 2020 •

edited

Loading

CarloLucibello Apr 26, 2020 •

edited

Loading

CarloLucibello commented Apr 27, 2020 •

edited

Loading

cossio commented Jun 16, 2020 •

edited

Loading

CarloLucibello commented Jun 17, 2020 •

edited

Loading