Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleaner training loop #1149

Merged
merged 1 commit into from
Jun 28, 2020
Merged

Cleaner training loop #1149

merged 1 commit into from
Jun 28, 2020

Conversation

DhairyaLGandhi
Copy link
Member

No description provided.

@DhairyaLGandhi
Copy link
Member Author

@CarloLucibello it would be good if DataLoader did this by default

Would the necessary change be to wrap the output of getdata with a single index in a tuple?

@CarloLucibello
Copy link
Member

The current behavior is not accidental, it is what I would expect when iterating on a single data collection:

  1. for x in DataLoader(X)
  2. for (x, y) in DataLoader(X, Y)

Knet and pytorch do the same. We could add

  1. for (x,) in DataLoader((X,))
  2. for (x, y) in DataLoader((X, Y))

and maybe deprecate 2., but I would keep 1. as it is, along with the changes in this PR

@johnnychen94
Copy link
Contributor

johnnychen94 commented Apr 26, 2020

I prefer to preserve the DataLoader((X, Y)) API for usage as a combination of two datasets instead of an alternative to DataLoader(X, Y).

@DhairyaLGandhi
Copy link
Member Author

I would prefer it if we followed consistent semantics. Since we have so far allowed every element of a dataset being basically a minibatch and it's labels, it follows that single data collections be represented by (x,)

@@ -56,6 +56,10 @@ function stop()
throw(StopException())
end

@inline batchmemaybe(x::AbstractArray) = tuple(x)
@inline batchmemaybe(x::AbstractArray{T}) where T <: AbstractArray = x
@inline batchmemaybe(x) = x
Copy link
Member

@CarloLucibello CarloLucibello Apr 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not follow the current branching:

@inline batchmemaybe(x::AbstractArray{<:Number}) = tuple(x)
@inline batchmemaybe(x) = x

which is stricter with respect to when bypassing splatting?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cases where we optimise colors and RGB values don't follow subtyping Number.

We shouldn't disallow StructArray here either

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be tricky if we had VecOfArray inputs

@CarloLucibello
Copy link
Member

I would prefer it if we followed consistent semantics. Since we have so far allowed every element of a dataset being basically a minibatch and it's labels, it follows that single data collections be represented by (x,)

I don't see why a data iterator for the unsupervised setting should follow the semantics of the supervised one.
In any case, I think this PR should be merged independently of what happens to DataLoader, since we want to support generic data iterators.

@bhvieira
Copy link
Contributor

Good to have multiple-dispatch on this instead of the if/else, but the naming of the function is non-informative. Perhaps a better name could be suggested?

@DhairyaLGandhi
Copy link
Member Author

I'll take suggestions

@DhairyaLGandhi
Copy link
Member Author

Assuming we require every data loader to return a tuple of arguments to the loss, we would remain consistent and generic.

In that case both supervised and unsupervised data loaders should just return a tuple of what they expect to be arguments to the loss.

@CarloLucibello
Copy link
Member

All right. So, maybe let's take a week or so to merge a few non-breaking PRs, tag a release, and then pipeline as many breaking PRs as we can for the release after that

@CarloLucibello CarloLucibello added this to the v0.11 milestone Apr 27, 2020
@MikeInnes
Copy link
Member

Carlo's option of deleting the DataLoader(X, Y) method and documenting DataLoader((X, Y)) would be fine. Then we can delete the branch from the training loop and encourage either DataLoader((X,)) of zip(DataLoader(X)) in the unsupervised case.

I don't see why a data iterator for the unsupervised setting should follow the semantics of the supervised one.

Primarily because there is no unsupervised or supervised case as far as this code is concerned; just a one-arg and multi-arg case. Here's a few ways the current setup could go wrong:

  • Splatting arguments; loss(x...) and DataLoader(X...). This will throw a confusing error in the case where you happen to have only one X, where it should just work consistently.
  • Modifying existing code: If I change loss(x, y) to loss(x) (or vice versa) it's intuitive to change DataLoader(X, Y) to DataLoader(X). In fact the correct change is more subtle and non-obvious.
  • Hitting this branch unexpectedly: If you have data = [[1, 2, 3], [4, 5, 6], ...] we should call loss(1, 2, 3) etc. Every other iterator of args works, but Array{<:Number} will break with an unhelpful error. We can't even assume iterators have consistent eltypes in julia, so someone could write data = [(1, 2, 3), [4, 5, 6], (i for i in 1:3)] and get very inconsistent behaviour.

Perhaps these seem unlikely, but if we really feel that supervised and unsupervised are fundamentally separate and need to behave differently, we need to represent that with two separate APIs and different types, not with a distinction between one and multiple arguments.

Also, note that because of the last point, the original PR #1051 was breaking. So quickly un-breaking it in a patch release seems like less of a big problem, since it's only restoring compatibility with v10.0.

@CarloLucibello
Copy link
Member

CarloLucibello commented Apr 27, 2020

Actually I'm still quite torn. On a second thought I would use for train the rule "a tuple argument will be splatted", implemented as

batchmemaybe(x) = tuple(x)
batchmemaybe(x::Tuple) = x

loss(batchmemaybe(x)...)

Would this cover any reasonable scenario?

@DhairyaLGandhi
Copy link
Member Author

Agree that an un-breaking patch would be least disruptive

@MikeInnes
Copy link
Member

Just splatting tuples might be better, and if we do that it's easy for more advanced use cases to get reliable behaviour (just wrap everything in a tuple). There's still some potential for surprise since in general, iterators and tuples behave the same in Julia, but it's a big improvement over the current situation.

I suggest we figure that out as a follow up to the fixes here (it would also technically be breaking).

@DhairyaLGandhi
Copy link
Member Author

I've added a simple backwards compatibility check with a warning hinting the users to the changed API. This along with checking for batching within the train loop should help with not breaking code. A quick look would be helpful @MikeInnes

@CarloLucibello
Copy link
Member

I'd like to preserve dataloader's interface 1. (and possibly also 2.) along with 3. and 4. introduced here, and without any warning. Let me do this in another PR and see if you people like it

@DhairyaLGandhi
Copy link
Member Author

The PR as it stands is not breaking anymore, with a compat layer

@CarloLucibello
Copy link
Member

The PR as it stands is not breaking anymore, with a compat layer

it's still deprecating the current dataloader interface, I'd like to avoid that

@CarloLucibello
Copy link
Member

also this is breaking the iteration behavior of DataLoader(X)

@MikeInnes
Copy link
Member

Let's do the following:

  • Delete the DataLoader(X, Y) method, but support DataLoader(X) and DataLoader((X, Y)). In principle this could be generalised to any nest of tuples, and the tuple structure you iterate over reflects what you put in.
  • Delete the branch in the training loop / batchmemaybe entirely to restore compatibility with v10.0.
  • In a follow up PR we can consider adding a reverse batchmemaybe that explicitly splats only tuples, rather than not splatting only numeric arrays; this will have to go into v11.0.

@CarloLucibello
Copy link
Member

ollowing:

  • Delete the DataLoader(X, Y) method, but support DataLoader(X) and DataLoader((X, Y)). In principle this could be generalised to any nest of tuples, and the tuple structure you iterate over reflects what you put in.
  • Delete the branch in the training loop / batchmemaybe entirely to restore compatibility with v10.0.
  • In a follow up PR we can consider adding a reverse batchmemaybe that explicitly splats only tuples, rather than not splatting only numeric arrays; this will have to go into v11.0.

I've implemented the DataLoader part of this ins #1152 .
We are breaking, the DataLoader interface though. It's been around for more than a month and it is used in recent model-zoo updates. This seems likely to cause more breakage than the previous change in train that probably has gone unnoticed. I suggest we implement all of the changes now and tag v0.11

@DhairyaLGandhi
Copy link
Member Author

Hmm, is this closer to what you had in mind?

bors bot added a commit that referenced this pull request Jun 8, 2020
1152: extend dataloader r=CarloLucibello a=CarloLucibello

cfr discussion in #1149. Currently DataLoader interface supports

1. `for x in DataLoader(X)`
2. `for (x, y) in DataLoader(X, Y)`

This PR adds

3. `for (x,) in DataLoader((X,))`
4. `for (x, y) in DataLoader((X, Y))`

Edit:
the constructor in 2. is removed in this PR

Co-authored-by: CarloLucibello <[email protected]>
@CarloLucibello
Copy link
Member

@dhairyagandhi96 bump. This should implement #1149 (comment) , so that we are done with the train/DataLoader overhaul

@CarloLucibello CarloLucibello mentioned this pull request Jun 16, 2020
4 tasks
@cossio
Copy link
Contributor

cossio commented Jun 16, 2020

Just so this idea is not lost. We could splat NamedTuple as keywords loss(; nt...), which I think would fit very nicely with #1221 (@CarloLucibello's original comment #1221 (comment)).

Something like

if d isa NamedTuple
  loss(; d...)
else
  loss(d...)
end

@DhairyaLGandhi
Copy link
Member Author

Ah, so the change here is to support a loss function which takes in named tuples? We should add a test here for that case as well

@CarloLucibello
Copy link
Member

CarloLucibello commented Jun 17, 2020

Actually, I wouldn't have train! specialize on named tuples as well, we can't possibly support any type an iterator can throw. I suggest we have train! just distinguish tuples and non-tuples, as per #1149 (comment).

In #1221 (comment) I was referring to somthing that could be done on the user side. That is if they want to work with named tuples and keyword arguments they can do something like

function loss(; x, y)
....
end

loss(nt) = loss(; nt...)  # helper for train!

train_loader = DataLoader((x=rand(10,10), y=rand(10)))

train!(loss, ps, train_loader, opt)

@CarloLucibello
Copy link
Member

a squash before merging would be nice

@DhairyaLGandhi
Copy link
Member Author

Now that I am looking at it, it means that I can no longer have a mini batch be something like [x,y] (ie not a tuple) which seems a little unfortunate. We are then making the API a bit more rigid to what it can expect.

and compute the gradient of `loss(d)`.
Each `d` should return a collection of arguments to the `loss` function wrapped as a tuple.

In case datapoints `d` are of numeric array type, assume no splatting is needed and compute the gradient of `loss(d)`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the last 3 sentences could be replaced by

For each datapoint `d` in `data`, compute the gradient of  `loss` with respect to `params` through
backpropagation and call the optimizer `opt`. The way the batch `d` is passed to `loss` depends on the type of `d` :
- if `d` is a tuple, call  `loss(d...)`
- otherwise, call `loss(d)` 

Each `d` should return a collection of arguments to the `loss` function wrapped as a tuple.

In case datapoints `d` are of numeric array type, assume no splatting is needed and compute the gradient of `loss(d)`.
For each datapoint `d` in `data`, compute the gradient of `loss` with respect to `params` through backpropagation and call the optimizer `opt`. If `d` is a tuple of arguments to `loss`, call `loss(d...)`. Else, `loss` may handle `d` as desired.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line too long.
Also, the meaning of "loss may handle d as desired" is not clear
Could you squash the commits after fixing this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this clearer?

@DhairyaLGandhi
Copy link
Member Author

oof seemingly chose some weird commits to squash. Will fix that

return array

add inline

fixes

formatting fixes

return tuple(getdata)

dataloader doc fixes

test fixes

more test fixes

add correct api to error

add backwards compat

remove branching

rm batchmemaybe

get rid of dataloader parts

white lines

add check for batching

explain tuple limit in docs

nicer doc string

cleaner doc string

add test
@CarloLucibello
Copy link
Member

bors r+

@bors
Copy link
Contributor

bors bot commented Jun 28, 2020

Build succeeded:

@bors bors bot merged commit 318ef9d into FluxML:master Jun 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants