-
Notifications
You must be signed in to change notification settings - Fork 331
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sample weight sliced to work with cross validation, issue #358 #359
Conversation
Codecov Report
@@ Coverage Diff @@
## master #359 +/- ##
==========================================
+ Coverage 97.32% 97.32% +<.01%
==========================================
Files 49 49
Lines 3138 3142 +4
Branches 584 585 +1
==========================================
+ Hits 3054 3058 +4
Misses 44 44
Partials 40 40
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @rg2410 , the fix looks good, some questions:
- do all
fit
methods acceptsample_weight
? Maybe it would be better to pass it only if it's present infit_params
? - could you please add a test?
Thanks for the suggestions @lopuhin, I've added them in the two last commits |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @rg2410 , looks great 👍
Will be happy to merge as-is, but left a few minor suggestion below.
if weights is None: | ||
est = clone(self.estimator).fit(X[train], y[train], **fit_params) | ||
else: | ||
est = clone(self.estimator).fit(X[train], y[train], sample_weight=weights[train], **fit_params) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does the right thing, but it's possible to improve this a bit - instead of repeating almost the same line twice, it's possible to modify fit_params
or create new fold_fit_params
, putting modified sample_weight
there - then the same call can be done for both cases, avoiding repetition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have a look at the new changes and see if they make sense. Good suggestions
@@ -214,8 +214,13 @@ def _cv_scores_importances(self, X, y, groups=None, **fit_params): | |||
cv = check_cv(self.cv, y, is_classifier(self.estimator)) | |||
feature_importances = [] # type: List | |||
base_scores = [] # type: List[float] | |||
weights = fit_params.get('sample_weight', None) | |||
fit_params.pop('sample_weight', None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An minor improvement is possible here: pop
already returns what it popped, so these two lines could be reduced to one:
weights = fit_params.pop('sample_weight', None)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks perfect, thank you @rg2410 👍
Changes that could fix situation described in issue #358