Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sample weight sliced to work with cross validation, issue #358 #359

Merged
merged 4 commits into from
Jan 22, 2020

Conversation

rg2410
Copy link
Contributor

@rg2410 rg2410 commented Jan 17, 2020

Changes that could fix situation described in issue #358

@codecov-io
Copy link

codecov-io commented Jan 17, 2020

Codecov Report

Merging #359 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master     #359      +/-   ##
==========================================
+ Coverage   97.32%   97.32%   +<.01%     
==========================================
  Files          49       49              
  Lines        3138     3142       +4     
  Branches      584      585       +1     
==========================================
+ Hits         3054     3058       +4     
  Misses         44       44              
  Partials       40       40
Impacted Files Coverage Δ
eli5/sklearn/permutation_importance.py 100% <100%> (ø) ⬆️

Copy link
Contributor

@lopuhin lopuhin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @rg2410 , the fix looks good, some questions:

  • do all fit methods accept sample_weight? Maybe it would be better to pass it only if it's present in fit_params?
  • could you please add a test?

@rg2410
Copy link
Contributor Author

rg2410 commented Jan 20, 2020

Thanks for the suggestions @lopuhin, I've added them in the two last commits

Copy link
Contributor

@lopuhin lopuhin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @rg2410 , looks great 👍

Will be happy to merge as-is, but left a few minor suggestion below.

if weights is None:
est = clone(self.estimator).fit(X[train], y[train], **fit_params)
else:
est = clone(self.estimator).fit(X[train], y[train], sample_weight=weights[train], **fit_params)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does the right thing, but it's possible to improve this a bit - instead of repeating almost the same line twice, it's possible to modify fit_params or create new fold_fit_params, putting modified sample_weight there - then the same call can be done for both cases, avoiding repetition.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have a look at the new changes and see if they make sense. Good suggestions

@@ -214,8 +214,13 @@ def _cv_scores_importances(self, X, y, groups=None, **fit_params):
cv = check_cv(self.cv, y, is_classifier(self.estimator))
feature_importances = [] # type: List
base_scores = [] # type: List[float]
weights = fit_params.get('sample_weight', None)
fit_params.pop('sample_weight', None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An minor improvement is possible here: pop already returns what it popped, so these two lines could be reduced to one:

 weights = fit_params.pop('sample_weight', None)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor

@lopuhin lopuhin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks perfect, thank you @rg2410 👍

@lopuhin lopuhin merged commit 017c738 into TeamHG-Memex:master Jan 22, 2020
@rg2410 rg2410 deleted the issue-358 branch January 22, 2020 10:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants