Salary automl fairness #1836

moonlanderr · 2024-05-24T08:55:59Z

Checklist

Please go through each entry in the below checklist and mark an 'X' if that condition has been met. Every entry should be marked with an 'X' to be get the Pull Request approved.

All imports are in the first cell?
- First block of imports are standard libraries
- Second block are 3rd party libraries
- Third block are all arcgis imports? Note that in some cases, for samples, it is a good idea to keep the imports next to where they are used, particularly for uncommonly used features that we want to highlight.
All GIS object instantiations are one of the following?
- gis = GIS()
- gis = GIS('home') or gis = GIS('pro')
- gis = GIS(profile="your_online_portal")
- gis = GIS(profile="your_enterprise_portal")
If this notebook requires setup or teardown, did you add the appropriate code to ./misc/setup.py and/or ./misc/teardown.py?
If this notebook references any portal items that need to be staged on AGOL/Python API playground, did you coordinate with a Python API team member to stage the item the correct way with the api_data_owner user?
If the notebook requires working with local data (such as CSV, FGDB, SHP, Raster files), upload the files as items to the Geosaurus Online Org using api_data_owner account and change the notebook to first download and unpack the files.
Code simplified & split out across multiple cells, useful comments?
Consistent voice/tense/narrative style? Thoroughly checked for typos?
All images used like <img src="base64str_here"> instead of <img src="https://some.url">? All map widgets contain a static image preview? (Call mapview_inst.take_screenshot() to do so)
All file paths are constructed in an OS-agnostic fashion with os.path.join()? (Instead of r"\foo\bar", os.path.join(os.path.sep, "foo", "bar"), etc.)
Is your code formatted using Jupyter Black? You can use Jupyter Black to format your code in the notebook.
If this notebook showcases deep learning capabilities, please go through the following checklist:
- Are the inputs required for Export Training Data Using Deep Learning tool published on geosaurus org (api data owner account) and added in the notebook using gis.content.get function?
- Is training data zipped and published as Image Collection? Note: Whole folder is zipped with name same as the notebook name.
- Are the inputs required for model inferencing published on geosaurus org (api data owner account) and added in the notebook using gis.content.get function? Note: This includes providing test raster and trained model.
- Are the inferenced results displayed using a webmap widget?
IF YOU WANT THIS SAMPLE TO BE DISPLAYED ON THE DEVELOPERS.ARCGIS.COM WEBSITE, ping @jyaistMap so he can add it to the list for the next deploy.

… next

review-notebook-app · 2024-05-24T08:56:04Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

review-notebook-app · 2024-06-05T09:58:17Z

View / edit / reply to this conversation on ReviewNB

KarthikDutt commented on 2024-06-05T09:58:17Z
----------------------------------------------------------------

I think we should clearly call out that we are trying to see if there is any bias towards a specific gender either in this section or in the next before we proceed further. This will set the context and problem statement upfront

review-notebook-app · 2024-06-05T09:58:18Z

View / edit / reply to this conversation on ReviewNB

KarthikDutt commented on 2024-06-05T09:58:17Z
----------------------------------------------------------------

We can remove unwanted/unused imports here

moonlanderr commented on 2024-07-17T04:47:02Z
----------------------------------------------------------------

removed unwanted

review-notebook-app · 2024-06-05T09:58:19Z

View / edit / reply to this conversation on ReviewNB

KarthikDutt commented on 2024-06-05T09:58:18Z
----------------------------------------------------------------

Incomplete sentence : Using this dataset we will attempt to train a model to predict

Also, any reason why we chose to keep only 104 records?

moonlanderr commented on 2024-07-17T04:46:48Z
----------------------------------------------------------------

corrected.

KarthikDutt commented on 2024-08-21T04:29:58Z
----------------------------------------------------------------

The dataset consists of 32,561 records, with 21,790 males and 10,771 females, suggesting a bias favoring males. THis statement is incorrect. The dataset is biased because males are more likely to be classified as earning more than 50k compared to women and not because there is less representation of women in the dataset.

moonlanderr commented on 2024-09-06T05:42:17Z
----------------------------------------------------------------

corrected

review-notebook-app · 2024-06-05T09:58:19Z

View / edit / reply to this conversation on ReviewNB

KarthikDutt commented on 2024-06-05T09:58:19Z
----------------------------------------------------------------

Before building the model, we might also consider showing if the data is inherently biased. This can be done by applying some formulas to calculate DPR and EOR. By doing so the story will be:

Before training the model, the raw data was biased.
After training the model, this bias is amplified. (general problem with ML models)
We mitigate it by using fairness features in the API

moonlanderr commented on 2024-07-17T04:49:48Z
----------------------------------------------------------------

Now I have shown this by just breaking up the male vs female counts , with showing male bias

KarthikDutt commented on 2024-08-21T04:30:20Z
----------------------------------------------------------------

Please refer to previous comment.

moonlanderr commented on 2024-09-06T05:45:20Z
----------------------------------------------------------------

added a basic analysis of male vs female higher salary stats that indicate bias

review-notebook-app · 2024-06-05T09:58:20Z

View / edit / reply to this conversation on ReviewNB

KarthikDutt commented on 2024-06-05T09:58:20Z
----------------------------------------------------------------

Brief description on what is happening during prepare_data, fit, score, report would be useful if there any users who are first time users of AutoML api.

moonlanderr commented on 2024-07-17T05:39:52Z
----------------------------------------------------------------

added descriptions

review-notebook-app · 2024-06-05T09:58:21Z

View / edit / reply to this conversation on ReviewNB

KarthikDutt commented on 2024-06-05T09:58:20Z
----------------------------------------------------------------

vanilla-trained may not be the correct term.

moonlanderr commented on 2024-07-17T05:39:33Z
----------------------------------------------------------------

replaced vanilla

review-notebook-app · 2024-06-05T09:58:22Z

View / edit / reply to this conversation on ReviewNB

KarthikDutt commented on 2024-06-05T09:58:21Z
----------------------------------------------------------------

Brief explanation of the curves would be good to have

moonlanderr commented on 2024-07-17T06:26:00Z
----------------------------------------------------------------

added briefly, along with the link to refer for more detail

review-notebook-app · 2024-06-05T09:58:22Z

View / edit / reply to this conversation on ReviewNB

KarthikDutt commented on 2024-06-05T09:58:22Z
----------------------------------------------------------------

Since we are bringing in the terms eqr and dpr, we must explain here what these terms mean.

moonlanderr commented on 2024-07-17T06:34:15Z
----------------------------------------------------------------

added the how automl and fairness works, where users can check the detail

review-notebook-app · 2024-06-05T09:58:23Z

View / edit / reply to this conversation on ReviewNB

KarthikDutt commented on 2024-06-05T09:58:23Z
----------------------------------------------------------------

Here, we should also explain specific to this example/dataset, what EOR and DPR mean. For ex -

DPR means that we are solely concerned about Women in this dataset are represented in high income category
EOR means ..... .

moonlanderr commented on 2024-07-17T07:18:37Z
----------------------------------------------------------------

added brief explanation

review-notebook-app · 2024-06-05T09:58:24Z

View / edit / reply to this conversation on ReviewNB

KarthikDutt commented on 2024-06-05T09:58:23Z
----------------------------------------------------------------

Here we will need to talk about sensitive_variable and other new fairness related parameters that we have added and explain what those parameters are and what values they can take.

moonlanderr commented on 2024-07-17T08:16:52Z
----------------------------------------------------------------

added brief explanation on sensitive variable

review-notebook-app · 2024-06-05T09:58:24Z

View / edit / reply to this conversation on ReviewNB

KarthikDutt commented on 2024-06-05T09:58:24Z
----------------------------------------------------------------

We will have to explain how we mitigated the bias. What strategy we are using to mitigate the bias.

moonlanderr commented on 2024-07-17T08:35:20Z
----------------------------------------------------------------

so here I thought to mention the grid searching by automl, do you mean something else by strategy? I am not sure what is internally happening. Could you add this section here, referring to the startegy it is using,

KarthikDutt commented on 2024-08-21T04:31:53Z
----------------------------------------------------------------

Internally we are using an approach called Reweighing.

Reweighing is a preprocessing technique that Weights the examples in each (group, label) combination differently to ensure fairness before classification

moonlanderr commented on 2024-09-06T06:51:39Z
----------------------------------------------------------------

added

review-notebook-app · 2024-06-05T09:58:25Z

View / edit / reply to this conversation on ReviewNB

KarthikDutt commented on 2024-06-05T09:58:25Z
----------------------------------------------------------------

When we say 'Hence we can consider the model is now mitigated' , we will have to tell why we consider it mitigated

moonlanderr commented on 2024-07-17T08:38:34Z
----------------------------------------------------------------

I explained that here "The model report shows that 2_Default_LightGBM_SampleWeigthing_Update_2 is the best trained model and the respective demograpihc_parity_ratio is now 0.84 which is up from 0.29, which is also higher than the minimum threshold of 0.80."

review-notebook-app · 2024-06-05T09:58:26Z

View / edit / reply to this conversation on ReviewNB

KarthikDutt commented on 2024-06-05T09:58:25Z
----------------------------------------------------------------

Formating is bad. If needed we might use image here so tht it is more readable.

moonlanderr commented on 2024-07-17T08:40:40Z
----------------------------------------------------------------

It is showing the table properly in firefox, is it due to browser? I think it will be in the table format when it gets published, otherwise will add as image

KarthikDutt commented on 2024-08-21T04:33:15Z
----------------------------------------------------------------

I am using chrome.

I feel it is better to use images. In case you are sure that it will be fine after publishing, then you can ignore this.

moonlanderr commented on 2024-09-06T07:01:23Z
----------------------------------------------------------------

yes, it will be formatted further by the publishing team, will give it the proper display

review-notebook-app · 2024-06-05T09:58:27Z

View / edit / reply to this conversation on ReviewNB

KarthikDutt commented on 2024-06-05T09:58:26Z
----------------------------------------------------------------

Bad formating

moonlanderr commented on 2024-07-17T08:41:22Z
----------------------------------------------------------------

same as above, showing ok in the notebook my machine,

review-notebook-app · 2024-06-05T09:58:27Z

View / edit / reply to this conversation on ReviewNB

KarthikDutt commented on 2024-06-05T09:58:27Z
----------------------------------------------------------------

We will need to explain what selection rate means (Definition)

moonlanderr commented on 2024-07-17T08:48:05Z
----------------------------------------------------------------

added

review-notebook-app · 2024-06-05T09:58:28Z

View / edit / reply to this conversation on ReviewNB

KarthikDutt commented on 2024-06-05T09:58:28Z
----------------------------------------------------------------

This indicates that females were more likely to be incorrectly classified as negative cases.

The above statement does not convey the problem that females might face due to this. Clearly stating that the model is likely to classify a female as earning less than 50k might give more description of the problem.

moonlanderr commented on 2024-07-17T08:51:14Z
----------------------------------------------------------------

added

review-notebook-app · 2024-06-05T09:58:29Z

View / edit / reply to this conversation on ReviewNB

KarthikDutt commented on 2024-06-05T09:58:28Z
----------------------------------------------------------------

males are being incorrectly classified as negative cases after mitigation.

Can we use a better term instead of negative cases. Users might wonder what negative cases would mean in this context.

moonlanderr commented on 2024-07-17T08:53:18Z
----------------------------------------------------------------

added

review-notebook-app · 2024-06-05T09:58:30Z

View / edit / reply to this conversation on ReviewNB

KarthikDutt commented on 2024-06-05T09:58:29Z
----------------------------------------------------------------

We might have to add a couple more sentence to describe why we are reducing threshold from 0.8 to 0.7.

We can probably start by acknowledging that with EOR and threshold of 0.8 , the model was not able to find a fair model. But it AuotoML was able to mitigate to an extent that EOR improved to 0.7 which is still a substantial improvement over 0.17.

Then we can explain that , we can formalize this by reducing the threshold to 0.7 in the API as well.

moonlanderr commented on 2024-07-17T08:56:25Z
----------------------------------------------------------------

added

review-notebook-app · 2024-06-05T09:58:30Z

View / edit / reply to this conversation on ReviewNB

KarthikDutt commented on 2024-06-05T09:58:30Z
----------------------------------------------------------------

Formating is not readable.

moonlanderr commented on 2024-07-17T08:57:11Z
----------------------------------------------------------------

same, as I think it would get fixed once published

review-notebook-app · 2024-06-05T09:58:31Z

View / edit / reply to this conversation on ReviewNB

KarthikDutt commented on 2024-06-05T09:58:30Z
----------------------------------------------------------------

Formatting needs to be fixed.

moonlanderr commented on 2024-07-17T08:57:55Z
----------------------------------------------------------------

will get fixed in publishing as with earlier notebooks

review-notebook-app · 2024-06-05T09:58:32Z

View / edit / reply to this conversation on ReviewNB

KarthikDutt commented on 2024-06-05T09:58:31Z
----------------------------------------------------------------

Abbreviations FNR , FPR and SR needs to be expanded.

We might have to reword the drawbacks so that it does not appear as if , all that we did to mitigate the bias has resulted in nothing at the end as new bias got introduced.

moonlanderr commented on 2024-07-17T09:02:28Z
----------------------------------------------------------------

removed part of drawback, the rest is ok, slightly hinting that new biases have crept in which can be worked on further.

review-notebook-app · 2024-06-05T09:58:32Z

View / edit / reply to this conversation on ReviewNB

KarthikDutt commented on 2024-06-05T09:58:32Z
----------------------------------------------------------------

Conclusion must talk a little bit about how users can mitigate biases in their own datasets. What fairness parameters to choose and when.

moonlanderr commented on 2024-07-17T09:05:32Z
----------------------------------------------------------------

I think that this will depend on the user data, and will be a bit open ended to comment, besides the automl fairness guide is there to help for that. Since this is now getting a bit heavy with technical detail and concepts.

moonlanderr · 2024-07-17T04:46:49Z

corrected.

View entire conversation on ReviewNB

review-notebook-app · 2024-09-05T21:53:09Z

View / edit / reply to this conversation on ReviewNB

BP-Ent commented on 2024-09-05T21:53:08Z
----------------------------------------------------------------

In this study, we explored the application of fairness metrics in machine learning, particularly focusing on the limitations and benefits of Demographic Parity Ratio (DPR) and Equalized Odds Ratio (EOR) for fairness assessment.

First, we performed an initial fairness assessment of the model predicting salary by utilizing the demographic variable dataset and a vanilla automl workflow. The initial model showed discrepancies in fairness metrics, particularly with higher false positive rates for certain groups revealed by the Demographic Parity Ratio (DPR) and the Equalized Odds Ratio (EOR).

Subsequently, fairness mitigation was done first with DPR and then with EOR. While DPR addressed some aspects of fairness, it fell short in balancing false positive and false negative rates across groups, leading to suboptimal performance in fairness. Then mitigation using the Equalized Odds Ratio metric provided a more comprehensive fairness assessment by ensuring equal false positive and true positive rates across all groups, thereby addressing the limitations observed with DPR.

Finally, adjusting the threshold allowed automl to construct a fair model, which is useful for getting an Ensemble model. Otherwise, if the model is not able to construct a fair model, a model ensemble is not created.

Although there might be bias still present in the model, the mitigation workflow was able to reduce it significantly. Thus continuous evaluation and refinement of the fairness workflow would be crucial for achieving more equitable machine learning models and unbiased decision-making processes.

moonlanderr commented on 2024-09-06T08:51:52Z
----------------------------------------------------------------

added

BP-Ent

Suggested changes made on reviewNB

moonlanderr · 2024-09-06T05:42:18Z

corrected

View entire conversation on ReviewNB

moonlanderr · 2024-09-06T05:45:22Z

added a basic analysis of male vs female higher salary stats that indicate bias

View entire conversation on ReviewNB

moonlanderr · 2024-09-06T06:12:08Z

added

View entire conversation on ReviewNB

moonlanderr · 2024-09-06T06:14:34Z

added

View entire conversation on ReviewNB

moonlanderr · 2024-09-06T06:16:54Z

added

View entire conversation on ReviewNB

moonlanderr · 2024-09-06T06:22:38Z

added

View entire conversation on ReviewNB

moonlanderr · 2024-09-06T06:23:16Z

added

View entire conversation on ReviewNB

moonlanderr · 2024-09-06T06:24:13Z

added

View entire conversation on ReviewNB

moonlanderr · 2024-09-06T06:27:54Z

added

View entire conversation on ReviewNB

moonlanderr · 2024-09-06T06:30:45Z

added

View entire conversation on ReviewNB

moonlanderr · 2024-09-06T06:32:21Z

added

View entire conversation on ReviewNB

moonlanderr · 2024-09-06T06:40:48Z

added

View entire conversation on ReviewNB

moonlanderr · 2024-09-06T06:47:10Z

added

View entire conversation on ReviewNB

moonlanderr · 2024-09-06T06:50:02Z

added

View entire conversation on ReviewNB

moonlanderr · 2024-09-06T06:51:40Z

added

View entire conversation on ReviewNB

moonlanderr · 2024-09-06T06:59:56Z

added

View entire conversation on ReviewNB

moonlanderr · 2024-09-06T07:01:24Z

yes, it will be formatted further by the publishing team, will give it the proper display

View entire conversation on ReviewNB

moonlanderr · 2024-09-06T07:08:43Z

added

View entire conversation on ReviewNB

moonlanderr · 2024-09-06T08:00:30Z

added

View entire conversation on ReviewNB

moonlanderr · 2024-09-06T08:51:53Z

added

View entire conversation on ReviewNB

moonlanderr · 2024-09-06T08:53:25Z

@BP-Ent , all suggestions added pls check

moonlanderr · 2024-09-06T08:56:10Z

@KarthikDutt , I have corrected the bias indication paragraph, pls check,

…rr/arcgis-python-api into salary_automl_fairness

done

Supratim Banik and others added 4 commits July 8, 2022 09:54

ïnstallation step changed to code

bcf4cbf

Merge branch 'next' of https://github.com/Esri/arcgis-python-api into…

2ca78cc

… next

Merge branch 'next' of https://github.com/Esri/arcgis-python-api into…

95f16b7

… next

salary prediction bias mitigation notebook 1st draft

7b87d61

moonlanderr requested a review from KarthikDutt May 24, 2024 08:55

moonlanderr self-assigned this May 24, 2024

BP-Ent previously requested changes Sep 5, 2024

View reviewed changes

moonlanderr added 2 commits September 6, 2024 14:31

All new suggestions added

056f682

Merge branch 'salary_automl_fairness' of https://github.com/moonlande…

1cf0586

…rr/arcgis-python-api into salary_automl_fairness

KarthikDutt approved these changes Sep 6, 2024

View reviewed changes

priyankatuteja approved these changes Sep 10, 2024

View reviewed changes

jyaistMap merged commit 126f4da into Esri:next Sep 14, 2024

Salary automl fairness #1836

Salary automl fairness #1836

Conversation

moonlanderr commented May 24, 2024

Checklist

review-notebook-app bot commented May 24, 2024

review-notebook-app bot commented Jun 5, 2024 • edited Loading

review-notebook-app bot commented Jun 5, 2024 • edited Loading

review-notebook-app bot commented Jun 5, 2024 • edited Loading

review-notebook-app bot commented Jun 5, 2024 • edited Loading

review-notebook-app bot commented Jun 5, 2024 • edited Loading

review-notebook-app bot commented Jun 5, 2024 • edited Loading

review-notebook-app bot commented Jun 5, 2024 • edited Loading

review-notebook-app bot commented Jun 5, 2024 • edited Loading

review-notebook-app bot commented Jun 5, 2024 • edited Loading

review-notebook-app bot commented Jun 5, 2024 • edited Loading

review-notebook-app bot commented Jun 5, 2024 • edited Loading

review-notebook-app bot commented Jun 5, 2024 • edited Loading

review-notebook-app bot commented Jun 5, 2024 • edited Loading

review-notebook-app bot commented Jun 5, 2024 • edited Loading

review-notebook-app bot commented Jun 5, 2024 • edited Loading

review-notebook-app bot commented Jun 5, 2024 • edited Loading

review-notebook-app bot commented Jun 5, 2024 • edited Loading

review-notebook-app bot commented Jun 5, 2024 • edited Loading

review-notebook-app bot commented Jun 5, 2024 • edited Loading

review-notebook-app bot commented Jun 5, 2024 • edited Loading

review-notebook-app bot commented Jun 5, 2024 • edited Loading

review-notebook-app bot commented Jun 5, 2024 • edited Loading

moonlanderr commented Jul 17, 2024

review-notebook-app bot commented Sep 5, 2024 • edited Loading

BP-Ent left a comment

Choose a reason for hiding this comment

moonlanderr commented Sep 6, 2024

moonlanderr commented Sep 6, 2024

moonlanderr commented Sep 6, 2024

moonlanderr commented Sep 6, 2024

moonlanderr commented Sep 6, 2024

moonlanderr commented Sep 6, 2024

moonlanderr commented Sep 6, 2024

moonlanderr commented Sep 6, 2024

moonlanderr commented Sep 6, 2024

moonlanderr commented Sep 6, 2024

moonlanderr commented Sep 6, 2024

moonlanderr commented Sep 6, 2024

moonlanderr commented Sep 6, 2024

moonlanderr commented Sep 6, 2024

moonlanderr commented Sep 6, 2024

moonlanderr commented Sep 6, 2024

moonlanderr commented Sep 6, 2024

moonlanderr commented Sep 6, 2024

moonlanderr commented Sep 6, 2024

moonlanderr commented Sep 6, 2024

moonlanderr commented Sep 6, 2024

moonlanderr commented Sep 6, 2024

review-notebook-app bot commented Jun 5, 2024 •

edited

Loading

review-notebook-app bot commented Jun 5, 2024 •

edited

Loading

review-notebook-app bot commented Jun 5, 2024 •

edited

Loading

review-notebook-app bot commented Jun 5, 2024 •

edited

Loading

review-notebook-app bot commented Jun 5, 2024 •

edited

Loading

review-notebook-app bot commented Jun 5, 2024 •

edited

Loading

review-notebook-app bot commented Jun 5, 2024 •

edited

Loading

review-notebook-app bot commented Jun 5, 2024 •

edited

Loading

review-notebook-app bot commented Jun 5, 2024 •

edited

Loading

review-notebook-app bot commented Jun 5, 2024 •

edited

Loading

review-notebook-app bot commented Jun 5, 2024 •

edited

Loading

review-notebook-app bot commented Jun 5, 2024 •

edited

Loading

review-notebook-app bot commented Jun 5, 2024 •

edited

Loading

review-notebook-app bot commented Jun 5, 2024 •

edited

Loading

review-notebook-app bot commented Jun 5, 2024 •

edited

Loading

review-notebook-app bot commented Jun 5, 2024 •

edited

Loading

review-notebook-app bot commented Jun 5, 2024 •

edited

Loading

review-notebook-app bot commented Jun 5, 2024 •

edited

Loading

review-notebook-app bot commented Jun 5, 2024 •

edited

Loading

review-notebook-app bot commented Jun 5, 2024 •

edited

Loading

review-notebook-app bot commented Jun 5, 2024 •

edited

Loading

review-notebook-app bot commented Jun 5, 2024 •

edited

Loading

review-notebook-app bot commented Sep 5, 2024 •

edited

Loading