Iterative rectifier #534

leostimpfle · 2024-07-03T11:24:16Z

This PR implements the iterative rectifier algorithm by Correia et al. (2021, http://arxiv.org/abs/1903.01633). This is the first step of issue #457 .

Updates

The main updates are:

_check_for_separation is now a wrapper calling the individual methods specified via the optional methods arguments. By default all methods are executed _check_for_separation_fe and _check_for_separation_ir
_check_for_separation_ir is the newly implemented iterative rectifier check
the APIs have now an optional keyword argument separation_methods that allows the user to specify which separation checks to run (and defaults to all methods)

Tests

I have updated test_poisson with an additional very basic example of separation that only works with the iterative rectifier.

The ppmlhdfe package contains many additional test cases here and I have tested my implementation against these cases. All cases work apart

from the datasets without regressors (02.csv, 03.csv, 04.csv, 12.csv, 13.csv): ppmlhdfe allows empty regressors while pyfixest does not (i.e., formulae of the form y | fe1 + fe2)
07.csv: running fpeois on this dataset throws ValueError: Demeaning failed after 100_000 iterations. This dataset is also excluded ppmlhdfe's separation tests in validate_tagsep.do and I have therefore not further investigated the cause.

If useful, I can extend test_poisson to include further test cases from palmhdfe?

leostimpfle · 2024-07-03T11:25:24Z

For reference, here's the code snippet I used to run the additional test cases:

import os
import pandas as pd

import pandas as pd
from pyfixest.estimation.estimation import fepois, feols

folder = r'/Users/leonardstimpfle/code/ppmlhdfe/test/separation_datasets'
fns = os.listdir(folder)
fns.sort()
for fn in fns:
    if fn in ['07.csv']: continue
    data = pd.read_csv(os.path.join(folder, fn))

    fml = "y"
    regressors = data.columns[data.columns.str.startswith('x')]
    if not regressors.empty:
        fml += f" ~ {' + '.join(regressors)}"
    else:
        # pyfixest currently does not allow empty regressors
        continue
    fixed_effects = data.columns[data.columns.str.startswith('id')]
    if not fixed_effects.empty:
        fml += f" | {' + '.join(fixed_effects)}"

    print(fn)
    print(f'Expect {data.separated.sum()} separated observations')
    fitted_ir = fepois(fml, data=data, vcov="hetero", separation_check=["ir"])

s3alfisc · 2024-07-03T17:50:27Z

Wow, super cool, thank you Leo (@leostimpfle)! Did not expect this PR to come so soon =) I'll take a first look tonight, but I think reviewing the algos more carefully will take me until this weekend. Thank you! =)

leostimpfle · 2024-07-04T08:04:31Z

Thanks @s3alfisc ! No rush, happy to have a chat to run through the code if helpful.

In the meantime I will try to review the failed checks. It looks like I did something wrong with the typed arguments.

codecov · 2024-07-14T14:41:38Z

Codecov Report

Attention: Patch coverage is 88.57143% with 8 lines in your changes missing coverage. Please review.

Files	Coverage Δ
pyfixest/estimation/FixestMulti_.py	`82.87% <100.00%> (ø)`
pyfixest/estimation/estimation.py	`75.71% <100.00%> (ø)`
tests/test_poisson.py	`100.00% <100.00%> (ø)`
pyfixest/estimation/fepois_.py	`89.50% <86.20%> (ø)`

... and 31 files with indirect coverage changes

leostimpfle · 2024-07-14T16:27:07Z

My initial implementation didn't account for complex formulae such as "Y ~ X1 + i(f1,X2, ref=4.0)". The issue is that while the iterative rectifier should work on the raw data with the original formula, _check_for_separation was applied to the already modified arrays Y and X in FixestMulti_.

In the latest version, _check_for_separation has the additional arguments fml and data and the iterative rectifier algorithm is applied to these arguments. While_check_for_separation_fe and _check_for_separation_ir have the same call signature, the args actually used don't overlap between the two functions (the former uses Y, X, and fe, the latter fml and data).

@s3alfisc Please let me know any thoughts (appreciate it may not be the most elegant solution).

s3alfisc · 2024-07-31T19:36:30Z

This weekend I will review your PR @leostimpfle! Sorry about the long delay :-/

s3alfisc · 2024-09-29T12:04:19Z

Hi @leostimpfle , finally got around to taking a look at the algo in more detail. Looks good to me! I merged the master branch into the PR and did the required updates. I have one minor question:

Currently, fepois only checks for separation when there are fixed effects. There is this line

pyfixest/pyfixest/estimation/fepois_.py

Line 132 in a0c67cc

if self._fe is not None:

which avoids separation checks if there is no fixed effect. Is this something we want to keep? In this case, the test that you've implemented should not work as no fixed effect is included. Strangely, it passed on your CI run 🤔 One reason to keep things as they are is that we could not check for categoricals that are encoded as dummies, as the dummy encoding happens before the separation checks. So maybe better to keep things as they are?

If you have the time, it would also be awesome if we could implement a few more tests against pplmhdfe based on the test data sets you linked; though if you're time constraint, I am also happy to do implement the tests myself =)

Best, Alex

leostimpfle · 2024-10-05T08:16:36Z

Hi @s3alfisc ! In principle, the iterative rectifier should work without fixed effects but I think it's fine to only run it if fixed effects are included in the specification. It's been a while since I looked at it but I'll have a closer look at it over the next week. Will also implement further tests against ppmlhdfe. Thanks again!

s3alfisc · 2024-10-05T10:21:33Z

Very cool, thanks Leo! =)

leostimpfle added 9 commits June 9, 2024 18:48

type valid separation methods

4648c7d

type separation method

abe079b

add separation_check arg

1c8498d

update test_poisson

d67db99

fix bugs

09fc032

remove unused variable

d14e6bb

update test case to avoid perfect collinearity

7ce587c

improve rectifier

b4ee738

cosmetics

d66f69f

leostimpfle added 3 commits July 4, 2024 19:08

use Optional

d8d4891

ensure consistent index

6822451

use input fml and data

662b152

s3alfisc self-requested a review July 31, 2024 19:36

s3alfisc added 2 commits September 29, 2024 12:00

solve merge conflicts

820c9da

minor cleanups

d3190be

s3alfisc added 2 commits September 29, 2024 14:22

add test against pplmhdfe test data set 01

5e2870a

readme txt to md

ee95873

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Iterative rectifier #534

Iterative rectifier #534

leostimpfle commented Jul 3, 2024

leostimpfle commented Jul 3, 2024

s3alfisc commented Jul 3, 2024

leostimpfle commented Jul 4, 2024

codecov bot commented Jul 14, 2024 •

edited

Loading

leostimpfle commented Jul 14, 2024

s3alfisc commented Jul 31, 2024

s3alfisc commented Sep 29, 2024

leostimpfle commented Oct 5, 2024

s3alfisc commented Oct 5, 2024

Iterative rectifier #534

Are you sure you want to change the base?

Iterative rectifier #534

Conversation

leostimpfle commented Jul 3, 2024

Updates

Tests

leostimpfle commented Jul 3, 2024

s3alfisc commented Jul 3, 2024

leostimpfle commented Jul 4, 2024

codecov bot commented Jul 14, 2024 • edited Loading

Codecov Report

leostimpfle commented Jul 14, 2024

s3alfisc commented Jul 31, 2024

s3alfisc commented Sep 29, 2024

leostimpfle commented Oct 5, 2024

s3alfisc commented Oct 5, 2024

codecov bot commented Jul 14, 2024 •

edited

Loading