Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scd2 validity column values based on column in source data #1776

Open
jorritsandbrink opened this issue Sep 2, 2024 · 0 comments
Open

scd2 validity column values based on column in source data #1776

jorritsandbrink opened this issue Sep 2, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@jorritsandbrink
Copy link
Collaborator

jorritsandbrink commented Sep 2, 2024

Feature description

There are currently two options for validity column (i.e. "valid from" and "valid to") values:

  • based on "pipeline run timestamp" ➜ default
  • based on arbitrary value configured using the boundary_timestamp argument

I would like to see a third option:

  • based on source data column

In some cases an updated_at-style column in the source data "naturally" provides a good value for the validity columns.

It then makes sense to also detect changes based on this column, i.e. to base the "row hash" on a composite of the natural key and updated_at (i.e. the surrogate key).

Are you a dlt user?

Yes, I'm already a dlt user.

Use case

User asked for it: #1601 (comment)

Proposed solution

Extend public API with natural_key and boundary_timestamp_column arguments:

@dlt.resource(
    write_disposition={
        "disposition": "merge",
        "strategy": "scd2",
        "natural_key": "customer_id",
        "boundary_timestamp_column": "updated_at"
    }
)

Related issues

#1601 (comment) (specifically option 1)

@jorritsandbrink jorritsandbrink added the enhancement New feature or request label Sep 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: Todo
Development

No branches or pull requests

1 participant