Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rest_api: Allow multiple resolve params in an endpoint config #1879

Open
willi-mueller opened this issue May 28, 2024 · 4 comments
Open

rest_api: Allow multiple resolve params in an endpoint config #1879

willi-mueller opened this issue May 28, 2024 · 4 comments
Assignees
Labels
enhancement New feature or request

Comments

@willi-mueller
Copy link
Collaborator

dlt version

0.4.11

Source name

rest_api

Describe the problem

When trying to specify multiple resolve params I get the following exception:

demos-py3.11➜  demos git:(main) ✗ RUNTIME__LOG_LEVEL=INFO python pokemon_pipeline.py
Traceback (most recent call last):
  File "/Users/vilasa/code/demos/pokemon_pipeline.py", line 60, in <module>
    pokemon_source = rest_api_source(pokemon_config)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/vilasa/code/demos/rest_api/__init__.py", line 89, in rest_api_source
    return decorated(config)
           ^^^^^^^^^^^^^^^^^
  File "/Users/vilasa/Library/Caches/pypoetry/virtualenvs/demos-C1MIco0Z-py3.11/lib/python3.11/site-packages/dlt/extract/decorators.py", line 243, in _wrap
    rv = conf_f(*args, **kwargs)
         ^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/vilasa/code/demos/rest_api/__init__.py", line 162, in rest_api_resources
    ) = build_resource_dependency_graph(
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/vilasa/code/demos/rest_api/config_setup.py", line 195, in build_resource_dependency_graph
    raise ValueError(
ValueError: Multiple resolved params for resource multiple_resolves: [ResolvedParam(param_name='berry_name', resolve_config=ResolveConfig(resource_name='berries', field_path='name')), ResolvedParam(param_name='pokemon_name', resolve_config=ResolveConfig(resource_name='pokemon', field_path='name'))]

Expected behavior

I can have an endless count of resolve params

Steps to reproduce

import dlt

from rest_api import rest_api_source

pokemon_config = {
    "client": {
        "base_url": "https://pokeapi.co/api/v2/",
    },
    "resource_defaults": {
        "write_disposition": "replace",
        "endpoint": {
            "params": {
                "limit": 1000,
            },
        },
    },
    "resources": [
        {"name": "berries", "endpoint": {"path": "berry"}, "selected": False},
        "pokemon",
        {
            "name": "multiple_resolves",
            "endpoint": {
                "path": "foo/bar?first_resolve={berry_name}&second_resolve={pokemon_name}",
                "params": {
                    "berry_name": {
                        "type": "resolve",
                        "resource": "berries",
                        "field": "name",
                    },
                    "pokemon_name": {
                        "type": "resolve",
                        "resource": "pokemon",
                        "field": "name",
                    },
                },
            },
        },
    ],
}

pokemon_source = rest_api_source(pokemon_config)

pipeline = dlt.pipeline(
    pipeline_name="pokemon_pipeline",
    destination="duckdb",
    dataset_name="pokemon",
    progress="log",
)

load_info = pipeline.run(pokemon_source)
print(load_info)

How you are using the source?

I run this source in production.

Operating system

macOS

Runtime environment

Local

Python version

3.11.4

dlt destination

duckdb

Additional information

Reported here: https://dlthub-community.slack.com/archives/C04DQA7JJN6/p1716833336657899?thread_ts=1716665462.485049&cid=C04DQA7JJN6

@burnash burnash self-assigned this May 28, 2024
@burnash burnash added the enhancement New feature or request label May 28, 2024
@jecolvin
Copy link

jecolvin commented Aug 6, 2024

I'll add another vote to this enhancement! Currently working with an API that wants two different IDs in the endpoint to return subtask data:

/api/v1/projects/<project_id>/assignments/<assignment_id>/subtasks

@burnash
Copy link
Collaborator

burnash commented Aug 6, 2024

Hey @jecolvin thanks for the input. Could you please share how <project_id> and <assignment_id> are related? dlt needs to figure out how to iterate over both <project_id> and <assignment_id>.

@jecolvin
Copy link

jecolvin commented Aug 6, 2024

For sure! <assignment_id> is a child of <project_id> in a one-to-many relationship -- each project has many assignments, but each assignment belongs to only one project.

Edit: If it helps, the assignment endpoint relies on project_id as well, it's just the first part of the subtask endpoint -- /api/v1/projects/<project_id>/assignments

@jecolvin
Copy link

@burnash Followup on my work and a proposal for a v1 of this feature:

I discovered I could get the project_id form the assignment endpoint, it was just named differently than I expected (assignable_id). So I could populate both params using a single resource. Relevant code:

    @dlt.transformer(data_from=assignments, write_disposition="replace")
    def subtasks(assignments):
        @dlt.defer
        def _get_subtasks(_assignment):
            project_id = _assignment["assignable_id"]
            assignment_id = _assignment["id"]
            for page in client.paginate(
                path=f"projects/{project_id}/assignments/{assignment_id}/subtasks",
                params={"per_page": 250},
            ):
                yield page

        for _assignment in assignments:
            yield _get_subtasks(_assignment)

That might be a good v1 for multiple resolve params: You can declare multiple resolve params as long as they all come from the same resource. That way it should still be a single, straightforward transformer.

@burnash burnash transferred this issue from dlt-hub/verified-sources Sep 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: Planned
Status: Todo
Development

No branches or pull requests

3 participants