Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Round 2 of refactoring for ParquetTools #5389

Open
malhotrashivam opened this issue Apr 19, 2024 · 0 comments
Open

Round 2 of refactoring for ParquetTools #5389

malhotrashivam opened this issue Apr 19, 2024 · 0 comments
Assignees
Labels
feature request New feature or request May2024 parquet Related to the Parquet integration
Milestone

Comments

@malhotrashivam
Copy link
Contributor

malhotrashivam commented Apr 19, 2024

As part of PR #5358, we have done a significant refactoring of APIs in ParquetTools and marked a number of APIs as deprecated, which will be removed soon as part of issue #5362.

Once we have removed the deprecated APIs, we should do another round of refactoring with the following ideas:

  • Right now ParquetInstructions is used for reading and writing. We should consider bifurcating into an ImmutableStyle ReadInstructions and WriteInstructions and move instructions like isRefreshing into read-only instructions, and index columns as write-only instructions, and so on.
  • Right now, we have writeTable for writing a single table, writeTables for writing Table[] and writeKeyValuePartitionedTable for partitioned writing of Table or PartitionedTable. We can refactor them to only expose a writeTable. writeTables should be covered by accepting a PartitionedTableFactory.of(Table[]) and writing them out as a flat partitioned table. Partitioned writing should be indicated by providing a KV_PARTITIONED layout instead of a separate API.
@malhotrashivam malhotrashivam added feature request New feature or request parquet Related to the Parquet integration labels Apr 19, 2024
@malhotrashivam malhotrashivam self-assigned this Apr 19, 2024
@malhotrashivam malhotrashivam added this to the 3. May 2024 milestone Apr 19, 2024
@pete-petey pete-petey modified the milestones: 3. May 2024, Backlog Aug 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request May2024 parquet Related to the Parquet integration
Projects
None yet
Development

No branches or pull requests

2 participants