Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactoring the ParquetTools read/write APIs #5358

Merged
merged 19 commits into from
Apr 22, 2024

Conversation

malhotrashivam
Copy link
Contributor

@malhotrashivam malhotrashivam commented Apr 13, 2024

  1. Marked a number of parquet read/write APIs as @deprecated. These include methods that accept java File objects, and will be replaced by methods accepting String instead.
  2. Added a few new ParquetInstruction to replace some APIs:
    • Methods which accept a TableDefinition have been deprecated and will be replaced by setting a new instruction for the definition.
    • Similarly, a new instruction for Parquet file layout has been added, which will replace APIs inside ParquetTools which had layout names in the method name (Ex. readSingleFileTable will be replaced by readTable with layout set as single file).
    • Added a new instruction for providing index columns for writing. This will be the new default approach and older APIs have been deleted since those were not released yet.
  3. Moved some public utility methods outside of ParquetTools, to prevent internal methods being exposed as APIs.
  4. In the python code, deprecated an argument col_definitions in favor of table_definition in parquet reading and writing APIs.

Initial work for this change was started as part of #5206, and has now been moved over to this PR.

@malhotrashivam malhotrashivam added parquet Related to the Parquet integration DocumentationNeeded NoReleaseNotesNeeded No release notes are needed. labels Apr 13, 2024
@malhotrashivam malhotrashivam added this to the 2. April 2024 milestone Apr 13, 2024
@malhotrashivam malhotrashivam self-assigned this Apr 13, 2024
@malhotrashivam malhotrashivam changed the title Refactoring the ParquetTools read/write APIs (WIP) Refactoring the ParquetTools read/write APIs Apr 13, 2024
@malhotrashivam malhotrashivam changed the title (WIP) Refactoring the ParquetTools read/write APIs Refactoring the ParquetTools read/write APIs Apr 15, 2024
@rcaudy
Copy link
Member

rcaudy commented Apr 16, 2024

Spitballing a bit, I wonder how much better can get these APIs with bifurcated read and write instructions.
Do we really need table[], destination[]? I imagine a PartitionedTableFactory.of(Table[]) could give us the Tables, but flat layout doesn't have a naming scheme right now.

py/server/deephaven/parquet.py Outdated Show resolved Hide resolved
py/server/deephaven/parquet.py Outdated Show resolved Hide resolved
py/server/deephaven/parquet.py Show resolved Hide resolved
py/server/tests/test_parquet.py Outdated Show resolved Hide resolved
chipkent
chipkent previously approved these changes Apr 22, 2024
@malhotrashivam malhotrashivam merged commit c6543b2 into deephaven:main Apr 22, 2024
15 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Apr 22, 2024
@deephaven-internal
Copy link
Contributor

Labels indicate documentation is required. Issues for documentation have been opened:

Community: deephaven/deephaven-docs-community#202

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
DocumentationNeeded NoReleaseNotesNeeded No release notes are needed. parquet Related to the Parquet integration
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants