Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch ArrowDict to signed and fix static datatype #7383

Merged
merged 1 commit into from
Sep 10, 2024

Conversation

jleibs
Copy link
Member

@jleibs jleibs commented Sep 9, 2024

What

Certain arrow implementations required that dictionary keys be signed.

Also the type returned by static data columns was missing the arraylist.

Checklist

  • I have read and agree to Contributor Guide and the Code of Conduct
  • I've included a screenshot or gif (if applicable)
  • I have tested the web demo (if applicable):
  • The PR title and labels are set such as to maximize their usefulness for the next release's CHANGELOG
  • If applicable, add a new check to the release checklist!
  • If have noted any breaking changes to the log API in CHANGELOG.md and the migration guide

To run all checks from main, comment on the PR with @rerun-bot full-check.

@jleibs jleibs added ⛃ re_datastore affects the datastore itself exclude from changelog PRs with this won't show up in CHANGELOG.md labels Sep 9, 2024
@jleibs jleibs marked this pull request as ready for review September 9, 2024 23:48
@abey79 abey79 self-requested a review September 10, 2024 07:35
@abey79
Copy link
Member

abey79 commented Sep 10, 2024

#7380 will need to be changed after that

@abey79 abey79 merged commit 175876a into main Sep 10, 2024
35 of 37 checks passed
@abey79 abey79 deleted the jleibs/dataframe_fixes branch September 10, 2024 07:38
abey79 added a commit that referenced this pull request Sep 10, 2024
@abey79
Copy link
Member

abey79 commented Sep 10, 2024

Certain arrow implementations required that dictionary keys be signed.

Out of curiosity, where did you encounter that?

abey79 added a commit that referenced this pull request Sep 10, 2024
### What

- Closes #7279

Major update to the dataframe view
- display the data return by the new `re_dataframe` crate
  - the PoV entity/component is now actually used
  - entities are now always columns
    - see #7379 
- use [`egui_table`](https://github.com/rerun-io/egui_table) for the
table
  - hierarchical header
  - sticky columns
  - and much more...

TODO:
- [x] fix after merging #7383

<img width="2670" alt="image"
src="https://github.com/user-attachments/assets/cf09b69b-3c82-4ba9-9425-bf60622efae4">


### Checklist
* [x] I have read and agree to [Contributor
Guide](https://github.com/rerun-io/rerun/blob/main/CONTRIBUTING.md) and
the [Code of
Conduct](https://github.com/rerun-io/rerun/blob/main/CODE_OF_CONDUCT.md)
* [x] I've included a screenshot or gif (if applicable)
* [x] I have tested the web demo (if applicable):
* Using examples from latest `main` build:
[rerun.io/viewer](https://rerun.io/viewer/pr/7380?manifest_url=https://app.rerun.io/version/main/examples_manifest.json)
* Using full set of examples from `nightly` build:
[rerun.io/viewer](https://rerun.io/viewer/pr/7380?manifest_url=https://app.rerun.io/version/nightly/examples_manifest.json)
* [x] The PR title and labels are set such as to maximize their
usefulness for the next release's CHANGELOG
* [x] If applicable, add a new check to the [release
checklist](https://github.com/rerun-io/rerun/blob/main/tests/python/release_checklist)!
* [x] If have noted any breaking changes to the log API in
`CHANGELOG.md` and the migration guide

- [PR Build Summary](https://build.rerun.io/pr/7380)
- [Recent benchmark results](https://build.rerun.io/graphs/crates.html)
- [Wasm size tracking](https://build.rerun.io/graphs/sizes.html)

To run all checks from `main`, comment on the PR with `@rerun-bot
full-check`.

---------

Co-authored-by: Emil Ernerfeldt <[email protected]>
Co-authored-by: Clement Rey <[email protected]>
@jleibs
Copy link
Member Author

jleibs commented Sep 10, 2024

Certain arrow implementations required that dictionary keys be signed.

Out of curiosity, where did you encounter that?

See for example the official c++ API: https://arrow.apache.org/docs/cpp/api/datatype.html#_CPPv410dictionaryRKNSt10shared_ptrI8DataTypeEERKNSt10shared_ptrI8DataTypeEEb

image

I also hit this when trying to convert a dictionary to pandas:

pyarrow.lib.ArrowTypeError: Converting unsigned dictionary indices to pandas not yet supported, index type: uint32

In general Arrow frequently falls back to usage of signed-integers in key design places in support of least-common-denominator approach to languages that don't handled unsigned values well

The spec for arrays

Array lengths are represented in the Arrow metadata as a 64-bit signed integer.

The spec for lists

The offsets are the same as in the variable-size binary case, and both 32-bit and 64-bit signed integer offsets are supported options for the offsets.

The spec for dictionary suggests:

Since unsigned integers can be more difficult to work with in some cases (e.g. in the JVM), we recommend preferring signed integers over unsigned integers for representing dictionary indices. Additionally, we recommend avoiding using 64-bit unsigned integer indices unless they are required by an application.

jprochazk pushed a commit that referenced this pull request Sep 11, 2024
### What

- Closes #7279

Major update to the dataframe view
- display the data return by the new `re_dataframe` crate
  - the PoV entity/component is now actually used
  - entities are now always columns
    - see #7379 
- use [`egui_table`](https://github.com/rerun-io/egui_table) for the
table
  - hierarchical header
  - sticky columns
  - and much more...

TODO:
- [x] fix after merging #7383

<img width="2670" alt="image"
src="https://github.com/user-attachments/assets/cf09b69b-3c82-4ba9-9425-bf60622efae4">


### Checklist
* [x] I have read and agree to [Contributor
Guide](https://github.com/rerun-io/rerun/blob/main/CONTRIBUTING.md) and
the [Code of
Conduct](https://github.com/rerun-io/rerun/blob/main/CODE_OF_CONDUCT.md)
* [x] I've included a screenshot or gif (if applicable)
* [x] I have tested the web demo (if applicable):
* Using examples from latest `main` build:
[rerun.io/viewer](https://rerun.io/viewer/pr/7380?manifest_url=https://app.rerun.io/version/main/examples_manifest.json)
* Using full set of examples from `nightly` build:
[rerun.io/viewer](https://rerun.io/viewer/pr/7380?manifest_url=https://app.rerun.io/version/nightly/examples_manifest.json)
* [x] The PR title and labels are set such as to maximize their
usefulness for the next release's CHANGELOG
* [x] If applicable, add a new check to the [release
checklist](https://github.com/rerun-io/rerun/blob/main/tests/python/release_checklist)!
* [x] If have noted any breaking changes to the log API in
`CHANGELOG.md` and the migration guide

- [PR Build Summary](https://build.rerun.io/pr/7380)
- [Recent benchmark results](https://build.rerun.io/graphs/crates.html)
- [Wasm size tracking](https://build.rerun.io/graphs/sizes.html)

To run all checks from `main`, comment on the PR with `@rerun-bot
full-check`.

---------

Co-authored-by: Emil Ernerfeldt <[email protected]>
Co-authored-by: Clement Rey <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
exclude from changelog PRs with this won't show up in CHANGELOG.md ⛃ re_datastore affects the datastore itself
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants