Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Give python-isal a mention in the zlib/gzip documentation #98347

Open
rhpvorderman opened this issue Oct 17, 2022 · 3 comments
Open

Give python-isal a mention in the zlib/gzip documentation #98347

rhpvorderman opened this issue Oct 17, 2022 · 3 comments
Assignees
Labels
docs Documentation in the Doc dir

Comments

@rhpvorderman
Copy link
Contributor

rhpvorderman commented Oct 17, 2022

Documentation

The documentation mentions several PyPI packages such as numpy and requests as an alternative for standard library packages.

I would like to propose that python-isal gets a mention in the "see also" section of the zlib and gzip documentation. Simply: "python-isal, faster zlib and gzip decompression and compression".

Since the documentation cannot just recommend any random project out there here follows the argumentation why python-isal should get a mention.

  1. Python-isal uses stdlib code and uses a PSF-2.0 license.
  2. As a result the following improvements could be made to the stdlib code:

Python-isal is a "good citizen" of the python ecosystem. All the improvements have been ported back to CPython. The useful thing about python-isal is that it allows the gzip and zlib code of CPython to evolve and get tested by a smaller group of users before it lands in CPython itself. The PRs above were all suggested only after the changes were found to be stable in releases of python-isal.
Therefore more python-isal users is also beneficial to CPython itself. It is also beneficial for the users to be able to install a library that offers 2x faster decompression and 5x(!) faster compression. Hence a small one-liner in "see also" is warranted in my opinion.

The next thing for python-isal to tackle is this: #89550 . When a working solution is found this will be backported to CPython.

Disclosure: I am the python-isal maintainer.

@rhpvorderman rhpvorderman added the docs Documentation in the Doc dir label Oct 17, 2022
@encukou
Copy link
Member

encukou commented Oct 20, 2022

The documentation mentions several PyPI packages such as numpy and requests as an alternative for standard library packages.

And we haven't exactly had the best experience with doing that. Requests is a good example: nowadays you could argue that httpx is a much better choice, but no one really wants to do the arguing.

The useful thing about python-isal is that it allows the gzip and zlib code of CPython to evolve and get tested by a smaller group of users before it lands in CPython itself. The PRs above were all suggested only after the changes were found to be stable in releases of python-isal.

That's a much better reason. Do you want to position the generic part of isal as an “upstream for gzip/zlip”, similarly to how importlib_resources powers importlib.resources or how tomli became tomllib?
Is having gzip.open/GzipFile one day taking the compression library as an argument a possible goal?
(Please don't take these as veiled promises, I'd just like to know your opinions.)

The zlib experts are @Yhg1s & @gpshead. I think it's fine to add the mention if the module experts agree. But, both of them (and me) happen to be on the Steering Council, which should be able to figure out how to make policy decisions around mentioning other projects.

@gpshead
Copy link
Member

gpshead commented Oct 20, 2022

Documentation is a living thing, we're free to change what it links to at any time (and backport that as far as desired into branch docs). So in that sense I think mentioning currently seen as good idea packages with a history of being useful that we'd like to see grow more community such as python-isal from our zlib and gzip module docs is reasonable. Even if we haven't worked out the desired long term role of the external thing we're linking to.

@gpshead gpshead self-assigned this Oct 20, 2022
@rhpvorderman
Copy link
Contributor Author

That's a much better reason. Do you want to position the generic part of isal as an “upstream for gzip/zlip”, similarly to how importlib_resources powers importlib.resources or how tomli became tomllib?

Since this is a C extension primarily, this is not possible. CPython uses argument clinic, but that does not work well on C-extensions because argument clinic is tuned for the specific version of CPython rather than for all supported versions of CPython. Furthermore I dislike the blocksoutputbuffer that is used in zlibmodule.c as it seems to me a much more convoluted solution than what was previously used, while it does not solve any actual problems that happen in the real world. (It is primarily useful for compressing/decompressing very large chunks in-memory, but the way computers work is that a streaming solution will always be faster in such cases, so why optimize for large in-memory blocks?). Also zlibmodule.c uses heap types. Which makes declaring types more difficult, so python-isal uses simple statically declared types. ISA-L and zlib also have some minor API differences. Mostly that ISA-L requires the developer to allocate memory.

In conclusion, one-to-one copying is impossible. Having said that l did make accomodations so that code exchange is still very much possible. For example by separating the argument parsing code from the actual code, as is done in CPython's extension modules. The changes I have backported to CPython so far required minor rewrites, but were by no means difficult.

Is having gzip.open/GzipFile one day taking the compression library as an argument a possible goal?

I do not think this is necessary. Plain zlib is already quite fast. However, when you have data formats that use gzip/zlib compression and are measured in the order of gigabytes, zlib becomes a bottleneck. This is the case in bioinformatics, so that is why I created the python-isal bindings. In use cases where zlib is not the bottleneck, or the extra time spent is not really noticable, zlib is fine. Also, ISA-L does not have a zlib-compatible API. It is mostly similar, so it is quite easy for developers who know zlib, but it is going to be a pain to have the library as a generic argument. So I do not think it is worth it, there are implementation problems and the target audience is simply not big enough. Developers who know that compression/decompression is a bottleneck in their application can easily add python-isal as a dependency and that will solve the problem for most users as well.

So in that sense I think mentioning currently seen as good idea packages with a history of being useful that we'd like to see grow more community such as python-isal from our zlib and gzip module docs is reasonable.

Thank you for the compliment on the "good idea package". These ideas come from a lot of experimenting to make things faster which is more fun than video games sometimes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Documentation in the Doc dir
Projects
Status: No status
Development

No branches or pull requests

3 participants