Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zlib.inflateRawSync how do I get the number of bytes read? #8874

Closed
smyt1 opened this issue Sep 30, 2016 · 10 comments
Closed

zlib.inflateRawSync how do I get the number of bytes read? #8874

smyt1 opened this issue Sep 30, 2016 · 10 comments
Labels
feature request Issues that request new features to be added to Node.js. zlib Issues and PRs related to the zlib subsystem.

Comments

@smyt1
Copy link

smyt1 commented Sep 30, 2016

  • Version: v6.7.0 (applies to previous versions as well)
  • Platform:Darwin [hostname] 15.6.0 Darwin Kernel Version 15.6.0: Thu Jun 23 18:25:34 PDT 2016; root:xnu-3248.60.10~1/RELEASE_X86_64 x86_64
  • Subsystem: zlib

zlib.inflateRawSync returns the decompressed data. AFAICT it does not indicate how many bytes were read from the input buffer. Is there a way to find that?

Use case: some ZIP writers use the "data descriptor" feature of the pkzip file format. The CRC-32 checksum of the uncompressed data and the lengths actually appear directly after the deflated data, so you need to know the number of bytes that the deflator read in order to jump ahead to those fields. FWIW zip libraries leveraging zlib, like yauzl, do not bother with the checksum.

@mscdex mscdex added the zlib Issues and PRs related to the zlib subsystem. label Oct 1, 2016
@targos targos added the feature request Issues that request new features to be added to Node.js. label Oct 2, 2016
@bnoordhuis
Copy link
Member

The bytes consumed count is tracked internally but isn't exposed. It could be retrofitted onto the asynchronous API but it would be very awkward with the synchronous API. Either it would have to be a property on the returned buffer or a side channel such as a function or variable that holds the count of the last operation.

@AlexanderOMara
Copy link
Contributor

AlexanderOMara commented Apr 8, 2017

I would love to see this feature added. Having to use a non-async pure-JS implementation like pako, just to get this little bit of information is pretty lame.

It can be hijacked from the private API, by monkey-patching ._handle.writeSync or ._handle.callback but that's also not a real solutions, and requires re-implementing part of the module.

If we can decide on an API, I might be willing to implement it.

Some ideas:

  1. Extra callback arg:
zlib.inflate(buff, opts, function(err, buffer, read) {
});
  • Does not translate to synchronous.
  • Do any other callbacks take more than 2 arguments?
  1. An option for returning an object with buffer and this extra data:
zlib.inflate(buff, {bytesRead: true}, function(err, data) {
    var buffer = data.buffer;
    var bytesRead = data.bytesRead;
});
  • Can similarly work with synchronous.
  1. Property of buffer:
zlib.inflate(buff, opts, function(err, buffer) {
    var bytesRead = buffer.bytesRead;
});
  • Can similarly work with synchronous.

(see idea 4 before)

@bnoordhuis
Copy link
Member

Option 2 - opting in to a different return value / callback signature - is probably the most acceptable. There are other APIs in core that work the same.

(No panacea though, it destroys composability.)

@addaleax
Copy link
Member

addaleax commented Apr 9, 2017

Options 2 and 3 both sound good to me.

@smyt1
Copy link
Author

smyt1 commented Apr 9, 2017

I personally like option 3.

@bnoordhuis
Copy link
Member

Adding a property changes the object's shape (hidden class), that has performance implications.

@AlexanderOMara
Copy link
Contributor

One other possible issue, is supporting Node streams. I'm not sure any in my suggested ideas cover that use case.

@smyt1
Copy link
Author

smyt1 commented Apr 9, 2017

Another option is to append the length to the end of the buffer (written as an unsigned 32 bit integer) if requested. Default behavior would not write the length (so this would not disrupt existing code), but if you request the length the buffer will be 4 bytes longer. So long as buffer.slice(0,-4) is a light operation there would be no significant performance penalty besides allocating an extra 4 bytes per request.

@AlexanderOMara
Copy link
Contributor

AlexanderOMara commented Apr 10, 2017

For Node stream usage, somehow this information should be available here:

const MemoryStream = require('memorystream');
const stream = new MemoryStream();
const engine = zlib.createInflate();
engine.on('data', function(buffer) {
    // How to know how much data was read?
});
stream.pipe(engine);
stream.write(buffer);

Some ideas:

  1. Exposed via engine.bytesRead or similar.
  • This value would probably represent the amount read so far, increasing and updating each on data.
  1. Expose by property on buffer.
  • What value would be best exposed per buffer? The amount for that chunk? Of the whole thing?
  • Hidden class issue again.

This also gave me another idea. What if these zlib.create* objects had async and sync "process" methods like the method of zlib, where this information could be accessed. Sort-of a more-advanced OOP API:

var engine = zlib.createInflate();

// Async:
engine.process(buff, opts, function(err, buffer) {
    var bytesRead = engine.bytesRead;
});

// Sync:
var buffer = engine.processSync(buff, opts);
var bytesRead = engine.bytesRead;

I think process probably wouldn't be the best name, a better name could be chosen, but I think you get the idea.

@bnoordhuis
Copy link
Member

That's overengineering it. We're unlikely to introduce completely new APIs just to expose a simple data property.

As well, it's a variation of the side channel I mentioned earlier. You could accomplish the same thing by adding a zlib.inflateSync.bytesRead property that gets updated after every call.

AlexanderOMara added a commit to AlexanderOMara/node that referenced this issue May 10, 2017
Added bytesRead property to Zlib engines and an option to expose the engine to
convenience method callbacks.

Fixes: nodejs#8874
AlexanderOMara added a commit to AlexanderOMara/node that referenced this issue May 18, 2017
Added bytesRead property to Zlib engines

Refs: nodejs#8874
AlexanderOMara added a commit to AlexanderOMara/node that referenced this issue May 18, 2017
Added option to expose engine to convenience methods

Refs: nodejs#8874
AlexanderOMara added a commit to AlexanderOMara/node that referenced this issue May 22, 2017
AlexanderOMara added a commit to AlexanderOMara/node that referenced this issue May 22, 2017
AlexanderOMara added a commit to AlexanderOMara/node that referenced this issue May 22, 2017
AlexanderOMara added a commit to AlexanderOMara/node that referenced this issue May 24, 2017
Refactored bytesRead test to use callbacks instead of timers

Refs: nodejs#8874
AlexanderOMara added a commit to AlexanderOMara/node that referenced this issue May 24, 2017
AlexanderOMara added a commit to AlexanderOMara/node that referenced this issue May 24, 2017
addaleax pushed a commit that referenced this issue May 31, 2017
Added option to expose engine to convenience methods

Refs: #8874
PR-URL: #13089
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Anna Henningsen <[email protected]>
jasnell pushed a commit that referenced this issue Jun 5, 2017
Added option to expose engine to convenience methods

Refs: #8874
PR-URL: #13089
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Anna Henningsen <[email protected]>
jasnell pushed a commit that referenced this issue Jun 5, 2017
Added bytesRead property to Zlib engines

Fixes: #8874
PR-URL: #13088
Reviewed-By: Anna Henningsen <[email protected]>
Reviewed-By: Sam Roberts <[email protected]>
addaleax added a commit to addaleax/node that referenced this issue Mar 29, 2018
The introduction of `.bytesRead` to zlib streams was unfortunate,
because other streams in Node.js core use the exact opposite naming
of `.bytesRead` and `.bytesWritten`.

While one could see how the original naming makes sense in
a `Transform` stream context, we should try to work towards more
consistent APIs in core for these things.

This introduces `zlib.bytesWritten` and documentation-only deprecates
`zlib.bytesRead`.

(Luckily, the old property name hasn’t even been around for a whole
year yet.)

Refs: nodejs#8874
Refs: nodejs#13088
BridgeAR pushed a commit to BridgeAR/node that referenced this issue Apr 9, 2018
The introduction of `.bytesRead` to zlib streams was unfortunate,
because other streams in Node.js core use the exact opposite naming
of `.bytesRead` and `.bytesWritten`.

While one could see how the original naming makes sense in
a `Transform` stream context, we should try to work towards more
consistent APIs in core for these things.

This introduces `zlib.bytesWritten` and documentation-only deprecates
`zlib.bytesRead`.

PR-URL: nodejs#19414
Refs: nodejs#8874
Refs: nodejs#13088
Reviewed-By: Matteo Collina <[email protected]>
Reviewed-By: Сковорода Никита Андреевич <[email protected]>
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Colin Ihrig <[email protected]>
Reviewed-By: Sakthipriyan Vairamani <[email protected]>
Reviewed-By: Ruben Bridgewater <[email protected]>
targos pushed a commit that referenced this issue Apr 12, 2018
The introduction of `.bytesRead` to zlib streams was unfortunate,
because other streams in Node.js core use the exact opposite naming
of `.bytesRead` and `.bytesWritten`.

While one could see how the original naming makes sense in
a `Transform` stream context, we should try to work towards more
consistent APIs in core for these things.

This introduces `zlib.bytesWritten` and documentation-only deprecates
`zlib.bytesRead`.

PR-URL: #19414
Refs: #8874
Refs: #13088
Reviewed-By: Matteo Collina <[email protected]>
Reviewed-By: Сковорода Никита Андреевич <[email protected]>
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Colin Ihrig <[email protected]>
Reviewed-By: Sakthipriyan Vairamani <[email protected]>
Reviewed-By: Ruben Bridgewater <[email protected]>
BridgeAR pushed a commit to BridgeAR/node that referenced this issue May 1, 2018
The introduction of `.bytesRead` to zlib streams was unfortunate,
because other streams in Node.js core use the exact opposite naming
of `.bytesRead` and `.bytesWritten`.

While one could see how the original naming makes sense in
a `Transform` stream context, we should try to work towards more
consistent APIs in core for these things.

This introduces `zlib.bytesWritten` and documentation-only deprecates
`zlib.bytesRead`.

PR-URL: nodejs#19414
Refs: nodejs#8874
Refs: nodejs#13088
Reviewed-By: Matteo Collina <[email protected]>
Reviewed-By: Сковорода Никита Андреевич <[email protected]>
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Colin Ihrig <[email protected]>
Reviewed-By: Sakthipriyan Vairamani <[email protected]>
Reviewed-By: Ruben Bridgewater <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Issues that request new features to be added to Node.js. zlib Issues and PRs related to the zlib subsystem.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants