-
Notifications
You must be signed in to change notification settings - Fork 3
/
zip_kit.rbi
2197 lines (2004 loc) · 102 KB
/
zip_kit.rbi
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
# typed: strong
module ZipKit
VERSION = T.let("6.3.1", T.untyped)
class Railtie < Rails::Railtie
end
# A ZIP archive contains a flat list of entries. These entries can implicitly
# create directories when the archive is expanded. For example, an entry with
# the filename of "some folder/file.docx" will make the unarchiving application
# create a directory called "some folder" automatically, and then deposit the
# file "file.docx" in that directory. These "implicit" directories can be
# arbitrarily nested, and create a tree structure of directories. That structure
# however is implicit as the archive contains a flat list.
#
# This creates opportunities for conflicts. For example, imagine the following
# structure:
#
# * `something/` - specifies an empty directory with the name "something"
# * `something` - specifies a file, creates a conflict
#
# This can be prevented with filename uniqueness checks. It does get funkier however
# as the rabbit hole goes down:
#
# * `dir/subdir/another_subdir/yet_another_subdir/file.bin` - declares a file and directories
# * `dir/subdir/another_subdir/yet_another_subdir` - declares a file at one of the levels, creates a conflict
#
# The results of this ZIP structure aren't very easy to predict as they depend on the
# application that opens the archive. For example, BOMArchiveHelper on macOS will expand files
# as they are declared in the ZIP, but once a conflict occurs it will fail with "error -21". It
# is not very transparent to the user why unarchiving fails, and it has to - and can reliably - only
# be prevented when the archive gets created.
#
# Unfortunately that conflicts with another "magical" feature of ZipKit which automatically
# "fixes" duplicate filenames - filenames (paths) which have already been added to the archive.
# This fix is performed by appending (1), then (2) and so forth to the filename so that the
# conflict is avoided. This is not possible to apply to directories, because when one of the
# path components is reused in multiple filenames it means those entities should end up in
# the same directory (subdirectory) once the archive is opened.
#
# The `PathSet` keeps track of entries as they get added using 2 Sets (cheap presence checks),
# one for directories and one for files. It will raise a `Conflict` exception if there are
# files clobbering one another, or in case files collide with directories.
class PathSet
sig { void }
def initialize; end
# Adds a directory path to the set of known paths, including
# all the directories that contain it. So, calling
# add_directory_path("dir/dir2/dir3")
# will add "dir", "dir/dir2", "dir/dir2/dir3".
#
# _@param_ `path` — the path to the directory to add
sig { params(path: String).void }
def add_directory_path(path); end
# Adds a file path to the set of known paths, including
# all the directories that contain it. Once a file has been added,
# it is no longer possible to add a directory having the same path
# as this would cause conflict.
#
# The operation also adds all the containing directories for the file, so
# add_file_path("dir/dir2/file.doc")
# will add "dir" and "dir/dir2" as directories, "dir/dir2/dir3".
#
# _@param_ `file_path` — the path to the directory to add
sig { params(file_path: String).void }
def add_file_path(file_path); end
# Tells whether a specific full path is already known to the PathSet.
# Can be a path for a directory or for a file.
#
# _@param_ `path_in_archive` — the path to check for inclusion
sig { params(path_in_archive: String).returns(T::Boolean) }
def include?(path_in_archive); end
# Clears the contained sets
sig { void }
def clear; end
# sord omit - no YARD type given for "path_in_archive", using untyped
# Adds the directory or file path to the path set
sig { params(path_in_archive: T.untyped).void }
def add_directory_or_file_path(path_in_archive); end
# sord omit - no YARD type given for "path", using untyped
# sord omit - no YARD return type given, using untyped
sig { params(path: T.untyped).returns(T.untyped) }
def non_empty_path_components(path); end
# sord omit - no YARD type given for "path", using untyped
# sord omit - no YARD return type given, using untyped
sig { params(path: T.untyped).returns(T.untyped) }
def path_and_ancestors(path); end
class Conflict < StandardError
end
class FileClobbersDirectory < ZipKit::PathSet::Conflict
end
class DirectoryClobbersFile < ZipKit::PathSet::Conflict
end
end
# Is used to write ZIP archives without having to read them back or to overwrite
# data. It outputs into any object that supports `<<` or `write`, namely:
#
# An `Array`, `File`, `IO`, `Socket` and even `String` all can be output destinations
# for the `Streamer`.
#
# You can also combine output through the `Streamer` with direct output to the destination,
# all while preserving the correct offsets in the ZIP file structures. This allows usage
# of `sendfile()` or socket `splice()` calls for "through" proxying.
#
# If you want to avoid data descriptors - or write data bypassing the Streamer -
# you need to know the CRC32 (as a uint) and the filesize upfront,
# before the writing of the entry body starts.
#
# ## Using the Streamer with runtime compression
#
# You can use the Streamer with data descriptors (the CRC32 and the sizes will be
# written after the file data). This allows non-rewinding on-the-fly compression.
# The streamer will pick the optimum compression method ("stored" or "deflated")
# depending on the nature of the byte stream you send into it (by using a small buffer).
# If you are compressing large files, the Deflater object that the Streamer controls
# will be regularly flushed to prevent memory inflation.
#
# ZipKit::Streamer.open(file_socket_or_string) do |zip|
# zip.write_file('mov.mp4') do |sink|
# File.open('mov.mp4', 'rb'){|source| IO.copy_stream(source, sink) }
# end
# zip.write_file('long-novel.txt') do |sink|
# File.open('novel.txt', 'rb'){|source| IO.copy_stream(source, sink) }
# end
# end
#
# The central directory will be written automatically at the end of the `open` block.
#
# ## Using the Streamer with entries of known size and having a known CRC32 checksum
#
# Streamer allows "IO splicing" - in this mode it will only control the metadata output,
# but you can write the data to the socket/file outside of the Streamer. For example, when
# using the sendfile gem:
#
# ZipKit::Streamer.open(socket) do | zip |
# zip.add_stored_entry(filename: "myfile1.bin", size: 9090821, crc32: 12485)
# socket.sendfile(tempfile1)
# zip.simulate_write(tempfile1.size)
#
# zip.add_stored_entry(filename: "myfile2.bin", size: 458678, crc32: 89568)
# socket.sendfile(tempfile2)
# zip.simulate_write(tempfile2.size)
# end
#
# Note that you need to use `simulate_write` in this case. This needs to happen since Streamer
# writes absolute offsets into the ZIP (local file header offsets and the like),
# and it relies on the output object to tell it how many bytes have been written
# so far. When using `sendfile` the Ruby write methods get bypassed entirely, and the
# offsets in the IO will not be updated - which will result in an invalid ZIP.
#
#
# ## On-the-fly deflate -using the Streamer with async/suspended writes and data descriptors
#
# If you are unable to use the block versions of `write_deflated_file` and `write_stored_file`
# there is an option to use a separate writer object. It gets returned from `write_deflated_file`
# and `write_stored_file` if you do not provide them with a block, and will accept data writes.
# Do note that you _must_ call `#close` on that object yourself:
#
# ZipKit::Streamer.open(socket) do | zip |
# w = zip.write_stored_file('mov.mp4')
# IO.copy_stream(source_io, w)
# w.close
# end
#
# The central directory will be written automatically at the end of the `open` block. If you need
# to manage the Streamer manually, or defer the central directory write until appropriate, use
# the constructor instead and call `Streamer#close`:
#
# zip = ZipKit::Streamer.new(out_io)
# .....
# zip.close
#
# Calling {Streamer#close} **will not** call `#close` on the underlying IO object.
class Streamer
include ZipKit::WriteShovel
STORED = T.let(0, T.untyped)
DEFLATED = T.let(8, T.untyped)
EntryBodySizeMismatch = T.let(Class.new(StandardError), T.untyped)
InvalidOutput = T.let(Class.new(ArgumentError), T.untyped)
Overflow = T.let(Class.new(StandardError), T.untyped)
UnknownMode = T.let(Class.new(StandardError), T.untyped)
OffsetOutOfSync = T.let(Class.new(StandardError), T.untyped)
# sord omit - no YARD return type given, using untyped
# Creates a new Streamer on top of the given IO-ish object and yields it. Once the given block
# returns, the Streamer will have it's `close` method called, which will write out the central
# directory of the archive to the output.
#
# _@param_ `stream` — the destination IO for the ZIP (should respond to `tell` and `<<`)
#
# _@param_ `kwargs_for_new` — keyword arguments for #initialize
sig { params(stream: IO, kwargs_for_new: T::Hash[T.untyped, T.untyped]).returns(T.untyped) }
def self.open(stream, **kwargs_for_new); end
# sord duck - #<< looks like a duck type, replacing with untyped
# Creates a new Streamer on top of the given IO-ish object.
#
# _@param_ `writable` — the destination IO for the ZIP. Anything that responds to `<<` can be used.
#
# _@param_ `writer` — the object to be used as the writer. Defaults to an instance of ZipKit::ZipWriter, normally you won't need to override it
#
# _@param_ `auto_rename_duplicate_filenames` — whether duplicate filenames, when encountered, should be suffixed with (1), (2) etc. Default value is `false` - if dupliate names are used an exception will be raised
sig { params(writable: T.untyped, writer: ZipKit::ZipWriter, auto_rename_duplicate_filenames: T::Boolean).void }
def initialize(writable, writer: create_writer, auto_rename_duplicate_filenames: false); end
# Writes a part of a zip entry body (actual binary data of the entry) into the output stream.
#
# _@param_ `binary_data` — a String in binary encoding
#
# _@return_ — self
sig { params(binary_data: String).returns(T.untyped) }
def <<(binary_data); end
# Advances the internal IO pointer to keep the offsets of the ZIP file in
# check. Use this if you are going to use accelerated writes to the socket
# (like the `sendfile()` call) after writing the headers, or if you
# just need to figure out the size of the archive.
#
# _@param_ `num_bytes` — how many bytes are going to be written bypassing the Streamer
#
# _@return_ — position in the output stream / ZIP archive
sig { params(num_bytes: Integer).returns(Integer) }
def simulate_write(num_bytes); end
# Writes out the local header for an entry (file in the ZIP) that is using
# the deflated storage model (is compressed). Once this method is called,
# the `<<` method has to be called to write the actual contents of the body.
#
# Note that the deflated body that is going to be written into the output
# has to be _precompressed_ (pre-deflated) before writing it into the
# Streamer, because otherwise it is impossible to know it's size upfront.
#
# _@param_ `filename` — the name of the file in the entry
#
# _@param_ `modification_time` — the modification time of the file in the archive
#
# _@param_ `compressed_size` — the size of the compressed entry that is going to be written into the archive
#
# _@param_ `uncompressed_size` — the size of the entry when uncompressed, in bytes
#
# _@param_ `crc32` — the CRC32 checksum of the entry when uncompressed
#
# _@param_ `use_data_descriptor` — whether the entry body will be followed by a data descriptor
#
# _@param_ `unix_permissions` — which UNIX permissions to set, normally the default should be used
#
# _@return_ — the offset the output IO is at after writing the entry header
sig do
params(
filename: String,
modification_time: Time,
compressed_size: Integer,
uncompressed_size: Integer,
crc32: Integer,
unix_permissions: T.nilable(Integer),
use_data_descriptor: T::Boolean
).returns(Integer)
end
def add_deflated_entry(filename:, modification_time: Time.now.utc, compressed_size: 0, uncompressed_size: 0, crc32: 0, unix_permissions: nil, use_data_descriptor: false); end
# Writes out the local header for an entry (file in the ZIP) that is using
# the stored storage model (is stored as-is).
# Once this method is called, the `<<` method has to be called one or more
# times to write the actual contents of the body.
#
# _@param_ `filename` — the name of the file in the entry
#
# _@param_ `modification_time` — the modification time of the file in the archive
#
# _@param_ `size` — the size of the file when uncompressed, in bytes
#
# _@param_ `crc32` — the CRC32 checksum of the entry when uncompressed
#
# _@param_ `use_data_descriptor` — whether the entry body will be followed by a data descriptor. When in use
#
# _@param_ `unix_permissions` — which UNIX permissions to set, normally the default should be used
#
# _@return_ — the offset the output IO is at after writing the entry header
sig do
params(
filename: String,
modification_time: Time,
size: Integer,
crc32: Integer,
unix_permissions: T.nilable(Integer),
use_data_descriptor: T::Boolean
).returns(Integer)
end
def add_stored_entry(filename:, modification_time: Time.now.utc, size: 0, crc32: 0, unix_permissions: nil, use_data_descriptor: false); end
# Adds an empty directory to the archive with a size of 0 and permissions of 755.
#
# _@param_ `dirname` — the name of the directory in the archive
#
# _@param_ `modification_time` — the modification time of the directory in the archive
#
# _@param_ `unix_permissions` — which UNIX permissions to set, normally the default should be used
#
# _@return_ — the offset the output IO is at after writing the entry header
sig { params(dirname: String, modification_time: Time, unix_permissions: T.nilable(Integer)).returns(Integer) }
def add_empty_directory(dirname:, modification_time: Time.now.utc, unix_permissions: nil); end
# Opens the stream for a file stored in the archive, and yields a writer
# for that file to the block.
# The writer will buffer a small amount of data and see whether compression is
# effective for the data being output. If compression turns out to work well -
# for instance, if the output is mostly text - it is going to create a deflated
# file inside the zip. If the compression benefits are negligible, it will
# create a stored file inside the zip. It will delegate either to `write_deflated_file`
# or to `write_stored_file`.
#
# Using a block, the write will be terminated with a data descriptor outright.
#
# zip.write_file("foo.txt") do |sink|
# IO.copy_stream(source_file, sink)
# end
#
# If deferred writes are desired (for example - to integrate with an API that
# does not support blocks, or to work with non-blocking environments) the method
# has to be called without a block. In that case it returns the sink instead,
# permitting to write to it in a deferred fashion. When `close` is called on
# the sink, any remanining compression output will be flushed and the data
# descriptor is going to be written.
#
# Note that even though it does not have to happen within the same call stack,
# call sequencing still must be observed. It is therefore not possible to do
# this:
#
# writer_for_file1 = zip.write_file("somefile.jpg")
# writer_for_file2 = zip.write_file("another.tif")
# writer_for_file1 << data
# writer_for_file2 << data
#
# because it is likely to result in an invalid ZIP file structure later on.
# So using this facility in async scenarios is certainly possible, but care
# and attention is recommended.
#
# _@param_ `filename` — the name of the file in the archive
#
# _@param_ `modification_time` — the modification time of the file in the archive
#
# _@param_ `unix_permissions` — which UNIX permissions to set, normally the default should be used
#
# _@return_ — without a block - the Writable sink which has to be closed manually
sig do
params(
filename: String,
modification_time: Time,
unix_permissions: T.nilable(Integer),
blk: T.proc.params(sink: ZipKit::Streamer::Writable).void
).returns(ZipKit::Streamer::Writable)
end
def write_file(filename, modification_time: Time.now.utc, unix_permissions: nil, &blk); end
# Opens the stream for a stored file in the archive, and yields a writer
# for that file to the block.
# Once the write completes, a data descriptor will be written with the
# actual compressed/uncompressed sizes and the CRC32 checksum.
#
# Using a block, the write will be terminated with a data descriptor outright.
#
# zip.write_stored_file("foo.txt") do |sink|
# IO.copy_stream(source_file, sink)
# end
#
# If deferred writes are desired (for example - to integrate with an API that
# does not support blocks, or to work with non-blocking environments) the method
# has to be called without a block. In that case it returns the sink instead,
# permitting to write to it in a deferred fashion. When `close` is called on
# the sink, any remanining compression output will be flushed and the data
# descriptor is going to be written.
#
# Note that even though it does not have to happen within the same call stack,
# call sequencing still must be observed. It is therefore not possible to do
# this:
#
# writer_for_file1 = zip.write_stored_file("somefile.jpg")
# writer_for_file2 = zip.write_stored_file("another.tif")
# writer_for_file1 << data
# writer_for_file2 << data
#
# because it is likely to result in an invalid ZIP file structure later on.
# So using this facility in async scenarios is certainly possible, but care
# and attention is recommended.
#
# If an exception is raised inside the block that is passed to the method, a `rollback!` call
# will be performed automatically and the entry just written will be omitted from the ZIP
# central directory. This can be useful if you want to rescue the exception and reattempt
# adding the ZIP file. Note that you will need to call `write_deflated_file` again to start a
# new file - you can't keep writing to the one that failed.
#
# _@param_ `filename` — the name of the file in the archive
#
# _@param_ `modification_time` — the modification time of the file in the archive
#
# _@param_ `unix_permissions` — which UNIX permissions to set, normally the default should be used
#
# _@return_ — without a block - the Writable sink which has to be closed manually
sig do
params(
filename: String,
modification_time: Time,
unix_permissions: T.nilable(Integer),
blk: T.proc.params(sink: ZipKit::Streamer::Writable).void
).returns(ZipKit::Streamer::Writable)
end
def write_stored_file(filename, modification_time: Time.now.utc, unix_permissions: nil, &blk); end
# Opens the stream for a deflated file in the archive, and yields a writer
# for that file to the block. Once the write completes, a data descriptor
# will be written with the actual compressed/uncompressed sizes and the
# CRC32 checksum.
#
# Using a block, the write will be terminated with a data descriptor outright.
#
# zip.write_stored_file("foo.txt") do |sink|
# IO.copy_stream(source_file, sink)
# end
#
# If deferred writes are desired (for example - to integrate with an API that
# does not support blocks, or to work with non-blocking environments) the method
# has to be called without a block. In that case it returns the sink instead,
# permitting to write to it in a deferred fashion. When `close` is called on
# the sink, any remanining compression output will be flushed and the data
# descriptor is going to be written.
#
# Note that even though it does not have to happen within the same call stack,
# call sequencing still must be observed. It is therefore not possible to do
# this:
#
# writer_for_file1 = zip.write_deflated_file("somefile.jpg")
# writer_for_file2 = zip.write_deflated_file("another.tif")
# writer_for_file1 << data
# writer_for_file2 << data
# writer_for_file1.close
# writer_for_file2.close
#
# because it is likely to result in an invalid ZIP file structure later on.
# So using this facility in async scenarios is certainly possible, but care
# and attention is recommended.
#
# If an exception is raised inside the block that is passed to the method, a `rollback!` call
# will be performed automatically and the entry just written will be omitted from the ZIP
# central directory. This can be useful if you want to rescue the exception and reattempt
# adding the ZIP file. Note that you will need to call `write_deflated_file` again to start a
# new file - you can't keep writing to the one that failed.
#
# _@param_ `filename` — the name of the file in the archive
#
# _@param_ `modification_time` — the modification time of the file in the archive
#
# _@param_ `unix_permissions` — which UNIX permissions to set, normally the default should be used
#
# _@return_ — without a block - the Writable sink which has to be closed manually
sig do
params(
filename: String,
modification_time: Time,
unix_permissions: T.nilable(Integer),
blk: T.proc.params(sink: ZipKit::Streamer::Writable).void
).returns(ZipKit::Streamer::Writable)
end
def write_deflated_file(filename, modification_time: Time.now.utc, unix_permissions: nil, &blk); end
# Closes the archive. Writes the central directory, and switches the writer into
# a state where it can no longer be written to.
#
# Once this method is called, the `Streamer` should be discarded (the ZIP archive is complete).
#
# _@return_ — the offset the output IO is at after closing the archive
sig { returns(Integer) }
def close; end
# Sets up the ZipWriter with wrappers if necessary. The method is called once, when the Streamer
# gets instantiated - the Writer then gets reused. This method is primarily there so that you
# can override it.
#
# _@return_ — the writer to perform writes with
sig { returns(ZipKit::ZipWriter) }
def create_writer; end
# Updates the last entry written with the CRC32 checksum and compressed/uncompressed
# sizes. For stored entries, `compressed_size` and `uncompressed_size` are the same.
# After updating the entry will immediately write the data descriptor bytes
# to the output.
#
# _@param_ `crc32` — the CRC32 checksum of the entry when uncompressed
#
# _@param_ `compressed_size` — the size of the compressed segment within the ZIP
#
# _@param_ `uncompressed_size` — the size of the entry once uncompressed
#
# _@return_ — the offset the output IO is at after writing the data descriptor
sig { params(crc32: Integer, compressed_size: Integer, uncompressed_size: Integer).returns(Integer) }
def update_last_entry_and_write_data_descriptor(crc32:, compressed_size:, uncompressed_size:); end
# Removes the buffered local entry for the last file written. This can be used when rescuing from exceptions
# when you want to skip the file that failed writing into the ZIP from getting written out into the
# ZIP central directory. This is useful when, for example, you encounter errors retrieving the file
# that you want to place inside the ZIP from a remote storage location and some network exception
# gets raised. `write_deflated_file` and `write_stored_file` will rollback for you automatically.
# Of course it is not possible to remove the failed entry from the ZIP file entirely, as the data
# is likely already on the wire. However, excluding the entry from the central directory of the ZIP
# file will allow better-behaved ZIP unarchivers to extract the entries which did store correctly,
# provided they read the ZIP from the central directory and not straight-ahead.
#
# _@return_ — position in the output stream / ZIP archive
#
# ```ruby
# zip.add_stored_entry(filename: "data.bin", size: 4.megabytes, crc32: the_crc)
# while chunk = remote.read(65*2048)
# zip << chunk
# rescue Timeout::Error
# zip.rollback!
# # and proceed to the next file
# end
# ```
sig { returns(Integer) }
def rollback!; end
# sord omit - no YARD type given for "writable", using untyped
# sord omit - no YARD return type given, using untyped
sig { params(writable: T.untyped, block_to_pass_writable_to: T.untyped).returns(T.untyped) }
def yield_or_return_writable(writable, &block_to_pass_writable_to); end
# sord omit - no YARD return type given, using untyped
sig { returns(T.untyped) }
def verify_offsets!; end
# sord omit - no YARD type given for "filename:", using untyped
# sord omit - no YARD type given for "modification_time:", using untyped
# sord omit - no YARD type given for "crc32:", using untyped
# sord omit - no YARD type given for "storage_mode:", using untyped
# sord omit - no YARD type given for "compressed_size:", using untyped
# sord omit - no YARD type given for "uncompressed_size:", using untyped
# sord omit - no YARD type given for "use_data_descriptor:", using untyped
# sord omit - no YARD type given for "unix_permissions:", using untyped
# sord omit - no YARD return type given, using untyped
sig do
params(
filename: T.untyped,
modification_time: T.untyped,
crc32: T.untyped,
storage_mode: T.untyped,
compressed_size: T.untyped,
uncompressed_size: T.untyped,
use_data_descriptor: T.untyped,
unix_permissions: T.untyped
).returns(T.untyped)
end
def add_file_and_write_local_header(filename:, modification_time:, crc32:, storage_mode:, compressed_size:, uncompressed_size:, use_data_descriptor:, unix_permissions:); end
# sord omit - no YARD type given for "filename", using untyped
# sord omit - no YARD return type given, using untyped
sig { params(filename: T.untyped).returns(T.untyped) }
def remove_backslash(filename); end
# Writes the given data to the output stream. Allows the object to be used as
# a target for `IO.copy_stream(from, to)`
#
# _@param_ `bytes` — the binary string to write (part of the uncompressed file)
#
# _@return_ — the number of bytes written (will always be the bytesize of `bytes`)
sig { params(bytes: String).returns(Integer) }
def write(bytes); end
# Is used internally by Streamer to keep track of entries in the archive during writing.
# Normally you will not have to use this class directly
class Entry < Struct
sig { void }
def initialize; end
# sord omit - no YARD return type given, using untyped
sig { returns(T.untyped) }
def total_bytes_used; end
# sord omit - no YARD return type given, using untyped
# Set the general purpose flags for the entry. We care about is the EFS
# bit (bit 11) which should be set if the filename is UTF8. If it is, we need to set the
# bit so that the unarchiving application knows that the filename in the archive is UTF-8
# encoded, and not some DOS default. For ASCII entries it does not matter.
# Additionally, we care about bit 3 which toggles the use of the postfix data descriptor.
sig { returns(T.untyped) }
def gp_flags; end
sig { returns(T::Boolean) }
def filler?; end
# Returns the value of attribute filename
sig { returns(Object) }
attr_accessor :filename
# Returns the value of attribute crc32
sig { returns(Object) }
attr_accessor :crc32
# Returns the value of attribute compressed_size
sig { returns(Object) }
attr_accessor :compressed_size
# Returns the value of attribute uncompressed_size
sig { returns(Object) }
attr_accessor :uncompressed_size
# Returns the value of attribute storage_mode
sig { returns(Object) }
attr_accessor :storage_mode
# Returns the value of attribute mtime
sig { returns(Object) }
attr_accessor :mtime
# Returns the value of attribute use_data_descriptor
sig { returns(Object) }
attr_accessor :use_data_descriptor
# Returns the value of attribute local_header_offset
sig { returns(Object) }
attr_accessor :local_header_offset
# Returns the value of attribute bytes_used_for_local_header
sig { returns(Object) }
attr_accessor :bytes_used_for_local_header
# Returns the value of attribute bytes_used_for_data_descriptor
sig { returns(Object) }
attr_accessor :bytes_used_for_data_descriptor
# Returns the value of attribute unix_permissions
sig { returns(Object) }
attr_accessor :unix_permissions
end
# Is used internally by Streamer to keep track of entries in the archive during writing.
# Normally you will not have to use this class directly
class Filler < Struct
sig { returns(T::Boolean) }
def filler?; end
# Returns the value of attribute total_bytes_used
sig { returns(Object) }
attr_accessor :total_bytes_used
end
# Gets yielded from the writing methods of the Streamer
# and accepts the data being written into the ZIP for deflate
# or stored modes. Can be used as a destination for `IO.copy_stream`
#
# IO.copy_stream(File.open('source.bin', 'rb), writable)
class Writable
include ZipKit::WriteShovel
# sord omit - no YARD type given for "streamer", using untyped
# sord omit - no YARD type given for "writer", using untyped
# Initializes a new Writable with the object it delegates the writes to.
# Normally you would not need to use this method directly
sig { params(streamer: T.untyped, writer: T.untyped).void }
def initialize(streamer, writer); end
# Writes the given data to the output stream
#
# _@param_ `d` — the binary string to write (part of the uncompressed file)
sig { params(d: String).returns(T.self_type) }
def <<(d); end
# sord omit - no YARD return type given, using untyped
# Flushes the writer and recovers the CRC32/size values. It then calls
# `update_last_entry_and_write_data_descriptor` on the given Streamer.
sig { returns(T.untyped) }
def close; end
# Writes the given data to the output stream. Allows the object to be used as
# a target for `IO.copy_stream(from, to)`
#
# _@param_ `bytes` — the binary string to write (part of the uncompressed file)
#
# _@return_ — the number of bytes written (will always be the bytesize of `bytes`)
sig { params(bytes: String).returns(Integer) }
def write(bytes); end
end
# Will be used to pick whether to store a file in the `stored` or
# `deflated` mode, by compressing the first N bytes of the file and
# comparing the stored and deflated data sizes. If deflate produces
# a sizable compression gain for this data, it will create a deflated
# file inside the ZIP archive. If the file doesn't compress well, it
# will use the "stored" mode for the entry. About 128KB of the
# file will be buffered to pick the appropriate storage mode. The
# Heuristic will call either `write_stored_file` or `write_deflated_file`
# on the Streamer passed into it once it knows which compression
# method should be applied
class Heuristic < ZipKit::Streamer::Writable
BYTES_WRITTEN_THRESHOLD = T.let(128 * 1024, T.untyped)
MINIMUM_VIABLE_COMPRESSION = T.let(0.75, T.untyped)
# sord omit - no YARD type given for "streamer", using untyped
# sord omit - no YARD type given for "filename", using untyped
# sord omit - no YARD type given for "**write_file_options", using untyped
sig { params(streamer: T.untyped, filename: T.untyped, write_file_options: T.untyped).void }
def initialize(streamer, filename, **write_file_options); end
# sord infer - argument name in single @param inferred as "bytes"
sig { params(bytes: String).returns(T.self_type) }
def <<(bytes); end
# sord omit - no YARD return type given, using untyped
sig { returns(T.untyped) }
def close; end
# sord omit - no YARD return type given, using untyped
sig { returns(T.untyped) }
def decide; end
end
# Sends writes to the given `io`, and also registers all the data passing
# through it in a CRC32 checksum calculator. Is made to be completely
# interchangeable with the DeflatedWriter in terms of interface.
class StoredWriter
include ZipKit::WriteShovel
CRC32_BUFFER_SIZE = T.let(64 * 1024, T.untyped)
# sord omit - no YARD type given for "io", using untyped
sig { params(io: T.untyped).void }
def initialize(io); end
# Writes the given data to the contained IO object.
#
# _@param_ `data` — data to be written
#
# _@return_ — self
sig { params(data: String).returns(T.untyped) }
def <<(data); end
# Returns the amount of data written and the CRC32 checksum. The return value
# can be directly used as the argument to {Streamer#update_last_entry_and_write_data_descriptor}
#
# _@return_ — a hash of `{crc32, compressed_size, uncompressed_size}`
sig { returns(T::Hash[T.untyped, T.untyped]) }
def finish; end
# Writes the given data to the output stream. Allows the object to be used as
# a target for `IO.copy_stream(from, to)`
#
# _@param_ `bytes` — the binary string to write (part of the uncompressed file)
#
# _@return_ — the number of bytes written (will always be the bytesize of `bytes`)
sig { params(bytes: String).returns(Integer) }
def write(bytes); end
end
# Sends writes to the given `io` compressed using a `Zlib::Deflate`. Also
# registers data passing through it in a CRC32 checksum calculator. Is made to be completely
# interchangeable with the StoredWriter in terms of interface.
class DeflatedWriter
include ZipKit::WriteShovel
CRC32_BUFFER_SIZE = T.let(64 * 1024, T.untyped)
# sord omit - no YARD type given for "io", using untyped
sig { params(io: T.untyped).void }
def initialize(io); end
# Writes the given data into the deflater, and flushes the deflater
# after having written more than FLUSH_EVERY_N_BYTES bytes of data
#
# _@param_ `data` — data to be written
#
# _@return_ — self
sig { params(data: String).returns(T.untyped) }
def <<(data); end
# Returns the amount of data received for writing, the amount of
# compressed data written and the CRC32 checksum. The return value
# can be directly used as the argument to {Streamer#update_last_entry_and_write_data_descriptor}
#
# _@return_ — a hash of `{crc32, compressed_size, uncompressed_size}`
sig { returns(T::Hash[T.untyped, T.untyped]) }
def finish; end
# Writes the given data to the output stream. Allows the object to be used as
# a target for `IO.copy_stream(from, to)`
#
# _@param_ `bytes` — the binary string to write (part of the uncompressed file)
#
# _@return_ — the number of bytes written (will always be the bytesize of `bytes`)
sig { params(bytes: String).returns(Integer) }
def write(bytes); end
end
end
# An object that fakes just-enough of an IO to be dangerous
# - or, more precisely, to be useful as a source for the FileReader
# central directory parser. Effectively we substitute an IO object
# for an object that fetches parts of the remote file over HTTP using `Range:`
# headers. The `RemoteIO` acts as an adapter between an object that performs the
# actual fetches over HTTP and an object that expects a handful of IO methods to be
# available.
class RemoteIO
# sord warn - URI wasn't able to be resolved to a constant in this project
# _@param_ `url` — the HTTP/HTTPS URL of the object to be retrieved
sig { params(url: T.any(String, URI)).void }
def initialize(url); end
# sord omit - no YARD return type given, using untyped
# Emulates IO#seek
#
# _@param_ `offset` — absolute offset in the remote resource to seek to
#
# _@param_ `mode` — The seek mode (only SEEK_SET is supported)
sig { params(offset: Integer, mode: Integer).returns(T.untyped) }
def seek(offset, mode = IO::SEEK_SET); end
# Emulates IO#size.
#
# _@return_ — the size of the remote resource
sig { returns(Integer) }
def size; end
# Emulates IO#read, but requires the number of bytes to read
# The read will be limited to the
# size of the remote resource relative to the current offset in the IO,
# so if you are at offset 0 in the IO of size 10, doing a `read(20)`
# will only return you 10 bytes of result, and not raise any exceptions.
#
# _@param_ `n_bytes` — how many bytes to read, or `nil` to read all the way to the end
#
# _@return_ — the read bytes
sig { params(n_bytes: T.nilable(Integer)).returns(String) }
def read(n_bytes = nil); end
# Returns the current pointer position within the IO
sig { returns(Integer) }
def tell; end
# Only used internally when reading the remote ZIP.
#
# _@param_ `range` — the HTTP range of data to fetch from remote
#
# _@return_ — the response body of the ranged request
sig { params(range: T::Range[T.untyped]).returns(String) }
def request_range(range); end
# For working with S3 it is a better idea to perform a GET request for one byte, since doing a HEAD
# request needs a different permission - and standard GET presigned URLs are not allowed to perform it
#
# _@return_ — the size of the remote resource, parsed either from Content-Length or Content-Range header
sig { returns(Integer) }
def request_object_size; end
# sord omit - no YARD type given for "a", using untyped
# sord omit - no YARD type given for "b", using untyped
# sord omit - no YARD type given for "c", using untyped
# sord omit - no YARD return type given, using untyped
sig { params(a: T.untyped, b: T.untyped, c: T.untyped).returns(T.untyped) }
def clamp(a, b, c); end
end
# A low-level ZIP file data writer. You can use it to write out various headers and central directory elements
# separately. The class handles the actual encoding of the data according to the ZIP format APPNOTE document.
#
# The primary reason the writer is a separate object is because it is kept stateless. That is, all the data that
# is needed for writing a piece of the ZIP (say, the EOCD record, or a data descriptor) can be written
# without depending on data available elsewhere. This makes the writer very easy to test, since each of
# it's methods outputs something that only depends on the method's arguments. For example, we use this
# to test writing Zip64 files which, when tested in a streaming fashion, would need tricky IO stubs
# to wind IO objects back and forth by large offsets. Instead, we can just write out the EOCD record
# with given offsets as arguments.
#
# Since some methods need a lot of data about the entity being written, everything is passed via
# keyword arguments - this way it is much less likely that you can make a mistake writing something.
#
# Another reason for having a separate Writer is that most ZIP libraries attach the methods for
# writing out the file headers to some sort of Entry object, which represents a file within the ZIP.
# However, when you are diagnosing issues with the ZIP files you produce, you actually want to have
# absolute _most_ of the code responsible for writing the actual encoded bytes available to you on
# one screen. Altering or checking that code then becomes much, much easier. The methods doing the
# writing are also intentionally left very verbose - so that you can follow what is happening at
# all times.
#
# All methods of the writer accept anything that responds to `<<` as `io` argument - you can use
# that to output to String objects, or to output to Arrays that you can later join together.
class ZipWriter
FOUR_BYTE_MAX_UINT = T.let(0xFFFFFFFF, T.untyped)
TWO_BYTE_MAX_UINT = T.let(0xFFFF, T.untyped)
ZIP_KIT_COMMENT = T.let("Written using ZipKit %<version>s" % {version: ZipKit::VERSION}, T.untyped)
VERSION_MADE_BY = T.let(52, T.untyped)
VERSION_NEEDED_TO_EXTRACT = T.let(20, T.untyped)
VERSION_NEEDED_TO_EXTRACT_ZIP64 = T.let(45, T.untyped)
DEFAULT_FILE_UNIX_PERMISSIONS = T.let(0o644, T.untyped)
DEFAULT_DIRECTORY_UNIX_PERMISSIONS = T.let(0o755, T.untyped)
FILE_TYPE_FILE = T.let(0o10, T.untyped)
FILE_TYPE_DIRECTORY = T.let(0o04, T.untyped)
MADE_BY_SIGNATURE = T.let(begin
# A combination of the VERSION_MADE_BY low byte and the OS type high byte
os_type = 3 # UNIX
[VERSION_MADE_BY, os_type].pack("CC")
end, T.untyped)
C_UINT4 = T.let("V", T.untyped)
C_UINT2 = T.let("v", T.untyped)
C_UINT8 = T.let("Q<", T.untyped)
C_CHAR = T.let("C", T.untyped)
C_INT4 = T.let("l<", T.untyped)
# sord duck - #<< looks like a duck type, replacing with untyped
# Writes the local file header, that precedes the actual file _data_.
#
# _@param_ `io` — the buffer to write the local file header to
#
# _@param_ `filename` — the name of the file in the archive
#
# _@param_ `compressed_size` — The size of the compressed (or stored) data - how much space it uses in the ZIP
#
# _@param_ `uncompressed_size` — The size of the file once extracted
#
# _@param_ `crc32` — The CRC32 checksum of the file
#
# _@param_ `mtime` — the modification time to be recorded in the ZIP
#
# _@param_ `gp_flags` — bit-packed general purpose flags
#
# _@param_ `storage_mode` — 8 for deflated, 0 for stored...
sig do
params(
io: T.untyped,
filename: String,
compressed_size: Integer,
uncompressed_size: Integer,
crc32: Integer,
gp_flags: Integer,
mtime: Time,
storage_mode: Integer
).void
end
def write_local_file_header(io:, filename:, compressed_size:, uncompressed_size:, crc32:, gp_flags:, mtime:, storage_mode:); end
# sord duck - #<< looks like a duck type, replacing with untyped
# sord omit - no YARD type given for "local_file_header_location:", using untyped
# sord omit - no YARD type given for "storage_mode:", using untyped
# Writes the file header for the central directory, for a particular file in the archive. When writing out this data,
# ensure that the CRC32 and both sizes (compressed/uncompressed) are correct for the entry in question.
#
# _@param_ `io` — the buffer to write the local file header to
#
# _@param_ `filename` — the name of the file in the archive
#
# _@param_ `compressed_size` — The size of the compressed (or stored) data - how much space it uses in the ZIP
#
# _@param_ `uncompressed_size` — The size of the file once extracted
#
# _@param_ `crc32` — The CRC32 checksum of the file
#
# _@param_ `mtime` — the modification time to be recorded in the ZIP
#
# _@param_ `gp_flags` — bit-packed general purpose flags
#
# _@param_ `unix_permissions` — the permissions for the file, or nil for the default to be used
sig do
params(
io: T.untyped,
local_file_header_location: T.untyped,
gp_flags: Integer,
storage_mode: T.untyped,
compressed_size: Integer,
uncompressed_size: Integer,
mtime: Time,
crc32: Integer,
filename: String,
unix_permissions: T.nilable(Integer)
).void
end
def write_central_directory_file_header(io:, local_file_header_location:, gp_flags:, storage_mode:, compressed_size:, uncompressed_size:, mtime:, crc32:, filename:, unix_permissions: nil); end
# sord duck - #<< looks like a duck type, replacing with untyped
# Writes the data descriptor following the file data for a file whose local file header
# was written with general-purpose flag bit 3 set. If the one of the sizes exceeds the Zip64 threshold,
# the data descriptor will have the sizes written out as 8-byte values instead of 4-byte values.
#
# _@param_ `io` — the buffer to write the local file header to
#
# _@param_ `crc32` — The CRC32 checksum of the file
#
# _@param_ `compressed_size` — The size of the compressed (or stored) data - how much space it uses in the ZIP
#
# _@param_ `uncompressed_size` — The size of the file once extracted
sig do
params(
io: T.untyped,
compressed_size: Integer,
uncompressed_size: Integer,
crc32: Integer