Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decompress with ISA-L #418

Merged
merged 18 commits into from
Sep 2, 2024
Merged

Decompress with ISA-L #418

merged 18 commits into from
Sep 2, 2024

Conversation

marcelm
Copy link
Collaborator

@marcelm marcelm commented Apr 19, 2024

This is a continuation of #269 where I fixed the things I commented on.

I will need to re-measure memory usage. At some point, it was higher than previously, but that seems to be fixed now.

As intended, runtime improves slightly.

Marked as draft because two issues remain:

  • Memory usage increases significantly, probably because of mmap usage, which I don’t think makes so much sense.
  • Multi-block gzips (concatenated gzip files) are not supported and hang the input thread.

@marcelm
Copy link
Collaborator Author

marcelm commented Apr 30, 2024

I found two issues:

  • Memory usage increases significantly, probably because of mmap usage. I don’t think using mmap makes a lot of sense because the input should be streamed instead.
  • Multi-block gzips (concatenated gzip files) are not supported and hang the input thread.

I’ll leave this for now, there are bigger improvements to be made elsewhere.

@teepean
Copy link
Contributor

teepean commented Apr 30, 2024

Htslib uses libdeflate so maybe that would be a better option?

@marcelm
Copy link
Collaborator Author

marcelm commented Apr 30, 2024

The Python isal bindings support multiblock gzips, so it is doable with ISA-L (maybe only a newer version is needed).

telmin and others added 14 commits August 27, 2024 11:26
add isa-l to build

fix isal include and library path

add separate class to hold the file.

add IsalIO class to read gzipped file

add RawIO class.

Add double buffering to RawIO

fix github action

Fix when file names do not include file extensions.

fix github action

remove warning

remove inline

fix method name

fix classname typo

fix typo

Switched to using the package manager's version instead of building ISA-L.

fix CI

apply clang-format to iowrap.hpp

fix include guard

fix forgot to add override to RawIO::ReaderName

apply clang-format to iowrap.cpp

remove meaningless line

fix isa-l include

Change class names

move find_package(PkgConfig) up

Adjust buffer size

rename ReaderName to name and add maybe_unused

add maybe_unused

fixed test failure on MacOS

adjust buffer size
The difference is small and maybe not significant, but if anything,
this makes it faster.
It is now unused, but we leave it in for now so that we can later enable it
as a fallback in case isa-l is not available.
Still not working as only the first block is processed
@marcelm marcelm marked this pull request as ready for review August 27, 2024 13:14
@marcelm
Copy link
Collaborator Author

marcelm commented Aug 27, 2024

I’ve managed to get the code to work with multiblock gzips (concatenated gzips) and also observed that there’s not actually an increase in memory usage. So from that side, everything seems to be fine.

However, when trying to compile this on our cluster, I noticed that the ISA-L library is not available and so I get an error at build time. I cannot just apt-get install libisal-dev there, so we need a different way to ensure that strobealign can be built.

Also, we’ve earlier changed the build system in such a way that no external code is downloaded during the build because this was requested at some point in time. So the most flexible solution would allow for three possibilities:

  • Use an isa-l available on the system
  • Download isa-l and compile it at build time
  • Don’t use isa-l, fall back to regular gzip

Let’s see whether I can get the above to work somehow.

@marcelm marcelm force-pushed the add-isal branch 2 times, most recently from 5a4a022 to 462301e Compare August 30, 2024 11:19
@marcelm
Copy link
Collaborator Author

marcelm commented Aug 30, 2024

Ok, this is now done. Without any options, CMake tries to use a system-installed libisal. If that does not work, an error message will be printed telling the user to try -DISAL=download. With that option, ISA-L is downloaded at build time.

I left out the fallback to regular gzip because that requires more code changes, and I don’t really see the advantage of not using isal. We can still add that if anyone requests it.

Performance-wise, it looks really good: With this, running strobealign with 128 cores will fully utilize all cores, see #435 (comment).

@ksahlin
Copy link
Owner

ksahlin commented Aug 30, 2024

This is great! But I get a more cryptic error message (both as default build and with -DISAL=download).

Built on two Macbook pros (one with intel and one with M1) with the same error.

$ cmake -B build -DCMAKE_C_FLAGS="-march=native" -DCMAKE_CXX_FLAGS="-march=native"
-- The C compiler identification is AppleClang 11.0.3.11030032
-- The CXX compiler identification is AppleClang 11.0.3.11030032
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /Library/Developer/CommandLineTools/usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /Library/Developer/CommandLineTools/usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found ZLIB: /Library/Developer/CommandLineTools/SDKs/MacOSX10.15.sdk/usr/lib/libz.tbd (found version "1.2.11") 
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE  
-- Could NOT find OpenMP_C (missing: OpenMP_C_FLAGS OpenMP_C_LIB_NAMES) 
-- Could NOT find OpenMP_CXX (missing: OpenMP_CXX_FLAGS OpenMP_CXX_LIB_NAMES) 
-- Could NOT find OpenMP (missing: OpenMP_C_FOUND OpenMP_CXX_FOUND) 
CMake Error at /usr/local/Cellar/cmake/3.24.1/share/cmake/Modules/FindPackageHandleStandardArgs.cmake:230 (message):
  Could NOT find PkgConfig (missing: PKG_CONFIG_EXECUTABLE)
Call Stack (most recent call first):
  /usr/local/Cellar/cmake/3.24.1/share/cmake/Modules/FindPackageHandleStandardArgs.cmake:594 (_FPHSA_FAILURE_MESSAGE)
  /usr/local/Cellar/cmake/3.24.1/share/cmake/Modules/FindPkgConfig.cmake:99 (find_package_handle_standard_args)
  CMakeLists.txt:14 (find_package)


-- Configuring incomplete, errors occurred!
See also "/Users/ksahlin/prefix/source/StrobeAlign/build/CMakeFiles/CMakeOutput.log".
See also "/Users/ksahlin/prefix/source/StrobeAlign/build/CMakeFiles/CMakeError.log".

Printing the CMakeError.log below.

$ cat /Users/ksahlin/prefix/source/StrobeAlign/build/CMakeFiles/CMakeError.log
Compiling the C compiler identification source file "CMakeCCompilerId.c" failed.
Compiler: /Library/Developer/CommandLineTools/usr/bin/cc 
Build flags: -march=native
Id flags:  

The output was:
1
ld: library not found for -lSystem
clang: error: linker command failed with exit code 1 (use -v to see invocation)


Compiling the CXX compiler identification source file "CMakeCXXCompilerId.cpp" failed.
Compiler: /Library/Developer/CommandLineTools/usr/bin/c++ 
Build flags: -march=native
Id flags:  

The output was:
1
ld: library not found for -lc++
clang: error: linker command failed with exit code 1 (use -v to see invocation)



@ksahlin
Copy link
Owner

ksahlin commented Aug 30, 2024

By the way, does this PR mean that there is no need to do

strobealign -t ${threads} CHM13.fa <(igzip -dc in.1.fastq.gz) <(igzip -dc in.2.fastq.gz) > /dev/null

but instead we could feed the .gz files directly as

strobealign -t ${threads} CHM13.fa in.1.fastq.gz  in.2.fastq.gz > /dev/null

?

@ksahlin
Copy link
Owner

ksahlin commented Aug 30, 2024

Also, I noticed in #435 (comment) that the runtimes with ISA-L are slower for 64 and 32 cores (6% and 18%, resp) even though having higher CPU utilization. So I imagine running strobealign with fewer cores (8,4,2) would suffer even more?

I still think it is nice to get the speedup for highly parallelised runs as it represents a use case. But it would be good to see what the trade-off will be for the other use cases.

@marcelm
Copy link
Collaborator Author

marcelm commented Aug 30, 2024

CMake Error at /usr/local/Cellar/cmake/3.24.1/share/cmake/Modules/FindPackageHandleStandardArgs.cmake:230 (message):
Could NOT find PkgConfig (missing: PKG_CONFIG_EXECUTABLE)

As you can tell, I haven’t tested on Mac ... but since the CI checks run on Mac, I’m sure that it can be made to work somehow. You need pkg-config. Maybe brew install pkg-config or so? (And while you’re at it, do a brew install isa-l.) Do you have brew?

By the way, does this PR mean that there is no need to do

strobealign -t ${threads} CHM13.fa <(igzip -dc in.1.fastq.gz) <(igzip -dc in.2.fastq.gz) > /dev/null

Yes!

Also, I noticed in #435 (comment) that the runtimes with ISA-L are slower for 64 and 32 cores (6% and 18%, resp) even though having higher CPU utilization. So I imagine running strobealign with fewer cores (8,4,2) would suffer even more?

I ran these measurements just once on busy nodes (with other jobs running), the aim was not to get accurate runtimes, but to see the CPU usage. I am quite sure the differences are not due to actual runtime differences.

I tested this locally on my machine with 8 cores, and there the isal version is consistently 3% faster. I can do some more measurements to be sure, but I am quite certain that there is no tradeoff, this is just faster even when running on a single core.

@ksahlin
Copy link
Owner

ksahlin commented Aug 31, 2024

I did brew install pkg-config and this makes build produce the intended error message:

CMake Error at CMakeLists.txt:77 (message):
  libisal (ISA-L) could not be found.  Either install the library using your
  package manager or add '-DISAL=download' to the CMake options in order to
  download ISA-L at build time.

Then I added -DISAL=download flag to build command as

cmake -B build -DCMAKE_C_FLAGS="-march=native" -DCMAKE_CXX_FLAGS="-march=native" -DISAL=download

as I wanted to try this option (i.e., I never did brew install isa-l), but then I get:

CMake Error at CMakeLists.txt:84 (find_program):
  Could not find NASM_program using the following names: nasm

@marcelm
Copy link
Collaborator Author

marcelm commented Aug 31, 2024

Could not find NASM_program using the following names: nasm

Hm, yes, you need to do brew install nasm ... It seems the -DISAL=download option is not so useful on macOS because you need to run brew install no matter what. It is still useful in general because it allows to compile strobealign on the PDC cluster dardel after a module load nasm.

I made the error message when NASM is missing a bit more helpful.

@ksahlin
Copy link
Owner

ksahlin commented Sep 1, 2024

Thanks, I like the new error messages.

If I do

$ cmake -B build -DCMAKE_C_FLAGS="-march=native" -DCMAKE_CXX_FLAGS="-march=native" -DISAL=download 
$ make -j -C build

I get the error pasted at bottom.

If i instead do

$ cmake -B build -DCMAKE_C_FLAGS="-march=native" -DCMAKE_CXX_FLAGS="-march=native"
$ brew install isa-l
$ make -j -C build

the installation completes without errors. So maybe there is something with the download option that doesn’t work for OSx?

Error

$ make -j -C build
[  2%] Creating directories for 'isal_external'
[  4%] Performing download step (git clone) for 'isal_external'
Setting version to 0.13.0-42-g2c5c008-dirty
[  4%] Built target version
Cloning into 'isal_external'...
HEAD is now at 2df39cf build: Bump revision to 2.30
[  6%] Performing update step for 'isal_external'
[  9%] No patch step for 'isal_external'
[ 11%] No configure step for 'isal_external'
[ 13%] Performing build step for 'isal_external'
  ---> Building erasure_code/ec_base.c  x86_64 
  ---> Building raid/raid_base.c  x86_64 
  ---> Building crc/crc_base.c  x86_64 
  ---> Building crc/crc64_base.c  x86_64 
  ---> Building igzip/igzip.c  x86_64 
  ---> Building igzip/hufftables_c.c  x86_64 
  ---> Building igzip/igzip_base.c  x86_64 
  ---> Building igzip/igzip_icf_base.c  x86_64 
  ---> Building igzip/adler32_base.c  x86_64 
  ---> Building igzip/flatten_ll.c  x86_64 
  ---> Building igzip/encode_df.c  x86_64 
  ---> Building igzip/igzip_icf_body.c  x86_64 
  ---> Building igzip/huff_codes.c  x86_64 
  ---> Building igzip/igzip_inflate.c  x86_64 
  ---> Building mem/mem_zero_detect_base.c  x86_64 
  ---> Building erasure_code/ec_highlevel_func.c  x86_64 
  ---> Building erasure_code/gf_vect_mul_sse.asm  x86_64 
  ---> Building erasure_code/gf_vect_mul_avx.asm  x86_64 
  ---> Building erasure_code/gf_vect_dot_prod_sse.asm  x86_64 
  ---> Building erasure_code/gf_vect_dot_prod_avx.asm  x86_64 
  ---> Building erasure_code/gf_vect_dot_prod_avx2.asm  x86_64 
  ---> Building erasure_code/gf_2vect_dot_prod_sse.asm  x86_64 
  ---> Building erasure_code/gf_3vect_dot_prod_sse.asm  x86_64 
  ---> Building erasure_code/gf_4vect_dot_prod_sse.asm  x86_64 
  ---> Building erasure_code/gf_5vect_dot_prod_sse.asm  x86_64 
  ---> Building erasure_code/gf_6vect_dot_prod_sse.asm  x86_64 
  ---> Building erasure_code/gf_2vect_dot_prod_avx.asm  x86_64 
  ---> Building erasure_code/gf_3vect_dot_prod_avx.asm  x86_64 
  ---> Building erasure_code/gf_4vect_dot_prod_avx.asm  x86_64 
  ---> Building erasure_code/gf_5vect_dot_prod_avx.asm  x86_64 
  ---> Building erasure_code/gf_6vect_dot_prod_avx.asm  x86_64 
  ---> Building erasure_code/gf_2vect_dot_prod_avx2.asm  x86_64 
  ---> Building erasure_code/gf_3vect_dot_prod_avx2.asm  x86_64 
  ---> Building erasure_code/gf_4vect_dot_prod_avx2.asm  x86_64 
  ---> Building erasure_code/gf_5vect_dot_prod_avx2.asm  x86_64 
  ---> Building erasure_code/gf_6vect_dot_prod_avx2.asm  x86_64 
  ---> Building erasure_code/gf_vect_mad_sse.asm  x86_64 
  ---> Building erasure_code/gf_2vect_mad_sse.asm  x86_64 
  ---> Building erasure_code/gf_3vect_mad_sse.asm  x86_64 
  ---> Building erasure_code/gf_4vect_mad_sse.asm  x86_64 
  ---> Building erasure_code/gf_5vect_mad_sse.asm  x86_64 
  ---> Building erasure_code/gf_6vect_mad_sse.asm  x86_64 
  ---> Building erasure_code/gf_vect_mad_avx.asm  x86_64 
  ---> Building erasure_code/gf_2vect_mad_avx.asm  x86_64 
  ---> Building erasure_code/gf_3vect_mad_avx.asm  x86_64 
  ---> Building erasure_code/gf_4vect_mad_avx.asm  x86_64 
  ---> Building erasure_code/gf_5vect_mad_avx.asm  x86_64 
  ---> Building erasure_code/gf_6vect_mad_avx.asm  x86_64 
  ---> Building erasure_code/gf_vect_mad_avx2.asm  x86_64 
  ---> Building erasure_code/gf_2vect_mad_avx2.asm  x86_64 
  ---> Building erasure_code/gf_3vect_mad_avx2.asm  x86_64 
  ---> Building erasure_code/gf_4vect_mad_avx2.asm  x86_64 
  ---> Building erasure_code/gf_5vect_mad_avx2.asm  x86_64 
  ---> Building erasure_code/gf_6vect_mad_avx2.asm  x86_64 
  ---> Building erasure_code/ec_multibinary.asm  x86_64 
  ---> Building erasure_code/gf_vect_dot_prod_avx512.asm  x86_64 
  ---> Building erasure_code/gf_2vect_dot_prod_avx512.asm  x86_64 
  ---> Building erasure_code/gf_3vect_dot_prod_avx512.asm  x86_64 
  ---> Building erasure_code/gf_4vect_dot_prod_avx512.asm  x86_64 
  ---> Building erasure_code/gf_5vect_dot_prod_avx512.asm  x86_64 
  ---> Building erasure_code/gf_6vect_dot_prod_avx512.asm  x86_64 
  ---> Building erasure_code/gf_vect_mad_avx512.asm  x86_64 
  ---> Building erasure_code/gf_2vect_mad_avx512.asm  x86_64 
  ---> Building erasure_code/gf_3vect_mad_avx512.asm  x86_64 
  ---> Building erasure_code/gf_4vect_mad_avx512.asm  x86_64 
  ---> Building erasure_code/gf_5vect_mad_avx512.asm  x86_64 
  ---> Building erasure_code/gf_6vect_mad_avx512.asm  x86_64 
  ---> Building raid/xor_gen_sse.asm  x86_64 
  ---> Building raid/pq_gen_sse.asm  x86_64 
  ---> Building raid/xor_check_sse.asm  x86_64 
  ---> Building raid/pq_check_sse.asm  x86_64 
  ---> Building raid/pq_gen_avx.asm  x86_64 
  ---> Building raid/xor_gen_avx.asm  x86_64 
  ---> Building raid/pq_gen_avx2.asm  x86_64 
  ---> Building raid/xor_gen_avx512.asm  x86_64 
  ---> Building raid/pq_gen_avx512.asm  x86_64 
  ---> Building raid/raid_multibinary.asm  x86_64 
  ---> Building crc/crc16_t10dif_01.asm  x86_64 
  ---> Building crc/crc16_t10dif_by4.asm  x86_64 
  ---> Building crc/crc16_t10dif_02.asm  x86_64 
  ---> Building crc/crc16_t10dif_by16_10.asm  x86_64 
  ---> Building crc/crc16_t10dif_copy_by4.asm  x86_64 
  ---> Building crc/crc16_t10dif_copy_by4_02.asm  x86_64 
  ---> Building crc/crc32_ieee_01.asm  x86_64 
  ---> Building crc/crc32_ieee_02.asm  x86_64 
  ---> Building crc/crc32_ieee_by4.asm  x86_64 
  ---> Building crc/crc32_ieee_by16_10.asm  x86_64 
  ---> Building crc/crc32_iscsi_01.asm  x86_64 
  ---> Building crc/crc32_iscsi_00.asm  x86_64 
  ---> Building crc/crc32_iscsi_by16_10.asm  x86_64 
  ---> Building crc/crc_multibinary.asm  x86_64 
  ---> Building crc/crc64_multibinary.asm  x86_64 
  ---> Building crc/crc64_ecma_refl_by8.asm  x86_64 
  ---> Building crc/crc64_ecma_refl_by16_10.asm  x86_64 
  ---> Building crc/crc64_ecma_norm_by8.asm  x86_64 
  ---> Building crc/crc64_ecma_norm_by16_10.asm  x86_64 
  ---> Building crc/crc64_iso_refl_by8.asm  x86_64 
  ---> Building crc/crc64_iso_refl_by16_10.asm  x86_64 
  ---> Building crc/crc64_iso_norm_by8.asm  x86_64 
  ---> Building crc/crc64_iso_norm_by16_10.asm  x86_64 
  ---> Building crc/crc64_jones_refl_by8.asm  x86_64 
  ---> Building crc/crc64_jones_refl_by16_10.asm  x86_64 
  ---> Building crc/crc64_jones_norm_by8.asm  x86_64 
  ---> Building crc/crc64_jones_norm_by16_10.asm  x86_64 
  ---> Building crc/crc32_gzip_refl_by8.asm  x86_64 
  ---> Building crc/crc32_gzip_refl_by8_02.asm  x86_64 
  ---> Building crc/crc32_gzip_refl_by16_10.asm  x86_64 
  ---> Building igzip/igzip_body.asm  x86_64 
  ---> Building igzip/igzip_finish.asm  x86_64 
  ---> Building igzip/igzip_icf_body_h1_gr_bt.asm  x86_64 
  ---> Building igzip/igzip_icf_finish.asm  x86_64 
  ---> Building igzip/rfc1951_lookup.asm  x86_64 
  ---> Building igzip/adler32_sse.asm  x86_64 
  ---> Building igzip/adler32_avx2_4.asm  x86_64 
  ---> Building igzip/igzip_multibinary.asm  x86_64 
  ---> Building igzip/igzip_update_histogram_01.asm  x86_64 
  ---> Building igzip/igzip_update_histogram_04.asm  x86_64 
  ---> Building igzip/igzip_decode_block_stateless_01.asm  x86_64 
  ---> Building igzip/igzip_decode_block_stateless_04.asm  x86_64 
  ---> Building igzip/igzip_inflate_multibinary.asm  x86_64 
  ---> Building igzip/encode_df_04.asm  x86_64 
  ---> Building igzip/encode_df_06.asm  x86_64 
  ---> Building igzip/proc_heap.asm  x86_64 
  ---> Building igzip/igzip_deflate_hash.asm  x86_64 
  ---> Building igzip/igzip_gen_icf_map_lh1_06.asm  x86_64 
  ---> Building igzip/igzip_gen_icf_map_lh1_04.asm  x86_64 
  ---> Building igzip/igzip_set_long_icf_fg_04.asm  x86_64 
  ---> Building igzip/igzip_set_long_icf_fg_06.asm  x86_64 
  ---> Building mem/mem_zero_detect_avx.asm  x86_64 
  ---> Building mem/mem_zero_detect_sse.asm  x86_64 
  ---> Building mem/mem_multibinary.asm  x86_64 
  ---> Building shared erasure_code/ec_base.c  x86_64 
  ---> Building shared raid/raid_base.c  x86_64 
  ---> Building shared crc/crc_base.c  x86_64 
  ---> Building shared crc/crc64_base.c  x86_64 
  ---> Building shared igzip/igzip.c  x86_64 
  ---> Building shared igzip/hufftables_c.c  x86_64 
  ---> Building shared igzip/igzip_base.c  x86_64 
  ---> Building shared igzip/igzip_icf_base.c  x86_64 
  ---> Building shared igzip/adler32_base.c  x86_64 
  ---> Building shared igzip/flatten_ll.c  x86_64 
  ---> Building shared igzip/encode_df.c  x86_64 
  ---> Building shared igzip/igzip_icf_body.c  x86_64 
  ---> Building shared igzip/huff_codes.c  x86_64 
  ---> Building shared igzip/igzip_inflate.c  x86_64 
  ---> Building shared mem/mem_zero_detect_base.c  x86_64 
  ---> Building shared erasure_code/ec_highlevel_func.c  x86_64 
  ---> Creating Lib bin/isa-l.a
ar: creating archive bin/isa-l.a
  ---> Building Programs programs/igzip DEBUG x86_64 
  ---> Creating Shared Lib bin/libisal.so
[ 16%] Performing install step for 'isal_external'
Building isa-l.h
make[3]: *** [install] Error 1
make[2]: *** [isal_external-prefix/src/isal_external-stamp/isal_external-install] Error 2
make[1]: *** [CMakeFiles/isal_external.dir/all] Error 2
make: *** [all] Error 2

@marcelm
Copy link
Collaborator Author

marcelm commented Sep 1, 2024

Can you try running make -C build VERBOSE=1 instead to maybe get some more helpful output? (Also without -j to ensure that the error is the last thing that happens)

@ksahlin
Copy link
Owner

ksahlin commented Sep 1, 2024

Ok, now I did:

brew uninstall isa-l
rm -rf build/
cmake -B build -DCMAKE_C_FLAGS="-march=native" -DCMAKE_CXX_FLAGS="-march=native" -DISAL=download
make -C build VERBOSE=1

Error

$ make -C build VERBOSE=1
/usr/local/Cellar/cmake/3.24.1/bin/cmake -S/Users/ksahlin/prefix/source/StrobeAlign -B/Users/ksahlin/prefix/source/StrobeAlign/build --check-build-system CMakeFiles/Makefile.cmake 0
/usr/local/Cellar/cmake/3.24.1/bin/cmake -E cmake_progress_start /Users/ksahlin/prefix/source/StrobeAlign/build/CMakeFiles /Users/ksahlin/prefix/source/StrobeAlign/build//CMakeFiles/progress.marks
/Library/Developer/CommandLineTools/usr/bin/make  -f CMakeFiles/Makefile2 all
/Library/Developer/CommandLineTools/usr/bin/make  -f CMakeFiles/isal_external.dir/build.make CMakeFiles/isal_external.dir/depend
cd /Users/ksahlin/prefix/source/StrobeAlign/build && /usr/local/Cellar/cmake/3.24.1/bin/cmake -E cmake_depends "Unix Makefiles" /Users/ksahlin/prefix/source/StrobeAlign /Users/ksahlin/prefix/source/StrobeAlign /Users/ksahlin/prefix/source/StrobeAlign/build /Users/ksahlin/prefix/source/StrobeAlign/build /Users/ksahlin/prefix/source/StrobeAlign/build/CMakeFiles/isal_external.dir/DependInfo.cmake --color=
/Library/Developer/CommandLineTools/usr/bin/make  -f CMakeFiles/isal_external.dir/build.make CMakeFiles/isal_external.dir/build
[  2%] Creating directories for 'isal_external'
/usr/local/Cellar/cmake/3.24.1/bin/cmake -Dcfgdir= -P /Users/ksahlin/prefix/source/StrobeAlign/build/isal_external-prefix/tmp/isal_external-mkdirs.cmake
/usr/local/Cellar/cmake/3.24.1/bin/cmake -E touch /Users/ksahlin/prefix/source/StrobeAlign/build/isal_external-prefix/src/isal_external-stamp/isal_external-mkdir
[  4%] Performing download step (git clone) for 'isal_external'
cd /Users/ksahlin/prefix/source/StrobeAlign/build/isal_external-prefix/src && /usr/local/Cellar/cmake/3.24.1/bin/cmake -P /Users/ksahlin/prefix/source/StrobeAlign/build/isal_external-prefix/tmp/isal_external-gitclone.cmake
Cloning into 'isal_external'...
HEAD is now at 2df39cf build: Bump revision to 2.30
cd /Users/ksahlin/prefix/source/StrobeAlign/build/isal_external-prefix/src && /usr/local/Cellar/cmake/3.24.1/bin/cmake -E touch /Users/ksahlin/prefix/source/StrobeAlign/build/isal_external-prefix/src/isal_external-stamp/isal_external-download
[  6%] Performing update step for 'isal_external'
cd /Users/ksahlin/prefix/source/StrobeAlign/build/isal_external-prefix/src/isal_external && /usr/local/Cellar/cmake/3.24.1/bin/cmake -P /Users/ksahlin/prefix/source/StrobeAlign/build/isal_external-prefix/tmp/isal_external-gitupdate.cmake
[  9%] No patch step for 'isal_external'
/usr/local/Cellar/cmake/3.24.1/bin/cmake -E echo_append
/usr/local/Cellar/cmake/3.24.1/bin/cmake -E touch /Users/ksahlin/prefix/source/StrobeAlign/build/isal_external-prefix/src/isal_external-stamp/isal_external-patch
[ 11%] No configure step for 'isal_external'
cd /Users/ksahlin/prefix/source/StrobeAlign/build/isal_external-prefix/src/isal_external && /usr/local/Cellar/cmake/3.24.1/bin/cmake -E echo_append
cd /Users/ksahlin/prefix/source/StrobeAlign/build/isal_external-prefix/src/isal_external && /usr/local/Cellar/cmake/3.24.1/bin/cmake -E touch /Users/ksahlin/prefix/source/StrobeAlign/build/isal_external-prefix/src/isal_external-stamp/isal_external-configure
[ 13%] Performing build step for 'isal_external'
cd /Users/ksahlin/prefix/source/StrobeAlign/build/isal_external-prefix/src/isal_external && make -f Makefile.unx
mkdir -p bin
  ---> Building erasure_code/ec_base.c  x86_64 
  ---> Building raid/raid_base.c  x86_64 
  ---> Building crc/crc_base.c  x86_64 
  ---> Building crc/crc64_base.c  x86_64 
  ---> Building igzip/igzip.c  x86_64 
  ---> Building igzip/hufftables_c.c  x86_64 
  ---> Building igzip/igzip_base.c  x86_64 
  ---> Building igzip/igzip_icf_base.c  x86_64 
  ---> Building igzip/adler32_base.c  x86_64 
  ---> Building igzip/flatten_ll.c  x86_64 
  ---> Building igzip/encode_df.c  x86_64 
  ---> Building igzip/igzip_icf_body.c  x86_64 
  ---> Building igzip/huff_codes.c  x86_64 
  ---> Building igzip/igzip_inflate.c  x86_64 
  ---> Building mem/mem_zero_detect_base.c  x86_64 
  ---> Building erasure_code/ec_highlevel_func.c  x86_64 
  ---> Building erasure_code/gf_vect_mul_sse.asm  x86_64 
  ---> Building erasure_code/gf_vect_mul_avx.asm  x86_64 
  ---> Building erasure_code/gf_vect_dot_prod_sse.asm  x86_64 
  ---> Building erasure_code/gf_vect_dot_prod_avx.asm  x86_64 
  ---> Building erasure_code/gf_vect_dot_prod_avx2.asm  x86_64 
  ---> Building erasure_code/gf_2vect_dot_prod_sse.asm  x86_64 
  ---> Building erasure_code/gf_3vect_dot_prod_sse.asm  x86_64 
  ---> Building erasure_code/gf_4vect_dot_prod_sse.asm  x86_64 
  ---> Building erasure_code/gf_5vect_dot_prod_sse.asm  x86_64 
  ---> Building erasure_code/gf_6vect_dot_prod_sse.asm  x86_64 
  ---> Building erasure_code/gf_2vect_dot_prod_avx.asm  x86_64 
  ---> Building erasure_code/gf_3vect_dot_prod_avx.asm  x86_64 
  ---> Building erasure_code/gf_4vect_dot_prod_avx.asm  x86_64 
  ---> Building erasure_code/gf_5vect_dot_prod_avx.asm  x86_64 
  ---> Building erasure_code/gf_6vect_dot_prod_avx.asm  x86_64 
  ---> Building erasure_code/gf_2vect_dot_prod_avx2.asm  x86_64 
  ---> Building erasure_code/gf_3vect_dot_prod_avx2.asm  x86_64 
  ---> Building erasure_code/gf_4vect_dot_prod_avx2.asm  x86_64 
  ---> Building erasure_code/gf_5vect_dot_prod_avx2.asm  x86_64 
  ---> Building erasure_code/gf_6vect_dot_prod_avx2.asm  x86_64 
  ---> Building erasure_code/gf_vect_mad_sse.asm  x86_64 
  ---> Building erasure_code/gf_2vect_mad_sse.asm  x86_64 
  ---> Building erasure_code/gf_3vect_mad_sse.asm  x86_64 
  ---> Building erasure_code/gf_4vect_mad_sse.asm  x86_64 
  ---> Building erasure_code/gf_5vect_mad_sse.asm  x86_64 
  ---> Building erasure_code/gf_6vect_mad_sse.asm  x86_64 
  ---> Building erasure_code/gf_vect_mad_avx.asm  x86_64 
  ---> Building erasure_code/gf_2vect_mad_avx.asm  x86_64 
  ---> Building erasure_code/gf_3vect_mad_avx.asm  x86_64 
  ---> Building erasure_code/gf_4vect_mad_avx.asm  x86_64 
  ---> Building erasure_code/gf_5vect_mad_avx.asm  x86_64 
  ---> Building erasure_code/gf_6vect_mad_avx.asm  x86_64 
  ---> Building erasure_code/gf_vect_mad_avx2.asm  x86_64 
  ---> Building erasure_code/gf_2vect_mad_avx2.asm  x86_64 
  ---> Building erasure_code/gf_3vect_mad_avx2.asm  x86_64 
  ---> Building erasure_code/gf_4vect_mad_avx2.asm  x86_64 
  ---> Building erasure_code/gf_5vect_mad_avx2.asm  x86_64 
  ---> Building erasure_code/gf_6vect_mad_avx2.asm  x86_64 
  ---> Building erasure_code/ec_multibinary.asm  x86_64 
  ---> Building erasure_code/gf_vect_dot_prod_avx512.asm  x86_64 
  ---> Building erasure_code/gf_2vect_dot_prod_avx512.asm  x86_64 
  ---> Building erasure_code/gf_3vect_dot_prod_avx512.asm  x86_64 
  ---> Building erasure_code/gf_4vect_dot_prod_avx512.asm  x86_64 
  ---> Building erasure_code/gf_5vect_dot_prod_avx512.asm  x86_64 
  ---> Building erasure_code/gf_6vect_dot_prod_avx512.asm  x86_64 
  ---> Building erasure_code/gf_vect_mad_avx512.asm  x86_64 
  ---> Building erasure_code/gf_2vect_mad_avx512.asm  x86_64 
  ---> Building erasure_code/gf_3vect_mad_avx512.asm  x86_64 
  ---> Building erasure_code/gf_4vect_mad_avx512.asm  x86_64 
  ---> Building erasure_code/gf_5vect_mad_avx512.asm  x86_64 
  ---> Building erasure_code/gf_6vect_mad_avx512.asm  x86_64 
  ---> Building raid/xor_gen_sse.asm  x86_64 
  ---> Building raid/pq_gen_sse.asm  x86_64 
  ---> Building raid/xor_check_sse.asm  x86_64 
  ---> Building raid/pq_check_sse.asm  x86_64 
  ---> Building raid/pq_gen_avx.asm  x86_64 
  ---> Building raid/xor_gen_avx.asm  x86_64 
  ---> Building raid/pq_gen_avx2.asm  x86_64 
  ---> Building raid/xor_gen_avx512.asm  x86_64 
  ---> Building raid/pq_gen_avx512.asm  x86_64 
  ---> Building raid/raid_multibinary.asm  x86_64 
  ---> Building crc/crc16_t10dif_01.asm  x86_64 
  ---> Building crc/crc16_t10dif_by4.asm  x86_64 
  ---> Building crc/crc16_t10dif_02.asm  x86_64 
  ---> Building crc/crc16_t10dif_by16_10.asm  x86_64 
  ---> Building crc/crc16_t10dif_copy_by4.asm  x86_64 
  ---> Building crc/crc16_t10dif_copy_by4_02.asm  x86_64 
  ---> Building crc/crc32_ieee_01.asm  x86_64 
  ---> Building crc/crc32_ieee_02.asm  x86_64 
  ---> Building crc/crc32_ieee_by4.asm  x86_64 
  ---> Building crc/crc32_ieee_by16_10.asm  x86_64 
  ---> Building crc/crc32_iscsi_01.asm  x86_64 
  ---> Building crc/crc32_iscsi_00.asm  x86_64 
  ---> Building crc/crc32_iscsi_by16_10.asm  x86_64 
  ---> Building crc/crc_multibinary.asm  x86_64 
  ---> Building crc/crc64_multibinary.asm  x86_64 
  ---> Building crc/crc64_ecma_refl_by8.asm  x86_64 
  ---> Building crc/crc64_ecma_refl_by16_10.asm  x86_64 
  ---> Building crc/crc64_ecma_norm_by8.asm  x86_64 
  ---> Building crc/crc64_ecma_norm_by16_10.asm  x86_64 
  ---> Building crc/crc64_iso_refl_by8.asm  x86_64 
  ---> Building crc/crc64_iso_refl_by16_10.asm  x86_64 
  ---> Building crc/crc64_iso_norm_by8.asm  x86_64 
  ---> Building crc/crc64_iso_norm_by16_10.asm  x86_64 
  ---> Building crc/crc64_jones_refl_by8.asm  x86_64 
  ---> Building crc/crc64_jones_refl_by16_10.asm  x86_64 
  ---> Building crc/crc64_jones_norm_by8.asm  x86_64 
  ---> Building crc/crc64_jones_norm_by16_10.asm  x86_64 
  ---> Building crc/crc32_gzip_refl_by8.asm  x86_64 
  ---> Building crc/crc32_gzip_refl_by8_02.asm  x86_64 
  ---> Building crc/crc32_gzip_refl_by16_10.asm  x86_64 
  ---> Building igzip/igzip_body.asm  x86_64 
  ---> Building igzip/igzip_finish.asm  x86_64 
  ---> Building igzip/igzip_icf_body_h1_gr_bt.asm  x86_64 
  ---> Building igzip/igzip_icf_finish.asm  x86_64 
  ---> Building igzip/rfc1951_lookup.asm  x86_64 
  ---> Building igzip/adler32_sse.asm  x86_64 
  ---> Building igzip/adler32_avx2_4.asm  x86_64 
  ---> Building igzip/igzip_multibinary.asm  x86_64 
  ---> Building igzip/igzip_update_histogram_01.asm  x86_64 
  ---> Building igzip/igzip_update_histogram_04.asm  x86_64 
  ---> Building igzip/igzip_decode_block_stateless_01.asm  x86_64 
  ---> Building igzip/igzip_decode_block_stateless_04.asm  x86_64 
  ---> Building igzip/igzip_inflate_multibinary.asm  x86_64 
  ---> Building igzip/encode_df_04.asm  x86_64 
  ---> Building igzip/encode_df_06.asm  x86_64 
  ---> Building igzip/proc_heap.asm  x86_64 
  ---> Building igzip/igzip_deflate_hash.asm  x86_64 
  ---> Building igzip/igzip_gen_icf_map_lh1_06.asm  x86_64 
  ---> Building igzip/igzip_gen_icf_map_lh1_04.asm  x86_64 
  ---> Building igzip/igzip_set_long_icf_fg_04.asm  x86_64 
  ---> Building igzip/igzip_set_long_icf_fg_06.asm  x86_64 
  ---> Building mem/mem_zero_detect_avx.asm  x86_64 
  ---> Building mem/mem_zero_detect_sse.asm  x86_64 
  ---> Building mem/mem_multibinary.asm  x86_64 
  ---> Creating Lib bin/isa-l.a
ar: creating archive bin/isa-l.a
  ---> Building shared erasure_code/ec_base.c  x86_64 
  ---> Building shared raid/raid_base.c  x86_64 
  ---> Building shared crc/crc_base.c  x86_64 
  ---> Building shared crc/crc64_base.c  x86_64 
  ---> Building shared igzip/igzip.c  x86_64 
  ---> Building shared igzip/hufftables_c.c  x86_64 
  ---> Building shared igzip/igzip_base.c  x86_64 
  ---> Building shared igzip/igzip_icf_base.c  x86_64 
  ---> Building shared igzip/adler32_base.c  x86_64 
  ---> Building shared igzip/flatten_ll.c  x86_64 
  ---> Building shared igzip/encode_df.c  x86_64 
  ---> Building shared igzip/igzip_icf_body.c  x86_64 
  ---> Building shared igzip/huff_codes.c  x86_64 
  ---> Building shared igzip/igzip_inflate.c  x86_64 
  ---> Building shared mem/mem_zero_detect_base.c  x86_64 
  ---> Building shared erasure_code/ec_highlevel_func.c  x86_64 
  ---> Creating Shared Lib bin/libisal.so
  ---> Building Programs programs/igzip DEBUG x86_64 
cd /Users/ksahlin/prefix/source/StrobeAlign/build/isal_external-prefix/src/isal_external && /usr/local/Cellar/cmake/3.24.1/bin/cmake -E touch /Users/ksahlin/prefix/source/StrobeAlign/build/isal_external-prefix/src/isal_external-stamp/isal_external-build
[ 16%] Performing install step for 'isal_external'
cd /Users/ksahlin/prefix/source/StrobeAlign/build/isal_external-prefix/src/isal_external && make -f Makefile.unx install prefix=/Users/ksahlin/prefix/source/StrobeAlign/build/ISAL
mkdir -p /Users/ksahlin/prefix/source/StrobeAlign/build/ISAL/lib
mkdir -p /Users/ksahlin/prefix/source/StrobeAlign/build/ISAL/include/isa-l
mkdir -p /Users/ksahlin/prefix/source/StrobeAlign/build/ISAL/bin
mkdir -p /Users/ksahlin/prefix/source/StrobeAlign/build/ISAL/share/man/man1
Building isa-l.h
install -m 644 bin/isa-l.a /Users/ksahlin/prefix/source/StrobeAlign/build/ISAL/lib/libisal.a
install -m 644 include/crc.h include/crc64.h include/erasure_code.h include/gf_vect_mul.h include/igzip_lib.h include/mem_routines.h include/raid.h /Users/ksahlin/prefix/source/StrobeAlign/build/ISAL/include/isa-l/.
install -m 664 isa-l.h /Users/ksahlin/prefix/source/StrobeAlign/build/ISAL/include/.
install -m 664 include/types.h /Users/ksahlin/prefix/source/StrobeAlign/build/ISAL/include/isa-l/.
install -m 664 bin/libisal.so /Users/ksahlin/prefix/source/StrobeAlign/build/ISAL/lib/libisal.so.2.30.0
(cd /Users/ksahlin/prefix/source/StrobeAlign/build/ISAL/lib && ln -f -s libisal.so.2.30.0 libisal.so.2 && ln -f -s libisal.so.2.30.0 libisal.so)
(cd /Users/ksahlin/prefix/source/StrobeAlign/build/ISAL/lib && ln -f -s libisal.so.2.30.0 libisal.dylib)
which glibtool && glibtool --mode=finish /Users/ksahlin/prefix/source/StrobeAlign/build/ISAL/lib
make[3]: *** [install] Error 1
make[2]: *** [isal_external-prefix/src/isal_external-stamp/isal_external-install] Error 2
make[1]: *** [CMakeFiles/isal_external.dir/all] Error 2
make: *** [all] Error 2

@marcelm
Copy link
Collaborator Author

marcelm commented Sep 1, 2024

I think you need to install the GNU version of libtool, which appears to be called glibtool on macOS. Maybe brew install libtool?

@ksahlin
Copy link
Owner

ksahlin commented Sep 1, 2024

Great, got it to work according to your suggestion, i.e.,

brew install libtool
cmake -B build -DCMAKE_C_FLAGS="-march=native" -DCMAKE_CXX_FLAGS="-march=native" -DISAL=download
make -C build VERBOSE=1

Trusting you in that there is no substantial speed trade-off for few cores I approve a merge.

@marcelm
Copy link
Collaborator Author

marcelm commented Sep 2, 2024

Trusting you in that there is no substantial speed trade-off for few cores I approve a merge.

I’ve done a few more tests to convince myself. On my local machine, the version in this PR is consistently ~2% faster (both wall-clock and CPU time) up to 8 cores (which is how many I have).

On the KTH cluster (dardel), the variance is very high, but it appears that

  • for 16 cores, user time is slightly lower, also wall-clock time lower
  • for 32 cores, user time is maybe a bit higher, wall-clock time lower
  • for 64 and 128 cores, user time is maybe 5% higher, wall-clock time is reduced by about half

Some notes for when we want to or need to revisit this:

  • Memory usage with this version can appear to be higher, but I only saw this on dardel. This is probably due to the usage of mmap. I am still not convinced using mmap is the best thing to do here to read in the file, but it works and is an improvement over what we have, so I didn’t want to change it for the moment.
  • The current code spawns a new thread for every decoded chunk of input data. I guess spawning a thread is fast, but this still seems wasteful.

@marcelm marcelm merged commit 71866c3 into main Sep 2, 2024
11 checks passed
@marcelm marcelm deleted the add-isal branch September 2, 2024 21:27
@teepean
Copy link
Contributor

teepean commented Sep 2, 2024

Just FYI that the addition of isal breaks Windows builds due to usage of sys/mman.h.

@marcelm
Copy link
Collaborator Author

marcelm commented Sep 3, 2024

We don’t support strobealign on Windows at the moment. If it compiled previously, that was a happy accident. If you would like us to support Windows, can you open a separate issue please? Then we can discuss how to achieve that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants