Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apple aarch64 support #544

Merged
merged 3 commits into from
Dec 1, 2022
Merged

Apple aarch64 support #544

merged 3 commits into from
Dec 1, 2022

Conversation

wilko77
Copy link
Collaborator

@wilko77 wilko77 commented Nov 30, 2022

for now, just doing the slow path in dice. Don't really feel like reading assembly instruction manuals...

@codecov
Copy link

codecov bot commented Nov 30, 2022

Codecov Report

Merging #544 (bed2715) into main (ff68175) will decrease coverage by 0.02%.
The diff coverage is n/a.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #544      +/-   ##
==========================================
- Coverage   94.22%   94.19%   -0.03%     
==========================================
  Files          16       16              
  Lines         797      793       -4     
==========================================
- Hits          751      747       -4     
  Misses         46       46              

Copy link
Collaborator

@hardbyte hardbyte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That doesn't seem too stressful. Decent performance?

I wonder if the fast path is still faster on a modern x64 chip with a recent compiler.

@wilko77
Copy link
Collaborator Author

wilko77 commented Dec 1, 2022

I get

$ python -m anonlink.benchmark
Anonlink benchmark -- see README for explanation
------------------------------------------------
using 'greedy_solve_native' as solver and 'dice_coefficient_accelerated' as similarity metric

Threshold: 0.5, All results returned
Size 1 | Size 2 | Comparisons      | Total Time (s)          | Throughput
       |        |        (match %) | (comparisons / matching)|  (1e6 cmp/s)
-------+--------+------------------+-------------------------+-------------
  1000 |   1000 |    1e6  (50.27%) |  0.478  (54.6% / 45.4%) |     3.827
  2000 |   2000 |    4e6  (50.13%) |  2.802  (37.4% / 62.6%) |     3.819
  3000 |   3000 |    9e6  (50.96%) | 10.197  (25.9% / 74.1%) |     3.405
  4000 |   4000 |   16e6  (50.76%) | 15.825  (29.2% / 70.8%) |     3.461

Threshold: 0.5, Top 100 matches per record returned
Size 1 | Size 2 | Comparisons      | Total Time (s)          | Throughput
       |        |        (match %) | (comparisons / matching)|  (1e6 cmp/s)
-------+--------+------------------+-------------------------+-------------
  1000 |   1000 |    1e6  ( 6.91%) |  0.107  (87.3% / 12.7%) |    10.690
  2000 |   2000 |    4e6  ( 3.23%) |  0.235  (86.4% / 13.6%) |    19.696
  3000 |   3000 |    9e6  ( 2.06%) |  0.395  (83.0% / 17.0%) |    27.432
  4000 |   4000 |   16e6  ( 1.49%) |  0.536  (86.8% / 13.2%) |    34.397
  5000 |   5000 |   25e6  ( 1.18%) |  0.701  (87.7% / 12.3%) |    40.683
  6000 |   6000 |   36e6  ( 0.97%) |  0.882  (88.0% / 12.0%) |    46.348
  7000 |   7000 |   49e6  ( 0.81%) |  1.070  (88.0% / 12.0%) |    52.072
  8000 |   8000 |   64e6  ( 0.71%) |  1.434  (80.0% / 20.0%) |    55.786
  9000 |   9000 |   81e6  ( 0.63%) |  1.651  (82.5% / 17.5%) |    59.452
 10000 |  10000 |  100e6  ( 0.56%) |  1.765  (86.9% / 13.1%) |    65.216
 20000 |  20000 |  400e6  ( 0.27%) |  4.491  (89.0% / 11.0%) |   100.082

Threshold: 0.7, All results returned
Size 1 | Size 2 | Comparisons      | Total Time (s)          | Throughput
       |        |        (match %) | (comparisons / matching)|  (1e6 cmp/s)
-------+--------+------------------+-------------------------+-------------
  1000 |   1000 |    1e6  ( 0.01%) |  0.006  (99.2% /  0.8%) |   172.069
  2000 |   2000 |    4e6  ( 0.01%) |  0.020  (99.2% /  0.8%) |   200.716
  3000 |   3000 |    9e6  ( 0.01%) |  0.044  (99.2% /  0.8%) |   207.419
  4000 |   4000 |   16e6  ( 0.01%) |  0.077  (99.1% /  0.9%) |   210.797
  5000 |   5000 |   25e6  ( 0.01%) |  0.117  (99.1% /  0.9%) |   215.640
  6000 |   6000 |   36e6  ( 0.01%) |  0.172  (99.1% /  0.9%) |   210.789
  7000 |   7000 |   49e6  ( 0.01%) |  0.228  (99.1% /  0.9%) |   217.104
  8000 |   8000 |   64e6  ( 0.01%) |  0.297  (99.0% /  1.0%) |   218.047
  9000 |   9000 |   81e6  ( 0.01%) |  0.380  (99.2% /  0.8%) |   214.927
 10000 |  10000 |  100e6  ( 0.01%) |  0.461  (99.0% /  1.0%) |   218.993
 20000 |  20000 |  400e6  ( 0.01%) |  1.809  (98.9% /  1.1%) |   223.652

Threshold: 0.7, Top 100 matches per record returned
Size 1 | Size 2 | Comparisons      | Total Time (s)          | Throughput
       |        |        (match %) | (comparisons / matching)|  (1e6 cmp/s)
-------+--------+------------------+-------------------------+-------------
  1000 |   1000 |    1e6  ( 0.01%) |  0.006  (99.4% /  0.6%) |   177.813
  2000 |   2000 |    4e6  ( 0.01%) |  0.020  (99.2% /  0.8%) |   198.890
  3000 |   3000 |    9e6  ( 0.01%) |  0.044  (99.0% /  1.0%) |   207.484
  4000 |   4000 |   16e6  ( 0.01%) |  0.077  (99.1% /  0.9%) |   209.673
  5000 |   5000 |   25e6  ( 0.01%) |  0.118  (99.1% /  0.9%) |   213.477
  6000 |   6000 |   36e6  ( 0.01%) |  0.173  (98.8% /  1.2%) |   209.933
  7000 |   7000 |   49e6  ( 0.01%) |  0.229  (99.0% /  1.0%) |   215.671
  8000 |   8000 |   64e6  ( 0.01%) |  0.293  (99.1% /  0.9%) |   220.039
  9000 |   9000 |   81e6  ( 0.01%) |  0.377  (99.1% /  0.9%) |   217.029
 10000 |  10000 |  100e6  ( 0.01%) |  0.460  (99.1% /  0.9%) |   219.514
 20000 |  20000 |  400e6  ( 0.01%) |  1.787  (99.1% /  0.9%) |   226.018

on an M1 Pro. Don't know how that compares. But still way faster than native python fallback... or running optimized intel code in emulation mode.

@wilko77 wilko77 merged commit 5860ade into main Dec 1, 2022
@wilko77 wilko77 deleted the aarch64_support branch December 1, 2022 04:20
@hardbyte
Copy link
Collaborator

hardbyte commented Dec 1, 2022

That is faster than my i7-6700K CPU @ 4.00GHz, so I'd say it compares favorably!

My benchmark results
(anonlink) [brian@hardbyte-nzxt anonlink]$ python -m anonlink.benchmark
Anonlink benchmark -- see README for explanation
------------------------------------------------
using 'greedy_solve_native' as solver and 'dice_coefficient_accelerated' as similarity metric

Threshold: 0.5, All results returned
Size 1 | Size 2 | Comparisons | Total Time (s) | Throughput
| | (match %) | (comparisons / matching)| (1e6 cmp/s)
-------+--------+------------------+-------------------------+-------------
1000 | 1000 | 1e6 (50.85%) | 0.856 (51.0% / 49.0%) | 2.291
2000 | 2000 | 4e6 (50.09%) | 3.708 (46.8% / 53.2%) | 2.303
3000 | 3000 | 9e6 (50.14%) | 8.072 (44.3% / 55.7%) | 2.516
4000 | 4000 | 16e6 (48.99%) | 14.502 (41.7% / 58.3%) | 2.646

Threshold: 0.5, Top 100 matches per record returned
Size 1 | Size 2 | Comparisons | Total Time (s) | Throughput
| | (match %) | (comparisons / matching)| (1e6 cmp/s)
-------+--------+------------------+-------------------------+-------------
1000 | 1000 | 1e6 ( 6.85%) | 0.202 (85.9% / 14.1%) | 5.777
2000 | 2000 | 4e6 ( 3.17%) | 0.505 (77.7% / 22.3%) | 10.197
3000 | 3000 | 9e6 ( 2.05%) | 0.862 (83.8% / 16.2%) | 12.463
4000 | 4000 | 16e6 ( 1.50%) | 1.115 (80.6% / 19.4%) | 17.817
5000 | 5000 | 25e6 ( 1.17%) | 1.529 (83.9% / 16.1%) | 19.491
6000 | 6000 | 36e6 ( 0.94%) | 1.581 (82.1% / 17.9%) | 27.738
7000 | 7000 | 49e6 ( 0.80%) | 1.860 (82.4% / 17.6%) | 31.979
8000 | 8000 | 64e6 ( 0.71%) | 2.273 (82.7% / 17.3%) | 34.048
9000 | 9000 | 81e6 ( 0.63%) | 2.631 (82.4% / 17.6%) | 37.363
10000 | 10000 | 100e6 ( 0.55%) | 3.034 (83.6% / 16.4%) | 39.406
20000 | 20000 | 400e6 ( 0.27%) | 8.241 (87.0% / 13.0%) | 55.820

Threshold: 0.7, All results returned
Size 1 | Size 2 | Comparisons | Total Time (s) | Throughput
| | (match %) | (comparisons / matching)| (1e6 cmp/s)
-------+--------+------------------+-------------------------+-------------
1000 | 1000 | 1e6 ( 0.01%) | 0.014 (99.3% / 0.7%) | 71.277
2000 | 2000 | 4e6 ( 0.01%) | 0.040 (99.2% / 0.8%) | 99.647
3000 | 3000 | 9e6 ( 0.01%) | 0.087 (99.2% / 0.8%) | 104.402
4000 | 4000 | 16e6 ( 0.01%) | 0.137 (99.0% / 1.0%) | 118.190
5000 | 5000 | 25e6 ( 0.01%) | 0.220 (99.1% / 0.9%) | 114.720
6000 | 6000 | 36e6 ( 0.01%) | 0.306 (98.3% / 1.7%) | 119.640
7000 | 7000 | 49e6 ( 0.01%) | 0.475 (99.2% / 0.8%) | 103.886
8000 | 8000 | 64e6 ( 0.01%) | 0.518 (98.4% / 1.6%) | 125.617
9000 | 9000 | 81e6 ( 0.01%) | 0.658 (97.8% / 2.2%) | 125.872
10000 | 10000 | 100e6 ( 0.01%) | 0.842 (99.2% / 0.8%) | 119.747
20000 | 20000 | 400e6 ( 0.01%) | 3.083 (99.0% / 1.0%) | 131.082

Threshold: 0.7, Top 100 matches per record returned
Size 1 | Size 2 | Comparisons | Total Time (s) | Throughput
| | (match %) | (comparisons / matching)| (1e6 cmp/s)
-------+--------+------------------+-------------------------+-------------
1000 | 1000 | 1e6 ( 0.01%) | 0.010 (98.4% / 1.6%) | 103.681
2000 | 2000 | 4e6 ( 0.01%) | 0.039 (98.5% / 1.5%) | 105.286
3000 | 3000 | 9e6 ( 0.01%) | 0.073 (99.0% / 1.0%) | 123.706
4000 | 4000 | 16e6 ( 0.01%) | 0.131 (99.1% / 0.9%) | 123.350
5000 | 5000 | 25e6 ( 0.01%) | 0.200 (99.0% / 1.0%) | 126.244
6000 | 6000 | 36e6 ( 0.01%) | 0.304 (99.1% / 0.9%) | 119.301
7000 | 7000 | 49e6 ( 0.01%) | 0.392 (99.1% / 0.9%) | 126.149
8000 | 8000 | 64e6 ( 0.01%) | 0.491 (99.0% / 1.0%) | 131.709
9000 | 9000 | 81e6 ( 0.01%) | 0.619 (98.9% / 1.1%) | 132.317
10000 | 10000 | 100e6 ( 0.01%) | 0.883 (98.9% / 1.1%) | 114.504
20000 | 20000 | 400e6 ( 0.01%) | 3.145 (98.8% / 1.2%) | 128.737

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants