Skip to content

Releases: data61/anonlink

Release 0.9.0

14 Aug 03:35
e3cb0a8
Compare
Choose a tag to compare

This release contains a major overhaul of Anonlink’s API and introduces support for multi-party linkage.

The changes are all additive, so the previous API continues to work. That API has now been deprecated and will be removed in a future release. The deprecation timeline is:

  • v0.9.0: old API deprecated
  • v0.10.0: use of old API raises a warning
  • v0.11.0: remove old API

Major changes

  • Introduce abstract similarity functions. The Sørensen–Dice coefficient is now just one possible similarity function.
    • Implement Hamming similarity as a similarity function.
    • Permit linkage of records other than CLKs (BYO similarity function).
    • Similarity functions now return multiple contiguous arrays instead of a list of tuples.
    • Candidate pairs from similarity functions are now always sorted.
  • Introduce a standard type for storing candidate pairs. This is now used consistently throughout the API.
  • Provide a function for multiparty candidate generation. It takes multiple datasets and compares them against each other using a similarity function.
  • Extend the greedy solver to multiparty problems.
    • The greedy solver also takes the new candidate pairs type.
  • Implement serialisation and deserialisation of candidate pairs.
    • Multiple files with serialised candidate pairs can be merged without loading everything into memory at once.
  • Introduce type annotations in the new API.

Minor changes

  • Automatically test on Python 3.7.
  • Remove support for Python 3.5 and below.
  • Update Clkhash dependency to 0.11.
  • Minor documentation and style in anonlink.concurrency.
  • Provide a convenience function for generating valid candidate pairs from a chunk.
  • Change the format of a chunk and move the type definition to anonlink.typechecking.

See the changelog for details.

Release 0.8.2

25 Jul 07:55
cffabb3
Compare
Choose a tag to compare

Minor updates:

  • Fix discrepancies between Python and C++ versions #102
  • Utility added to anonlink/concurrency.py help with chunking.
  • Better Github status messages posted by jenkins.

Release 0.8.1

18 May 04:10
Compare
Choose a tag to compare

Just minor fixes and improvements in this release.

Release 0.8.0

18 Apr 00:32
23a7be7
Compare
Choose a tag to compare

Fix to greedy solver, so that mappings are set by the first match, not repeatedly overwritten. #89

Other improvements

  • Order of k and threshold parameters now consistent across library
  • Limit size of k to prevent OOM DoS
  • Fix misaligned pointer handling #77

Install from Pypi:

pip install anonlink==0.8.0

0.7.0

21 Mar 00:18
Compare
Choose a tag to compare

Introduces support for comparing "arbitrary" length cryptographic linkage keys.

Benchmark is much more comprehensive and more comparable between releases - see the
readme for an example report.

Other improvements

  • Internal C/C++ cleanup/refactoring and optimization.
  • Expose the native popcount implementation to Python.
  • Bug fix to avoid configuring a logger.
  • Testing is now with py.test and runs on travis-ci

You can test the release from PyPi:

$ pip install anonlink==0.7.0
$ pip install clkhash
$ python -m anonlink.benchmark

0.6.2

14 Dec 10:50
Compare
Choose a tag to compare

Available on PyPi:

$ pip install anonlink==0.6.2

To run the benchmarks first install clkhash:

$ pip install clkhash
$ python -m anonlink.benchmark
100000 x 1024 bit popcounts in 0.016641 seconds
Popcount speed: 733.55 MiB/s
Size 1 | Size 2 | Comparisons  | Compute Time | Million Comparisons per second
  1000 |   1000 |      1000000 |    0.073s    |        13.710
  2000 |   2000 |      4000000 |    0.129s    |        31.024
  3000 |   3000 |      9000000 |    0.247s    |        36.464
  4000 |   4000 |     16000000 |    0.406s    |        39.425
  5000 |   5000 |     25000000 |    0.510s    |        49.067
  6000 |   6000 |     36000000 |    0.533s    |        67.603
  7000 |   7000 |     49000000 |    0.543s    |        90.299
  8000 |   8000 |     64000000 |    0.594s    |       107.682
  9000 |   9000 |     81000000 |    0.627s    |       129.188
 10000 |  10000 |    100000000 |    0.824s    |       121.289
 20000 |  20000 |    400000000 |    2.902s    |       137.815
Single Core:
  5000 |   5000 |     25000000 |    0.243s    |       102.941

(These results from a high end laptop)

Notable changes since 0.5.x:

  • client side code has been removed
  • C/C++ performance improvements
  • packaging and testing improvements
  • testing/benchmarking against clkhash 0.8