Skip to content

Release 0.9.0

Compare
Choose a tag to compare
@nbgl nbgl released this 14 Aug 03:35
e3cb0a8

This release contains a major overhaul of Anonlink’s API and introduces support for multi-party linkage.

The changes are all additive, so the previous API continues to work. That API has now been deprecated and will be removed in a future release. The deprecation timeline is:

  • v0.9.0: old API deprecated
  • v0.10.0: use of old API raises a warning
  • v0.11.0: remove old API

Major changes

  • Introduce abstract similarity functions. The Sørensen–Dice coefficient is now just one possible similarity function.
    • Implement Hamming similarity as a similarity function.
    • Permit linkage of records other than CLKs (BYO similarity function).
    • Similarity functions now return multiple contiguous arrays instead of a list of tuples.
    • Candidate pairs from similarity functions are now always sorted.
  • Introduce a standard type for storing candidate pairs. This is now used consistently throughout the API.
  • Provide a function for multiparty candidate generation. It takes multiple datasets and compares them against each other using a similarity function.
  • Extend the greedy solver to multiparty problems.
    • The greedy solver also takes the new candidate pairs type.
  • Implement serialisation and deserialisation of candidate pairs.
    • Multiple files with serialised candidate pairs can be merged without loading everything into memory at once.
  • Introduce type annotations in the new API.

Minor changes

  • Automatically test on Python 3.7.
  • Remove support for Python 3.5 and below.
  • Update Clkhash dependency to 0.11.
  • Minor documentation and style in anonlink.concurrency.
  • Provide a convenience function for generating valid candidate pairs from a chunk.
  • Change the format of a chunk and move the type definition to anonlink.typechecking.

See the changelog for details.