Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ML-DSA-ipd and ML-KEM-ipd & NIST supplied test vectors #1626

Merged
merged 27 commits into from
Feb 19, 2024
Merged

Conversation

bhess
Copy link
Member

@bhess bhess commented Dec 12, 2023

Pull ML-DSA-ipd / ML-KEM-ipd from pq-crystals "standard" branch.
Adds test cases with NIST supplied test vectors: https://csrc.nist.gov/Projects/post-quantum-cryptography/post-quantum-cryptography-standardization/example-files (see also Note on the intermediate values for ML-KEM: and Note on the intermediate values for ML-DSA:)

The NIST files are rather large (several MB) and we only use parts of the data that can be passed via the public API + randomness. So we process them and store only the data we need in tests/PQC_Intermediate_Values (script is included to reproduce the files)

The PR adds new test code vectors_sig.c and vectors_kem.c that for testing the NIST vectors.

The idea is to keep the current Kyber/Dilithium (round 3) versions for compatibility reasons, and add the ML-* algorithms additionally.

The naming of the algorithms is done following the IETF hackathon: ML-DSA-44-ipd, ML-DSA-65-ipd, ML-DSA-87-ipd, ML-KEM-512-ipd, ML-KEM-768-ipd, ML-KEM-1024-ipd: https://github.com/IETF-Hackathon/pqc-certificates/blob/master/docs/oid_mapping.md

TODOs:

  • Add NIST supplied test vectors and code to test them
  • Compact NIST test vectors: the files are quite large and we need only a small part of the value
  • Pull ML-DSA-ipd from pq-crystals "standard" branch
  • Pull ML-KEM-ipd from pq-crystals "standard" branch
  • add patches for copy_from_upstream (for renaming to ML-KEM-ipd / ML-DSA-ipd)
  • Pull pqclean aarch64 code (may omit for now)
  • Integrate tests with CI
  • Update docs
  • Ensure updating to latest upstream versions (e.g. with Kyberslash fixes)
  • Run constant-time tests on new algorithms & update valgrind files

Closes #1521.

  • Does this PR change the input/output behaviour of a cryptographic algorithm (i.e., does it change known answer test values)? (If so, a version bump will be required from x.y.z to x.(y+1).0.)
  • Does this PR change the list of algorithms available -- either adding, removing, or renaming? Does this PR otherwise change an API? (If so, PRs in fully supported downstream projects dependent on these, i.e., oqs-provider and OQS-OpenSSH will also need to be ready for review and merge by the time this is merged.)

.CMake/alg_support.cmake Outdated Show resolved Hide resolved
src/kem/kem.h Outdated Show resolved Hide resolved
Copy link
Member

@baentsch baentsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding these algorithms: I very much like how you use our "import machinery" to transform the "old" upstream names into the standard ones. Conceptually this all looks OK -- I'm just not happy with adding version information to the algorithm names: Those may be OK for a hackathon that is not meant to be used for more than that event, but algorithm names with version information included IMO adds an undue burden to downstream integrations: Anyone will have to update their source code as and when the "-ipd" suffix goes away (as opposed to just re-compile or only re-link iff a code change occurs during the transition to final standard).

@bhess
Copy link
Member Author

bhess commented Jan 25, 2024

Thanks for the review and comment @baentsch. Good point regarding naming, including the versioning in the names would indeed be a change from the current approach. I see the point of ease of re-compiling and re-deploying. However, I think there are arguments for including the versioning:
For a downstream users it seems not directly obvious if a compatibility-breaking update happened when the same name is kept (besides looking at the release notes or documentation). They in addition have to make sure that compatibility is kept, e.g. by updating OIDs for certificates. I'd argue that a name-update can help downstream to resolve ambiguities and avoid compatibility-breaking changes.
For the NIST "competition" algorithms, keeping the same name seems to be not much an issue as the experimenting/prototyping users will just update everything to the latest versions.
With the (almost) standard algorithms and practical deployments like the Kyber-r3-hybrid in Chrome it can be useful for interoperability to identify the exact version, and to be able to support different versions at the same time (r3, ipd, final ML-KEM).
A versioning scheme for algorithms is also suggested by the IETF-PQUIP naming convention.

@baentsch
Copy link
Member

For a downstream users it seems not directly obvious if a compatibility-breaking update happened when the same name is kept (besides looking at the release notes or documentation). They in addition have to make sure that compatibility is kept, e.g. by updating OIDs for certificates. I'd argue that a name-update can help downstream to resolve ambiguities and avoid compatibility-breaking changes.

The above has different meanings of the word "user" (of liboqs), so let me be precise:
Let me call "End users" those users not knowing about software development but just interested in deploying PQC -- at most writing nginx config files, but typically just reading documentation, writing papers or toggling algorithm name switches. Big majority by far.
Let me call "PQ integrators" those users writing code to make use of liboqs, e.g., entities changing code and assigning OIDs. No so numerous.

If PQ integrators need name changes to ascertain they also update their OIDs and link the right libraries, they may need to reconsider their job choice.

The argumentation of the IETF

However, the community wants to avoid the situation where we over-eagerly adopt the new names, then see a compatibility-breaking change in a future iteration of the FIPS draft standards, and then have multiple non-compatible algorithms with the same name.

indeed seems to indicate a lack of trust in the capabilities of software developers and interop testers to weed out possibly incompatible implementations using the same name.

Does this mean that if there is a step after "IPD", let's call it "SPD" that everyone would need to update their software again? Good business for IT consultancies paid by the hour, bad choice for the End Users footing the bill.

it can be useful for interoperability to identify the exact version, and to be able to support different versions at the same time (r3, ipd, final ML-KE

This is a good argument. But it is relevant only for comparative testing, i.e., development and test purposes. Should End Users be impacted by this (currently still interesting, but in the long run irrelevant) use case?

@baentsch
Copy link
Member

Re-reading this discussion, what about the idea to introduce a macro "-ipd" now? That way we can satisfy the IETF folks placing no trust in software developers and allow people knowing what they do to minimize their long-term work effort (by taking the "risk" of using the non-ipd variant)?

@dstebila
Copy link
Member

@bhess Stepping back from the discussion about -IPD naming in that one specific file to a more general question: what was your philosophy around when to use the ML-KEM-IPD naming and when to use plain ML-KEM?

If ML-KEM-IPD != ML-KEM, then there may be a time period where we want to have both ML-KEM-IPD and ML-KEM present in the library, similar to how we're currently imagining wanting to have both Kyber and ML-KEM-IPD present in the library. I'm not sure that would be possible with the current naming in the PR...

@bhess
Copy link
Member Author

bhess commented Jan 26, 2024

what about the idea to introduce a macro "-ipd" now?

I like the idea of a macro/alias. What about the other way around, i.e. "ML-KEM-512" is an alias of "ML-KEM-512-ipd"? Should there be something like an "-spd" version, the ML-KEM-512 alias can just become an alias of the -spd version. The same would then be possible for the upcoming on-ramp signatures ("OnrampSig"-R1, "OnrampSig"-R2, ..., and an alias "OnrampSig").

what was your philosophy around when to use the ML-KEM-IPD naming and when to use plain ML-KEM?

The idea was to have a family name "ML-KEM" the individual schemes are called "ML-KEM-512-ipd", "ML-KEM-768-ipd" and "ML-KEM-1024-ipd".

I'm not sure that would be possible with the current naming in the PR...

I think you are right, the import logic builds the internal API names using "name"_"scheme":

-
name: ml_kem
default_implementation: ref
upstream_location: pqcrystals-kyber-standard
schemes:
-
scheme: "512"
pqclean_scheme: ml-kem-512-ipd
pretty_name_full: ML-KEM-512-ipd

If "scheme" is named "512_ipd" instead then it should work.

I'll update the PR next week.

@baentsch
Copy link
Member

I like the idea of a macro/alias. What about the other way around, i.e. "ML-KEM-512" is an alias of "ML-KEM-512-ipd"?

Thanks & fine with me. As long as "ML-KEM-512" (also) becomes available now. Analog for the other "ML..." (sig) algs & strengths.

@bhess
Copy link
Member Author

bhess commented Jan 30, 2024

Updated the PR with:

  1. Consistent API naming ml_*_ipd. This should address the use case when multiple algorithm versions should co-exist for some time.
  2. Added an alias: e.g. "ML-KEM-512" is an alias of "ML-KEM-512-ipd.
    To use aliases, add 'alias_scheme' and 'alias_pretty_name_full' fields to copy_from_upstream.yml.
    All additional code for the aliases is generated by the jinja-templates.
    The algorithm name and alias are usable interchangably, meaning: (1) the string can be provided to OQS_KEM_new, OQS_KEM_alg_is_enabled, (2) e.g. both OQS_KEM_ml_kem_512_ipd_length_public_key and OQS_KEM_ml_kem_512_length_public_key are available to help users using them directly instead of the OQS_KEM structs.
    The alias feature can be used by any algorithms (automated if imported via copy_from_upstream).

README.md Outdated Show resolved Hide resolved
@baentsch
Copy link
Member

baentsch commented Feb 7, 2024

As a follow-up to https://github.com/orgs/open-quantum-safe/discussions/1689 this is to ask whether this PR should also change the designation of what is a "STD" algorithm, at the least dropping Kyber and Dilithium.

@baentsch
Copy link
Member

baentsch commented Feb 8, 2024

Thanks for the updates to the STD algs list, @bhess. Only item left: Could you please trigger downstream testing and thus the ML-* PR on oqs-provider as discussed (by way of adding the statement "[trigger downstream]" to a commit message)?

@bhess
Copy link
Member Author

bhess commented Feb 8, 2024

As a follow-up to https://github.com/orgs/open-quantum-safe/discussions/1689 this is to ask whether this PR should also change the designation of what is a "STD" algorithm, at the least dropping Kyber and Dilithium.

Agreed, the STD algorithms should contain the standards eventually, and until the standards are finalized only the latest revisions.

Thanks for the updates to the STD algs list, @bhess. Only item left: Could you please trigger downstream testing and thus the ML-* PR on oqs-provider as discussed (by way of adding the statement "[trigger downstream]" to a commit message)?

Thanks, added the [trigger downstream] message now (wanted to have a passing build first before triggering downstream tests).

@bhess bhess merged commit 60adf53 into main Feb 19, 2024
61 checks passed
@bhess bhess mentioned this pull request Feb 21, 2024
2 tasks
@bhess bhess deleted the bhe-fips-ipd branch May 30, 2024 15:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update Kyber and Dilithium to FIPS versions
4 participants