Post

"Shall Destroy, Did Not": Recovering ML-DSA Private Keys from wolfSSL's Heap

wolfSSL's ML-DSA signing implementation does not destroy private key material after use, violating FIPS 204 Section 3.6.3. The unzeroed heap block is recoverable via same-process allocation, enabling end-to-end signature forgery.

"Shall Destroy, Did Not": Recovering ML-DSA Private Keys from wolfSSL's Heap

Summary

wolfSSL’s ML-DSA signing implementation frees a ~50 KB heap block containing private signing material (s1, s2, t0 in NTT form) without clearing it, violating FIPS 204 Section 3.6.3. For a same-process attacker able to allocate and read a same-size heap block, this material is recoverable – demonstrated on glibc across three Linux distributions. Recovery of s1 is sufficient for full signing-key compromise and arbitrary signature forgery, verified end-to-end against the compiled libwolfssl binary.

wolfSSL confirmed the finding, patched it (#10100, #10113), and credited the reporter. It declined to assign a CVE.

Affected: wolfSSL v5.7.2 – v5.9.0-stable, native WOLFSSL_WC_DILITHIUM builds (requires --enable-mldsa or --enable-dilithium; not included in --enable-all). Fix: v5.9.1 (April 8, 2026). Update to v5.9.1 or later.

Off-process recovery primitives (core-dump ingest, cross-process /proc/$pid/mem) are covered in a follow-up post.


How It Works

The attacker is code running in the same process as the ML-DSA signing operation – a plugin, callback handler, co-loaded library, scripting engine, or any component that shares the process address space. The attacker can call malloc and read the returned buffer. No memory-corruption vulnerability, core dump, /proc/pid/mem access, or privilege beyond normal same-process execution is required.

The missed ForceZero

dilithium_sign_with_seed_mu() in wolfcrypt/src/dilithium.c, line 8417:

1
2
XFREE(y, key->heap, DYNAMIC_TYPE_DILITHIUM);  // no ForceZero
return ret;

wolfSSL already fixed this exact pattern in dilithium keygen (643427040), ed25519 signing (5f7bc0f3a), and ed448 signing (109e765b5). The ML-DSA signing path was missed.

The freed block is 50,176 bytes for ML-DSA-44. With WC_DILITHIUM_CACHE_PRIV_VECTORS off (the default) it contains the private-key polynomials at fixed offsets:

OffsetContentsNote
21504s1 – static secret signing keyNTT-small domain
25600s2NTT domain
29696t0NTT domain

(Offsets are for ML-DSA-44 with default build configuration. ML-DSA-65/87 use a different block size but the same bug.)

Exploit chain

  1. Application signs message M1. wolfSSL allocates the ~50 KB block, writes s1/s2/t0 in NTT form, frees it without zeroing.
  2. Any code in the same process calls malloc(50176). The chunk is too large for glibc’s tcache (~1 KB max) and goes through the unsorted/large-bin path; for a single-threaded process with no intervening same-size allocations, malloc(50176) returns the same chunk with its payload intact. Read s1 from offset 21504.
  3. Forge a signature on a different message M2 using s1 plus the public key.

s1 is the static signing key. One recovery = permanent key compromise for every message under that key.

Why the forged signature verifies (perturbation-bound detail)

The forgery uses hint reconstruction; s2 and t0 are not needed. For ML-DSA-44, tau*(2^(d-1) + eta) = 159,822 < 2*gamma2 = 190,464, which guarantees the reconstructed hints are always correct and the forged signature passes wc_dilithium_verify_msg().


PoC

The PoC (poc_heap_forgery_v2.c) includes wolfcrypt/src/dilithium.c directly for access to static NTT/expand functions. A companion verifier (verify_forged.c) links against the compiled libwolfssl binary – not an inlined copy – and independently confirms wc_dilithium_verify_msg() accepts the forged signature:

1
2
3
4
5
6
$ ./poc_heap_forgery_v2
...
FORGERY OK - wc_dilithium_verify_msg accepted forged sig on m2

$ ./verify_forged
VERIFIED - linked libwolfssl accepted the forged signature

Test results (wolfSSL v5.9.0-stable, -O2):

PlatformArchlibcResult
Ubuntu 22.04.5 (Azure B2ls_v2)x86_64glibc 2.3510/10
Amazon Linux 2023x86_64glibc 2.345/5
Ubuntu 20.04x86_64glibc 2.315/5

Heap reclamation is allocator-dependent. macOS libmalloc did not return the same chunk on sequential free -> malloc (0/100); the freed block still contains s1 but recovery needs a different primitive (core dump or direct process-memory readback). musl and jemalloc were not tested.


Why It Matters

“implementations of ML-DSA shall ensure that any potentially sensitive intermediate data is destroyed as soon as it is no longer needed.”

– FIPS 204, Section 3.6.3

“shall” is normative under NIST conventions. The unzeroed heap block containing s1, s2, and t0 directly violates this requirement. wolfSSL’s active FIPS 140-3 certificate #4718 (wolfCrypt v5.2.1) does not include ML-DSA in its validated boundary, but wolfSSL has a pending FIPS 140-3 submission with ML-DSA in scope; the §3.6.3 violation is relevant to that submission.

wolfSSL confirmed and patched the finding but declined to assign a CVE, classifying it as a bug requiring a second vulnerability to exploit. The follow-up post addresses that objection.


Mitigations

Update to v5.9.1 or later. Process isolation, zero-on-free XMALLOC/XFREE hooks, and WC_DILITHIUM_CACHE_PRIV_VECTORS are partial workarounds only; none is a substitute for the patch.


Timeline

DateEvent
2026-03-28Reported to wolfSSL with forgery PoC
2026-03-30Confirmed and patched: PRs #10100 and #10113
2026-04-02CVE declined; ticket closed
2026-04-13Public disclosure; posted to oss-security 2026-04-14
2026-04-17Follow-up post: core-dump and cross-process /proc/mem recovery

References

This post is licensed under CC BY 4.0 by the author.