Post

OpenSSL's ARM64 SM2 Path Leaks a Private-Key Timing Fingerprint

OpenSSL's optimized SM2 scalar multiplication has data-dependent branches on the private key. Direct measurement of the EC_POINT_mul call used by SM2 decrypt shows r = -0.9828 between runtime and zero-nibble count, with a slope of -389 ns per zero nibble. This leaks a stable aggregate private-key fingerprint (~3 bits) and the same non-constant-time branch pattern may expose richer traces to cache-based attacks. ARM64 and RISC-V only. SM2 is required for systems subject to Chinese commercial cryptography regulations.

OpenSSL's ARM64 SM2 Path Leaks a Private-Key Timing Fingerprint

Summary

OpenSSL’s optimized SM2 decryption path on ARM64 branches on zero nibbles of the long-term private key scalar. Direct measurement of the EC_POINT_mul call used by SM2 decrypt shows r = -0.9828 correlation between runtime and private key structure, with a slope of -389 ns per zero nibble. The x86_64 generic path shows no comparable signal. This demonstrates leakage of a stable aggregate private-key fingerprint — not full key recovery, but a clear violation of the constant-time property that cryptographic implementations are expected to provide.

SM2 is not an obscure algorithm. It is required by Chinese commercial cryptography regulations (GB/T 32918, GM/T 0024) for government information systems, banking networks, telecommunications infrastructure, and IoT devices. The bug affects ARM64 (measured) and RISC-V (same source pattern, unmeasured) — architectures widely used in Chinese cloud and IoT infrastructure. The x86_64 path is clean.

The P-256 implementation in the same codebase already handles this correctly, using constant-time table lookups and unconditional point operations. The SM2 code does not follow that pattern. A previous CVE (CVE-2025-9231) fixed a different timing issue in the same source file but did not modify the scalar multiplication loop.

OpenSSL reviewed this finding and determined it falls outside their threat model for same-physical-system side channels. No CVE was assigned.

Why this matters: a long-term private scalar changes OpenSSL’s runtime on ARM64. The measured leak is small in entropy terms, but cryptographic scalar multiplication is expected not to branch on secret data at all. This is especially relevant because SM2 is used in Chinese commercial-cryptography deployments and the affected path is an optimized architecture-specific implementation.

Affected: Measured on master commit 5199c5b98a; the same ARM64 optimized scalar-multiplication code pattern is present in OpenSSL 3.2.0 through 3.6.1 (introduced in commit 6399d7856c; absent in 3.1.x). RISC-V compiles the same source path — confirmed by code inspection, not measured. x86_64 is not affected. Check your build: nm libcrypto.so | grep ecp_sm2p256_point_P_mul — if the symbol is absent, you are on the generic constant-time path.


Why SM2 Matters

Under GB/T 32918 and GM/T 0024, SM2 is required for systems subject to Chinese commercial cryptography compliance:

  • Chinese government information systems
  • Banking and financial networks (PBC, UnionPay infrastructure)
  • Telecommunications infrastructure (TLCP — China’s TLS variant)
  • IoT devices requiring Chinese cryptographic certification
  • Any system subject to commercial cryptography compliance review (密码应用安全性评估)

The optimized code path in ecp_sm2p256.c exists specifically because SM2 performance matters on these deployment platforms. Those platforms are predominantly ARM64:

  • Alibaba Cloud — Yitian 710 (ARM64, custom Neoverse)
  • Huawei Cloud — Kunpeng 920 (ARM64, HiSilicon)
  • Tencent Cloud — ARM64 instances available
  • RISC-V — T-Head C906/C910 (Alibaba), SpacemiT K1, IoT controllers

OpenSSL’s security policy excludes same-physical-system side channels from their threat model. That policy is designed for many OpenSSL deployments. But SM2’s primary deployment context is cloud infrastructure — where co-tenancy is the architectural norm, not the exception.


The Bug

ecp_sm2p256_point_P_mul_by_scalar() (lines 370–414 of ecp_sm2p256.c) processes the 256-bit private key scalar in 4-bit nibble windows. Two branches make execution time depend on the scalar value:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
for (i = 64 - 1; i >= 0; --i) {
    index = (k[i / 16] >> (4 * (i % 16))) & mask;  // 4-bit nibble of private key d

    if (init == 0) {
        if (index) {                                  // BRANCH 1: leading zero detection
            memcpy(R, &precomputed[index], sizeof(P256_POINT));
            init = 1;
        }
    } else {
        ecp_sm2p256_point_double(R, R);               // 4x double (always runs)
        ecp_sm2p256_point_double(R, R);
        ecp_sm2p256_point_double(R, R);
        ecp_sm2p256_point_double(R, R);
        if (index)                                    // BRANCH 2: skips point_add when nibble is 0
            ecp_sm2p256_point_add(R, R, &precomputed[index]);
    }
}

When a nibble of the private key d is zero, ecp_sm2p256_point_add is skipped — saving ~389 ns (the cost of 12 field multiplications, 4 squarings, and 7 subtractions in the ARM64 assembly). The init flag creates a second signal: leading zero nibbles also skip the 4x doublings, saving even more time.

This function is called during SM2 decryption at sm2_crypt.c:360 with the long-term private key d as the scalar. Because d is static across the key’s lifetime, every decryption reinforces the same signal.

A second function, ecp_sm2p256_point_G_mul_by_scalar() (lines 335–364), has the same pattern with 8-bit byte windows for SM2 signing, where it processes the ephemeral nonce k.

P-256 Does This Right. SM2 Doesn’t.

The P-256 equivalent in the same codebase (ecp_nistz256.c) handles this correctly:

PropertyP-256 (ecp_nistz256.c)SM2 (ecp_sm2p256.c)
Table lookupecp_nistz256_gather_w5/w7 (CT scatter/gather)precomputed[index] (direct array access)
Zero-nibble handlingUnconditional point_add on every iterationif (index) skips point_add
Identity handlingcopy_conditional (branchless cmov)is_zeros() + early-return branch
Leading-zero handlingNo init flag; operates on all windowsinit flag with data-dependent branching

This is OpenSSL’s own established approach for constant-time scalar multiplication. The SM2 code does not follow it.


Same File as CVE-2025-9231

CVE-2025-9231 (reported by Stanislav Fort, Aisle Research) fixed a timing vulnerability in the modular inversion path (get_affine, field_inv, inv_mod_ord) in the same fileecp_sm2p256.c. That fix replaced three EC_METHOD vtable entries with constant-time fallbacks. It did not touch the scalar multiplication loop.

All measurements in this post were taken on code that already includes the CVE-2025-9231 fix. The scalar multiplication loop — which processes the private key d through 64 iterations of data-dependent branching — was not modified as part of that fix.


What Leaks

Information Content

For a uniformly random 256-bit key represented as 64 nibbles, the zero-nibble count Z follows a binomial distribution B(64, 1/16) with mean 4. After the attacker learns Z, the remaining keyspace is reduced:

ZRemaining keyspace (log₂)Bits learned
0250.06.0
2253.22.8
4 (most probable)253.72.3
6252.83.2
8250.75.3

Average information leakage: H(Z) ≈ 3 bits per key lifetime. Tail-distribution keys (Z = 0 or Z ≥ 7) leak 4–6 bits.

The leading-zero init flag provides additional MSB information for ~6% of keys (those with one or more leading zero nibbles). For such keys, the attacker also learns that d < 2^(256 - 4L).

What This Does NOT Give the Attacker

Full key recovery from timing alone is not feasible with any published algorithm. The zero-nibble count is a combinatorial constraint — it tells you how many nibbles are zero, not which. This does not map to the Hidden Number Problem (HNP) framework used by lattice attacks (Howgrave-Graham & Smart, Nguyen & Shparlinski), and generic algorithms like Pollard’s rho achieve O(2^128) regardless.

What It Does Give the Attacker

Key fingerprinting. The zero-nibble count Z is a coarse timing fingerprint for a private key. Since Z is only ~3 bits, many unrelated keys share the same value — this is not a unique key identifier. However, an attacker observing SM2 decrypt traffic can use it to cluster instances with compatible key profiles and detect some key-rotation events (a change in Z implies a different key).

Exposure to cache attacks. The same if(index) branch that creates the timing channel also controls whether precomputed[index] is accessed — a 1,536-byte table (24 cache lines) on the stack. This branch pattern may expose the implementation to richer traces from cache-based attacks (FLUSH+RELOAD, PRIME+PROBE) by a co-located attacker, which could reveal which table entries were accessed per iteration rather than just the aggregate count. This post demonstrates only aggregate timing leakage; per-iteration cache tracing and key recovery are not demonstrated, but the non-constant-time control flow is a precondition for both.

A constant-time fix — branchless table lookup and unconditional point operations, as P-256 already implements — would eliminate both channels simultaneously.

SM2 Signing: The Minerva Parallel

The signing path (point_G_mul_by_scalar) processes the ephemeral nonce k in 8-bit byte windows. This is the same class of vulnerability as Minerva (CVE-2019-15809, CVE-2024-13176): non-constant-time processing of a cryptographic nonce. The specific leakage geometry here is less favorable for lattice exploitation than Minerva’s (zero-byte count rather than nonce bit-length), and no published lattice construction converts this leakage type into an HNP attack. However, a novel construction handling scattered Hamming weight constraints — an open research problem — could change this assessment.


Disclosure Timeline

DateEvent
2026-05-02Reported to openssl-security@openssl.org with full evidence (correlation data, 4 independent PoCs, negative controls, CT patch)
2026-05-06OpenSSL response: decided to handle as a regular bug/hardening issue, no CVE. Asked for a public GitHub issue.
2026-05-09Public disclosure (this post)

OpenSSL’s response was consistent with their stated security policy, which explicitly excludes same-physical-system side channel attacks from their threat model. The policy notes: “Prior to the threat model being included in this policy, CVEs were sometimes issued for these classes of attacks. The existence of a previous CVE does not override this policy going forward.”


Practical Guidance

Check if you’re affected:

1
nm libcrypto.so | grep ecp_sm2p256_point_P_mul

If the symbol is present, the non-constant-time optimized path is compiled in. If absent (x86_64), the generic constant-time EC path is used.

Who should act:

  • Organizations running SM2 decryption services on ARM64 or RISC-V
  • Cloud providers offering SM2-based TLS/TLCP endpoints on ARM64 infrastructure
  • IoT deployments using SM2 on RISC-V with physical attacker exposure

Mitigation: The fix is the same approach P-256 already uses: constant-time table lookup (gather_w5-style scatter/gather), unconditional point operations on every loop iteration, and branchless identity handling via conditional-move. A public GitHub issue will be filed.


Technical Details

Measurement Setup

All measurements on ARM64 (Neoverse-N1, Azure Standard_D4ps_v5, 4 vCPUs, pinned to core 0, Ubuntu 22.04). OpenSSL 4.1.0-dev built from commit 5199c5b98a.

Each key was measured using 2,000 SM2 decrypt operations with the EVP_PKEY_CTX created once and reused (eliminating context-creation overhead). 50 warmup calls were discarded. The P20 (20th percentile) of the remaining 1,950 measurements was used as the per-key timing estimate. Measurement order was randomized using Fisher-Yates shuffle to eliminate ordering artifacts.

Core Result: Direct EC_POINT_mul Isolation

100 crafted keys spanning 0–49 zero nibbles, no EVP overhead, Fisher-Yates randomized:

  • Pearson r = -0.9828 (t = -52.61)
  • Slope = -389.4 ns/zero-nibble
  • Timing range: 23,114 ns (51,401–74,515 ns)
  • Ratio to EVP measurement: 0.957 — confirming the signal originates in scalar multiplication, not EVP context

This is the cleanest measurement: it isolates the exact EC_POINT_mul call used by SM2 decrypt, with no EVP/KDF/SM3 overhead.

Blind Hidden-Key Inference

The attacker has no access to the private key. Keys are generated, a fixed plaintext is encrypted with the public key, and the attacker infers key structure from EVP_PKEY_decrypt timing alone.

Broad range (40 keys, 0–50 zero nibbles, 4 buckets):

  • Accuracy within ±2 nibbles = 80%, MAE = 1.73

Natural range (20 keys from EC_KEY_generate_key, 0–8 zero nibbles):

  • 20/20 correct within ±2 nibbles (100%), MAE = 0.1

Extreme-Key Sanity Check

Using artificially crafted keys at the extremes (0 vs 63 zero nibbles): timing gap of 165.8 µs, Welch t = 2,748. These are not realistic keys but confirm the linear relationship between zero-nibble count and timing across the full range.

Negative Controls

x86_64 SM2 (Intel Cascade Lake, same PoC): nm libcrypto.so | grep sm2p256_point_P_mul returns nothing — the optimized function is absent from x86 builds.

PhaseARM64 (SM2)x86_64 (control)
Correlation r-0.98280.152 (no signal)
Hidden-key accuracy80%5% (random chance)
Natural-key r-0.9829-0.077 (no signal)

As an auxiliary harness sanity check, P-256 ECDH on the same ARM64 hardware showed r = 0.096 (no correlation, slope = 0.2 ns/nibble — noise floor). This uses a different API path (ECDH, not decrypt) but confirms the measurement infrastructure does not produce false positives.

Patch Status

An initial constant-time table-lookup patch eliminated the timing gap between a controlled pair of keys (Welch t: 240.98 → -0.29, 2.6% overhead). However, broader testing across 100 keys showed the correlation persisted at r = -0.97, because ecp_sm2p256_point_add() and ecp_sm2p256_point_double() also contain identity-dependent early returns (lines 186, 286, 294, 302) that leak through a separate mechanism. A complete fix must make the point arithmetic functions branchless as well — a fully branchless patch using constant_time_select has been written but not yet measured.


Evidence Artifacts

PoC source code, evidence tarballs, and measurement transcripts are available on request.

ArtifactDescription
poc_sm2_ecmul_direct.cDirect EC_POINT_mul isolation — proves signal is in scalar mul, not EVP
poc_sm2_airtight.cFull 3-phase PoC: shuffled correlation, blind inference, random keygen
poc_sm2_hidden_key.cBlind hidden-key inference via EVP_PKEY_decrypt
ARM64 evidence tarballNeoverse-N1 measurements, compiler flags, build log
x86 evidence tarballIntel Cascade Lake negative control
CT patch evidence tarballA/B patched vs unpatched comparison
This post is licensed under CC BY 4.0 by the author.