A plan for developing the riscv architectural tests for the Scalar Crypto Extension.
The point of this test plan is to:
-
Explain what the RISC-V architectural tests try to achieve, both generally and for the scalar crypto instructions in particular.
-
List the kinds of coverage that the architectural tests try to meet, and to explain more and less important coverage cases for different kinds of instruction.
-
Act as a starting point for verification engineers writing verification plans. It describes real-world usage patterns of the instructions which constrained random stimulus generation flows can focus on.
Some useful links:
Some simple stimulus patterns described here and referred too later when talking about individual instructions.
-
single-bit-1- Each source register input has a single bit set. Test for all bits0⇐i<XLEN. Likewise, havesingle-bit-0. These are sometimes also referred too as walking ones or walking zeros. -
uniform-random- Each source register input is a uniform random number,XLEN-bits long. -
byte-count- Each source register input is divided into bytes, and each byte is incremented individually, starting at zero. Hence, for RV32, the first two input patterns would be0x03020100and0x07060504.
|
Note
|
Where some unknown number of test vectors will be needed to hit
coverage, this is usually left as N, which can be tuned later.
E.g. "generate N uniform random numbers…".
|
These are coverage points relevant for single instructions, and act as a bare minimum standard to hit for every instruction.
-
Have all values of
rd,rs1andrs2been covered where applicable? -
Have we seen:
-
rd==rs1,rd!=rs1 -
rd==rs2,rd!=rs2 -
rs1==rs2,rs1!=rs2
-
-
The immediates for all of the scalar crypto instructions are either
2or4bits, so we should aim for complete coverage of these.
-
Have we seen every bit set for each register input?
-
Have we seen every bit clear for each register input?
-
For instructions with an SBox (AES,SM4), do we have complete input coverage for each input to the SBox? For all instructions, this is just
0..255for each input byte.
The RV32 instructions have been put into groups of instructions which are similar from a coverage and stimulus perspective.
aes32dsi rt, rs2, bs aes32dsmi rt, rs2, bs aes32esi rt, rs2, bs aes32esmi rt, rs2, bs sm4ed rt, rs2, bs sm4ks rt, rs2, bs
All of these instructions have the same basic input patterns, and apply
an SBox to a single byte of rs2.
The bs immediate is 2 bits, and is used to select a byte of rs2
for further processing.
|
Note
|
The aes32* and sm4* instructions read and write the rt register.
It can be thought of as both rs1 and rd.
|
-
Test pattern 1: SBox Testing
-
This uses the
byte-countpattern described above. -
Generate a 256-byte sequence
0..255and pack the sequence into 32-bit words. -
Each word in the sequence is the
rs2input. Thers1input is set to zero so we do not alter the SBox output value. -
For each input word, generate
4instructions, withbs=0..3. This will mean that every possible SBox input pattern is tested.
-
-
Test pattern 2: Uniform Random
-
Generate uniform random values for
rs1,rs2andbs. -
Let register values be un-constrained:
0..31. -
Repeat
Ntimes for each instruction until sufficient coverage is reached.
-
-
Test pattern 3: real-world patterns:
-
Execute
4of each instruction adjacently. Each instruction has the samerdandrs1value, a differentrs2and a differentbsvalue. This mimics how the instructions will appear in real-world code, and tests things like pipeline forwarding.li a0, <random> li a1, <random> li a2, <random> li a3, <random> li a4, <random> aes32* a4, a0, 0 // This is the expected use-case sequence aes32* a4, a1, 1 // for these instructions. aes32* a4, a2, 2 aes32* a4, a3, 3
-
|
Note
|
These instructions are un-likely to ever appear interleaved with one another, so this pattern is left out for now. Forwarding between like-instructions is much more common. |
sha256sig0 rd, rs1 sha256sig1 rd, rs1 sha256sum0 rd, rs1 sha256sum1 rd, rs1 sm3p0 rd, rs1 sm3p1 rd, rs1
These instructions are all designed to accelerate hash functions, and
essentially perform rotations and/or shifts of rs1 by several different
constants, before xor’ing the results together.
-
Test pattern 1: Single bit testing
-
For each instruction, generate
XLENinputs with a single bit set. -
For each instruction, generate
XLENinputs with a single bit clear.
-
-
Test pattern 2: Uniform random.
-
For each instruction, generate
NXLENbit uniform random inputs.
-
-
Test pattern 3: Real-world usage.
-
Check forwarding result of
add/xor/not/andn/addinstruction into these instructions. -
Check forwarding result of these instructions into
add/xor/not/andn/addinstructions. -
Check load-to-use hazard into these instructions.
-
Check forwarding of these instructions into
rs1ofswinstruction.
-
sha512sig0h rd, rs1, rs2 sha512sig0l rd, rs1, rs2 sha512sig1h rd, rs1, rs2 sha512sig1l rd, rs1, rs2 sha512sum0r rd, rs1, rs2 sha512sum1r rd, rs1, rs2
These instructions are similar to the SHA2-256 and SM3 instructions.
The rs1 and rs2 operands are shifted left/right by several constants,
then xor’d together.
|
Note
|
The plan for these instructions is identical to the one for SHA2-256 and SM3, but with an additional register input to cover. |
-
Test pattern 1: Single bit testing
-
For each instruction, generate
XLENinputs with a single bit set. Do this for eachrs1andrs2. -
For each instruction, generate
XLENinputs with a single bit clear. Do this for eachrs1andrs2.
-
-
Test pattern 2: Uniform random.
-
For each instruction, generate
NXLENbit uniform random inputs forrs1andrs2.
-
-
Test pattern 3: Real-world usage.
-
Check forwarding result of
add/xor/not/andn/addinstruction into these instructions. -
Check forwarding result of these instructions into
add/xor/not/andn/addinstructions. -
Check load-to-use hazard into these instructions.
-
Check forwarding of these instructions into
rs1ofswinstruction.
-
The RV64 instructions have been put into groups of instructions which are similar from a coverage and stimulus perspective.
aes64ds rd, rs1, rs2 aes64dsm rd, rs1, rs2 aes64es rd, rs1, rs2 aes64esm rd, rs1, rs2
-
Test pattern 1: SBox Testing
-
This uses the
byte-countpattern described above. -
Generate a 256-byte sequence
0..255and pack the sequence into 64-bit words. -
For each pair of 64-bit words
iandj, wherej=i+1: -
Execute two of each instruction. One where
rs1=i, rs2=j, and one wherers1=jandrs2=i. Store the results of each instruction to the signature.
-
-
Test pattern 2: Uniform Random Testing
-
For
rs1andrs2, generate uniform random values and store the results to the signature.
-
-
Test pattern 3: Real-world usage
-
Execute two adjacent instructions of the same type, with:
-
Different destination registers.
-
The first instruction has
rs1=x, rs2=y, and the second instruction hasrs1=y, rs2=x. -
This is the most common usage pattern for the instructions.
-
-
Forward the result of an
xorinstruction into the instructions and vice-versa.
-
aes64ks1i rd, rs1, rcon
This instruction applies the AES Forward SBox to the low 32-bits
of rs1, with an optional rotation and xor depending on rcon.
rcon is 4-bits wide, with only values 0⇐rcon⇐0xA permitted.
-
Test pattern 1: SBox coverage
-
Uses the
byte-countpattern described above. -
Generate
64double-word inputs, such that the low4bytes of each double-word completely cover the0..255SBox input space. -
Execute one instruction per double-word input to get complete SBox input coverage.
-
The
rconimmediate should be set to0xAfor this, to avoid it altering the SBox output value and make debugging easier.
-
-
Test pattern 2: Uniform Random testing
-
Generate random 64-bit values for
rs1and random 4-bit values forrcon, where0⇐rcon⇐0xA. Record each result to the signature.
-
aes64ks2 rd, rs1, rs2
This instruction simply performs xor operations between high and low
words of rs1 and rs2 to produce a result.
-
Test pattern 1: Single bit testing
-
Generate
XLENinputs with a single bit set. -
Generate
XLENinputs with a single bit clear.
-
-
Test pattern 2: Uniform random.
-
Generate
NXLENbit uniform random inputs.
-
sha256sig0 rd, rs1 sha256sig1 rd, rs1 sha256sum0 rd, rs1 sha256sum1 rd, rs1 sha512sig0 rd, rs1 (RV64 Only) sha512sig1 rd, rs1 (RV64 Only) sha512sum0 rd, rs1 (RV64 Only) sha512sum1 rd, rs1 (RV64 Only) sm3p0 rd, rs1 sm3p1 rd, rs1 aes64im rd, rs1 (RV64 Only)
The SHA256 and SM3 instructions listed here are very similar to the RV32 SHA and SM3 listed instructions, but with zero extended 32-bit outputs and they ignore the high 32-bits of their inputs.
The SHA512 instructions are similar to the SHA256 instructions, but work across the entire 64-bits of the input.
The aes64im instruction implements the AES Inverse MixColumn transform
on each 32-bit word of rs1.
-
Test pattern 1: Single bit testing
-
Generate
XLENinputs with a single bit set. -
Generate
XLENinputs with a single bit clear.
-
-
Test pattern 2: Uniform random.
-
Generate
NXLENbit uniform random inputs.
-
-
Test pattern 3: Real-world usage - SHA and SM3
-
Check forwarding result of
add/xor/not/andn/addinstruction into these instructions. -
Check forwarding result of these instructions into
add/xor/not/andn/addinstructions. -
Check load-to-use hazard into these instructions.
-
Check forwarding of these instructions into
rs1ofswinstruction.
-
sm4ed rt, rs2, bs sm4ks rt, rs2, bs
|
Note
|
These instructions are identical to the RV32 versions, but are also available on RV64. On RV64, they ignore the high 32-bits of their register inputs, and zero extend the low 32-bits of their outputs. The same test plan may be used, accounting for the wider registers on RV64. |
|
Note
|
It is worth having a copy of the specification ready for this. |
The Entropy Source Extension consists of two machine-mode CSRs, and two pseudo-instructions to access them:
-
pollentropy rd: An alias forcsrrs rd, mentropy, x0. -
getnoise rd: An alias forcsrrs rd, mentropy, x0.
-
It must be possible to read and write
mnoisein machine mode.-
If
mnoiseis not implemented, it must always return zeros. -
An implementation can check if
mnoiseis implemented if it can set and clear bit31(NOISE_TEST). This is the only architecturally defined bit. -
Tests must determine if
mnoiseis implemented first, before checking any other behaviour, and accomodate this case in the test signature.
-
-
Accesses to
mnoisein any privilege mode other than machine mode must raise an Illegal Opcode Exception.
|
Note
|
It is possible that pre-tapeout or pre-validation, mnoise will
have different behaviour after post-silicon-validation. This is because
it is designed as a validation / certification interface to check that
the noise source is functioning correctly.
Once the noise source is validated, the interface may be disabled
permenantly. Tests must account for this in their signature generation.
|
The following tests must be written specifically for the mentropy
CSR related behaviour.
-
This is a machine-mode, read-only CSR. Tests should check that it is accessible only in machine mode.
-
Per section 2.1 of the privileged architecture specification: any write to
mentropymust raise an Illegal Instruction Exception. Tests must check this for all variants of CSR write instructions.
The following tests must be written to check for behaviour related to
values read from the mentropy CSR.
-
If the returned
OPSTfield is notES16, then theSEEDfield must be zero. A test may check this by readingpollentropymany times, and setting a bit iffOPST!=ES16 && SEED!=0is ever seen. Coverage bins should be used to check thatpollentropyreturned different values ofOPST. -
On RV64, the upper 32-bits of the return value must be zero.
-
When
mnoise.NOISE_TEST=1, thenpollentropymust always return withOPST=BIST.
-
The
wfiinstruction must be implemented, and not raise an Illegal Opcode Exception unless themstatus.TWbit is set. Thewfiinstruction may be implemented as anop. It is sufficient to check thatwfiexecutes without raising an Illegal Opcode Exception whenmstatus.TW=0using something like a contrived timer interrupt.
The scalar crypto ISE places additional constraints on instructions which are present in the base ISA, or Bitmanip standard extension.
mul rd, rs1, rs2 mulh rd, rs1, rs2 mulhu rd, rs1, rs2 mulhsu rd, rs1, rs2 mulw rd, rs1, rs2 clmul rd, rs1, rs2 clmulh rd, rs1, rs2
Per section 3.6 of the scalar crypto extension draft specification,
all of these instructions must execute in constant time with respect to their
inputs when rs1 ⇐ rs2.
If they are not, they create a (remotely) exploitable timing channel and are insecure from a cryptographic perspective. Common micro-architectural performance optimisations for these instructions include early termination and macro-op fusion.
|
Note
|
Do we also need to consider operand memoisation for multiplication? Yes: It does introduce a timing channel. No: That timing channel is very hard to exploit. |
-
Test pattern 1: Leading Ones
-
For each
rsregister input, generate a randomXLENinput value, and set the most-significantibits. See the otherrsinput, pick a random value. -
Repeat for values
0⇐i⇐XLEN. Theivalue can be stepped by a value greater than1to manage the test size.
-
-
Test pattern 2: Leading Zeros.
-
Repeat test pattern 1, but clear the top
ibits instead.
-
-
Test pattern 3: Trailing Zeros
-
Repeat test pattern 1, but clear the least-significant
ibits instead.
-
-
Test pattern 4: Trailing Ones
-
Repeat test pattern 1, but set the least-significant
ibits instead.
-
After executing each test input, the time rdcycle instruction is
used to record the amount of time taken to execute the relevant multiply
instruction.
Each execution time is recorded and compared to the previous
measurement.
If the two are not identical, a fail code is recorded to the
test signature, along with the inputs which caused the failure.
It may be more accurate to run several multiplication instructions in
sequence, so as to amortise any overhead introduced by rdcycle.
|
Caution
|
Will this give consistent results on modern micro-architectures?
Can we expect rdcycle ordering with respect to the multiplies to
be respected?
Chapter 10 of the user-level ISA spec has a long discussion on how
defining a cycle is hard, and offers no guarantees of portability.
Hence, it becomes much easier to identify when multiplication is not
constant time (and so insecure), but very hard to portably show that
multiplication is constant time.
We do not want to artificially limit the range of possible implementations
due to un-necessesarily restrictive compliance tests.
|
As well as individual instructions, recommended fusion pairs must also be tested. These are:
mulhu ra, rs1, rs2 // ra != rs1, rs2 mul rb, rs1, rs2 // rb != ra, rs1, rs2
and
clmulh ra, rs1, rs2 // ra != rs1, rs2 clmul rb, rs1, rs2 // rb != ra, rs1, rs2
The same set of test patterns can be used, treating rs1,rs2 as a
single 2*XLEN input.