Up to [cvs.NetBSD.org] / src / sys / crypto / aes / arch / x86
Request diff between arbitrary revisions
Keyword substitution: kv
Default branch: MAIN
sys/crypto: Introduce arch/{arm,x86} to share common MD headers Dedup between aes and chacha. No binary changes.
Add some Intel intrinsics for ChaCha. _mm_load1_ps _mm_loadu_si128 _mm_movelh_ps _mm_slli_epi32 _mm_storeu_si128 _mm_unpackhi_epi32 _mm_unpacklo_epi32
Fix target attribute on _mm_movehl_ps, fix clang _mm_unpacklo_epi64. - _mm_movehl_ps is available in SSE2, no need for SSSE3. - _mm_unpacklo_epi64 operates on v2di, not v4si; fix.
Implement AES-CCM with SSSE3.
New permutation-based AES implementation using SSSE3. This covers a lot of CPUs -- particularly lower-end CPUs over the past decade which lack AES-NI. Derived from Mike Hamburg's public domain vpaes software; see <https://crypto.stanford.edu/vpaes/> for details.
New SSE2-based bitsliced AES implementation. This should work on essentially all x86 CPUs of the last two decades, and may improve throughput over the portable C aes_ct implementation from BearSSL by (a) reducing the number of vector operations in sequence, and (b) batching four rather than two blocks in parallel. Derived from BearSSL'S aes_ct64 implementation adjusted so that where aes_ct64 uses 64-bit q[0],...,q[7], aes_sse2 uses (q[0], q[4]), ..., (q[3], q[7]), each tuple representing a pair of 64-bit quantities stacked in a single 128-bit register. This translation was done very naively, and mostly reduces the cost of ShiftRows and data movement without doing anything to address the S-box or (Inv)MixColumns, which spread all 64-bit quantities across separate registers and ignore the upper halves. Unfortunately, SSE2 -- which is all that is guaranteed on all amd64 CPUs -- doesn't have PSHUFB, which would help out a lot more. For example, vpaes relies on that. Perhaps there are enough CPUs out there with PSHUFB but not AES-NI to make it worthwhile to import or adapt vpaes too. Note: This includes local definitions of various Intel compiler intrinsics for gcc and clang in terms of their __builtin_* &c., because the necessary header files are not available during the kernel build. This is a kludge -- we should fix it properly; the present approach is expedient but not ideal.