The NetBSD Project

CVS log for src/sys/crypto/aes/arch/arm/aes_neon_32.S

[BACK] Up to [cvs.NetBSD.org] / src / sys / crypto / aes / arch / arm

Request diff between arbitrary revisions


Keyword substitution: kv
Default branch: MAIN


Revision 1.11: download - view: text, markup, annotated - select for diffs
Thu Sep 10 11:31:03 2020 UTC (4 years, 4 months ago) by riastradh
Branches: MAIN
CVS tags: thorpej-ifq-base, thorpej-ifq, thorpej-i2c-spi-conf2-base, thorpej-i2c-spi-conf2, thorpej-i2c-spi-conf-base, thorpej-i2c-spi-conf, thorpej-futex2-base, thorpej-futex2, thorpej-futex-base, thorpej-futex, thorpej-cfargs2-base, thorpej-cfargs2, thorpej-cfargs-base, thorpej-cfargs, thorpej-altq-separation-base, thorpej-altq-separation, perseant-exfatfs-base-20240630, perseant-exfatfs-base, perseant-exfatfs, netbsd-10-base, netbsd-10-1-RELEASE, netbsd-10-0-RELEASE, netbsd-10-0-RC6, netbsd-10-0-RC5, netbsd-10-0-RC4, netbsd-10-0-RC3, netbsd-10-0-RC2, netbsd-10-0-RC1, netbsd-10, cjep_sun2x-base1, cjep_sun2x-base, cjep_sun2x, cjep_staticlib_x-base1, cjep_staticlib_x-base, cjep_staticlib_x, bouyer-sunxi-drm-base, bouyer-sunxi-drm, HEAD
Diff to: previous 1.10: preferred, colored
Changes since revision 1.10: +25 -39 lines
aes neon: Gather mc_forward/backward so we can load 256 bits at once.

Revision 1.10: download - view: text, markup, annotated - select for diffs
Thu Sep 10 11:30:28 2020 UTC (4 years, 4 months ago) by riastradh
Branches: MAIN
Diff to: previous 1.9: preferred, colored
Changes since revision 1.9: +10 -5 lines
aes neon: Hoist dsbd/dsbe address calculation out of loop.

Revision 1.9: download - view: text, markup, annotated - select for diffs
Thu Sep 10 11:30:08 2020 UTC (4 years, 4 months ago) by riastradh
Branches: MAIN
Diff to: previous 1.8: preferred, colored
Changes since revision 1.8: +36 -36 lines
aes neon: Tweak register usage.

- Call r12 by its usual name, ip.
- No need for r7 or r11=fp at the moment.

Revision 1.8: download - view: text, markup, annotated - select for diffs
Thu Sep 10 11:29:43 2020 UTC (4 years, 4 months ago) by riastradh
Branches: MAIN
Diff to: previous 1.7: preferred, colored
Changes since revision 1.7: +78 -78 lines
aes neon: Write vtbl with {qN} rather than {d(2N)-d(2N+1)}.

Cosmetic; no functional change.

Revision 1.7: download - view: text, markup, annotated - select for diffs
Thu Sep 10 11:29:02 2020 UTC (4 years, 4 months ago) by riastradh
Branches: MAIN
Diff to: previous 1.6: preferred, colored
Changes since revision 1.6: +89 -150 lines
aes neon: Issue 256-bit loads rather than pairs of 128-bit loads.

Not sure why I didn't realize you could do this before!

Saves some temporary registers that can now be allocated to shave off
a few cycles.

Revision 1.6: download - view: text, markup, annotated - select for diffs
Sun Aug 16 18:02:03 2020 UTC (4 years, 5 months ago) by riastradh
Branches: MAIN
Diff to: previous 1.5: preferred, colored
Changes since revision 1.5: +25 -7 lines
Fix AES NEON code for big-endian softfp ARM.

...which is how the kernel runs.  Switch to using __SOFTFP__ for
consistency with how it gets exposed to C, although I'm not sure how
to get it defined automagically in the toolchain for .S files so
that's set manually in files.aesneon for now.

Revision 1.5: download - view: text, markup, annotated - select for diffs
Sat Aug 8 14:47:01 2020 UTC (4 years, 5 months ago) by riastradh
Branches: MAIN
Diff to: previous 1.4: preferred, colored
Changes since revision 1.4: +37 -37 lines
Fix ARM NEON implementations of AES and ChaCha on big-endian ARM.

New macros such as VQ_N_U32(a,b,c,d) for NEON vector initializers.
Needed because GCC and Clang disagree on the ordering of lanes,
depending on whether it's 64-bit big-endian, 32-bit big-endian, or
little-endian -- and, bizarrely, both of them disagree with the
architectural numbering of lanes.

Experimented with using

static const uint8_t x8[16] = {...};

        uint8x16_t x = vld1q_u8(x8);

which doesn't require knowing anything about the ordering of lanes,
but this generates considerably worse code and apparently confuses
GCC into not recognizing the constant value of x8.

Fix some clang mistakes while here too.

Revision 1.4: download - view: text, markup, annotated - select for diffs
Mon Jul 27 20:57:23 2020 UTC (4 years, 5 months ago) by riastradh
Branches: MAIN
Diff to: previous 1.3: preferred, colored
Changes since revision 1.3: +3 -1 lines
Add RCSIDs to the AES and ChaCha .S sources.

Revision 1.3: download - view: text, markup, annotated - select for diffs
Mon Jul 27 20:53:22 2020 UTC (4 years, 5 months ago) by riastradh
Branches: MAIN
Diff to: previous 1.2: preferred, colored
Changes since revision 1.2: +3 -1 lines
Align critical-path loops in AES and ChaCha.

Revision 1.2: download - view: text, markup, annotated - select for diffs
Mon Jul 27 20:52:10 2020 UTC (4 years, 5 months ago) by riastradh
Branches: MAIN
Diff to: previous 1.1: preferred, colored
Changes since revision 1.1: +49 -29 lines
PIC for aes_neon_32.S.

Without this, tests/sys/crypto/aes/t_aes fails to start on armv7
because of R_ARM_ABS32 relocations in a nonwritable text segment for
a PIE -- which atf quietly ignores in the final report!  Yikes.

Revision 1.1: download - view: text, markup, annotated - select for diffs
Mon Jun 29 23:57:56 2020 UTC (4 years, 6 months ago) by riastradh
Branches: MAIN
Provide hand-written AES NEON assembly for arm32.

gcc does a lousy job at compiling 128-bit NEON intrinsics on arm32;
hand-writing it made it about 12x faster, by avoiding a zillion loads
and stores to spill everything and the kitchen sink onto the stack.
(But gcc does fine on aarch64, presumably because it has twice as
many registers and doesn't have to deal with q2=d4/d5 overlapping.)

Diff request

This form allows you to request diffs between any two revisions of a file. You may select a symbolic revision name using the selection box or you may type in a numeric name using the type-in text box.

Log view options

CVSweb <webmaster@jp.NetBSD.org>