Up to [cvs.NetBSD.org] / src / sys / netipsec
Request diff between arbitrary revisions
Keyword substitution: kv
Default branch: MAIN
Pull up following revision(s) (requested by rin in ticket #740): sys/netipsec/ipsec_input.c: revision 1.79 sys/netipsec/ipsec_output.c: revision 1.86 sys/netipsec/ipsec.c: revision 1.178 sys/netinet6/ip6_output.c: revision 1.232 ipsec: remove unnecessary splsoftnet Because the code of IPsec itself is already MP-safe.
sys: Drop redundant NULL check before m_freem(9) m_freem(9) safely has accepted NULL argument at least since 4.2BSD: https://www.tuhs.org/cgi-bin/utree.pl?file=4.2BSD/usr/src/sys/sys/uipc_mbuf.c Compile-tested on amd64/ALL. Suggested by knakahara@
ipsec: remove unnecessary splsoftnet Because the code of IPsec itself is already MP-safe.
fix various typos in comments and output/log messages.
Mostly merge changes from HEAD upto 20200411
Fix ipsecif(4) IPV6_MINMTU does not work correctly.
Pull up following revision(s) (requested by knakahara in ticket #1385): sys/net/if.c 1.461 sys/net/if.h 1.277 sys/net/if_gif.c 1.149 sys/net/if_gif.h 1.33 sys/net/if_ipsec.c 1.19,1.20,1.24 sys/net/if_ipsec.h 1.5 sys/net/if_l2tp.c 1.33,1.36-1.39 sys/net/if_l2tp.h 1.7,1.8 sys/net/route.c 1.220,1.221 sys/net/route.h 1.125 sys/netinet/in_gif.c 1.95 sys/netinet/in_l2tp.c 1.17 sys/netinet/ip_input.c 1.391,1.392 sys/netinet/wqinput.c 1.6 sys/netinet6/in6_gif.c 1.94 sys/netinet6/in6_l2tp.c 1.18 sys/netinet6/ip6_forward.c 1.97 sys/netinet6/ip6_input.c 1.210,1.211 sys/netipsec/ipsec_output.c 1.82,1.83 (patched) sys/netipsec/ipsecif.c 1.12,1.13,1.15,1.17 (patched) sys/netipsec/key.c 1.259,1.260 ipsecif(4) support input drop packet counter. ipsecif(4) should not increment drop counter by errors not related to if_snd. Pointed out by ozaki-r@n.o, thanks. Remove unnecessary addresses in PF_KEY message. MOBIKE Extensions for PF_KEY draft-schilcher-mobike-pfkey-extension-01.txt says ==================== 5. SPD Update // snip SADB_X_SPDADD: // snip sadb_x_ipsecrequest_reqid: An ID for that SA can be passed to the kernel in the sadb_x_ipsecrequest_reqid field. If tunnel mode is specified, the sadb_x_ipsecrequest structure is followed by two sockaddr structures that define the tunnel endpoint addresses. In the case that transport mode is used, no additional addresses are specified. ==================== see: <a rel="nofollow" href="https://tools.ietf.org/html/draft-schilcher-mobike-pfkey-extension-01">https://tools.ietf.org/html/draft-schilcher-mobike-pfkey-extension-01</a> ipsecif(4) uses transport mode, so it should not add addresses. ipsecif(4) supports multiple peers in the same NAPT. E.g. ipsec0 connects between NetBSD_A and NetBSD_B, ipsec1 connects NetBSD_A and NetBSD_C at the following figure. +----------+ +----| NetBSD_B | +----------+ +------+ | +----------+ | NetBSD_A |--- ... ---| NAPT |---+ +----------+ +------+ | +----------+ +----| NetBSD_C | +----------+ Add ATF later. l2tp(4): fix output bytes counter. Pointed by k-goda@IIJ, thanks. remove a variable which is no longer used. l2tp: initialize mowner variables for MBUFTRACE Avoid having a rtcache directly in a percpu storage percpu(9) has a certain memory storage for each CPU and provides it by the piece to users. If the storages went short, percpu(9) enlarges them by allocating new larger memory areas, replacing old ones with them and destroying the old ones. A percpu storage referenced by a pointer gotten via percpu_getref can be destroyed by the mechanism after a running thread sleeps even if percpu_putref has not been called. Using rtcache, i.e., packet processing, typically involves sleepable operations such as rwlock so we must avoid dereferencing a rtcache that is directly stored in a percpu storage during packet processing. Address this situation by having just a pointer to a rtcache in a percpu storage instead. Reviewed by knakahara@ and yamaguchi@ wqinput: avoid having struct wqinput_worklist directly in a percpu storage percpu(9) has a certain memory storage for each CPU and provides it by the piece to users. If the storages went short, percpu(9) enlarges them by allocating new larger memory areas, replacing old ones with them and destroying the old ones. A percpu storage referenced by a pointer gotten via percpu_getref can be destroyed by the mechanism after a running thread sleeps even if percpu_putref has not been called. Input handlers of wqinput normally involves sleepable operations so we must avoid dereferencing a percpu data (struct wqinput_worklist) after executing an input handler. Address this situation by having just a pointer to the data in a percpu storage instead. Reviewed by knakahara@ and yamaguchi@ Add missing #include <sys/kmem.h> Divide Tx context of l2tp(4) to improve performance. It seems l2tp(4) call path is too long for instruction cache. So, dividing l2tp(4) Tx context improves CPU use efficiency. After this commit, l2tp(4) throughput gains 10% on my machine(Atom C3000). Apply some missing changes lost on the previous commit Avoid having a rtcache directly in a percpu storage for tunnel protocols. percpu(9) has a certain memory storage for each CPU and provides it by the piece to users. If the storages went short, percpu(9) enlarges them by allocating new larger memory areas, replacing old ones with them and destroying the old ones. A percpu storage referenced by a pointer gotten via percpu_getref can be destroyed by the mechanism after a running thread sleeps even if percpu_putref has not been called. Using rtcache, i.e., packet processing, typically involves sleepable operations such as rwlock so we must avoid dereferencing a rtcache that is directly stored in a percpu storage during packet processing. Address this situation by having just a pointer to a rtcache in a percpu storage instead. Reviewed by ozaki-r@ and yamaguchi@ l2tp(4): avoid having struct ifqueue directly in a percpu storage. percpu(9) has a certain memory storage for each CPU and provides it by the piece to users. If the storages went short, percpu(9) enlarges them by allocating new larger memory areas, replacing old ones with them and destroying the old ones. A percpu storage referenced by a pointer gotten via percpu_getref can be destroyed by the mechanism after a running thread sleeps even if percpu_putref has not been called. Tx processing of l2tp(4) uses normally involves sleepable operations so we must avoid dereferencing a percpu data (struct ifqueue) after executing Tx processing. Address this situation by having just a pointer to the data in a percpu storage instead. Reviewed by ozaki-r@ and yamaguchi@
Pull up following revision(s) (requested by ozaki-r in ticket #238): sys/netipsec/ipsec_output.c: revision 1.83 sys/net/route.h: revision 1.125 sys/netinet6/ip6_input.c: revision 1.210 sys/netinet6/ip6_input.c: revision 1.211 sys/net/if.c: revision 1.461 sys/net/if_gif.h: revision 1.33 sys/net/route.c: revision 1.220 sys/net/route.c: revision 1.221 sys/net/if.h: revision 1.277 sys/netinet6/ip6_forward.c: revision 1.97 sys/netinet/wqinput.c: revision 1.6 sys/net/if_ipsec.h: revision 1.5 sys/netinet6/in6_l2tp.c: revision 1.18 sys/netinet6/in6_gif.c: revision 1.94 sys/net/if_l2tp.h: revision 1.7 sys/net/if_gif.c: revision 1.149 sys/net/if_l2tp.h: revision 1.8 sys/netinet/in_gif.c: revision 1.95 sys/netinet/in_l2tp.c: revision 1.17 sys/netipsec/ipsecif.c: revision 1.17 sys/net/if_ipsec.c: revision 1.24 sys/net/if_l2tp.c: revision 1.37 sys/netinet/ip_input.c: revision 1.391 sys/net/if_l2tp.c: revision 1.38 sys/netinet/ip_input.c: revision 1.392 sys/net/if_l2tp.c: revision 1.39 Avoid having a rtcache directly in a percpu storage percpu(9) has a certain memory storage for each CPU and provides it by the piece to users. If the storages went short, percpu(9) enlarges them by allocating new larger memory areas, replacing old ones with them and destroying the old ones. A percpu storage referenced by a pointer gotten via percpu_getref can be destroyed by the mechanism after a running thread sleeps even if percpu_putref has not been called. Using rtcache, i.e., packet processing, typically involves sleepable operations such as rwlock so we must avoid dereferencing a rtcache that is directly stored in a percpu storage during packet processing. Address this situation by having just a pointer to a rtcache in a percpu storage instead. Reviewed by knakahara@ and yamaguchi@ - wqinput: avoid having struct wqinput_worklist directly in a percpu storage percpu(9) has a certain memory storage for each CPU and provides it by the piece to users. If the storages went short, percpu(9) enlarges them by allocating new larger memory areas, replacing old ones with them and destroying the old ones. A percpu storage referenced by a pointer gotten via percpu_getref can be destroyed by the mechanism after a running thread sleeps even if percpu_putref has not been called. Input handlers of wqinput normally involves sleepable operations so we must avoid dereferencing a percpu data (struct wqinput_worklist) after executing an input handler. Address this situation by having just a pointer to the data in a percpu storage instead. Reviewed by knakahara@ and yamaguchi@ - Add missing #include <sys/kmem.h> - Divide Tx context of l2tp(4) to improve performance. It seems l2tp(4) call path is too long for instruction cache. So, dividing l2tp(4) Tx context improves CPU use efficiency. After this commit, l2tp(4) throughput gains 10% on my machine(Atom C3000). - Apply some missing changes lost on the previous commit - Avoid having a rtcache directly in a percpu storage for tunnel protocols. percpu(9) has a certain memory storage for each CPU and provides it by the piece to users. If the storages went short, percpu(9) enlarges them by allocating new larger memory areas, replacing old ones with them and destroying the old ones. A percpu storage referenced by a pointer gotten via percpu_getref can be destroyed by the mechanism after a running thread sleeps even if percpu_putref has not been called. Using rtcache, i.e., packet processing, typically involves sleepable operations such as rwlock so we must avoid dereferencing a rtcache that is directly stored in a percpu storage during packet processing. Address this situation by having just a pointer to a rtcache in a percpu storage instead. Reviewed by ozaki-r@ and yamaguchi@ - l2tp(4): avoid having struct ifqueue directly in a percpu storage. percpu(9) has a certain memory storage for each CPU and provides it by the piece to users. If the storages went short, percpu(9) enlarges them by allocating new larger memory areas, replacing old ones with them and destroying the old ones. A percpu storage referenced by a pointer gotten via percpu_getref can be destroyed by the mechanism after a running thread sleeps even if percpu_putref has not been called. Tx processing of l2tp(4) uses normally involves sleepable operations so we must avoid dereferencing a percpu data (struct ifqueue) after executing Tx processing. Address this situation by having just a pointer to the data in a percpu storage instead. Reviewed by ozaki-r@ and yamaguchi@
Avoid having a rtcache directly in a percpu storage percpu(9) has a certain memory storage for each CPU and provides it by the piece to users. If the storages went short, percpu(9) enlarges them by allocating new larger memory areas, replacing old ones with them and destroying the old ones. A percpu storage referenced by a pointer gotten via percpu_getref can be destroyed by the mechanism after a running thread sleeps even if percpu_putref has not been called. Using rtcache, i.e., packet processing, typically involves sleepable operations such as rwlock so we must avoid dereferencing a rtcache that is directly stored in a percpu storage during packet processing. Address this situation by having just a pointer to a rtcache in a percpu storage instead. Reviewed by knakahara@ and yamaguchi@
Sync with HEAD
Synch with HEAD
ipsecif(4) supports multiple peers in the same NAPT. E.g. ipsec0 connects between NetBSD_A and NetBSD_B, ipsec1 connects NetBSD_A and NetBSD_C at the following figure. +----------+ +----| NetBSD_B | +----------+ +------+ | +----------+ | NetBSD_A |--- ... ---| NAPT |---+ +----------+ +------+ | +----------+ +----| NetBSD_C | +----------+ Add ATF later.
Sync with HEAD, resolve a couple of conflicts
Support IPv6 NAT-T. Implemented by hsuenaga@IIJ and ohishi@IIJ. Add ATF later.
Sync with HEAD
Adapt rev1.75, suggested by Alexander Bluhm. Relax the checks to allow protocols smaller than two bytes (only IPPROTO_NONE). While here style.
Remove support for non-IKE markers in the kernel. Discussed on tech-net@, and now in PR/53334. Basically non-IKE markers come from a deprecated draft, and our kernel code for them has never worked. Setsockopt will now reject UDP_ENCAP_ESPINUDP_NON_IKE. Perhaps we should also add a check in key_handle_natt_info(), to make sure we also reject UDP_ENCAP_ESPINUDP_NON_IKE in the SADB.
Sync with HEAD
Remove a dummy reference to XF_IP4, explain briefly why we don't use ipe4_xformsw, and remove unused includes.
Remove now unused 'isr', 'skip' and 'protoff' arguments from ipip_output.
Remove unused 'mp' argument from all the xf_output functions. Also clean up xform.h a bit.
Pull up following revision(s) (requested by maxv in ticket #799): sys/netipsec/ipsec_output.c: revision 1.75 sys/netipsec/ipsec_output.c: revision 1.67 Strengthen this check, to make sure there is room for an ip6_ext structure. Seems possible to crash m_copydata here (but I didn't test more than that). Fix the checks in compute_ipsec_pos, otherwise m_copydata could crash. I already fixed half of the problem two months ago in rev1.67, back then I thought it was not triggerable because each packet we emit is guaranteed to have correctly formed IPv6 options; but it is actually triggerable via IPv6 forwarding, we emit a packet we just received, and we don't sanitize its options before invoking IPsec. Since it would be wrong to just stop the iteration and continue the IPsec processing, allow compute_ipsec_pos to fail, and when it does, drop the packet entirely.
Pull up following revision(s) (requested by maxv in ticket #1600): sys/netipsec/ipsec_output.c: revision 1.67,1.75 (via patch) Strengthen this check, to make sure there is room for an ip6_ext structure. Seems possible to crash m_copydata here (but I didn't test more than that). Fix the checks in compute_ipsec_pos, otherwise m_copydata could crash. I already fixed half of the problem two months ago in rev1.67, back then I thought it was not triggerable because each packet we emit is guaranteed to have correctly formed IPv6 options; but it is actually triggerable via IPv6 forwarding, we emit a packet we just received, and we don't sanitize its options before invoking IPsec. Since it would be wrong to just stop the iteration and continue the IPsec processing, allow compute_ipsec_pos to fail, and when it does, drop the packet entirely.
Pull up following revision(s) (requested by maxv in ticket #1600): sys/netipsec/ipsec_output.c: revision 1.67,1.75 (via patch) Strengthen this check, to make sure there is room for an ip6_ext structure. Seems possible to crash m_copydata here (but I didn't test more than that). Fix the checks in compute_ipsec_pos, otherwise m_copydata could crash. I already fixed half of the problem two months ago in rev1.67, back then I thought it was not triggerable because each packet we emit is guaranteed to have correctly formed IPv6 options; but it is actually triggerable via IPv6 forwarding, we emit a packet we just received, and we don't sanitize its options before invoking IPsec. Since it would be wrong to just stop the iteration and continue the IPsec processing, allow compute_ipsec_pos to fail, and when it does, drop the packet entirely.
Pull up following revision(s) (requested by maxv in ticket #1600): sys/netipsec/ipsec_output.c: revision 1.67,1.75 (via patch) Strengthen this check, to make sure there is room for an ip6_ext structure. Seems possible to crash m_copydata here (but I didn't test more than that). Fix the checks in compute_ipsec_pos, otherwise m_copydata could crash. I already fixed half of the problem two months ago in rev1.67, back then I thought it was not triggerable because each packet we emit is guaranteed to have correctly formed IPv6 options; but it is actually triggerable via IPv6 forwarding, we emit a packet we just received, and we don't sanitize its options before invoking IPsec. Since it would be wrong to just stop the iteration and continue the IPsec processing, allow compute_ipsec_pos to fail, and when it does, drop the packet entirely.
Pull up following revision(s) (requested by maxv in ticket #1546): sys/netipsec/ipsec_output.c: revision 1.67,1.75 (via patch) Strengthen this check, to make sure there is room for an ip6_ext structure. Seems possible to crash m_copydata here (but I didn't test more than that). Fix the checks in compute_ipsec_pos, otherwise m_copydata could crash. I already fixed half of the problem two months ago in rev1.67, back then I thought it was not triggerable because each packet we emit is guaranteed to have correctly formed IPv6 options; but it is actually triggerable via IPv6 forwarding, we emit a packet we just received, and we don't sanitize its options before invoking IPsec. Since it would be wrong to just stop the iteration and continue the IPsec processing, allow compute_ipsec_pos to fail, and when it does, drop the packet entirely.
Pull up following revision(s) (requested by maxv in ticket #1546): sys/netipsec/ipsec_output.c: revision 1.67,1.75 (via patch) Strengthen this check, to make sure there is room for an ip6_ext structure. Seems possible to crash m_copydata here (but I didn't test more than that). Fix the checks in compute_ipsec_pos, otherwise m_copydata could crash. I already fixed half of the problem two months ago in rev1.67, back then I thought it was not triggerable because each packet we emit is guaranteed to have correctly formed IPv6 options; but it is actually triggerable via IPv6 forwarding, we emit a packet we just received, and we don't sanitize its options before invoking IPsec. Since it would be wrong to just stop the iteration and continue the IPsec processing, allow compute_ipsec_pos to fail, and when it does, drop the packet entirely.
Pull up following revision(s) (requested by maxv in ticket #1546): sys/netipsec/ipsec_output.c: revision 1.67,1.75 (via patch) Strengthen this check, to make sure there is room for an ip6_ext structure. Seems possible to crash m_copydata here (but I didn't test more than that). Fix the checks in compute_ipsec_pos, otherwise m_copydata could crash. I already fixed half of the problem two months ago in rev1.67, back then I thought it was not triggerable because each packet we emit is guaranteed to have correctly formed IPv6 options; but it is actually triggerable via IPv6 forwarding, we emit a packet we just received, and we don't sanitize its options before invoking IPsec. Since it would be wrong to just stop the iteration and continue the IPsec processing, allow compute_ipsec_pos to fail, and when it does, drop the packet entirely.
Synch with HEAD
Fix the checks in compute_ipsec_pos, otherwise m_copydata could crash. I already fixed half of the problem two months ago in rev1.67, back then I thought it was not triggerable because each packet we emit is guaranteed to have correctly formed IPv6 options; but it is actually triggerable via IPv6 forwarding, we emit a packet we just received, and we don't sanitize its options before invoking IPsec. Since it would be wrong to just stop the iteration and continue the IPsec processing, allow compute_ipsec_pos to fail, and when it does, drop the packet entirely.
Remove IPSEC_SPLASSERT_SOFTNET, it has always been a no-op.
Sync with HEAD
Remove extra long file paths from the headers.
style
Call m_pullup earlier, fixes one branch.
Add KASSERTs, we don't want m_nextpkt in ipsec{4/6}_process_packet.
Fix mbuf mistake: we are using ip6 before it is pulled up properly.
Style, no functional change.
Strengthen this check, to make sure there is room for an ip6_ext structure. Seems possible to crash m_copydata here (but I didn't test more than that).
Remove unused net_osdep.h include.
Pull up following revision(s) (requested by ozaki-r in ticket #456): sys/arch/arm/sunxi/sunxi_emac.c: 1.9 sys/dev/ic/dwc_gmac.c: 1.43-1.44 sys/dev/pci/if_iwm.c: 1.75 sys/dev/pci/if_wm.c: 1.543 sys/dev/pci/ixgbe/ixgbe.c: 1.112 sys/dev/pci/ixgbe/ixv.c: 1.74 sys/kern/sys_socket.c: 1.75 sys/net/agr/if_agr.c: 1.43 sys/net/bpf.c: 1.219 sys/net/if.c: 1.397, 1.399, 1.401-1.403, 1.406-1.410, 1.412-1.416 sys/net/if.h: 1.242-1.247, 1.250, 1.252-1.257 sys/net/if_bridge.c: 1.140 via patch, 1.142-1.146 sys/net/if_etherip.c: 1.40 sys/net/if_ethersubr.c: 1.243, 1.246 sys/net/if_faith.c: 1.57 sys/net/if_gif.c: 1.132 sys/net/if_l2tp.c: 1.15, 1.17 sys/net/if_loop.c: 1.98-1.101 sys/net/if_media.c: 1.35 sys/net/if_pppoe.c: 1.131-1.132 sys/net/if_spppsubr.c: 1.176-1.177 sys/net/if_tun.c: 1.142 sys/net/if_vlan.c: 1.107, 1.109, 1.114-1.121 sys/net/npf/npf_ifaddr.c: 1.3 sys/net/npf/npf_os.c: 1.8-1.9 sys/net/rtsock.c: 1.230 sys/netcan/if_canloop.c: 1.3-1.5 sys/netinet/if_arp.c: 1.255 sys/netinet/igmp.c: 1.65 sys/netinet/in.c: 1.210-1.211 sys/netinet/in_pcb.c: 1.180 sys/netinet/ip_carp.c: 1.92, 1.94 sys/netinet/ip_flow.c: 1.81 sys/netinet/ip_input.c: 1.362 sys/netinet/ip_mroute.c: 1.147 sys/netinet/ip_output.c: 1.283, 1.285, 1.287 sys/netinet6/frag6.c: 1.61 sys/netinet6/in6.c: 1.251, 1.255 sys/netinet6/in6_pcb.c: 1.162 sys/netinet6/ip6_flow.c: 1.35 sys/netinet6/ip6_input.c: 1.183 sys/netinet6/ip6_output.c: 1.196 sys/netinet6/mld6.c: 1.90 sys/netinet6/nd6.c: 1.239-1.240 sys/netinet6/nd6_nbr.c: 1.139 sys/netinet6/nd6_rtr.c: 1.136 sys/netipsec/ipsec_output.c: 1.65 sys/rump/net/lib/libnetinet/netinet_component.c: 1.9-1.10 kmem_intr_free kmem_intr_[z]alloced memory the underlying pools are the same but api-wise those should match Unify IFEF_*_MPSAFE into IFEF_MPSAFE There are already two flags for if_output and if_start, however, it seems such MPSAFE flags are eventually needed for all if_XXX operations. Having discrete flags for each operation is wasteful of if_extflags bits. So let's unify the flags into one: IFEF_MPSAFE. Fortunately IFEF_*_MPSAFE flags have never been included in any releases, so we can change them without breaking backward compatibility of the releases (though the kernel version of -current should be bumped). Note that if an interface have both MP-safe and non-MP-safe operations at a time, we have to set the IFEF_MPSAFE flag and let callees of non-MP-safe opeartions take the kernel lock. Proposed on tech-kern@ and tech-net@ Provide macros for softnet_lock and KERNEL_LOCK hiding NET_MPSAFE switch It reduces C&P codes such as "#ifndef NET_MPSAFE KERNEL_LOCK(1, NULL); ..." scattered all over the source code and makes it easy to identify remaining KERNEL_LOCK and/or softnet_lock that are held even if NET_MPSAFE. No functional change Hold KERNEL_LOCK on if_ioctl selectively based on IFEF_MPSAFE If IFEF_MPSAFE is set, hold the lock and otherwise don't hold. This change requires additions of KERNEL_LOCK to subsequence functions from if_ioctl such as ifmedia_ioctl and ifioctl_common to protect non-MP-safe components. Proposed on tech-kern@ and tech-net@ Ensure to hold if_ioctl_lock when calling if_flags_set Fix locking against myself on ifpromisc vlan_unconfig_locked could be called with holding if_ioctl_lock. Ensure to not turn on IFF_RUNNING of an interface until its initialization completes And ensure to turn off it before destruction as per IFF_RUNNING's description "resource allocated". (The description is a bit doubtful though, I believe the change is still proper.) Ensure to hold if_ioctl_lock on if_up and if_down One exception for if_down is if_detach; in the case the lock isn't needed because it's guaranteed that no other one can access ifp at that point. Make if_link_queue MP-safe if IFEF_MPSAFE if_link_queue is a queue to store events of link state changes, which is used to pass events from (typically) an interrupt handler to if_link_state_change softint. The queue was protected by KERNEL_LOCK so far, but if IFEF_MPSAFE is enabled, it becomes unsafe because (perhaps) an interrupt handler of an interface with IFEF_MPSAFE doesn't take KERNEL_LOCK. Protect it by a spin mutex. Additionally with this change KERNEL_LOCK of if_link_state_change softint is omitted if NET_MPSAFE is enabled. Note that the spin mutex is now ifp->if_snd.ifq_lock as well as the case of if_timer (see the comment). Use IFADDR_WRITER_FOREACH instead of IFADDR_READER_FOREACH At that point no other one modifies the list so IFADDR_READER_FOREACH is unnecessary. Use of IFADDR_READER_FOREACH is harmless in general though, if we try to detect contract violations of pserialize, using it violates the contract. So avoid using it makes life easy. Ensure to call if_addr_init with holding if_ioctl_lock Get rid of outdated comments Fix build of kernels without ether By throwing out if_enable_vlan_mtu and if_disable_vlan_mtu that created a unnecessary dependency from if.c to if_ethersubr.c. PR kern/52790 Rename IFNET_LOCK to IFNET_GLOBAL_LOCK IFNET_LOCK will be used in another lock, if_ioctl_lock (might be renamed then). Wrap if_ioctl_lock with IFNET_* macros (NFC) Also if_ioctl_lock perhaps needs to be renamed to something because it's now not just for ioctl... Reorder some destruction routines in if_detach - Destroy if_ioctl_lock at the end of the if_detach because it's used in various destruction routines - Move psref_target_destroy after pr_purgeif because we want to use psref in pr_purgeif (otherwise destruction procedures can be tricky) Ensure to call if_mcast_op with holding IFNET_LOCK Note that CARP doesn't deal with IFNET_LOCK yet. Remove IFNET_GLOBAL_LOCK where it's unnecessary because IFNET_LOCK is held Describe which lock is used to protect each member variable of struct ifnet Requested by skrll@ Write a guideline for converting an interface to IFEF_MPSAFE Requested by skrll@ Note that IFNET_LOCK must not be held in softint Don't set IFEF_MPSAFE unless NET_MPSAFE at this point Because recent investigations show that interfaces with IFEF_MPSAFE need to follow additional restrictions to work with the flag safely. We should enable it on an interface by default only if the interface surely satisfies the restrictions, which are described in if.h. Note that enabling IFEF_MPSAFE solely gains a few benefit on performance because the network stack is still serialized by the big kernel locks by default.
update from HEAD
Provide macros for softnet_lock and KERNEL_LOCK hiding NET_MPSAFE switch It reduces C&P codes such as "#ifndef NET_MPSAFE KERNEL_LOCK(1, NULL); ..." scattered all over the source code and makes it easy to identify remaining KERNEL_LOCK and/or softnet_lock that are held even if NET_MPSAFE. No functional change
Pull up following revision(s) (requested by ozaki-r in ticket #300): crypto/dist/ipsec-tools/src/setkey/parse.y: 1.19 crypto/dist/ipsec-tools/src/setkey/token.l: 1.20 distrib/sets/lists/tests/mi: 1.754, 1.757, 1.759 doc/TODO.smpnet: 1.12-1.13 sys/net/pfkeyv2.h: 1.32 sys/net/raw_cb.c: 1.23-1.24, 1.28 sys/net/raw_cb.h: 1.28 sys/net/raw_usrreq.c: 1.57-1.58 sys/net/rtsock.c: 1.228-1.229 sys/netinet/in_proto.c: 1.125 sys/netinet/ip_input.c: 1.359-1.361 sys/netinet/tcp_input.c: 1.359-1.360 sys/netinet/tcp_output.c: 1.197 sys/netinet/tcp_var.h: 1.178 sys/netinet6/icmp6.c: 1.213 sys/netinet6/in6_proto.c: 1.119 sys/netinet6/ip6_forward.c: 1.88 sys/netinet6/ip6_input.c: 1.181-1.182 sys/netinet6/ip6_output.c: 1.193 sys/netinet6/ip6protosw.h: 1.26 sys/netipsec/ipsec.c: 1.100-1.122 sys/netipsec/ipsec.h: 1.51-1.61 sys/netipsec/ipsec6.h: 1.18-1.20 sys/netipsec/ipsec_input.c: 1.44-1.51 sys/netipsec/ipsec_netbsd.c: 1.41-1.45 sys/netipsec/ipsec_output.c: 1.49-1.64 sys/netipsec/ipsec_private.h: 1.5 sys/netipsec/key.c: 1.164-1.234 sys/netipsec/key.h: 1.20-1.32 sys/netipsec/key_debug.c: 1.18-1.21 sys/netipsec/key_debug.h: 1.9 sys/netipsec/keydb.h: 1.16-1.20 sys/netipsec/keysock.c: 1.59-1.62 sys/netipsec/keysock.h: 1.10 sys/netipsec/xform.h: 1.9-1.12 sys/netipsec/xform_ah.c: 1.55-1.74 sys/netipsec/xform_esp.c: 1.56-1.72 sys/netipsec/xform_ipcomp.c: 1.39-1.53 sys/netipsec/xform_ipip.c: 1.50-1.54 sys/netipsec/xform_tcp.c: 1.12-1.16 sys/rump/librump/rumpkern/Makefile.rumpkern: 1.170 sys/rump/librump/rumpnet/net_stub.c: 1.27 sys/sys/protosw.h: 1.67-1.68 tests/net/carp/t_basic.sh: 1.7 tests/net/if_gif/t_gif.sh: 1.11 tests/net/if_l2tp/t_l2tp.sh: 1.3 tests/net/ipsec/Makefile: 1.7-1.9 tests/net/ipsec/algorithms.sh: 1.5 tests/net/ipsec/common.sh: 1.4-1.6 tests/net/ipsec/t_ipsec_ah_keys.sh: 1.2 tests/net/ipsec/t_ipsec_esp_keys.sh: 1.2 tests/net/ipsec/t_ipsec_gif.sh: 1.6-1.7 tests/net/ipsec/t_ipsec_l2tp.sh: 1.6-1.7 tests/net/ipsec/t_ipsec_misc.sh: 1.8-1.18 tests/net/ipsec/t_ipsec_sockopt.sh: 1.1-1.2 tests/net/ipsec/t_ipsec_tcp.sh: 1.1-1.2 tests/net/ipsec/t_ipsec_transport.sh: 1.5-1.6 tests/net/ipsec/t_ipsec_tunnel.sh: 1.9 tests/net/ipsec/t_ipsec_tunnel_ipcomp.sh: 1.1-1.2 tests/net/ipsec/t_ipsec_tunnel_odd.sh: 1.3 tests/net/mcast/t_mcast.sh: 1.6 tests/net/net/t_ipaddress.sh: 1.11 tests/net/net_common.sh: 1.20 tests/net/npf/t_npf.sh: 1.3 tests/net/route/t_flags.sh: 1.20 tests/net/route/t_flags6.sh: 1.16 usr.bin/netstat/fast_ipsec.c: 1.22 Do m_pullup before mtod It may fix panicks of some tests on anita/sparc and anita/GuruPlug. --- KNF --- Enable DEBUG for babylon5 --- Apply C99-style struct initialization to xformsw --- Tweak outputs of netstat -s for IPsec - Get rid of "Fast" - Use ipsec and ipsec6 for titles to clarify protocol - Indent outputs of sub protocols Original outputs were organized like this: (Fast) IPsec: IPsec ah: IPsec esp: IPsec ipip: IPsec ipcomp: (Fast) IPsec: IPsec ah: IPsec esp: IPsec ipip: IPsec ipcomp: New outputs are organized like this: ipsec: ah: esp: ipip: ipcomp: ipsec6: ah: esp: ipip: ipcomp: --- Add test cases for IPComp --- Simplify IPSEC_OSTAT macro (NFC) --- KNF; replace leading whitespaces with hard tabs --- Introduce and use SADB_SASTATE_USABLE_P --- KNF --- Add update command for testing Updating an SA (SADB_UPDATE) requires that a process issuing SADB_UPDATE is the same as a process issued SADB_ADD (or SADB_GETSPI). This means that update command must be used with add command in a configuration of setkey. This usage is normally meaningless but useful for testing (and debugging) purposes. --- Add test cases for updating SA/SP The tests require newly-added udpate command of setkey. --- PR/52346: Frank Kardel: Fix checksumming for NAT-T See XXX for improvements. --- Remove codes for PACKET_TAG_IPSEC_IN_CRYPTO_DONE It seems that PACKET_TAG_IPSEC_IN_CRYPTO_DONE is for network adapters that have IPsec accelerators; a driver sets the mtag to a packet when its device has already encrypted the packet. Unfortunately no driver implements such offload features for long years and seems unlikely to implement them soon. (Note that neither FreeBSD nor Linux doesn't have such drivers.) Let's remove related (unused) codes and simplify the IPsec code. --- Fix usages of sadb_msg_errno --- Avoid updating sav directly On SADB_UPDATE a target sav was updated directly, which was unsafe. Instead allocate another sav, copy variables of the old sav to the new one and replace the old one with the new one. --- Simplify; we can assume sav->tdb_xform cannot be NULL while it's valid --- Rename key_alloc* functions (NFC) We shouldn't use the term "alloc" for functions that just look up data and actually don't allocate memory. --- Use explicit_memset to surely zero-clear key_auth and key_enc --- Make sure to clear keys on error paths of key_setsaval --- Add missing KEY_FREESAV --- Make sure a sav is inserted to a sah list after its initialization completes --- Remove unnecessary zero-clearing codes from key_setsaval key_setsaval is now used only for a newly-allocated sav. (It was used to reset variables of an existing sav.) --- Correct wrong assumption of sav->refcnt in key_delsah A sav in a list is basically not to be sav->refcnt == 0. And also KEY_FREESAV assumes sav->refcnt > 0. --- Let key_getsavbyspi take a reference of a returning sav --- Use time_mono_to_wall (NFC) --- Separate sending message routine (NFC) --- Simplify; remove unnecessary zero-clears key_freesaval is used only when a target sav is being destroyed. --- Omit NULL checks for sav->lft_c sav->lft_c can be NULL only when initializing or destroying sav. --- Omit unnecessary NULL checks for sav->sah --- Omit unnecessary check of sav->state key_allocsa_policy picks a sav of either MATURE or DYING so we don't need to check its state again. --- Simplify; omit unnecessary saidx passing - ipsec_nextisr returns a saidx but no caller uses it - key_checkrequest is passed a saidx but it can be gotton by another argument (isr) --- Fix splx isn't called on some error paths --- Fix header size calculation of esp where sav is NULL --- Fix header size calculation of ah in the case sav is NULL This fix was also needed for esp. --- Pass sav directly to opencrypto callback In a callback, use a passed sav as-is by default and look up a sav only if the passed sav is dead. --- Avoid examining freshness of sav on packet processing If a sav list is sorted (by lft_c->sadb_lifetime_addtime) in advance, we don't need to examine each sav and also don't need to delete one on the fly and send up a message. Fortunately every sav lists are sorted as we need. Added key_validate_savlist validates that each sav list is surely sorted (run only if DEBUG because it's not cheap). --- Add test cases for SAs with different SPIs --- Prepare to stop using isr->sav isr is a shared resource and using isr->sav as a temporal storage for each packet processing is racy. And also having a reference from isr to sav makes the lifetime of sav non-deterministic; such a reference is removed when a packet is processed and isr->sav is overwritten by new one. Let's have a sav locally for each packet processing instead of using shared isr->sav. However this change doesn't stop using isr->sav yet because there are some users of isr->sav. isr->sav will be removed after the users find a way to not use isr->sav. --- Fix wrong argument handling --- fix printf format. --- Don't validate sav lists of LARVAL or DEAD states We don't sort the lists so the validation will always fail. Fix PR kern/52405 --- Make sure to sort the list when changing the state by key_sa_chgstate --- Rename key_allocsa_policy to key_lookup_sa_bysaidx --- Separate test files --- Calculate ah_max_authsize on initialization as well as esp_max_ivlen --- Remove m_tag_find(PACKET_TAG_IPSEC_PENDING_TDB) because nobody sets the tag --- Restore a comment removed in previous The comment is valid for the below code. --- Make tests more stable sleep command seems to wait longer than expected on anita so use polling to wait for a state change. --- Add tests that explicitly delete SAs instead of waiting for expirations --- Remove invalid M_AUTHIPDGM check on ESP isr->sav M_AUTHIPDGM flag is set to a mbuf in ah_input_cb. An sav of ESP can have AH authentication as sav->tdb_authalgxform. However, in that case esp_input and esp_input_cb are used to do ESP decryption and AH authentication and M_AUTHIPDGM never be set to a mbuf. So checking M_AUTHIPDGM of a mbuf on isr->sav of ESP is meaningless. --- Look up sav instead of relying on unstable sp->req->sav This code is executed only in an error path so an additional lookup doesn't matter. --- Correct a comment --- Don't release sav if calling crypto_dispatch again --- Remove extra KEY_FREESAV from ipsec_process_done It should be done by the caller. --- Don't bother the case of crp->crp_buf == NULL in callbacks --- Hold a reference to an SP during opencrypto processing An SP has a list of isr (ipsecrequest) that represents a sequence of IPsec encryption/authentication processing. One isr corresponds to one opencrypto processing. The lifetime of an isr follows its SP. We pass an isr to a callback function of opencrypto to continue to a next encryption/authentication processing. However nobody guaranteed that the isr wasn't freed, i.e., its SP wasn't destroyed. In order to avoid such unexpected destruction of isr, hold a reference to its SP during opencrypto processing. --- Don't make SAs expired on tests that delete SAs explicitly --- Fix a debug message --- Dedup error paths (NFC) --- Use pool to allocate tdb_crypto For ESP and AH, we need to allocate an extra variable space in addition to struct tdb_crypto. The fixed size of pool items may be larger than an actual requisite size of a buffer, but still the performance improvement by replacing malloc with pool wins. --- Don't use unstable isr->sav for header size calculations We may need to optimize to not look up sav here for users that don't need to know an exact size of headers (e.g., TCP segmemt size caclulation). --- Don't use sp->req->sav when handling NAT-T ESP fragmentation In order to do this we need to look up a sav however an additional look-up degrades performance. A sav is later looked up in ipsec4_process_packet so delay the fragmentation check until then to avoid an extra look-up. --- Don't use key_lookup_sp that depends on unstable sp->req->sav It provided a fast look-up of SP. We will provide an alternative method in the future (after basic MP-ification finishes). --- Stop setting isr->sav on looking up sav in key_checkrequest --- Remove ipsecrequest#sav --- Stop setting mtag of PACKET_TAG_IPSEC_IN_DONE because there is no users anymore --- Skip ipsec_spi_*_*_preferred_new_timeout when running on qemu Probably due to PR 43997 --- Add localcount to rump kernels --- Remove unused macro --- Fix key_getcomb_setlifetime The fix adjusts a soft limit to be 80% of a corresponding hard limit. I'm not sure the fix is really correct though, at least the original code is wrong. A passed comb is zero-cleared before calling key_getcomb_setlifetime, so comb->sadb_comb_soft_addtime = comb->sadb_comb_soft_addtime * 80 / 100; is meaningless. --- Provide and apply key_sp_refcnt (NFC) It simplifies further changes. --- Fix indentation Pointed out by knakahara@ --- Use pslist(9) for sptree --- Don't acquire global locks for IPsec if NET_MPSAFE Note that the change is just to make testing easy and IPsec isn't MP-safe yet. --- Let PF_KEY socks hold their own lock instead of softnet_lock Operations on SAD and SPD are executed via PF_KEY socks. The operations include deletions of SAs and SPs that will use synchronization mechanisms such as pserialize_perform to wait for references to SAs and SPs to be released. It is known that using such mechanisms with holding softnet_lock causes a dead lock. We should avoid the situation. --- Make IPsec SPD MP-safe We use localcount(9), not psref(9), to make the sptree and secpolicy (SP) entries MP-safe because SPs need to be referenced over opencrypto processing that executes a callback in a different context. SPs on sockets aren't managed by the sptree and can be destroyed in softint. localcount_drain cannot be used in softint so we delay the destruction of such SPs to a thread context. To do so, a list to manage such SPs is added (key_socksplist) and key_timehandler_spd deletes dead SPs in the list. For more details please read the locking notes in key.c. Proposed on tech-kern@ and tech-net@ --- Fix updating ipsec_used - key_update_used wasn't called in key_api_spddelete2 and key_api_spdflush - key_update_used wasn't called if an SP had been added/deleted but a reply to userland failed --- Fix updating ipsec_used; turn on when SPs on sockets are added --- Add missing IPsec policy checks to icmp6_rip6_input icmp6_rip6_input is quite similar to rip6_input and the same checks exist in rip6_input. --- Add test cases for setsockopt(IP_IPSEC_POLICY) --- Don't use KEY_NEWSP for dummy SP entries By the change KEY_NEWSP is now not called from softint anymore and we can use kmem_zalloc with KM_SLEEP for KEY_NEWSP. --- Comment out unused functions --- Add test cases that there are SPs but no relevant SAs --- Don't allow sav->lft_c to be NULL lft_c of an sav that was created by SADB_GETSPI could be NULL. --- Clean up clunky eval strings - Remove unnecessary \ at EOL - This allows to omit ; too - Remove unnecessary quotes for arguments of atf_set - Don't expand $DEBUG in eval - We expect it's expanded on execution Suggested by kre@ --- Remove unnecessary KEY_FREESAV in an error path sav should be freed (unreferenced) by the caller. --- Use pslist(9) for sahtree --- Use pslist(9) for sah->savtree --- Rename local variable newsah to sah It may not be new. --- MP-ify SAD slightly - Introduce key_sa_mtx and use it for some list operations - Use pserialize for some list iterations --- Introduce KEY_SA_UNREF and replace KEY_FREESAV with it where sav will never be actually freed in the future KEY_SA_UNREF is still key_freesav so no functional change for now. This change reduces diff of further changes. --- Remove out-of-date log output Pointed out by riastradh@ --- Use KDASSERT instead of KASSERT for mutex_ownable Because mutex_ownable is too heavy to run in a fast path even for DIAGNOSTIC + LOCKDEBUG. Suggested by riastradh@ --- Assemble global lists and related locks into cache lines (NFCI) Also rename variable names from *tree to *list because they are just lists, not trees. Suggested by riastradh@ --- Move locking notes --- Update the locking notes - Add locking order - Add locking notes for misc lists such as reglist - Mention pserialize, key_sp_ref and key_sp_unref on SP operations Requested by riastradh@ --- Describe constraints of key_sp_ref and key_sp_unref Requested by riastradh@ --- Hold key_sad.lock on SAVLIST_WRITER_INSERT_TAIL --- Add __read_mostly to key_psz Suggested by riastradh@ --- Tweak wording (pserialize critical section => pserialize read section) Suggested by riastradh@ --- Add missing mutex_exit --- Fix setkey -D -P outputs The outputs were tweaked (by me), but I forgot updating libipsec in my local ATF environment... --- MP-ify SAD (key_sad.sahlist and sah entries) localcount(9) is used to protect key_sad.sahlist and sah entries as well as SPD (and will be used for SAD sav). Please read the locking notes of SAD for more details. --- Introduce key_sa_refcnt and replace sav->refcnt with it (NFC) --- Destroy sav only in the loop for DEAD sav --- Fix KASSERT(solocked(sb->sb_so)) failure in sbappendaddr that is called eventually from key_sendup_mbuf If key_sendup_mbuf isn't passed a socket, the assertion fails. Originally in this case sb->sb_so was softnet_lock and callers held softnet_lock so the assertion was magically satisfied. Now sb->sb_so is key_so_mtx and also softnet_lock isn't always held by callers so the assertion can fail. Fix it by holding key_so_mtx if key_sendup_mbuf isn't passed a socket. Reported by knakahara@ Tested by knakahara@ and ozaki-r@ --- Fix locking notes of SAD --- Fix deadlock between key_sendup_mbuf called from key_acquire and localcount_drain If we call key_sendup_mbuf from key_acquire that is called on packet processing, a deadlock can happen like this: - At key_acquire, a reference to an SP (and an SA) is held - key_sendup_mbuf will try to take key_so_mtx - Some other thread may try to localcount_drain to the SP with holding key_so_mtx in say key_api_spdflush - In this case localcount_drain never return because key_sendup_mbuf that has stuck on key_so_mtx never release a reference to the SP Fix the deadlock by deferring key_sendup_mbuf to the timer (key_timehandler). --- Fix that prev isn't cleared on retry --- Limit the number of mbufs queued for deferred key_sendup_mbuf It's easy to be queued hundreds of mbufs on the list under heavy network load. --- MP-ify SAD (savlist) localcount(9) is used to protect savlist of sah. The basic design is similar to MP-ifications of SPD and SAD sahlist. Please read the locking notes of SAD for more details. --- Simplify ipsec_reinject_ipstack (NFC) --- Add per-CPU rtcache to ipsec_reinject_ipstack It reduces route lookups and also reduces rtcache lock contentions when NET_MPSAFE is enabled. --- Use pool_cache(9) instead of pool(9) for tdb_crypto objects The change improves network throughput especially on multi-core systems. --- Update ipsec(4), opencrypto(9) and vlan(4) are now MP-safe. --- Write known issues on scalability --- Share a global dummy SP between PCBs It's never be changed so it can be pre-allocated and shared safely between PCBs. --- Fix race condition on the rawcb list shared by rtsock and keysock keysock now protects itself by its own mutex, which means that the rawcb list is protected by two different mutexes (keysock's one and softnet_lock for rtsock), of course it's useless. Fix the situation by having a discrete rawcb list for each. --- Use a dedicated mutex for rt_rawcb instead of softnet_lock if NET_MPSAFE --- fix localcount leak in sav. fixed by ozaki-r@n.o. I commit on behalf of him. --- remove unnecessary comment. --- Fix deadlock between pserialize_perform and localcount_drain A typical ussage of localcount_drain looks like this: mutex_enter(&mtx); item = remove_from_list(); pserialize_perform(psz); localcount_drain(&item->localcount, &cv, &mtx); mutex_exit(&mtx); This sequence can cause a deadlock which happens for example on the following situation: - Thread A calls localcount_drain which calls xc_broadcast after releasing a specified mutex - Thread B enters the sequence and calls pserialize_perform with holding the mutex while pserialize_perform also calls xc_broadcast - Thread C (xc_thread) that calls an xcall callback of localcount_drain tries to hold the mutex xc_broadcast of thread B doesn't start until xc_broadcast of thread A finishes, which is a feature of xcall(9). This means that pserialize_perform never complete until xc_broadcast of thread A finishes. On the other hand, thread C that is a callee of xc_broadcast of thread A sticks on the mutex. Finally the threads block each other (A blocks B, B blocks C and C blocks A). A possible fix is to serialize executions of the above sequence by another mutex, but adding another mutex makes the code complex, so fix the deadlock by another way; the fix is to release the mutex before pserialize_perform and instead use a condvar to prevent pserialize_perform from being called simultaneously. Note that the deadlock has happened only if NET_MPSAFE is enabled. --- Add missing ifdef NET_MPSAFE --- Take softnet_lock on pr_input properly if NET_MPSAFE Currently softnet_lock is taken unnecessarily in some cases, e.g., icmp_input and encap4_input from ip_input, or not taken even if needed, e.g., udp_input and tcp_input from ipsec4_common_input_cb. Fix them. NFC if NET_MPSAFE is disabled (default). --- - sanitize key debugging so that we don't print extra newlines or unassociated debugging messages. - remove unused functions and make internal ones static - print information in one line per message --- humanize printing of ip addresses --- cast reduction, NFC. --- Fix typo in comment --- Pull out ipsec_fill_saidx_bymbuf (NFC) --- Don't abuse key_checkrequest just for looking up sav It does more than expected for example key_acquire. --- Fix SP is broken on transport mode isr->saidx was modified accidentally in ipsec_nextisr. Reported by christos@ Helped investigations by christos@ and knakahara@ --- Constify isr at many places (NFC) --- Include socketvar.h for softnet_lock --- Fix buffer length for ipsec_logsastr
Constify isr at many places (NFC)
Fix SP is broken on transport mode isr->saidx was modified accidentally in ipsec_nextisr. Reported by christos@ Helped investigations by christos@ and knakahara@
Don't abuse key_checkrequest just for looking up sav It does more than expected for example key_acquire.
Pull out ipsec_fill_saidx_bymbuf (NFC)
Sync with HEAD
Add per-CPU rtcache to ipsec_reinject_ipstack It reduces route lookups and also reduces rtcache lock contentions when NET_MPSAFE is enabled.
Simplify ipsec_reinject_ipstack (NFC)
Introduce KEY_SA_UNREF and replace KEY_FREESAV with it where sav will never be actually freed in the future KEY_SA_UNREF is still key_freesav so no functional change for now. This change reduces diff of further changes.
Don't acquire global locks for IPsec if NET_MPSAFE Note that the change is just to make testing easy and IPsec isn't MP-safe yet.
Don't use sp->req->sav when handling NAT-T ESP fragmentation In order to do this we need to look up a sav however an additional look-up degrades performance. A sav is later looked up in ipsec4_process_packet so delay the fragmentation check until then to avoid an extra look-up.
Remove extra KEY_FREESAV from ipsec_process_done It should be done by the caller.
Prepare to stop using isr->sav isr is a shared resource and using isr->sav as a temporal storage for each packet processing is racy. And also having a reference from isr to sav makes the lifetime of sav non-deterministic; such a reference is removed when a packet is processed and isr->sav is overwritten by new one. Let's have a sav locally for each packet processing instead of using shared isr->sav. However this change doesn't stop using isr->sav yet because there are some users of isr->sav. isr->sav will be removed after the users find a way to not use isr->sav.
Fix splx isn't called on some error paths
Simplify; omit unnecessary saidx passing - ipsec_nextisr returns a saidx but no caller uses it - key_checkrequest is passed a saidx but it can be gotton by another argument (isr)
Omit unnecessary NULL checks for sav->sah
Simplify; we can assume sav->tdb_xform cannot be NULL while it's valid
Simplify IPSEC_OSTAT macro (NFC)
Introduce IPSECLOG and replace ipseclog and DPRINTF with it
Resolve conflicts from previous merge (all resulting from $NetBSD keywork expansion)
Make ipsec_address() and ipsec_logsastr() mpsafe.
Sync with HEAD
Omit two arguments of ipsec4_process_packet flags is unused and tunalready is always 0. So NFC.
Sync with HEAD
Sync with HEAD
Retire ipsec_osdep.h We don't need to care other OSes (FreeBSD) anymore. Some macros are alive in ipsec_private.h.
Convert IPSEC_ASSERT to KASSERT or KASSERTMSG IPSEC_ASSERT just discarded specified message...
Remove __FreeBSD__ and __NetBSD__ switches No functional changes (except for a debug printf). Note that there remain some __FreeBSD__ for sysctl knobs which counerparts to NetBSD don't exist. And ipsec_osdep.h isn't touched yet; tidying it up requires actual code changes.
Prepare netipsec for rump-ification - Include "opt_*.h" only if _KERNEL_OPT is defined - Allow encapinit to be called twice (by ifinit and ipe4_attach) - ifinit didn't call encapinit if IPSEC is enabled (ipe4_attach called it instead), however, on a rump kernel ipe4_attach may not be called even if IPSEC is enabled. So we need to allow ifinit to call it anyway - Setup sysctls in ipsec_attach explicitly instead of using SYSCTL_SETUP - Call ip6flow_invalidate_all in key_spdadd only if in6_present - It's possible that a rump kernel loads the ipsec library but not the inet6 library
Sync with HEAD
Tidy up opt_ipsec.h inclusions Some inclusions of opt_ipsec.h were for IPSEC_NAT_T and are now unnecessary. Add inclusions to some C files for IPSEC_DEBUG.
Rebase to HEAD as of a few days ago.
sync with head. for a reference, the tree before this commit was tagged as yamt-pagecache-tag8. this commit was splitted into small chunks to avoid a limitation of cvs. ("Protocol error: too many arguments")
sync with head
- apply some __diagused - remove unused variables - move some variables inside their relevant use #ifdef
resync from head
PR/47886: Dr. Wolfgang Stukenbrock: IPSEC_NAT_T enabled kernels may access outdated pointers and pass ESP data to UPD-sockets. While here, simplify the code and remove the IPSEC_NAT_T option; always compile nat-traversal in so that it does not bitrot.
sync with head
merge to -current.
add patch from Arnaud Degroote to handle IPv6 extended options with (FAST_)IPSEC, tested lightly with a DSTOPTS header consisting of PAD1
NULL does not need a cast
Catchup with rmind-uvmplock merge.
sync with head
catch a case where an ip6 address with scope embedded was compared with one without -- interestingly this didn't break the connection but just caused a useless encapsulation (this code needs to be rearranged to get it clean)
fix tunnel encapsulation in ipsec6_process_packet() -- it is not completely clean yet, but at least a v6-in-v6 tunnel works now
reindent ipsec6_process_packet() - whitespace changes only
remove a limitation that inner and outer IP version must be equal for an ESP tunnel, and add some fixes which make v4-in-v6 work (v6 as inner protocol isn't ready, even v6-in-v6 can never have worked) being here, fix a statistics counter and kill an unused variable
Sync with HEAD.
sync with head
Sync with HEAD
do proper statistics counting for outbound packets, fixes PR kern/30182 by Gilles Roy
Sync with HEAD
in rev.1.192 of ip_output.c the semantics of ip_output() was changed: Before, setting the IP_RAWOUTPUT flag did imply that the ip_id (the fragmentation thing) was used as-is. Now, a new ID is diced unless the new IP_NOIPNEWID flag is set. The ip_id is part of the data which are used to calculate the hash for AH, so set the IP_NOIPNEWID flag to make sure the IP header is not modified behind AH's back. Otherwise, the recipient will detect a checksum mismatch and discard the packet.
-in opencrypto callbacks (which run in a kernel thread), pull softnet_lock everywhere splsoftnet() was used before, to fix MP concurrency problems -pull KERNEL_LOCK where ip(6)_output() is called, as this is what the network stack (unfortunately) expects, in particular to avoid races for packets in the interface send queues From Wolfgang Stukenbrock per PR kern/44418, with the application of KERNEL_LOCK to what I think are the essential points, tested on a dual-core i386.
sync with head
Cosmetic: fix indentation, change some spaces to tabs.
Sync with HEAD.
sync with head.
sync with head.
Fix a stupid typo. In ipsec6_process_packet, reinject the packet in AF_INET6, nor in AF_INET.
Make IPSEC and FAST_IPSEC stats per-cpu. Use <net/net_stats.h> and netstat_sysctl().
Sync with HEAD.
sync with head
sync with HEAD
Catch up to netbsd-4.0 release.
Sync with HEAD
Fix the ipsec processing in case of USE rules with no SA installed. In case where there is no more isr to process, just tag the packet and reinject in the ip{,6} stack. Fix pr/34843
Simplify the FAST_IPSEC output path Only record an IPSEC_OUT_DONE tag when we have finished the processing In ip{,6}_output, check this tag to know if we have already processed this packet. Remove some dead code (IPSEC_PENDING_TDB is not used in NetBSD) Fix pr/36870
Sync with head.
sync with head.
Kill _IP_VHL ifdef (from netinet/ip.h history, it has never been used in NetBSD so ...)
Pull up following revision(s) (requested by adrianp in ticket #11395): sys/netipsec/xform_ah.c: revision 1.19 via patch sys/netipsec/ipsec.c: revision 1.34 via patch sys/netipsec/xform_ipip.c: revision 1.18 via patch sys/netipsec/ipsec_output.c: revision 1.23 via patch sys/netipsec/ipsec_osdep.h: revision 1.21 via patch The function ipsec4_get_ulp assumes that ip_off is in host order. This results in IPsec processing that is dependent on protocol and/or port can be bypassed. Bug report, analysis and initial fix from Karl Knutsson. Final patch and ok from degroote@
Pull up following revision(s) (requested by adrianp in ticket #11395): sys/netipsec/xform_ah.c: revision 1.19 via patch sys/netipsec/ipsec.c: revision 1.34 via patch sys/netipsec/xform_ipip.c: revision 1.18 via patch sys/netipsec/ipsec_output.c: revision 1.23 via patch sys/netipsec/ipsec_osdep.h: revision 1.21 via patch The function ipsec4_get_ulp assumes that ip_off is in host order. This results in IPsec processing that is dependent on protocol and/or port can be bypassed. Bug report, analysis and initial fix from Karl Knutsson. Final patch and ok from degroote@
Pull up following revision(s) (requested by adrianp in ticket #11395): sys/netipsec/xform_ah.c: revision 1.19 via patch sys/netipsec/ipsec.c: revision 1.34 via patch sys/netipsec/xform_ipip.c: revision 1.18 via patch sys/netipsec/ipsec_output.c: revision 1.23 via patch sys/netipsec/ipsec_osdep.h: revision 1.21 via patch The function ipsec4_get_ulp assumes that ip_off is in host order. This results in IPsec processing that is dependent on protocol and/or port can be bypassed. Bug report, analysis and initial fix from Karl Knutsson. Final patch and ok from degroote@
Pull up following revision(s) (requested by adrianp in ticket #1878): sys/netipsec/xform_ah.c: revision 1.19 via patch sys/netipsec/ipsec.c: revision 1.34 via patch sys/netipsec/xform_ipip.c: revision 1.18 via patch sys/netipsec/ipsec_output.c: revision 1.23 via patch sys/netipsec/ipsec_osdep.h: revision 1.21 via patch The function ipsec4_get_ulp assumes that ip_off is in host order. This results in IPsec processing that is dependent on protocol and/or port can be bypassed. Bug report, analysis and initial fix from Karl Knutsson. Final patch and ok from degroote@
Pull up following revision(s) (requested by adrianp in ticket #1878): sys/netipsec/xform_ah.c: revision 1.19 via patch sys/netipsec/ipsec.c: revision 1.34 via patch sys/netipsec/xform_ipip.c: revision 1.18 via patch sys/netipsec/ipsec_output.c: revision 1.23 via patch sys/netipsec/ipsec_osdep.h: revision 1.21 via patch The function ipsec4_get_ulp assumes that ip_off is in host order. This results in IPsec processing that is dependent on protocol and/or port can be bypassed. Bug report, analysis and initial fix from Karl Knutsson. Final patch and ok from degroote@
Pull up following revision(s) (requested by adrianp in ticket #1878): sys/netipsec/xform_ah.c: revision 1.19 via patch sys/netipsec/ipsec.c: revision 1.34 via patch sys/netipsec/xform_ipip.c: revision 1.18 via patch sys/netipsec/ipsec_output.c: revision 1.23 via patch sys/netipsec/ipsec_osdep.h: revision 1.21 via patch The function ipsec4_get_ulp assumes that ip_off is in host order. This results in IPsec processing that is dependent on protocol and/or port can be bypassed. Bug report, analysis and initial fix from Karl Knutsson. Final patch and ok from degroote@
sync with head.
Sync with HEAD
sync with HEAD
Pull up following revision(s) (requested by adrianp in ticket #964): sys/netipsec/xform_ah.c: revision 1.19 sys/netipsec/ipsec.c: revision 1.34 sys/netipsec/xform_ipip.c: revision 1.18 sys/netipsec/ipsec_output.c: revision 1.23 sys/netipsec/ipsec_osdep.h: revision 1.21 The function ipsec4_get_ulp assumes that ip_off is in host order. This results in IPsec processing that is dependent on protocol and/or port can be bypassed. Bug report, analysis and initial fix from Karl Knutsson. Final patch and ok from degroote@
Sync with HEAD.
The function ipsec4_get_ulp assumes that ip_off is in host order. This results in IPsec processing that is dependent on protocol and/or port can be bypassed. Bug report, analysis and initial fix from Karl Knutsson. Final patch and ok from degroote@
sync with head.
Sync with head.
Sync with head.
Add support for options IPSEC_NAT_T (RFC 3947 and 3948) for fast_ipsec(4). No objection on tech-net@
Update to today's netbsd-4.
Pull up following revision(s) (requested by degroote in ticket #667): sys/netinet/tcp_input.c: revision 1.260 sys/netinet/tcp_output.c: revision 1.154 sys/netinet/tcp_subr.c: revision 1.210 sys/netinet6/icmp6.c: revision 1.129 sys/netinet6/in6_proto.c: revision 1.70 sys/netinet6/ip6_forward.c: revision 1.54 sys/netinet6/ip6_input.c: revision 1.94 sys/netinet6/ip6_output.c: revision 1.114 sys/netinet6/raw_ip6.c: revision 1.81 sys/netipsec/ipcomp_var.h: revision 1.4 sys/netipsec/ipsec.c: revision 1.26 via patch,1.31-1.32 sys/netipsec/ipsec6.h: revision 1.5 sys/netipsec/ipsec_input.c: revision 1.14 sys/netipsec/ipsec_netbsd.c: revision 1.18,1.26 sys/netipsec/ipsec_output.c: revision 1.21 via patch sys/netipsec/key.c: revision 1.33,1.44 sys/netipsec/xform_ipcomp.c: revision 1.9 sys/netipsec/xform_ipip.c: revision 1.15 sys/opencrypto/deflate.c: revision 1.8 Commit my SoC work Add ipv6 support for fast_ipsec Note that currently, packet with extensions headers are not correctly supported Change the ipcomp logic Add sysctl tree to modify the fast_ipsec options related to ipv6. Similar to the sysctl kame interface. Choose the good default policy, depending of the adress family of the desired policy Increase the refcount for the default ipv6 policy so nobody can reclaim it Always compute the sp index even if we don't have any sp in spd. It will let us to choose the right default policy (based on the adress family requested). While here, fix an error message Use dynamic array instead of an static array to decompress. It lets us to decompress any data, whatever is the radio decompressed data / compressed data. It fixes the last issues with fast_ipsec and ipcomp. While here, bzero -> memset, bcopy -> memcpy, FREE -> free Reviewed a long time ago by sam@
sync with head.
Commit my SoC work Add ipv6 support for fast_ipsec Note that currently, packet with extensions headers are not correctly supported Change the ipcomp logic
Sync with head.
KNF: bzero -> memset.
Sync with head.
sync with head.
sync with head.
Introduce new helper functions to abstract the route caching. rtcache_init and rtcache_init_noclone lookup ro_dst and store the result in ro_rt, taking care of the reference counting and calling the domain specific route cache. rtcache_free checks if a route was cashed and frees the reference. rtcache_copy copies ro_dst of the given struct route, checking that enough space is available and incrementing the reference count of the cached rtentry if necessary. rtcache_check validates that the cached route is still up. If it isn't, it tries to look it up again. Afterwards ro_rt is either a valid again or NULL. rtcache_copy is used internally. Adjust to callers of rtalloc/rtflush in the tree to check the sanity of ro_dst first (if necessary). If it doesn't fit the expectations, free the cache, otherwise check if the cached route is still valid. After that combination, a single check for ro_rt == NULL is enough to decide whether a new lookup needs to be done with a different ro_dst. Make the route checking in gre stricter by repeating the loop check after revalidation. Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly changed here to first validate the route and check RTF_GATEWAY afterwards. This is sementically equivalent though. etherip doesn't need sc_route_expire similiar to the gif changes from dyoung@ earlier. Based on the earlier patch from dyoung@, reviewed and discussed with him.
sync with head.
Here are various changes designed to protect against bad IPv4 routing caused by stale route caches (struct route). Route caches are sprinkled throughout PCBs, the IP fast-forwarding table, and IP tunnel interfaces (gre, gif, stf). Stale IPv6 and ISO route caches will be treated by separate patches. Thank you to Christoph Badura for suggesting the general approach to invalidating route caches that I take here. Here are the details: Add hooks to struct domain for tracking and for invalidating each domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall. Introduce helper subroutines, rtflush(ro) for invalidating a route cache, rtflushall(family) for invalidating all route caches in a routing domain, and rtcache(ro) for notifying the domain of a new cached route. Chain together all IPv4 route caches where ro_rt != NULL. Provide in_rtcache() for adding a route to the chain. Provide in_rtflush() and in_rtflushall() for invalidating IPv4 route caches. In in_rtflush(), set ro_rt to NULL, and remove the route from the chain. In in_rtflushall(), walk the chain and remove every route cache. In rtrequest1(), call rtflushall() to invalidate route caches when a route is added. In gif(4), discard the workaround for stale caches that involves expiring them every so often. Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a call to rtflush(ro). Update ipflow_fastforward() and all other users of route caches so that they expect a cached route, ro->ro_rt, to turn to NULL. Take care when moving a 'struct route' to rtflush() the source and to rtcache() the destination. In domain initializers, use .dom_xxx tags. KNF here and there.
fix spelling of accommodate; from Zapher.
Sync with head.
__unused removal on arguments; approved by core.
sync with head
more __unused
merge ktrace-lwp.
Fix the sync with head I botched.
Sync with HEAD.
Sync with HEAD
Pull up revision 1.13 (requested by jonathan in ticket #280): Redo net.inet.* sysctl subtree for fast-ipsec from scratch. Attach FAST-IPSEC statistics with 64-bit counters to new sysctl MIB. Rework netstat to show FAST_IPSEC statistics, via sysctl, for netstat -p ipsec. New kernel files: sys/netipsec/Makefile (new file; install *_var.h includes) sys/netipsec/ipsec_var.h (new 64-bit mib counter struct) Changed kernel files: sys/Makefile (recurse into sys/netipsec/) sys/netinet/in.h (fake IP_PROTO name for fast_ipsec sysctl subtree.) sys/netipsec/ipsec.h (minimal userspace inclusion) sys/netipsec/ipsec_osdep.h (minimal userspace inclusion) sys/netipsec/ipsec_netbsd.c (redo sysctl subtree from scratch) sys/netipsec/key*.c (fix broken net.key subtree) sys/netipsec/ah_var.h (increase all counters to 64 bits) sys/netipsec/esp_var.h (increase all counters to 64 bits) sys/netipsec/ipip_var.h (increase all counters to 64 bits) sys/netipsec/ipcomp_var.h (increase all counters to 64 bits) sys/netipsec/ipsec.c (add #include netipsec/ipsec_var.h) sys/netipsec/ipsec_mbuf.c (add #include netipsec/ipsec_var.h) sys/netipsec/ipsec_output.c (add #include netipsec/ipsec_var.h) sys/netinet/raw_ip.c (add #include netipsec/ipsec_var.h) sys/netinet/tcp_input.c (add #include netipsec/ipsec_var.h) sys/netinet/udp_usrreq.c (add #include netipsec/ipsec_var.h) Changes to usr.bin/netstat to print the new fast-ipsec sysctl tree for "netstat -s -p ipsec": New file: usr.bin/netstat/fast_ipsec.c (print fast-ipsec counters) Changed files: usr.bin/netstat/Makefile (add fast_ipsec.c) usr.bin/netstat/netstat.h (declarations for fast_ipsec.c) usr.bin/netstat/main.c (call KAME-vs-fast-ipsec dispatcher)
file ipsec_output.c was added on branch ktrace-lwp on 2004-08-03 10:55:29 +0000
Redo net.inet.* sysctl subtree for fast-ipsec from scratch. Attach FAST-IPSEC statistics with 64-bit counters to new sysctl MIB. Rework netstat to show FAST_IPSEC statistics, via sysctl, for netstat -p ipsec. New kernel files: sys/netipsec/Makefile (new file; install *_var.h includes) sys/netipsec/ipsec_var.h (new 64-bit mib counter struct) Changed kernel files: sys/Makefile (recurse into sys/netipsec/) sys/netinet/in.h (fake IP_PROTO name for fast_ipsec sysctl subtree.) sys/netipsec/ipsec.h (minimal userspace inclusion) sys/netipsec/ipsec_osdep.h (minimal userspace inclusion) sys/netipsec/ipsec_netbsd.c (redo sysctl subtree from scratch) sys/netipsec/key*.c (fix broken net.key subtree) sys/netipsec/ah_var.h (increase all counters to 64 bits) sys/netipsec/esp_var.h (increase all counters to 64 bits) sys/netipsec/ipip_var.h (increase all counters to 64 bits) sys/netipsec/ipcomp_var.h (increase all counters to 64 bits) sys/netipsec/ipsec.c (add #include netipsec/ipsec_var.h) sys/netipsec/ipsec_mbuf.c (add #include netipsec/ipsec_var.h) sys/netipsec/ipsec_output.c (add #include netipsec/ipsec_var.h) sys/netinet/raw_ip.c (add #include netipsec/ipsec_var.h) sys/netinet/tcp_input.c (add #include netipsec/ipsec_var.h) sys/netinet/udp_usrreq.c (add #include netipsec/ipsec_var.h) Changes to usr.bin/netstat to print the new fast-ipsec sysctl tree for "netstat -s -p ipsec": New file: usr.bin/netstat/fast_ipsec.c (print fast-ipsec counters) Changed files: usr.bin/netstat/Makefile (add fast_ipsec.c) usr.bin/netstat/netstat.h (declarations for fast_ipsec.c) usr.bin/netstat/main.c (call KAME-vs-fast-ipsec dispatcher)
sys/netinet6/ip6_ecn.h is reportedly a FreeBSD-ism; NetBSD has prototypes for the IPv6 ECN ingress/egress functions in sys/netinet/ip_ecn.h, inside an #ifdef INET6 wrapper. So, wrap sys/netipsec ocurrences of #include <netinet6/ip6_ecn.h> in #ifdef __FreeBSD__/#endif, until both camps can agree on this teensy little piece of namespace. Affects: ipsec_output.c xform_ah.c xform_esp.c xform_ipip.c
Delint ntohl() as argument to a "%lx" format in a log message.
#include <net/net_osdep.h>: if INET6 is configured, ipsec_encapsulate() calls ovbcopy(), which is otherwise deprecated.
Add missing copyright notice (FreeBSD rev. 1.3.2.2).
Fix ipip_output() to always set *mp to NULL on failure, even if 'm' is NULL, otherwise ipsec4_process_packet() may try to m_freem() a bad pointer. In ipsec4_process_packet(), don't try to m_freem() 'm' twice; ipip_output() already did it.
Reversion of "netkey merge", part 2 (replacement of removed files in the repository by christos was part 1). netipsec should now be back as it was on 2003-09-11, with some very minor changes: 1) Some residual platform-dependent code was moved from ipsec.h to ipsec_osdep.h; without this, IPSEC_ASSERT() was multiply defined. ipsec.h now includes ipsec_osdep.h 2) itojun's renaming of netipsec/files.ipsec to netipsec/files.netipsec has been left in place (it's arguable which name is less confusing but the rename is pretty harmless). 3) Some #endif TOKEN has been replaced by #endif /* TOKEN */; #endif TOKEN is invalid and GCC 3 won't compile it. An i386 kernel with "options FAST_IPSEC" and "options OPENCRYPTO" now gets through "make depend" but fails to build with errors in ip_input.c. But it's better than it was (thank heaven for small favors).
merge netipsec/key* into netkey/key*. no need for both. change confusing filename
change the additional arg to be passed to ip{,6}_output to struct socket *. this fixes KAME policy lookup which was broken by the previous commit.
opt_inet6.h is FreeBSD-specific, so wrap it with #ifdef __FreeBSD__/#endif.
Fix bug with IP_DF handling which was breaking TCP: on FreeBSD, ip_off is assumed to be in host byteorder during the input(?) path. NetBSD keeps ip_off and ip_len in network order. Add (or remove) byteswaps accordingly. TCP over fast_ipsec now works with PMTU, as well as without.
(fast-ipsec): Add hooks to pass IPv4 IPsec traffic into fast-ipsec, if configured with ``options FAST_IPSEC''. Kernels with KAME IPsec or with no IPsec should work as before. All calls to ip_output() now always pass an additional compulsory argument: the inpcb associated with the packet being sent, or 0 if no inpcb is available. Fast-ipsec tested with ICMP or UDP over ESP. TCP doesn't work, yet.
Initial import of Sam Leffler's `Fast-IPsec' from FreeBSD 4. Fast-IPsec is a rework of the OpenBSD and KAME IPsec code, using the OpenCryptoFramework (and thus hardware crypto accelerators) and numerous detailed performance improvements. This import is (aside from SPL-level names) the FreeBSD source, imported ``as-is'' as a historical snapshot, for future maintenance and comparison against the FreeBSD source. For now, several minor kernel-API differences are hidden by macros a shim file, ipsec_osdep.h, which (aside from SPL names) can be targeted at either NetBSD or FreeBSD.