Up to [cvs.NetBSD.org] / src / sys / netinet6
Request diff between arbitrary revisions
Keyword substitution: kv
Default branch: MAIN
netinet6: Use _NET_STAT* API instead of direct array access. XXX Exception: ip6flow_addstats_rt _assigns_ one of the `statistics' to the current count of ip6 flows in use, and we don't have anything in the _NET_STAT* API for that. So for now I abuse the abstraction, until we sort out this one exceptional case properly. PR kern/58380
inpcb: integrate data structures of PCB into one Data structures of network protocol control blocks (PCBs), i.e., struct inpcb, in6pcb and inpcb_hdr, are not organized well. Users of the data structures have to handle them separately and thus the code is cluttered and duplicated. The commit integrates the data structures into one, struct inpcb. As a result, users of PCBs only have to handle just one data structure, so the code becomes simple. One drawback is that the data size of PCB for IPv4 increases by 40 bytes (from 248 bytes to 288 bytes).
Fix PR kern/57037 Be able to change the behavior sending parameter changing routing messages. When set net.inet6.ip6.param_rt_msg=0, don't send parameter changing routing messages. When set net.inet6.ip6.param_rt_msg=1(default), send parameter changing routing messages by RTM_NEWADDR.
pktqueue: Re-factor sysctl handling. Provide a new pktq_sysctl_setup() function that attaches standard pktq sysctl nodes below a specified parent node, with either a fixed node ID or CTL_CREATE to dynamically assign node IDs. Make all of the sysctl handlers private to pktqueue.c, and remove the INET- and INET6-specific pktqueue sysctl code from net/if.c.
Sync with HEAD.
- Make ALIGNED_POINTER use __alignof(t) instead of sizeof(t). This is more correct because it works with non-primitive types and provides the ABI alignment for the type the compiler will use. - Remove all the *_HDR_ALIGNMENT macros and asserts - Replace POINTER_ALIGNED_P with ACCESSIBLE_POINTER which is identical to ALIGNED_POINTER, but returns that the pointer is always aligned if the CPU supports unaligned accesses. [ as proposed in tech-kern ]
- centralize header align and pullup into a single inline function - use a single macro to align pointers and expose the alignment, instead of hard-coding 3 in 1/2 the macros. - fix an issue in the ipv6 lt2p where it was aligning for ipv4 and pulling for ipv6.
inet6: reduce silent packet discards
inet6: pass rcvif to ip6_forward to avoid extra psref_acquire
ipsec: rename ipsec_ip_input to ipsec_ip_input_checkpolicy Because it just checks if a packet passes security policies.
inet, inet6: count packets dropped by IPsec The counters count packets dropped due to security policy checks.
ip6: Remove __packed attribute from ip6 structures They should naturally align. Add compile time assertations to ip6_input.c to prove this.
localify
Remove in-kernel handling of Router Advertisements This is much better handled by a user-land tool. Proposed on tech-net here: https://mail-index.netbsd.org/tech-net/2020/04/22/msg007766.html Note that the ioctl SIOCGIFINFO_IN6 no longer sets flags. That now needs to be done using the pre-existing SIOCSIFINFO_FLAGS ioctl. Compat is fully provided where it makes sense, but trying to turn on RA handling will obviously throw an error as it no longer exists. Note that if you use IPv6 temporary addresses, this now needs to be turned on in dhcpcd.conf(5) rather than in sysctl.conf(5).
Mostly merge changes from HEAD upto 20200411
Pull up following revision(s) (requested by maxv in ticket #432): sys/netinet6/ip6_input.c: revision 1.215 Add more checks in ip6_pullexthdr, to prevent a panic in m_copydata. The Rip6 entry point could see a garbage Hop6 option. Not a big issue, since it's a clean panic only triggerable if the socket has the IN6P_DSTOPTS/IN6P_RTHDR option.
Add more checks in ip6_pullexthdr, to prevent a panic in m_copydata. The Rip6 entry point could see a garbage Hop6 option. Not a big issue, since it's a clean panic only triggerable if the socket has the IN6P_DSTOPTS/IN6P_RTHDR option. Reported-by: syzbot+3b07b3511b4ceb8bf1e2@syzkaller.appspotmail.com
Pull up following revision(s) (requested by ozaki-r in ticket #368): sys/netinet6/in6_ifattach.h: revision 1.14 sys/netinet6/ip6_input.c: revision 1.212 sys/netinet6/ip6_input.c: revision 1.213 sys/netinet6/ip6_input.c: revision 1.214 sys/netinet6/in6_var.h: revision 1.101 sys/netinet6/in6_var.h: revision 1.102 sys/netinet6/in6_ifattach.c: revision 1.116 sys/netinet6/in6_ifattach.c: revision 1.117 tests/net/ndp/t_ra.sh: revision 1.33 Reorganize in6_tmpaddrtimer stuffs - Move the related functions to where in6_tmpaddrtimer_ch exists - Hide global variable in6_tmpaddrtimer_ch - Rename ip6_init2 to in6_tmpaddrtimer_init - Reduce callers of callout_reset - Use callout_schedule Validate ip6_temp_preferred_lifetime (net.inet6.ip6.temppltime) on a change ip6_temp_preferred_lifetime is used to calculate an interval period to regenerate temporary addresse by TEMP_PREFERRED_LIFETIME - REGEN_ADVANCE - DESYNC_FACTOR as per RFC 3041 3.5. So it must be greater than (REGEN_ADVANCE + DESYNC_FACTOR), otherwise it will be negative and go wrong, for example KASSERT(to_ticks >= 0) in callout_schedule_locked fails. tests: add tests for the validateion of net.inet6.ip6.temppltime in6: reset the temporary address timer on a change of the interval period
in6: reset the temporary address timer on a change of the interval period
Validate ip6_temp_preferred_lifetime (net.inet6.ip6.temppltime) on a change ip6_temp_preferred_lifetime is used to calculate an interval period to regenerate temporary addresse by TEMP_PREFERRED_LIFETIME - REGEN_ADVANCE - DESYNC_FACTOR as per RFC 3041 3.5. So it must be greater than (REGEN_ADVANCE + DESYNC_FACTOR), otherwise it will be negative and go wrong, for example KASSERT(to_ticks >= 0) in callout_schedule_locked fails.
Reorganize in6_tmpaddrtimer stuffs - Move the related functions to where in6_tmpaddrtimer_ch exists - Hide global variable in6_tmpaddrtimer_ch - Rename ip6_init2 to in6_tmpaddrtimer_init - Reduce callers of callout_reset - Use callout_schedule
Pull up following revision(s) (requested by knakahara in ticket #1385): sys/net/if.c 1.461 sys/net/if.h 1.277 sys/net/if_gif.c 1.149 sys/net/if_gif.h 1.33 sys/net/if_ipsec.c 1.19,1.20,1.24 sys/net/if_ipsec.h 1.5 sys/net/if_l2tp.c 1.33,1.36-1.39 sys/net/if_l2tp.h 1.7,1.8 sys/net/route.c 1.220,1.221 sys/net/route.h 1.125 sys/netinet/in_gif.c 1.95 sys/netinet/in_l2tp.c 1.17 sys/netinet/ip_input.c 1.391,1.392 sys/netinet/wqinput.c 1.6 sys/netinet6/in6_gif.c 1.94 sys/netinet6/in6_l2tp.c 1.18 sys/netinet6/ip6_forward.c 1.97 sys/netinet6/ip6_input.c 1.210,1.211 sys/netipsec/ipsec_output.c 1.82,1.83 (patched) sys/netipsec/ipsecif.c 1.12,1.13,1.15,1.17 (patched) sys/netipsec/key.c 1.259,1.260 ipsecif(4) support input drop packet counter. ipsecif(4) should not increment drop counter by errors not related to if_snd. Pointed out by ozaki-r@n.o, thanks. Remove unnecessary addresses in PF_KEY message. MOBIKE Extensions for PF_KEY draft-schilcher-mobike-pfkey-extension-01.txt says ==================== 5. SPD Update // snip SADB_X_SPDADD: // snip sadb_x_ipsecrequest_reqid: An ID for that SA can be passed to the kernel in the sadb_x_ipsecrequest_reqid field. If tunnel mode is specified, the sadb_x_ipsecrequest structure is followed by two sockaddr structures that define the tunnel endpoint addresses. In the case that transport mode is used, no additional addresses are specified. ==================== see: <a rel="nofollow" href="https://tools.ietf.org/html/draft-schilcher-mobike-pfkey-extension-01">https://tools.ietf.org/html/draft-schilcher-mobike-pfkey-extension-01</a> ipsecif(4) uses transport mode, so it should not add addresses. ipsecif(4) supports multiple peers in the same NAPT. E.g. ipsec0 connects between NetBSD_A and NetBSD_B, ipsec1 connects NetBSD_A and NetBSD_C at the following figure. +----------+ +----| NetBSD_B | +----------+ +------+ | +----------+ | NetBSD_A |--- ... ---| NAPT |---+ +----------+ +------+ | +----------+ +----| NetBSD_C | +----------+ Add ATF later. l2tp(4): fix output bytes counter. Pointed by k-goda@IIJ, thanks. remove a variable which is no longer used. l2tp: initialize mowner variables for MBUFTRACE Avoid having a rtcache directly in a percpu storage percpu(9) has a certain memory storage for each CPU and provides it by the piece to users. If the storages went short, percpu(9) enlarges them by allocating new larger memory areas, replacing old ones with them and destroying the old ones. A percpu storage referenced by a pointer gotten via percpu_getref can be destroyed by the mechanism after a running thread sleeps even if percpu_putref has not been called. Using rtcache, i.e., packet processing, typically involves sleepable operations such as rwlock so we must avoid dereferencing a rtcache that is directly stored in a percpu storage during packet processing. Address this situation by having just a pointer to a rtcache in a percpu storage instead. Reviewed by knakahara@ and yamaguchi@ wqinput: avoid having struct wqinput_worklist directly in a percpu storage percpu(9) has a certain memory storage for each CPU and provides it by the piece to users. If the storages went short, percpu(9) enlarges them by allocating new larger memory areas, replacing old ones with them and destroying the old ones. A percpu storage referenced by a pointer gotten via percpu_getref can be destroyed by the mechanism after a running thread sleeps even if percpu_putref has not been called. Input handlers of wqinput normally involves sleepable operations so we must avoid dereferencing a percpu data (struct wqinput_worklist) after executing an input handler. Address this situation by having just a pointer to the data in a percpu storage instead. Reviewed by knakahara@ and yamaguchi@ Add missing #include <sys/kmem.h> Divide Tx context of l2tp(4) to improve performance. It seems l2tp(4) call path is too long for instruction cache. So, dividing l2tp(4) Tx context improves CPU use efficiency. After this commit, l2tp(4) throughput gains 10% on my machine(Atom C3000). Apply some missing changes lost on the previous commit Avoid having a rtcache directly in a percpu storage for tunnel protocols. percpu(9) has a certain memory storage for each CPU and provides it by the piece to users. If the storages went short, percpu(9) enlarges them by allocating new larger memory areas, replacing old ones with them and destroying the old ones. A percpu storage referenced by a pointer gotten via percpu_getref can be destroyed by the mechanism after a running thread sleeps even if percpu_putref has not been called. Using rtcache, i.e., packet processing, typically involves sleepable operations such as rwlock so we must avoid dereferencing a rtcache that is directly stored in a percpu storage during packet processing. Address this situation by having just a pointer to a rtcache in a percpu storage instead. Reviewed by ozaki-r@ and yamaguchi@ l2tp(4): avoid having struct ifqueue directly in a percpu storage. percpu(9) has a certain memory storage for each CPU and provides it by the piece to users. If the storages went short, percpu(9) enlarges them by allocating new larger memory areas, replacing old ones with them and destroying the old ones. A percpu storage referenced by a pointer gotten via percpu_getref can be destroyed by the mechanism after a running thread sleeps even if percpu_putref has not been called. Tx processing of l2tp(4) uses normally involves sleepable operations so we must avoid dereferencing a percpu data (struct ifqueue) after executing Tx processing. Address this situation by having just a pointer to the data in a percpu storage instead. Reviewed by ozaki-r@ and yamaguchi@
Pull up following revision(s) (requested by ozaki-r in ticket #238): sys/netipsec/ipsec_output.c: revision 1.83 sys/net/route.h: revision 1.125 sys/netinet6/ip6_input.c: revision 1.210 sys/netinet6/ip6_input.c: revision 1.211 sys/net/if.c: revision 1.461 sys/net/if_gif.h: revision 1.33 sys/net/route.c: revision 1.220 sys/net/route.c: revision 1.221 sys/net/if.h: revision 1.277 sys/netinet6/ip6_forward.c: revision 1.97 sys/netinet/wqinput.c: revision 1.6 sys/net/if_ipsec.h: revision 1.5 sys/netinet6/in6_l2tp.c: revision 1.18 sys/netinet6/in6_gif.c: revision 1.94 sys/net/if_l2tp.h: revision 1.7 sys/net/if_gif.c: revision 1.149 sys/net/if_l2tp.h: revision 1.8 sys/netinet/in_gif.c: revision 1.95 sys/netinet/in_l2tp.c: revision 1.17 sys/netipsec/ipsecif.c: revision 1.17 sys/net/if_ipsec.c: revision 1.24 sys/net/if_l2tp.c: revision 1.37 sys/netinet/ip_input.c: revision 1.391 sys/net/if_l2tp.c: revision 1.38 sys/netinet/ip_input.c: revision 1.392 sys/net/if_l2tp.c: revision 1.39 Avoid having a rtcache directly in a percpu storage percpu(9) has a certain memory storage for each CPU and provides it by the piece to users. If the storages went short, percpu(9) enlarges them by allocating new larger memory areas, replacing old ones with them and destroying the old ones. A percpu storage referenced by a pointer gotten via percpu_getref can be destroyed by the mechanism after a running thread sleeps even if percpu_putref has not been called. Using rtcache, i.e., packet processing, typically involves sleepable operations such as rwlock so we must avoid dereferencing a rtcache that is directly stored in a percpu storage during packet processing. Address this situation by having just a pointer to a rtcache in a percpu storage instead. Reviewed by knakahara@ and yamaguchi@ - wqinput: avoid having struct wqinput_worklist directly in a percpu storage percpu(9) has a certain memory storage for each CPU and provides it by the piece to users. If the storages went short, percpu(9) enlarges them by allocating new larger memory areas, replacing old ones with them and destroying the old ones. A percpu storage referenced by a pointer gotten via percpu_getref can be destroyed by the mechanism after a running thread sleeps even if percpu_putref has not been called. Input handlers of wqinput normally involves sleepable operations so we must avoid dereferencing a percpu data (struct wqinput_worklist) after executing an input handler. Address this situation by having just a pointer to the data in a percpu storage instead. Reviewed by knakahara@ and yamaguchi@ - Add missing #include <sys/kmem.h> - Divide Tx context of l2tp(4) to improve performance. It seems l2tp(4) call path is too long for instruction cache. So, dividing l2tp(4) Tx context improves CPU use efficiency. After this commit, l2tp(4) throughput gains 10% on my machine(Atom C3000). - Apply some missing changes lost on the previous commit - Avoid having a rtcache directly in a percpu storage for tunnel protocols. percpu(9) has a certain memory storage for each CPU and provides it by the piece to users. If the storages went short, percpu(9) enlarges them by allocating new larger memory areas, replacing old ones with them and destroying the old ones. A percpu storage referenced by a pointer gotten via percpu_getref can be destroyed by the mechanism after a running thread sleeps even if percpu_putref has not been called. Using rtcache, i.e., packet processing, typically involves sleepable operations such as rwlock so we must avoid dereferencing a rtcache that is directly stored in a percpu storage during packet processing. Address this situation by having just a pointer to a rtcache in a percpu storage instead. Reviewed by ozaki-r@ and yamaguchi@ - l2tp(4): avoid having struct ifqueue directly in a percpu storage. percpu(9) has a certain memory storage for each CPU and provides it by the piece to users. If the storages went short, percpu(9) enlarges them by allocating new larger memory areas, replacing old ones with them and destroying the old ones. A percpu storage referenced by a pointer gotten via percpu_getref can be destroyed by the mechanism after a running thread sleeps even if percpu_putref has not been called. Tx processing of l2tp(4) uses normally involves sleepable operations so we must avoid dereferencing a percpu data (struct ifqueue) after executing Tx processing. Address this situation by having just a pointer to the data in a percpu storage instead. Reviewed by ozaki-r@ and yamaguchi@
Apply some missing changes lost on the previous commit
Avoid having a rtcache directly in a percpu storage percpu(9) has a certain memory storage for each CPU and provides it by the piece to users. If the storages went short, percpu(9) enlarges them by allocating new larger memory areas, replacing old ones with them and destroying the old ones. A percpu storage referenced by a pointer gotten via percpu_getref can be destroyed by the mechanism after a running thread sleeps even if percpu_putref has not been called. Using rtcache, i.e., packet processing, typically involves sleepable operations such as rwlock so we must avoid dereferencing a rtcache that is directly stored in a percpu storage during packet processing. Address this situation by having just a pointer to a rtcache in a percpu storage instead. Reviewed by knakahara@ and yamaguchi@
Pull up following revision(s) (requested by bouyer in ticket #208): sys/netinet6/ip6_input.c: revision 1.209 sys/netinet/ip_input.c: revision 1.390 Packet filters can return an mbuf chain with fragmented headers, so m_pullup() it if needed and remove the KASSERT()s.
Pull up following revision(s) (requested by bouyer in ticket #1378): sys/netinet6/ip6_input.c: revision 1.209 (patch) sys/netinet/ip_input.c: revision 1.390 (patch) Packet filters can return an mbuf chain with fragmented headers, so m_pullup() it if needed and remove the KASSERT()s.
Pull up following revision(s) (requested by bouyer in ticket #1708): sys/netinet6/ip6_input.c: revision 1.209 via patch sys/netinet/ip_input.c: revision 1.390 via patch Packet filters can return an mbuf chain with fragmented headers, so m_pullup() it if needed and remove the KASSERT()s.
Pull up following revision(s) (requested by bouyer in ticket #1708): sys/netinet6/ip6_input.c: revision 1.209 via patch sys/netinet/ip_input.c: revision 1.390 via patch Packet filters can return an mbuf chain with fragmented headers, so m_pullup() it if needed and remove the KASSERT()s.
Pull up following revision(s) (requested by bouyer in ticket #1708): sys/netinet6/ip6_input.c: revision 1.209 via patch sys/netinet/ip_input.c: revision 1.390 via patch Packet filters can return an mbuf chain with fragmented headers, so m_pullup() it if needed and remove the KASSERT()s.
Packet filters can return an mbuf chain with fragmented headers, so m_pullup() it if needed and remove the KASSERT()s.
Sync with HEAD
Count packets dropped by pfil
Synch with HEAD
Fix ipsecif(4) cannot apply input direction packet filter. Reviewed by ozaki-r@n.o and ryo@n.o. Add ATF later.
Fix bug, should be ip6_protox[].
Sync with HEAD, resolve a couple of conflicts
Remove the 't' argument from m_tag_find().
Sync with HEAD
Remove misleading comment.
Add KASSERTs, related to PR/39794.
Merge ipsec4_input and ipsec6_input into ipsec_ip_input. Make the argument a bool for clarity. Optimize the function: if M_CANFASTFWD is not there (because already removed by the firewall) leave now. Makes it easier to see that M_CANFASTFWD is not removed on IPv6.
Synch with HEAD
Remove now unused net_osdep.h includes, the other BSDs did the same.
Remove unused mbuf argument from sbsavetimestamp.
Move the address checks into one function, ip6_badaddr(). In this function, reinstate the "IPv4-compatible IPv6 addresses" check; these addresses are deprecated by RFC4291 (2006).
Sync with HEAD, resolve some conflicts
Remove useless DIAGNOSTIC block, the caller already ensures the assumptions, and here we're not doing anything (it should be a panic rather than a printf).
Introduce a m_verify_packet function, that verifies the mbuf chain of a packet to ensure it is not malformed. Call this function in "points of interest", that are the IPv4/IPv6/IPsec entry points. There could be more. We use M_VERIFY_PACKET(m), declared under DIAGNOSTIC only. This function should not be called everywhere, especially not in places that temporarily manipulate (and clobber) the mbuf structure; once they're done they put the mbuf back in a correct format.
Add comment about IPsec.
Pull up following revision(s) (requested by roy in ticket #724): tests/net/icmp/t_ping.c: revision 1.19 sys/netinet6/raw_ip6.c: revision 1.166 sys/netinet6/ip6_input.c: revision 1.195 sys/net/raw_usrreq.c: revision 1.59 sys/sys/socketvar.h: revision 1.151 sys/kern/uipc_socket2.c: revision 1.128 tests/lib/libc/sys/t_recvmmsg.c: revision 1.2 lib/libc/sys/recv.2: revision 1.38 sys/net/rtsock.c: revision 1.239 sys/netinet/udp_usrreq.c: revision 1.246 sys/netinet6/icmp6.c: revision 1.224 tests/net/icmp/t_ping.c: revision 1.20 sys/netipsec/keysock.c: revision 1.63 sys/netinet/raw_ip.c: revision 1.172 sys/kern/uipc_socket.c: revision 1.260 tests/net/icmp/t_ping.c: revision 1.22 sys/kern/uipc_socket.c: revision 1.261 tests/net/icmp/t_ping.c: revision 1.23 sys/netinet/ip_mroute.c: revision 1.155 sbin/route/route.c: revision 1.159 sys/netinet6/ip6_mroute.c: revision 1.123 sys/netatalk/ddp_input.c: revision 1.31 sys/netcan/can.c: revision 1.3 sys/kern/uipc_usrreq.c: revision 1.184 sys/netinet6/udp6_usrreq.c: revision 1.138 tests/net/icmp/t_ping.c: revision 1.18 socket: report receive buffer overflows Add soroverflow() which increments the overflow counter, sets so_error to ENOBUFS and wakes the receive socket up. Replace all code that manually increments this counter with soroverflow(). Add soroverflow() to raw_input(). This allows userland to detect route(4) overflows so it can re-sync with the current state. socket: clear error even when peeking The error has already been reported and it's pointless requiring another recv(2) call just to clear it. socket: remove now incorrect comment that so_error is only udp As it can be affected by route(4) sockets which are raw. rtsock: log dropped messages that we cannot report to userland Handle ENOBUFS when receiving messages. Don't send messages if the receiver has died. Sprinkle more soroverflow(). Handle ENOBUFS in recv Handle ENOBUFS in sendto Note value received. Harden another sendto for ENOBUFS. Handle the routing socket overflowing gracefully. Allow a valid sendto .... duh Handle errors better. Fix test for checking we sent all the data we asked to.
Synch with HEAD, resolve conflicts
Sprinkle more soroverflow().
Synch with HEAD
Perform the IP (src/dst) checks _before_ calling the packet filter, because if the filter has a "return-icmp" rule it may call icmp6_error with an src field that was not entirely validated.
Pull up following revision(s) (requested by ozaki-r in ticket #588): sys/netinet6/in6.c: revision 1.260 sys/netinet/in.c: revision 1.219 sys/netinet/wqinput.c: revision 1.4 sys/rump/net/lib/libnetinet/netinet_component.c: revision 1.11 sys/netinet/ip_input.c: revision 1.376 sys/netinet6/ip6_input.c: revision 1.193 Avoid a deadlock between softnet_lock and IFNET_LOCK A deadlock occurs because there is a violation of the rule of lock ordering; softnet_lock is held with hodling IFNET_LOCK, which violates the rule. To avoid the deadlock, replace softnet_lock in in_control and in6_control with KERNEL_LOCK. We also need to add some KERNEL_LOCKs to protect the network stack surely. This is required, for example, for PR kern/51356. Fix PR kern/53043
Pull up following revision(s) (requested by maxv in ticket #568): sys/netinet6/ip6_input.c: 1.188 Kick nested fragments.
Pull up following revision(s) (requested by maxv in ticket #1572): sys/netinet6/ip6_input.c: 1.188 via patch Kick nested fragments.
Pull up following revision(s) (requested by maxv in ticket #1572): sys/netinet6/ip6_input.c: 1.188 via patch Kick nested fragments.
Pull up following revision(s) (requested by maxv in ticket #1572): sys/netinet6/ip6_input.c: 1.188 via patch Kick nested fragments.
Avoid a deadlock between softnet_lock and IFNET_LOCK A deadlock occurs because there is a violation of the rule of lock ordering; softnet_lock is held with hodling IFNET_LOCK, which violates the rule. To avoid the deadlock, replace softnet_lock in in_control and in6_control with KERNEL_LOCK. We also need to add some KERNEL_LOCKs to protect the network stack surely. This is required, for example, for PR kern/51356. Fix PR kern/53043
Re-make ip6_nexthdr global, it will be used in soon-to-be-added code...
Replace bcopy -> memcpy when it is obvious that the areas don't overlap. Rearrange ip6_splithdr() for clarity.
Remove dead code.
Pull up following revision(s) (requested by maxv in ticket #1523): sys/netinet6/frag6.c: revision 1.65 sys/netinet6/ip6_input.c: revision 1.187 sys/netinet6/ip6_var.h: revision 1.78 sys/netinet6/raw_ip6.c: revision 1.160 (patch) sys/netinet6/ah_input.c: adjust other callers (patch) sys/netinet6/esp_input.c: adjust other callers (patch) sys/netinet6/ipcomp_input.c: adjust other callers (patch) Fix a buffer overflow in ip6_get_prevhdr. Doing mtod(m, char *) + len is wrong, an option is allowed to be located in another mbuf of the chain. If the offset of an option within the chain is bigger than the length of the first mbuf in that chain, we are reading/writing one byte of packet- controlled data beyond the end of the first mbuf. The length of this first mbuf depends on the layout the network driver chose. In the most difficult case, it will allocate a 2KB cluster, which is bigger than the Ethernet MTU. But there is at least one way of exploiting this case: by sending a special combination of nested IPv6 fragments, the packet can control a good bunch of 'len'. By luck, the memory pool containing clusters does not embed the pool header in front of the items, so it is not straightforward to predict what is located at 'mtod(m, char *) + len'. However, by sending offending fragments in a loop, it is possible to crash the kernel - at some point we will hit important data structures. As far as I can tell, PF protects against this difficult case, because it kicks nested fragments. NPF does not protect against this. IPF I don't know. Then there are the more easy cases, if the MTU is bigger than a cluster, or if the network driver did not allocate a cluster, or perhaps if the fragments are received via a tunnel; I haven't investigated these cases. Change ip6_get_prevhdr so that it returns an offset in the chain, and always use IP6_EXTHDR_GET to get a writable pointer. IP6_EXTHDR_GET leaves M_PKTHDR untouched. This place is still fragile.
Pull up following revision(s) (requested by maxv in ticket #1523): sys/netinet6/frag6.c: revision 1.65 sys/netinet6/ip6_input.c: revision 1.187 sys/netinet6/ip6_var.h: revision 1.78 sys/netinet6/raw_ip6.c: revision 1.160 (patch) sys/netinet6/ah_input.c: adjust other callers (patch) sys/netinet6/esp_input.c: adjust other callers (patch) sys/netinet6/ipcomp_input.c: adjust other callers (patch) Fix a buffer overflow in ip6_get_prevhdr. Doing mtod(m, char *) + len is wrong, an option is allowed to be located in another mbuf of the chain. If the offset of an option within the chain is bigger than the length of the first mbuf in that chain, we are reading/writing one byte of packet- controlled data beyond the end of the first mbuf. The length of this first mbuf depends on the layout the network driver chose. In the most difficult case, it will allocate a 2KB cluster, which is bigger than the Ethernet MTU. But there is at least one way of exploiting this case: by sending a special combination of nested IPv6 fragments, the packet can control a good bunch of 'len'. By luck, the memory pool containing clusters does not embed the pool header in front of the items, so it is not straightforward to predict what is located at 'mtod(m, char *) + len'. However, by sending offending fragments in a loop, it is possible to crash the kernel - at some point we will hit important data structures. As far as I can tell, PF protects against this difficult case, because it kicks nested fragments. NPF does not protect against this. IPF I don't know. Then there are the more easy cases, if the MTU is bigger than a cluster, or if the network driver did not allocate a cluster, or perhaps if the fragments are received via a tunnel; I haven't investigated these cases. Change ip6_get_prevhdr so that it returns an offset in the chain, and always use IP6_EXTHDR_GET to get a writable pointer. IP6_EXTHDR_GET leaves M_PKTHDR untouched. This place is still fragile.
Pull up following revision(s) (requested by maxv in ticket #1523): sys/netinet6/frag6.c: revision 1.65 sys/netinet6/ip6_input.c: revision 1.187 sys/netinet6/ip6_var.h: revision 1.78 sys/netinet6/raw_ip6.c: revision 1.160 (patch) sys/netinet6/ah_input.c: adjust other callers (patch) sys/netinet6/esp_input.c: adjust other callers (patch) sys/netinet6/ipcomp_input.c: adjust other callers (patch) Fix a buffer overflow in ip6_get_prevhdr. Doing mtod(m, char *) + len is wrong, an option is allowed to be located in another mbuf of the chain. If the offset of an option within the chain is bigger than the length of the first mbuf in that chain, we are reading/writing one byte of packet- controlled data beyond the end of the first mbuf. The length of this first mbuf depends on the layout the network driver chose. In the most difficult case, it will allocate a 2KB cluster, which is bigger than the Ethernet MTU. But there is at least one way of exploiting this case: by sending a special combination of nested IPv6 fragments, the packet can control a good bunch of 'len'. By luck, the memory pool containing clusters does not embed the pool header in front of the items, so it is not straightforward to predict what is located at 'mtod(m, char *) + len'. However, by sending offending fragments in a loop, it is possible to crash the kernel - at some point we will hit important data structures. As far as I can tell, PF protects against this difficult case, because it kicks nested fragments. NPF does not protect against this. IPF I don't know. Then there are the more easy cases, if the MTU is bigger than a cluster, or if the network driver did not allocate a cluster, or perhaps if the fragments are received via a tunnel; I haven't investigated these cases. Change ip6_get_prevhdr so that it returns an offset in the chain, and always use IP6_EXTHDR_GET to get a writable pointer. IP6_EXTHDR_GET leaves M_PKTHDR untouched. This place is still fragile.
Pull up following revision(s) (requested by maxv in ticket #1560): sys/netinet6/frag6.c: revision 1.65 sys/netinet6/ip6_input.c: revision 1.187 sys/netinet6/ip6_var.h: revision 1.78 sys/netinet6/raw_ip6.c: revision 1.160 (patch) Fix a buffer overflow in ip6_get_prevhdr. Doing mtod(m, char *) + len is wrong, an option is allowed to be located in another mbuf of the chain. If the offset of an option within the chain is bigger than the length of the first mbuf in that chain, we are reading/writing one byte of packet- controlled data beyond the end of the first mbuf. The length of this first mbuf depends on the layout the network driver chose. In the most difficult case, it will allocate a 2KB cluster, which is bigger than the Ethernet MTU. But there is at least one way of exploiting this case: by sending a special combination of nested IPv6 fragments, the packet can control a good bunch of 'len'. By luck, the memory pool containing clusters does not embed the pool header in front of the items, so it is not straightforward to predict what is located at 'mtod(m, char *) + len'. However, by sending offending fragments in a loop, it is possible to crash the kernel - at some point we will hit important data structures. As far as I can tell, PF protects against this difficult case, because it kicks nested fragments. NPF does not protect against this. IPF I don't know. Then there are the more easy cases, if the MTU is bigger than a cluster, or if the network driver did not allocate a cluster, or perhaps if the fragments are received via a tunnel; I haven't investigated these cases. Change ip6_get_prevhdr so that it returns an offset in the chain, and always use IP6_EXTHDR_GET to get a writable pointer. IP6_EXTHDR_GET leaves M_PKTHDR untouched. This place is still fragile.
Pull up following revision(s) (requested by maxv in ticket #1560): sys/netinet6/frag6.c: revision 1.65 sys/netinet6/ip6_input.c: revision 1.187 sys/netinet6/ip6_var.h: revision 1.78 sys/netinet6/raw_ip6.c: revision 1.160 (patch) Fix a buffer overflow in ip6_get_prevhdr. Doing mtod(m, char *) + len is wrong, an option is allowed to be located in another mbuf of the chain. If the offset of an option within the chain is bigger than the length of the first mbuf in that chain, we are reading/writing one byte of packet- controlled data beyond the end of the first mbuf. The length of this first mbuf depends on the layout the network driver chose. In the most difficult case, it will allocate a 2KB cluster, which is bigger than the Ethernet MTU. But there is at least one way of exploiting this case: by sending a special combination of nested IPv6 fragments, the packet can control a good bunch of 'len'. By luck, the memory pool containing clusters does not embed the pool header in front of the items, so it is not straightforward to predict what is located at 'mtod(m, char *) + len'. However, by sending offending fragments in a loop, it is possible to crash the kernel - at some point we will hit important data structures. As far as I can tell, PF protects against this difficult case, because it kicks nested fragments. NPF does not protect against this. IPF I don't know. Then there are the more easy cases, if the MTU is bigger than a cluster, or if the network driver did not allocate a cluster, or perhaps if the fragments are received via a tunnel; I haven't investigated these cases. Change ip6_get_prevhdr so that it returns an offset in the chain, and always use IP6_EXTHDR_GET to get a writable pointer. IP6_EXTHDR_GET leaves M_PKTHDR untouched. This place is still fragile.
Pull up following revision(s) (requested by maxv in ticket #1560): sys/netinet6/frag6.c: revision 1.65 sys/netinet6/ip6_input.c: revision 1.187 sys/netinet6/ip6_var.h: revision 1.78 sys/netinet6/raw_ip6.c: revision 1.160 (patch) Fix a buffer overflow in ip6_get_prevhdr. Doing mtod(m, char *) + len is wrong, an option is allowed to be located in another mbuf of the chain. If the offset of an option within the chain is bigger than the length of the first mbuf in that chain, we are reading/writing one byte of packet- controlled data beyond the end of the first mbuf. The length of this first mbuf depends on the layout the network driver chose. In the most difficult case, it will allocate a 2KB cluster, which is bigger than the Ethernet MTU. But there is at least one way of exploiting this case: by sending a special combination of nested IPv6 fragments, the packet can control a good bunch of 'len'. By luck, the memory pool containing clusters does not embed the pool header in front of the items, so it is not straightforward to predict what is located at 'mtod(m, char *) + len'. However, by sending offending fragments in a loop, it is possible to crash the kernel - at some point we will hit important data structures. As far as I can tell, PF protects against this difficult case, because it kicks nested fragments. NPF does not protect against this. IPF I don't know. Then there are the more easy cases, if the MTU is bigger than a cluster, or if the network driver did not allocate a cluster, or perhaps if the fragments are received via a tunnel; I haven't investigated these cases. Change ip6_get_prevhdr so that it returns an offset in the chain, and always use IP6_EXTHDR_GET to get a writable pointer. IP6_EXTHDR_GET leaves M_PKTHDR untouched. This place is still fragile.
Pull up following revision(s) (requested by maxv in ticket #527): sys/netinet6/frag6.c: revision 1.65 sys/netinet6/ip6_input.c: revision 1.187 sys/netinet6/ip6_var.h: revision 1.78 sys/netinet6/raw_ip6.c: revision 1.160 Fix a buffer overflow in ip6_get_prevhdr. Doing mtod(m, char *) + len is wrong, an option is allowed to be located in another mbuf of the chain. If the offset of an option within the chain is bigger than the length of the first mbuf in that chain, we are reading/writing one byte of packet- controlled data beyond the end of the first mbuf. The length of this first mbuf depends on the layout the network driver chose. In the most difficult case, it will allocate a 2KB cluster, which is bigger than the Ethernet MTU. But there is at least one way of exploiting this case: by sending a special combination of nested IPv6 fragments, the packet can control a good bunch of 'len'. By luck, the memory pool containing clusters does not embed the pool header in front of the items, so it is not straightforward to predict what is located at 'mtod(m, char *) + len'. However, by sending offending fragments in a loop, it is possible to crash the kernel - at some point we will hit important data structures. As far as I can tell, PF protects against this difficult case, because it kicks nested fragments. NPF does not protect against this. IPF I don't know. Then there are the more easy cases, if the MTU is bigger than a cluster, or if the network driver did not allocate a cluster, or perhaps if the fragments are received via a tunnel; I haven't investigated these cases. Change ip6_get_prevhdr so that it returns an offset in the chain, and always use IP6_EXTHDR_GET to get a writable pointer. IP6_EXTHDR_GET leaves M_PKTHDR untouched. This place is still fragile.
Style, localify, remove dead code, and fix typos. No functional change.
Kick nested fragments.
Fix a buffer overflow in ip6_get_prevhdr. Doing mtod(m, char *) + len is wrong, an option is allowed to be located in another mbuf of the chain. If the offset of an option within the chain is bigger than the length of the first mbuf in that chain, we are reading/writing one byte of packet- controlled data beyond the end of the first mbuf. The length of this first mbuf depends on the layout the network driver chose. In the most difficult case, it will allocate a 2KB cluster, which is bigger than the Ethernet MTU. But there is at least one way of exploiting this case: by sending a special combination of nested IPv6 fragments, the packet can control a good bunch of 'len'. By luck, the memory pool containing clusters does not embed the pool header in front of the items, so it is not straightforward to predict what is located at 'mtod(m, char *) + len'. However, by sending offending fragments in a loop, it is possible to crash the kernel - at some point we will hit important data structures. As far as I can tell, PF protects against this difficult case, because it kicks nested fragments. NPF does not protect against this. IPF I don't know. Then there are the more easy cases, if the MTU is bigger than a cluster, or if the network driver did not allocate a cluster, or perhaps if the fragments are received via a tunnel; I haven't investigated these cases. Change ip6_get_prevhdr so that it returns an offset in the chain, and always use IP6_EXTHDR_GET to get a writable pointer. IP6_EXTHDR_GET leaves M_PKTHDR untouched. This place is still fragile.
Start cleaning up ip6_input.c. Several pieces of code have evolved but their neighboring comments were not updated. So update them, and remove code that has been disabled for years (it has no use anyway).
Pull up following revision(s) (requested by ozaki-r in ticket #456): sys/arch/arm/sunxi/sunxi_emac.c: 1.9 sys/dev/ic/dwc_gmac.c: 1.43-1.44 sys/dev/pci/if_iwm.c: 1.75 sys/dev/pci/if_wm.c: 1.543 sys/dev/pci/ixgbe/ixgbe.c: 1.112 sys/dev/pci/ixgbe/ixv.c: 1.74 sys/kern/sys_socket.c: 1.75 sys/net/agr/if_agr.c: 1.43 sys/net/bpf.c: 1.219 sys/net/if.c: 1.397, 1.399, 1.401-1.403, 1.406-1.410, 1.412-1.416 sys/net/if.h: 1.242-1.247, 1.250, 1.252-1.257 sys/net/if_bridge.c: 1.140 via patch, 1.142-1.146 sys/net/if_etherip.c: 1.40 sys/net/if_ethersubr.c: 1.243, 1.246 sys/net/if_faith.c: 1.57 sys/net/if_gif.c: 1.132 sys/net/if_l2tp.c: 1.15, 1.17 sys/net/if_loop.c: 1.98-1.101 sys/net/if_media.c: 1.35 sys/net/if_pppoe.c: 1.131-1.132 sys/net/if_spppsubr.c: 1.176-1.177 sys/net/if_tun.c: 1.142 sys/net/if_vlan.c: 1.107, 1.109, 1.114-1.121 sys/net/npf/npf_ifaddr.c: 1.3 sys/net/npf/npf_os.c: 1.8-1.9 sys/net/rtsock.c: 1.230 sys/netcan/if_canloop.c: 1.3-1.5 sys/netinet/if_arp.c: 1.255 sys/netinet/igmp.c: 1.65 sys/netinet/in.c: 1.210-1.211 sys/netinet/in_pcb.c: 1.180 sys/netinet/ip_carp.c: 1.92, 1.94 sys/netinet/ip_flow.c: 1.81 sys/netinet/ip_input.c: 1.362 sys/netinet/ip_mroute.c: 1.147 sys/netinet/ip_output.c: 1.283, 1.285, 1.287 sys/netinet6/frag6.c: 1.61 sys/netinet6/in6.c: 1.251, 1.255 sys/netinet6/in6_pcb.c: 1.162 sys/netinet6/ip6_flow.c: 1.35 sys/netinet6/ip6_input.c: 1.183 sys/netinet6/ip6_output.c: 1.196 sys/netinet6/mld6.c: 1.90 sys/netinet6/nd6.c: 1.239-1.240 sys/netinet6/nd6_nbr.c: 1.139 sys/netinet6/nd6_rtr.c: 1.136 sys/netipsec/ipsec_output.c: 1.65 sys/rump/net/lib/libnetinet/netinet_component.c: 1.9-1.10 kmem_intr_free kmem_intr_[z]alloced memory the underlying pools are the same but api-wise those should match Unify IFEF_*_MPSAFE into IFEF_MPSAFE There are already two flags for if_output and if_start, however, it seems such MPSAFE flags are eventually needed for all if_XXX operations. Having discrete flags for each operation is wasteful of if_extflags bits. So let's unify the flags into one: IFEF_MPSAFE. Fortunately IFEF_*_MPSAFE flags have never been included in any releases, so we can change them without breaking backward compatibility of the releases (though the kernel version of -current should be bumped). Note that if an interface have both MP-safe and non-MP-safe operations at a time, we have to set the IFEF_MPSAFE flag and let callees of non-MP-safe opeartions take the kernel lock. Proposed on tech-kern@ and tech-net@ Provide macros for softnet_lock and KERNEL_LOCK hiding NET_MPSAFE switch It reduces C&P codes such as "#ifndef NET_MPSAFE KERNEL_LOCK(1, NULL); ..." scattered all over the source code and makes it easy to identify remaining KERNEL_LOCK and/or softnet_lock that are held even if NET_MPSAFE. No functional change Hold KERNEL_LOCK on if_ioctl selectively based on IFEF_MPSAFE If IFEF_MPSAFE is set, hold the lock and otherwise don't hold. This change requires additions of KERNEL_LOCK to subsequence functions from if_ioctl such as ifmedia_ioctl and ifioctl_common to protect non-MP-safe components. Proposed on tech-kern@ and tech-net@ Ensure to hold if_ioctl_lock when calling if_flags_set Fix locking against myself on ifpromisc vlan_unconfig_locked could be called with holding if_ioctl_lock. Ensure to not turn on IFF_RUNNING of an interface until its initialization completes And ensure to turn off it before destruction as per IFF_RUNNING's description "resource allocated". (The description is a bit doubtful though, I believe the change is still proper.) Ensure to hold if_ioctl_lock on if_up and if_down One exception for if_down is if_detach; in the case the lock isn't needed because it's guaranteed that no other one can access ifp at that point. Make if_link_queue MP-safe if IFEF_MPSAFE if_link_queue is a queue to store events of link state changes, which is used to pass events from (typically) an interrupt handler to if_link_state_change softint. The queue was protected by KERNEL_LOCK so far, but if IFEF_MPSAFE is enabled, it becomes unsafe because (perhaps) an interrupt handler of an interface with IFEF_MPSAFE doesn't take KERNEL_LOCK. Protect it by a spin mutex. Additionally with this change KERNEL_LOCK of if_link_state_change softint is omitted if NET_MPSAFE is enabled. Note that the spin mutex is now ifp->if_snd.ifq_lock as well as the case of if_timer (see the comment). Use IFADDR_WRITER_FOREACH instead of IFADDR_READER_FOREACH At that point no other one modifies the list so IFADDR_READER_FOREACH is unnecessary. Use of IFADDR_READER_FOREACH is harmless in general though, if we try to detect contract violations of pserialize, using it violates the contract. So avoid using it makes life easy. Ensure to call if_addr_init with holding if_ioctl_lock Get rid of outdated comments Fix build of kernels without ether By throwing out if_enable_vlan_mtu and if_disable_vlan_mtu that created a unnecessary dependency from if.c to if_ethersubr.c. PR kern/52790 Rename IFNET_LOCK to IFNET_GLOBAL_LOCK IFNET_LOCK will be used in another lock, if_ioctl_lock (might be renamed then). Wrap if_ioctl_lock with IFNET_* macros (NFC) Also if_ioctl_lock perhaps needs to be renamed to something because it's now not just for ioctl... Reorder some destruction routines in if_detach - Destroy if_ioctl_lock at the end of the if_detach because it's used in various destruction routines - Move psref_target_destroy after pr_purgeif because we want to use psref in pr_purgeif (otherwise destruction procedures can be tricky) Ensure to call if_mcast_op with holding IFNET_LOCK Note that CARP doesn't deal with IFNET_LOCK yet. Remove IFNET_GLOBAL_LOCK where it's unnecessary because IFNET_LOCK is held Describe which lock is used to protect each member variable of struct ifnet Requested by skrll@ Write a guideline for converting an interface to IFEF_MPSAFE Requested by skrll@ Note that IFNET_LOCK must not be held in softint Don't set IFEF_MPSAFE unless NET_MPSAFE at this point Because recent investigations show that interfaces with IFEF_MPSAFE need to follow additional restrictions to work with the flag safely. We should enable it on an interface by default only if the interface surely satisfies the restrictions, which are described in if.h. Note that enabling IFEF_MPSAFE solely gains a few benefit on performance because the network stack is still serialized by the big kernel locks by default.
Pull up following revision(s) (requested by roy in ticket #390): sys/netinet/ip_input.c: 1.363 sys/netinet6/ip6_input.c: 1.184-1.185 sys/netinet6/ip6_output.c: 1.194-1.195 sys/netinet6/in6_src.c: 1.83-1.84 Allow local communication over DETACHED addresses. Allow binding to DETACHED or TENTATIVE addresses as we deny sending upstream from them anyway. Prefer non DETACHED or TENTATIVE addresses. -- Attempt to restore v6 networking. Not 100% certain that these changes are all that is needed, but they're certainly a big part of it (especially the ip6_input.c change.) -- Treat unvalidated addresses as deprecated in rule 3.
update from HEAD
Attempt to restore v6 networking. Not 100% certain that these changes are all that is needed, but they're certainly a big part of it (especially the ip6_input.c change.)
Allow local communication over DETACHED addresses. Allow binding to DETACHED or TENTATIVE addresses as we deny sending upstream from them anyway. Prefer non DETACHED or TENTATIVE addresses.
Provide macros for softnet_lock and KERNEL_LOCK hiding NET_MPSAFE switch It reduces C&P codes such as "#ifndef NET_MPSAFE KERNEL_LOCK(1, NULL); ..." scattered all over the source code and makes it easy to identify remaining KERNEL_LOCK and/or softnet_lock that are held even if NET_MPSAFE. No functional change
Pull up following revision(s) (requested by ozaki-r in ticket #300): crypto/dist/ipsec-tools/src/setkey/parse.y: 1.19 crypto/dist/ipsec-tools/src/setkey/token.l: 1.20 distrib/sets/lists/tests/mi: 1.754, 1.757, 1.759 doc/TODO.smpnet: 1.12-1.13 sys/net/pfkeyv2.h: 1.32 sys/net/raw_cb.c: 1.23-1.24, 1.28 sys/net/raw_cb.h: 1.28 sys/net/raw_usrreq.c: 1.57-1.58 sys/net/rtsock.c: 1.228-1.229 sys/netinet/in_proto.c: 1.125 sys/netinet/ip_input.c: 1.359-1.361 sys/netinet/tcp_input.c: 1.359-1.360 sys/netinet/tcp_output.c: 1.197 sys/netinet/tcp_var.h: 1.178 sys/netinet6/icmp6.c: 1.213 sys/netinet6/in6_proto.c: 1.119 sys/netinet6/ip6_forward.c: 1.88 sys/netinet6/ip6_input.c: 1.181-1.182 sys/netinet6/ip6_output.c: 1.193 sys/netinet6/ip6protosw.h: 1.26 sys/netipsec/ipsec.c: 1.100-1.122 sys/netipsec/ipsec.h: 1.51-1.61 sys/netipsec/ipsec6.h: 1.18-1.20 sys/netipsec/ipsec_input.c: 1.44-1.51 sys/netipsec/ipsec_netbsd.c: 1.41-1.45 sys/netipsec/ipsec_output.c: 1.49-1.64 sys/netipsec/ipsec_private.h: 1.5 sys/netipsec/key.c: 1.164-1.234 sys/netipsec/key.h: 1.20-1.32 sys/netipsec/key_debug.c: 1.18-1.21 sys/netipsec/key_debug.h: 1.9 sys/netipsec/keydb.h: 1.16-1.20 sys/netipsec/keysock.c: 1.59-1.62 sys/netipsec/keysock.h: 1.10 sys/netipsec/xform.h: 1.9-1.12 sys/netipsec/xform_ah.c: 1.55-1.74 sys/netipsec/xform_esp.c: 1.56-1.72 sys/netipsec/xform_ipcomp.c: 1.39-1.53 sys/netipsec/xform_ipip.c: 1.50-1.54 sys/netipsec/xform_tcp.c: 1.12-1.16 sys/rump/librump/rumpkern/Makefile.rumpkern: 1.170 sys/rump/librump/rumpnet/net_stub.c: 1.27 sys/sys/protosw.h: 1.67-1.68 tests/net/carp/t_basic.sh: 1.7 tests/net/if_gif/t_gif.sh: 1.11 tests/net/if_l2tp/t_l2tp.sh: 1.3 tests/net/ipsec/Makefile: 1.7-1.9 tests/net/ipsec/algorithms.sh: 1.5 tests/net/ipsec/common.sh: 1.4-1.6 tests/net/ipsec/t_ipsec_ah_keys.sh: 1.2 tests/net/ipsec/t_ipsec_esp_keys.sh: 1.2 tests/net/ipsec/t_ipsec_gif.sh: 1.6-1.7 tests/net/ipsec/t_ipsec_l2tp.sh: 1.6-1.7 tests/net/ipsec/t_ipsec_misc.sh: 1.8-1.18 tests/net/ipsec/t_ipsec_sockopt.sh: 1.1-1.2 tests/net/ipsec/t_ipsec_tcp.sh: 1.1-1.2 tests/net/ipsec/t_ipsec_transport.sh: 1.5-1.6 tests/net/ipsec/t_ipsec_tunnel.sh: 1.9 tests/net/ipsec/t_ipsec_tunnel_ipcomp.sh: 1.1-1.2 tests/net/ipsec/t_ipsec_tunnel_odd.sh: 1.3 tests/net/mcast/t_mcast.sh: 1.6 tests/net/net/t_ipaddress.sh: 1.11 tests/net/net_common.sh: 1.20 tests/net/npf/t_npf.sh: 1.3 tests/net/route/t_flags.sh: 1.20 tests/net/route/t_flags6.sh: 1.16 usr.bin/netstat/fast_ipsec.c: 1.22 Do m_pullup before mtod It may fix panicks of some tests on anita/sparc and anita/GuruPlug. --- KNF --- Enable DEBUG for babylon5 --- Apply C99-style struct initialization to xformsw --- Tweak outputs of netstat -s for IPsec - Get rid of "Fast" - Use ipsec and ipsec6 for titles to clarify protocol - Indent outputs of sub protocols Original outputs were organized like this: (Fast) IPsec: IPsec ah: IPsec esp: IPsec ipip: IPsec ipcomp: (Fast) IPsec: IPsec ah: IPsec esp: IPsec ipip: IPsec ipcomp: New outputs are organized like this: ipsec: ah: esp: ipip: ipcomp: ipsec6: ah: esp: ipip: ipcomp: --- Add test cases for IPComp --- Simplify IPSEC_OSTAT macro (NFC) --- KNF; replace leading whitespaces with hard tabs --- Introduce and use SADB_SASTATE_USABLE_P --- KNF --- Add update command for testing Updating an SA (SADB_UPDATE) requires that a process issuing SADB_UPDATE is the same as a process issued SADB_ADD (or SADB_GETSPI). This means that update command must be used with add command in a configuration of setkey. This usage is normally meaningless but useful for testing (and debugging) purposes. --- Add test cases for updating SA/SP The tests require newly-added udpate command of setkey. --- PR/52346: Frank Kardel: Fix checksumming for NAT-T See XXX for improvements. --- Remove codes for PACKET_TAG_IPSEC_IN_CRYPTO_DONE It seems that PACKET_TAG_IPSEC_IN_CRYPTO_DONE is for network adapters that have IPsec accelerators; a driver sets the mtag to a packet when its device has already encrypted the packet. Unfortunately no driver implements such offload features for long years and seems unlikely to implement them soon. (Note that neither FreeBSD nor Linux doesn't have such drivers.) Let's remove related (unused) codes and simplify the IPsec code. --- Fix usages of sadb_msg_errno --- Avoid updating sav directly On SADB_UPDATE a target sav was updated directly, which was unsafe. Instead allocate another sav, copy variables of the old sav to the new one and replace the old one with the new one. --- Simplify; we can assume sav->tdb_xform cannot be NULL while it's valid --- Rename key_alloc* functions (NFC) We shouldn't use the term "alloc" for functions that just look up data and actually don't allocate memory. --- Use explicit_memset to surely zero-clear key_auth and key_enc --- Make sure to clear keys on error paths of key_setsaval --- Add missing KEY_FREESAV --- Make sure a sav is inserted to a sah list after its initialization completes --- Remove unnecessary zero-clearing codes from key_setsaval key_setsaval is now used only for a newly-allocated sav. (It was used to reset variables of an existing sav.) --- Correct wrong assumption of sav->refcnt in key_delsah A sav in a list is basically not to be sav->refcnt == 0. And also KEY_FREESAV assumes sav->refcnt > 0. --- Let key_getsavbyspi take a reference of a returning sav --- Use time_mono_to_wall (NFC) --- Separate sending message routine (NFC) --- Simplify; remove unnecessary zero-clears key_freesaval is used only when a target sav is being destroyed. --- Omit NULL checks for sav->lft_c sav->lft_c can be NULL only when initializing or destroying sav. --- Omit unnecessary NULL checks for sav->sah --- Omit unnecessary check of sav->state key_allocsa_policy picks a sav of either MATURE or DYING so we don't need to check its state again. --- Simplify; omit unnecessary saidx passing - ipsec_nextisr returns a saidx but no caller uses it - key_checkrequest is passed a saidx but it can be gotton by another argument (isr) --- Fix splx isn't called on some error paths --- Fix header size calculation of esp where sav is NULL --- Fix header size calculation of ah in the case sav is NULL This fix was also needed for esp. --- Pass sav directly to opencrypto callback In a callback, use a passed sav as-is by default and look up a sav only if the passed sav is dead. --- Avoid examining freshness of sav on packet processing If a sav list is sorted (by lft_c->sadb_lifetime_addtime) in advance, we don't need to examine each sav and also don't need to delete one on the fly and send up a message. Fortunately every sav lists are sorted as we need. Added key_validate_savlist validates that each sav list is surely sorted (run only if DEBUG because it's not cheap). --- Add test cases for SAs with different SPIs --- Prepare to stop using isr->sav isr is a shared resource and using isr->sav as a temporal storage for each packet processing is racy. And also having a reference from isr to sav makes the lifetime of sav non-deterministic; such a reference is removed when a packet is processed and isr->sav is overwritten by new one. Let's have a sav locally for each packet processing instead of using shared isr->sav. However this change doesn't stop using isr->sav yet because there are some users of isr->sav. isr->sav will be removed after the users find a way to not use isr->sav. --- Fix wrong argument handling --- fix printf format. --- Don't validate sav lists of LARVAL or DEAD states We don't sort the lists so the validation will always fail. Fix PR kern/52405 --- Make sure to sort the list when changing the state by key_sa_chgstate --- Rename key_allocsa_policy to key_lookup_sa_bysaidx --- Separate test files --- Calculate ah_max_authsize on initialization as well as esp_max_ivlen --- Remove m_tag_find(PACKET_TAG_IPSEC_PENDING_TDB) because nobody sets the tag --- Restore a comment removed in previous The comment is valid for the below code. --- Make tests more stable sleep command seems to wait longer than expected on anita so use polling to wait for a state change. --- Add tests that explicitly delete SAs instead of waiting for expirations --- Remove invalid M_AUTHIPDGM check on ESP isr->sav M_AUTHIPDGM flag is set to a mbuf in ah_input_cb. An sav of ESP can have AH authentication as sav->tdb_authalgxform. However, in that case esp_input and esp_input_cb are used to do ESP decryption and AH authentication and M_AUTHIPDGM never be set to a mbuf. So checking M_AUTHIPDGM of a mbuf on isr->sav of ESP is meaningless. --- Look up sav instead of relying on unstable sp->req->sav This code is executed only in an error path so an additional lookup doesn't matter. --- Correct a comment --- Don't release sav if calling crypto_dispatch again --- Remove extra KEY_FREESAV from ipsec_process_done It should be done by the caller. --- Don't bother the case of crp->crp_buf == NULL in callbacks --- Hold a reference to an SP during opencrypto processing An SP has a list of isr (ipsecrequest) that represents a sequence of IPsec encryption/authentication processing. One isr corresponds to one opencrypto processing. The lifetime of an isr follows its SP. We pass an isr to a callback function of opencrypto to continue to a next encryption/authentication processing. However nobody guaranteed that the isr wasn't freed, i.e., its SP wasn't destroyed. In order to avoid such unexpected destruction of isr, hold a reference to its SP during opencrypto processing. --- Don't make SAs expired on tests that delete SAs explicitly --- Fix a debug message --- Dedup error paths (NFC) --- Use pool to allocate tdb_crypto For ESP and AH, we need to allocate an extra variable space in addition to struct tdb_crypto. The fixed size of pool items may be larger than an actual requisite size of a buffer, but still the performance improvement by replacing malloc with pool wins. --- Don't use unstable isr->sav for header size calculations We may need to optimize to not look up sav here for users that don't need to know an exact size of headers (e.g., TCP segmemt size caclulation). --- Don't use sp->req->sav when handling NAT-T ESP fragmentation In order to do this we need to look up a sav however an additional look-up degrades performance. A sav is later looked up in ipsec4_process_packet so delay the fragmentation check until then to avoid an extra look-up. --- Don't use key_lookup_sp that depends on unstable sp->req->sav It provided a fast look-up of SP. We will provide an alternative method in the future (after basic MP-ification finishes). --- Stop setting isr->sav on looking up sav in key_checkrequest --- Remove ipsecrequest#sav --- Stop setting mtag of PACKET_TAG_IPSEC_IN_DONE because there is no users anymore --- Skip ipsec_spi_*_*_preferred_new_timeout when running on qemu Probably due to PR 43997 --- Add localcount to rump kernels --- Remove unused macro --- Fix key_getcomb_setlifetime The fix adjusts a soft limit to be 80% of a corresponding hard limit. I'm not sure the fix is really correct though, at least the original code is wrong. A passed comb is zero-cleared before calling key_getcomb_setlifetime, so comb->sadb_comb_soft_addtime = comb->sadb_comb_soft_addtime * 80 / 100; is meaningless. --- Provide and apply key_sp_refcnt (NFC) It simplifies further changes. --- Fix indentation Pointed out by knakahara@ --- Use pslist(9) for sptree --- Don't acquire global locks for IPsec if NET_MPSAFE Note that the change is just to make testing easy and IPsec isn't MP-safe yet. --- Let PF_KEY socks hold their own lock instead of softnet_lock Operations on SAD and SPD are executed via PF_KEY socks. The operations include deletions of SAs and SPs that will use synchronization mechanisms such as pserialize_perform to wait for references to SAs and SPs to be released. It is known that using such mechanisms with holding softnet_lock causes a dead lock. We should avoid the situation. --- Make IPsec SPD MP-safe We use localcount(9), not psref(9), to make the sptree and secpolicy (SP) entries MP-safe because SPs need to be referenced over opencrypto processing that executes a callback in a different context. SPs on sockets aren't managed by the sptree and can be destroyed in softint. localcount_drain cannot be used in softint so we delay the destruction of such SPs to a thread context. To do so, a list to manage such SPs is added (key_socksplist) and key_timehandler_spd deletes dead SPs in the list. For more details please read the locking notes in key.c. Proposed on tech-kern@ and tech-net@ --- Fix updating ipsec_used - key_update_used wasn't called in key_api_spddelete2 and key_api_spdflush - key_update_used wasn't called if an SP had been added/deleted but a reply to userland failed --- Fix updating ipsec_used; turn on when SPs on sockets are added --- Add missing IPsec policy checks to icmp6_rip6_input icmp6_rip6_input is quite similar to rip6_input and the same checks exist in rip6_input. --- Add test cases for setsockopt(IP_IPSEC_POLICY) --- Don't use KEY_NEWSP for dummy SP entries By the change KEY_NEWSP is now not called from softint anymore and we can use kmem_zalloc with KM_SLEEP for KEY_NEWSP. --- Comment out unused functions --- Add test cases that there are SPs but no relevant SAs --- Don't allow sav->lft_c to be NULL lft_c of an sav that was created by SADB_GETSPI could be NULL. --- Clean up clunky eval strings - Remove unnecessary \ at EOL - This allows to omit ; too - Remove unnecessary quotes for arguments of atf_set - Don't expand $DEBUG in eval - We expect it's expanded on execution Suggested by kre@ --- Remove unnecessary KEY_FREESAV in an error path sav should be freed (unreferenced) by the caller. --- Use pslist(9) for sahtree --- Use pslist(9) for sah->savtree --- Rename local variable newsah to sah It may not be new. --- MP-ify SAD slightly - Introduce key_sa_mtx and use it for some list operations - Use pserialize for some list iterations --- Introduce KEY_SA_UNREF and replace KEY_FREESAV with it where sav will never be actually freed in the future KEY_SA_UNREF is still key_freesav so no functional change for now. This change reduces diff of further changes. --- Remove out-of-date log output Pointed out by riastradh@ --- Use KDASSERT instead of KASSERT for mutex_ownable Because mutex_ownable is too heavy to run in a fast path even for DIAGNOSTIC + LOCKDEBUG. Suggested by riastradh@ --- Assemble global lists and related locks into cache lines (NFCI) Also rename variable names from *tree to *list because they are just lists, not trees. Suggested by riastradh@ --- Move locking notes --- Update the locking notes - Add locking order - Add locking notes for misc lists such as reglist - Mention pserialize, key_sp_ref and key_sp_unref on SP operations Requested by riastradh@ --- Describe constraints of key_sp_ref and key_sp_unref Requested by riastradh@ --- Hold key_sad.lock on SAVLIST_WRITER_INSERT_TAIL --- Add __read_mostly to key_psz Suggested by riastradh@ --- Tweak wording (pserialize critical section => pserialize read section) Suggested by riastradh@ --- Add missing mutex_exit --- Fix setkey -D -P outputs The outputs were tweaked (by me), but I forgot updating libipsec in my local ATF environment... --- MP-ify SAD (key_sad.sahlist and sah entries) localcount(9) is used to protect key_sad.sahlist and sah entries as well as SPD (and will be used for SAD sav). Please read the locking notes of SAD for more details. --- Introduce key_sa_refcnt and replace sav->refcnt with it (NFC) --- Destroy sav only in the loop for DEAD sav --- Fix KASSERT(solocked(sb->sb_so)) failure in sbappendaddr that is called eventually from key_sendup_mbuf If key_sendup_mbuf isn't passed a socket, the assertion fails. Originally in this case sb->sb_so was softnet_lock and callers held softnet_lock so the assertion was magically satisfied. Now sb->sb_so is key_so_mtx and also softnet_lock isn't always held by callers so the assertion can fail. Fix it by holding key_so_mtx if key_sendup_mbuf isn't passed a socket. Reported by knakahara@ Tested by knakahara@ and ozaki-r@ --- Fix locking notes of SAD --- Fix deadlock between key_sendup_mbuf called from key_acquire and localcount_drain If we call key_sendup_mbuf from key_acquire that is called on packet processing, a deadlock can happen like this: - At key_acquire, a reference to an SP (and an SA) is held - key_sendup_mbuf will try to take key_so_mtx - Some other thread may try to localcount_drain to the SP with holding key_so_mtx in say key_api_spdflush - In this case localcount_drain never return because key_sendup_mbuf that has stuck on key_so_mtx never release a reference to the SP Fix the deadlock by deferring key_sendup_mbuf to the timer (key_timehandler). --- Fix that prev isn't cleared on retry --- Limit the number of mbufs queued for deferred key_sendup_mbuf It's easy to be queued hundreds of mbufs on the list under heavy network load. --- MP-ify SAD (savlist) localcount(9) is used to protect savlist of sah. The basic design is similar to MP-ifications of SPD and SAD sahlist. Please read the locking notes of SAD for more details. --- Simplify ipsec_reinject_ipstack (NFC) --- Add per-CPU rtcache to ipsec_reinject_ipstack It reduces route lookups and also reduces rtcache lock contentions when NET_MPSAFE is enabled. --- Use pool_cache(9) instead of pool(9) for tdb_crypto objects The change improves network throughput especially on multi-core systems. --- Update ipsec(4), opencrypto(9) and vlan(4) are now MP-safe. --- Write known issues on scalability --- Share a global dummy SP between PCBs It's never be changed so it can be pre-allocated and shared safely between PCBs. --- Fix race condition on the rawcb list shared by rtsock and keysock keysock now protects itself by its own mutex, which means that the rawcb list is protected by two different mutexes (keysock's one and softnet_lock for rtsock), of course it's useless. Fix the situation by having a discrete rawcb list for each. --- Use a dedicated mutex for rt_rawcb instead of softnet_lock if NET_MPSAFE --- fix localcount leak in sav. fixed by ozaki-r@n.o. I commit on behalf of him. --- remove unnecessary comment. --- Fix deadlock between pserialize_perform and localcount_drain A typical ussage of localcount_drain looks like this: mutex_enter(&mtx); item = remove_from_list(); pserialize_perform(psz); localcount_drain(&item->localcount, &cv, &mtx); mutex_exit(&mtx); This sequence can cause a deadlock which happens for example on the following situation: - Thread A calls localcount_drain which calls xc_broadcast after releasing a specified mutex - Thread B enters the sequence and calls pserialize_perform with holding the mutex while pserialize_perform also calls xc_broadcast - Thread C (xc_thread) that calls an xcall callback of localcount_drain tries to hold the mutex xc_broadcast of thread B doesn't start until xc_broadcast of thread A finishes, which is a feature of xcall(9). This means that pserialize_perform never complete until xc_broadcast of thread A finishes. On the other hand, thread C that is a callee of xc_broadcast of thread A sticks on the mutex. Finally the threads block each other (A blocks B, B blocks C and C blocks A). A possible fix is to serialize executions of the above sequence by another mutex, but adding another mutex makes the code complex, so fix the deadlock by another way; the fix is to release the mutex before pserialize_perform and instead use a condvar to prevent pserialize_perform from being called simultaneously. Note that the deadlock has happened only if NET_MPSAFE is enabled. --- Add missing ifdef NET_MPSAFE --- Take softnet_lock on pr_input properly if NET_MPSAFE Currently softnet_lock is taken unnecessarily in some cases, e.g., icmp_input and encap4_input from ip_input, or not taken even if needed, e.g., udp_input and tcp_input from ipsec4_common_input_cb. Fix them. NFC if NET_MPSAFE is disabled (default). --- - sanitize key debugging so that we don't print extra newlines or unassociated debugging messages. - remove unused functions and make internal ones static - print information in one line per message --- humanize printing of ip addresses --- cast reduction, NFC. --- Fix typo in comment --- Pull out ipsec_fill_saidx_bymbuf (NFC) --- Don't abuse key_checkrequest just for looking up sav It does more than expected for example key_acquire. --- Fix SP is broken on transport mode isr->saidx was modified accidentally in ipsec_nextisr. Reported by christos@ Helped investigations by christos@ and knakahara@ --- Constify isr at many places (NFC) --- Include socketvar.h for softnet_lock --- Fix buffer length for ipsec_logsastr
Take softnet_lock on pr_input properly if NET_MPSAFE Currently softnet_lock is taken unnecessarily in some cases, e.g., icmp_input and encap4_input from ip_input, or not taken even if needed, e.g., udp_input and tcp_input from ipsec4_common_input_cb. Fix them. NFC if NET_MPSAFE is disabled (default).
Sync with HEAD
Don't acquire global locks for IPsec if NET_MPSAFE Note that the change is just to make testing easy and IPsec isn't MP-safe yet.
remove unnecessary casts; use sizeof(var) instead of sizeof(type).
Merge the two copies SO_TIMESTAMP/SO_OTIMESTAMP processing to a single function, and add a SOOPT_TIMESTAMP define reducing compat pollution from 5 places to 1.
remove checks for failure after memory allocation calls that cannot fail: kmem_alloc() with KM_SLEEP kmem_zalloc() with KM_SLEEP percpu_alloc() pserialize_create() psref_class_create() all of these paths include an assertion that the allocation has not failed, so callers should not assert that again.
Sync with HEAD
Sync with HEAD
Replace DIAGNOSTIC + panic with KASSERT
Provide in6_multi_group Use it when checking if we belong to the group, instead of in6_lookup_multi. No functional change.
Stop using useless IN6_*_MULTI macros
Sweep unnecessary malloc.h inclusions
Sync with HEAD
ip6_sprintf -> IN6_PRINT so that we pass the size.
Make ip6_sprintf(), in_fmtaddr(), lla_snprintf() and icmp6_redirect_diag() mpsafe. Reviewed by ozaki-r@
Sync with HEAD. (Note that most of these changes are simply $NetBSD$ tag issues.)
Add rtcache_unref to release points of rtentry stemming from rtcache In the MP-safe world, a rtentry stemming from a rtcache can be freed at any points. So we need to protect rtentries somehow say by reference couting or passive references. Regardless of the method, we need to call some release function of a rtentry after using it. The change adds a new function rtcache_unref to release a rtentry. At this point, this function does nothing because for now we don't add a reference to a rtentry when we get one from a rtcache. We will add something useful in a further commit. This change is a part of changes for MP-safe routing table. It is separated to avoid one big change that makes difficult to debug by bisecting.
Sync with HEAD
Sync with HEAD
Reduce the number of return points No functional change.
Don't hold global locks if NET_MPSAFE is enabled If NET_MPSAFE is enabled, don't hold KERNEL_LOCK and softnet_lock in part of the network stack such as IP forwarding paths. The aim of the change is to make it easy to test the network stack without the locks and reduce our local diffs. By default (i.e., if NET_MPSAFE isn't enabled), the locks are held as they used to be. Reviewed by knakahara@
Sync with HEAD
Disallow input to detached addresses because they are not yet valid.
Make ipforward_rt and ip6_forward_rt percpu Sharing one rtcache between CPUs is just a bad idea. Reviewed by knakahara@
Sync with HEAD
ip6flow refactor like ipflow. - move ip6flow sysctls into ip6_flow.c like ip_flow.c:r1.64 - build ip6_flow.c only if GATEWAY kernel option is enabled
Apply pserialize and psref to struct ifaddr and its variants This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr) MP-safe by using pserialize and psref. At this moment, pserialize_perform and psref_target_destroy are disabled because (1) we don't need them because of softnet_lock (2) they cause a deadlock because of softnet_lock. So we'll enable them when we remove softnet_lock in the future.
Sync with HEAD
Switch the address list of intefaces to pslist(9) As usual, we leave the old list to avoid breaking kvm(3) users.
Move in6_ifaddr_list to a more proper place (from ip6_input.c to in6.c) It's a similar place as the IPv4 address list, i.e., in.c. More varibles will join together.
Use pslist(9) for the global in6_ifaddr list psz and psref will be applied in another commit. No functional change intended.
Remove unnecessary NULL checks of ifa->ifa_addr If it's NULL, it should be a bug. There many IFADDR_FOREACH that don't do NULL check. If it can be NULL, they should fire already.
Avoid storing a pointer of an interface in a mbuf Having a pointer of an interface in a mbuf isn't safe if we remove big kernel locks; an interface object (ifnet) can be destroyed anytime in any packet processing and accessing such object via a pointer is racy. Instead we have to get an object from the interface collection (ifindex2ifnet) via an interface index (if_index) that is stored to a mbuf instead of an pointer. The change provides two APIs: m_{get,put}_rcvif_psref that use psref(9) for sleep-able critical sections and m_{get,put}_rcvif that use pserialize(9) for other critical sections. The change also adds another API called m_get_rcvif_NOMPSAFE, that is NOT MP-safe and for transition moratorium, i.e., it is intended to be used for places where are not planned to be MP-ified soon. The change adds some overhead due to psref to performance sensitive paths, however the overhead is not serious, 2% down at worst. Proposed on tech-kern and tech-net.
Sync with HEAD
Get rcvif once and reuse it No functional change.
Sync with HEAD
Separate nexthop caches from the routing table By this change, nexthop caches (IP-MAC address pair) are not stored in the routing table anymore. Instead nexthop caches are stored in each network interface; we already have lltable/llentry data structure for this purpose. This change also obsoletes the concept of cloning/cloned routes. Cloned routes no longer exist while cloning routes still exist with renamed to connected routes. Noticeable changes are: - Nexthop caches aren't listed in route show/netstat -r - sysctl(NET_RT_DUMP) doesn't return them - If RTF_LLDATA is specified, it returns nexthop caches - Several definitions of routing flags and messages are removed - RTF_CLONING, RTF_XRESOLVE, RTF_LLINFO, RTF_CLONED and RTM_RESOLVE - RTF_CONNECTED is added - It has the same value of RTF_CLONING for backward compatibility - route's -xresolve, -[no]cloned and -llinfo options are removed - -[no]cloning remains because it seems there are users - -[no]connected is introduced and recommended to be used instead of -[no]cloning - route show/netstat -r drops some flags - 'L' and 'c' are not seen anymore - 'C' now indicates a connected route - Gateway value of a route of an interface address is now not a L2 address but "link#N" like a connected (cloning) route - Proxy ARP: "arp -s ... pub" doesn't create a route You can know details of behavior changes by seeing diffs under tests/. Proposed on tech-net and tech-kern: http://mail-index.netbsd.org/tech-net/2016/03/11/msg005701.html
Refine nd6log Add __func__ to nd6log itself instead of adding it to callers.
Tidy up nd6_timer initialization
Sync with HEAD
Declare in6_tmpaddrtimer_ch in in6_var.h. Do not declare extern variables in .c files!
eliminate ip_input.c and ip6_input.c dependency on gif(4)
Sync with HEAD (as of 26th Dec)
Hook up the addrctl stuff that's already there.
Sync with HEAD
sprinkle _KERNEL_OPT
Sync with HEAD
Pull out ipsec routines from ip6_input This change reduces symbol references from netinet6 to netipsec and improves modularity of netipsec. No functional change is intended.
Pull up following revision(s) (requested by pettai in ticket #441): sys/netinet6/ip6_var.h: revision 1.64 sys/netinet6/in6.h: revision 1.82 sys/netinet6/in6_src.c: revision 1.56 sys/netinet6/mld6.c: revision 1.62 sys/netinet6/ip6_input.c: revision 1.150 sys/netinet6/ip6_output.c: revision 1.161 Add net.inet6.ip6.prefer_tempaddr sysctl knob so that we can prefer IPv6 temporary addresses as the source address. Fixes PR kern/47100 based on a patch by Dieter Roelants.
Add net.inet6.ip6.prefer_tempaddr sysctl knob so that we can prefer IPv6 temporary addresses as the source address. Fixes PR kern/47100 based on a patch by Dieter Roelants.
Rebase to HEAD as of a few days ago.
Rebase.
Add 3rd argument to pktq_create to pass sc It will be used to pass bridge sc for bridge_forward softint. ok rmind@
- Implement pktqueue interface for lockless IP input queue. - Replace ipintrq and ip6intrq with the pktqueue mechanism. - Eliminate kernel-lock from ipintr() and ip6intr(). - Some preparation work to push softnet_lock out of ipintr(). Discussed on tech-net.
Add IPV6CTL_AUTO_LINKLOCAL and ND6_IFF_AUTO_LINKLOCAL toggles which control the automatic creation of IPv6 link-local addresses when an interface is brought up. Taken from FreeBSD.
Introduce 2 new variables: ipsec_enabled and ipsec_used. Ipsec enabled is controlled by sysctl and determines if is allowed. ipsec_used is set automatically based on ipsec being enabled, and rules existing.
sync with head. for a reference, the tree before this commit was tagged as yamt-pagecache-tag8. this commit was splitted into small chunks to avoid a limitation of cvs. ("Protocol error: too many arguments")
sync with head
Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before the sysctl link sets are processed, and remove redundancy. Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate lines of code.
check result of setscope, from logan.
sync with head
Pull up revisions: src/share/man/man7/sysctl.7 revision 1.73 via patch src/sys/netinet6/icmp6.c revision 1.161 via patch src/sys/netinet6/in6.c revision 1.161 via patch src/sys/netinet6/in6_proto.c revision 1.97 via patch src/sys/netinet6/in6_var.h revision 1.65 via patch src/sys/netinet6/ip6_input.c revision 1.139 via patch src/sys/netinet6/ip6_var.h revision 1.59 via patch src/sys/netinet6/nd6.c revision 1.143 via patch src/sys/netinet6/nd6.h revision 1.57 via patch src/sys/netinet6/nd6_rtr.c revision 1.83 via patch (requested by christos in ticket #905). Patch by Loganaden Velvindron. 4 new sysctls to avoid ipv6 DoS attacks from OpenBSD
Pull up revisions: src/share/man/man7/sysctl.7 revision 1.73 via patch src/sys/netinet6/icmp6.c revision 1.161 via patch src/sys/netinet6/in6.c revision 1.161 via patch src/sys/netinet6/in6_proto.c revision 1.97 via patch src/sys/netinet6/in6_var.h revision 1.65 via patch src/sys/netinet6/ip6_input.c revision 1.139 via patch src/sys/netinet6/ip6_var.h revision 1.59 via patch src/sys/netinet6/nd6.c revision 1.143 via patch src/sys/netinet6/nd6.h revision 1.57 via patch src/sys/netinet6/nd6_rtr.c revision 1.83 via patch (requested by christos in ticket #905). Patch by Loganaden Velvindron. 4 new sysctls to avoid ipv6 DoS attacks from OpenBSD
Pull up revisions: src/share/man/man7/sysctl.7 revision 1.73 via patch src/sys/netinet6/icmp6.c revision 1.161 via patch src/sys/netinet6/in6.c revision 1.161 via patch src/sys/netinet6/in6_proto.c revision 1.97 via patch src/sys/netinet6/in6_var.h revision 1.65 via patch src/sys/netinet6/ip6_input.c revision 1.139 via patch src/sys/netinet6/ip6_var.h revision 1.59 via patch src/sys/netinet6/nd6.c revision 1.143 via patch src/sys/netinet6/nd6.h revision 1.57 via patch src/sys/netinet6/nd6_rtr.c revision 1.83 via patch (requested by christos in ticket #905). Patch by Loganaden Velvindron. 4 new sysctls to avoid ipv6 DoS attacks from OpenBSD
- Rewrite parts of pfil(9): use array to store hooks and thus be more cache friendly (there are only few hooks in the system). Make the structures opaque and the interface more strict. - Remove PFIL_HOOKS option by making pfil(9) mandatory.
resync from head
IPSEC has not come in two speeds for a long time now (IPSEC == kame, FAST_IPSEC). Make everything refer to IPSEC to avoid confusion.
resync with head
sync with (a bit old) head
Add a new sysctl to mark ports as reserved, so that they are not used in the anonymous or reserved port allocation.
sync with head
rename rfc6056 -> portalgo, requested by yamt
4 new sysctls to avoid ipv6 DoS attacks from OpenBSD
PR/46602: Move the rfc6056 port randomization to the IP layer.
sync with head
sync to latest -current.
remove KAME IPSEC, replaced by FAST_IPSEC
merge to -current.
add patch from Arnaud Degroote to handle IPv6 extended options with (FAST_)IPSEC, tested lightly with a DSTOPTS header consisting of PAD1
- fix offsetof usage, and redundant defines - kill pointer casts to 0
rename the IPSEC in-kernel CPP variable and config(8) option to KAME_IPSEC, and make IPSEC define it so that existing kernel config files work as before Now the default can be easily be changed to FAST_IPSEC just by setting the IPSEC alias to FAST_IPSEC.
First step of random number subsystem rework described in <20111022023242.BA26F14A158@mail.netbsd.org>. This change includes the following: An initial cleanup and minor reorganization of the entropy pool code in sys/dev/rnd.c and sys/dev/rndpool.c. Several bugs are fixed. Some effort is made to accumulate entropy more quickly at boot time. A generic interface, "rndsink", is added, for stream generators to request that they be re-keyed with good quality entropy from the pool as soon as it is available. The arc4random()/arc4randbytes() implementation in libkern is adjusted to use the rndsink interface for rekeying, which helps address the problem of low-quality keys at boot time. An implementation of the FIPS 140-2 statistical tests for random number generator quality is provided (libkern/rngtest.c). This is based on Greg Rose's implementation from Qualcomm. A new random stream generator, nist_ctr_drbg, is provided. It is based on an implementation of the NIST SP800-90 CTR_DRBG by Henric Jungheim. This generator users AES in a modified counter mode to generate a backtracking-resistant random stream. An abstraction layer, "cprng", is provided for in-kernel consumers of randomness. The arc4random/arc4randbytes API is deprecated for in-kernel use. It is replaced by "cprng_strong". The current cprng_fast implementation wraps the existing arc4random implementation. The current cprng_strong implementation wraps the new CTR_DRBG implementation. Both interfaces are rekeyed from the entropy pool automatically at intervals justifiable from best current cryptographic practice. In some quick tests, cprng_fast() is about the same speed as the old arc4randbytes(), and cprng_strong() is about 20% faster than rnd_extract_data(). Performance is expected to improve. The AES code in src/crypto/rijndael is no longer an optional kernel component, as it is required by cprng_strong, which is not an optional kernel component. The entropy pool output is subjected to the rngtest tests at startup time; if it fails, the system will reboot. There is approximately a 3/10000 chance of a false positive from these tests. Entropy pool _input_ from hardware random numbers is subjected to the rngtest tests at attach time, as well as the FIPS continuous-output test, to detect bad or stuck hardware RNGs; if any are detected, they are detached, but the system continues to run. A problem with rndctl(8) is fixed -- datastructures with pointers in arrays are no longer passed to userspace (this was not a security problem, but rather a major issue for compat32). A new kernel will require a new rndctl. The sysctl kern.arandom() and kern.urandom() nodes are hooked up to the new generators, but the /dev/*random pseudodevices are not, yet. Manual pages for the new kernel interfaces are forthcoming.
Catchup with rmind-uvmplock merge.
sync with head
Sync with HEAD.
Don't refer to extern tcbtable here, it is unused.
sync with head
RA flood mitigation via a limit on accepted routes: - introduce a limit for the routes accepted via IPv6 Router Advertisement: a common 2 interface client will have 6, the default limit is 100 and can be adjusted via sysctl - report the current number of routes installed via RA via sysctl - count discarded route additions. Note that one RA message is two routes. This is at present only across all interfaces even though per-interface would be more useful, since the per-interface structure complies to RFC2466 - bump kernel version due to the previous change - adjust netstat to use the new value (with netstat -p icmp6)
Reduces the resources demanded by TCP sessions in TIME_WAIT-state using methods called Vestigial Time-Wait (VTW) and Maximum Segment Lifetime Truncation (MSLT). MSLT and VTW were contributed by Coyote Point Systems, Inc. Even after a TCP session enters the TIME_WAIT state, its corresponding socket and protocol control blocks (PCBs) stick around until the TCP Maximum Segment Lifetime (MSL) expires. On a host whose workload necessarily creates and closes down many TCP sockets, the sockets & PCBs for TCP sessions in TIME_WAIT state amount to many megabytes of dead weight in RAM. Maximum Segment Lifetimes Truncation (MSLT) assigns each TCP session to a class based on the nearness of the peer. Corresponding to each class is an MSL, and a session uses the MSL of its class. The classes are loopback (local host equals remote host), local (local host and remote host are on the same link/subnet), and remote (local host and remote host communicate via one or more gateways). Classes corresponding to nearer peers have lower MSLs by default: 2 seconds for loopback, 10 seconds for local, 60 seconds for remote. Loopback and local sessions expire more quickly when MSLT is used. Vestigial Time-Wait (VTW) replaces a TIME_WAIT session's PCB/socket dead weight with a compact representation of the session, called a "vestigial PCB". VTW data structures are designed to be very fast and memory-efficient: for fast insertion and lookup of vestigial PCBs, the PCBs are stored in a hash table that is designed to minimize the number of cacheline visits per lookup/insertion. The memory both for vestigial PCBs and for elements of the PCB hashtable come from fixed-size pools, and linked data structures exploit this to conserve memory by representing references with a narrow index/offset from the start of a pool instead of a pointer. When space for new vestigial PCBs runs out, VTW makes room by discarding old vestigial PCBs, oldest first. VTW cooperates with MSLT. It may help to think of VTW as a "FIN cache" by analogy to the SYN cache. A 2.8-GHz Pentium 4 running a test workload that creates TIME_WAIT sessions as fast as it can is approximately 17% idle when VTW is active versus 0% idle when VTW is inactive. It has 103 megabytes more free RAM when VTW is active (approximately 64k vestigial PCBs are created) than when it is inactive.
sync to netbsd-5
sync with head
Explicitly include opt_gateway.h when depending on GATEWAY.
Replace a large number of link set based sysctl node creations with calls from subsystem constructors. Benefits both future kernel modules and rump. no change to sysctl nodes on i386/MONOLITHIC & build tested i386/ALL
Sync with HEAD. Commit is split, to avoid a "too many arguments" protocol error.
sync with head.
Pull up following revision(s) (requested by martin in ticket #733): sys/netinet6/ip6_input.c: revision 1.127 Add missing paranthesis - from Kurt Lidl in PR port-vax/41316
Pull up following revision(s) (requested by martin in ticket #733): sys/netinet6/ip6_input.c: revision 1.127 Add missing paranthesis - from Kurt Lidl in PR port-vax/41316
Add missing paranthesis - from Kurt Lidl in PR port-vax/41316
Sync with HEAD.
Remove extra whitespace added by a stupid tool. XXX: more in src/sys/arch
bcopy -> memcpy
bzero -> memset
Sync with HEAD.
Provide compatibility to the old timeval SCM_TIMESTAMP messages.
Sync with HEAD.
Sync with HEAD.
Sync with wrstuden-revivesa-base-2.
Change KERNEL_LOCK_ONE (wrong name) to KERNEL_LOCK (the right name).
Fix 8-spaces-vs-tab goop.
Make the sysctl routines take out softnet_lock before dealing with any data structures. Change inet6ctlerrmap and zeroin6_addr to const.
Sync with HEAD.
sync with head.
sync with head.
Simplify the interface to netstat_sysctl() and allocate space for the collated counters using kmem_alloc(). PR kern/38577
Merge the socket locking patch: - Socket layer becomes MP safe. - Unix protocols become MP safe. - Allows protocol processing interrupts to safely block on locks. - Fixes a number of race conditions. With much feedback from matt@ and plunky@.
Make IPSEC and FAST_IPSEC stats per-cpu. Use <net/net_stats.h> and netstat_sysctl().
Make ip6 and icmp6 stats per-cpu.
Change IPv6 stats from a structure to an array of uint64_t's. Note: This is ABI-compatible with the old ip6stat structure; old netstat binaries will continue to work properly.
Sync with HEAD.
sync with head.
sync with HEAD
sync with head.
Convert to ansi definitions from old-style definitons. Remember that func() is not ansi, func(void) is.
imported Mobile IPv6 code developed by the SHISA project (http://www.mobileip.jp/).
sync with HEAD
Sync with HEAD.
Sync with HEAD.
Sync with head.
sync with head
Use IFNET_FOREACH() and IFADDR_FOREACH().
sync with head.
Sync with HEAD
sync with HEAD
Sync with HEAD.
The IPv6 stack labels incoming packets with an m_tag whose payload is a struct ip6aux. A struct ip6aux used to contain a pointer to an in6_ifaddr, but that pointer could become a dangling reference in the lifetime of the m_tag, because ip6_setdstifaddr() did not increase the in6_ifaddr's reference count. I have removed the pointer from ip6aux. I load it with the interesting fields from the in6_ifaddr (an IPv6 address, a scope ID, and some flags), instead.
sync with head.
Sync with HEAD. Follow the merge of pmap.c on i386 and amd64 and move pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup code to restore CR4 before jumping back into kernel space as the large page option might cover that.
Replace rote sockaddr_in6 initializations (memset(), set sa6_family, sa6_len, and sa6_add) with sockaddr_in6_init() calls. De-__P(). Constify. KNF. Shorten a staircase. Change bcmp() to memcmp(). Extract subroutine in6_setzoneid() from in6_setscope(), for re-use soon.
Sync with head.
Sync with HEAD.
Sync with somewhat-recent netbsd-4.
Pull up following revision(s) (requested by degroote in ticket #881): sys/netinet/ip_input.c: revision 1.253 sys/netinet6/ip6_input.c: revision 1.110 In some FAST_IPSEC, spl level is not restored correctly. Fix that. Spotted by Wolfgang Stukenbrock in pr/36800
In some FAST_IPSEC, spl level is not restored correctly. Fix that. Spotted by Wolfgang Stukenbrock in pr/36800
sync with head.
Sync with HEAD.
Sync with HEAD.
Take steps to hide the radix_node implementation of the forwarding table from the forwarding table's users: Introduce rt_walktree() for walking the routing table and applying a function to each rtentry. Replace most rn_walktree() calls with it. Use rt_getkey()/rt_setkey() to get/set a route's destination. Keep a pointer to the sockaddr key in the rtentry, so that rtentry users do not have to grovel in the radix_node for the key. Add a RTM_GET method to rtrequest. Use that instead of radix_node lookups in, e.g., carp(4). Add sys/net/link_proto.c, which supplies sockaddr routines for link-layer socket addresses (sockaddr_dl). Cosmetic: Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH, et cetera. Use NULL instead of 0 for null pointers. Use __arraycount(). Reduce gratuitous parenthesization. Stop using variadic arguments for rip6_output(), it is unnecessary. Remove the unnecessary rtentry member rt_genmask and the code to maintain it, since nothing actually used it. Make rt_maskedcopy() easier to read by using meaningful variable names. Extract a subroutine intern_netmask() for looking up a netmask in the masks table. Start converting backslash-ridden IPv6 macros in sys/netinet6/in6_var.h into inline subroutines that one can read without special eyeglasses. One functional change: when the kernel serves an RTM_GET, RTM_LOCK, or RTM_CHANGE request, it applies the netmask (if supplied) to a destination before searching for it in the forwarding table. I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove the unlawful radix_node knowledge. Apart from the changes to carp(4), netiso, ATM, and strip(4), I have run the changes on three nodes in my wireless routing testbed, which involves IPv4 + IPv6 dynamic routing acrobatics, and it's working beautifully so far.
file ip6_input.c was added on branch matt-mips64 on 2007-07-19 20:48:57 +0000
Take steps to hide the radix_node implementation of the forwarding table from the forwarding table's users: Introduce rt_walktree() for walking the routing table and applying a function to each rtentry. Replace most rn_walktree() calls with it. Use rt_getkey()/rt_setkey() to get/set a route's destination. Keep a pointer to the sockaddr key in the rtentry, so that rtentry users do not have to grovel in the radix_node for the key. Add a RTM_GET method to rtrequest. Use that instead of radix_node lookups in, e.g., carp(4). Add sys/net/link_proto.c, which supplies sockaddr routines for link-layer socket addresses (sockaddr_dl). Cosmetic: Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH, et cetera. Use NULL instead of 0 for null pointers. Use __arraycount(). Reduce gratuitous parenthesization. Stop using variadic arguments for rip6_output(), it is unnecessary. Remove the unnecessary rtentry member rt_genmask and the code to maintain it, since nothing actually used it. Make rt_maskedcopy() easier to read by using meaningful variable names. Extract a subroutine intern_netmask() for looking up a netmask in the masks table. Start converting backslash-ridden IPv6 macros in sys/netinet6/in6_var.h into inline subroutines that one can read without special eyeglasses. One functional change: when the kernel serves an RTM_GET, RTM_LOCK, or RTM_CHANGE request, it applies the netmask (if supplied) to a destination before searching for it in the forwarding table. I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove the unlawful radix_node knowledge. Apart from the changes to carp(4), netiso, ATM, and strip(4), I have run the changes on three nodes in my wireless routing testbed, which involves IPv4 + IPv6 dynamic routing acrobatics, and it's working beautifully so far.
Sync with head.
Merge some of the less invasive changes from the vmlocking branch: - kthread, callout, devsw API changes - select()/poll() improvements - miscellaneous MT safety improvements
- ip6_init: fix a mistake in rev.1.98.2.3 which makes callout_softclock jump to NULL. - s/struct callout/callout_t/
Adapt to callout API change.
Sync with head.
Pull up following revision(s) (requested by adrianp in ticket #11330): sys/netinet6/ip6_input.c: revision 1.102 via patch sys/netinet6/route6.c: revision 1.18 via patch sys/netinet6/ip6_var.h: revisions 1.41-1.42 via patch sbin/sysctl/sysctl.8: patch Disable processing of routing header type 0 packets since they can be used of DoS attacks. Provide a sysctl to re-enable them (net.inet6.ip6.rht0). Information from: http://www.secdev.org/conf/IPv6_RH_security-csw07.pdf
Pull up following revision(s) (requested by adrianp in ticket #11330): sys/netinet6/ip6_input.c: revision 1.102 via patch sys/netinet6/route6.c: revision 1.18 via patch sys/netinet6/ip6_var.h: revisions 1.41-1.42 via patch sbin/sysctl/sysctl.8: patch Disable processing of routing header type 0 packets since they can be used of DoS attacks. Provide a sysctl to re-enable them (net.inet6.ip6.rht0). Information from: http://www.secdev.org/conf/IPv6_RH_security-csw07.pdf
Pull up following revision(s) (requested by adrianp in ticket #11330): sys/netinet6/ip6_input.c: revision 1.102 via patch sys/netinet6/route6.c: revision 1.18 via patch sys/netinet6/ip6_var.h: revisions 1.41-1.42 via patch sbin/sysctl/sysctl.8: patch Disable processing of routing header type 0 packets since they can be used of DoS attacks. Provide a sysctl to re-enable them (net.inet6.ip6.rht0). Information from: http://www.secdev.org/conf/IPv6_RH_security-csw07.pdf
Update to today's netbsd-4.
Pull up following revision(s) (requested by degroote in ticket #667): sys/netinet/tcp_input.c: revision 1.260 sys/netinet/tcp_output.c: revision 1.154 sys/netinet/tcp_subr.c: revision 1.210 sys/netinet6/icmp6.c: revision 1.129 sys/netinet6/in6_proto.c: revision 1.70 sys/netinet6/ip6_forward.c: revision 1.54 sys/netinet6/ip6_input.c: revision 1.94 sys/netinet6/ip6_output.c: revision 1.114 sys/netinet6/raw_ip6.c: revision 1.81 sys/netipsec/ipcomp_var.h: revision 1.4 sys/netipsec/ipsec.c: revision 1.26 via patch,1.31-1.32 sys/netipsec/ipsec6.h: revision 1.5 sys/netipsec/ipsec_input.c: revision 1.14 sys/netipsec/ipsec_netbsd.c: revision 1.18,1.26 sys/netipsec/ipsec_output.c: revision 1.21 via patch sys/netipsec/key.c: revision 1.33,1.44 sys/netipsec/xform_ipcomp.c: revision 1.9 sys/netipsec/xform_ipip.c: revision 1.15 sys/opencrypto/deflate.c: revision 1.8 Commit my SoC work Add ipv6 support for fast_ipsec Note that currently, packet with extensions headers are not correctly supported Change the ipcomp logic Add sysctl tree to modify the fast_ipsec options related to ipv6. Similar to the sysctl kame interface. Choose the good default policy, depending of the adress family of the desired policy Increase the refcount for the default ipv6 policy so nobody can reclaim it Always compute the sp index even if we don't have any sp in spd. It will let us to choose the right default policy (based on the adress family requested). While here, fix an error message Use dynamic array instead of an static array to decompress. It lets us to decompress any data, whatever is the radio decompressed data / compressed data. It fixes the last issues with fast_ipsec and ipcomp. While here, bzero -> memset, bcopy -> memcpy, FREE -> free Reviewed a long time ago by sam@
Ansify + add a few comments, from Karl Sjödahl
sync with head.
remove net.inet6.ip6.rht0 sysctl. it's too dangerous compared to its benefit. strongly requested by itojun@. ok'ed by core@.
sync with head.
Use rtcache_lookup2(), and fix cache hit/miss accounting. While I am here, introduce an rtentry pointer, 'rt', and set it equal to ip6_forward.ro_rt. Replace several occurrences of 'ip6_forward.ro_rt' with 'rt'.
from kame: > Revision 1.371 > Thu May 3 22:07:39 2007 UTC (47 hours, 7 minutes ago) by itojun > > drop packets with more than 1 routing headers. > from claudio@openbsd (and increment ifs6_in_hdrerr on ip6s_toomanyhdr.)
Eliminate address family-specific route caches (struct route, struct route_in6, struct route_iso), replacing all caches with a struct route. The principle benefit of this change is that all of the protocol families can benefit from route cache-invalidation, which is necessary for correct routing. Route-cache invalidation fixes an ancient PR, kern/3508, at long last; it fixes various other PRs, also. Discussions with and ideas from Joerg Sonnenberger influenced this work tremendously. Of course, all design oversights and bugs are mine. DETAILS 1 I added to each address family a pool of sockaddrs. I have introduced routines for allocating, copying, and duplicating, and freeing sockaddrs: struct sockaddr *sockaddr_alloc(sa_family_t af, int flags); struct sockaddr *sockaddr_copy(struct sockaddr *dst, const struct sockaddr *src); struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags); void sockaddr_free(struct sockaddr *sa); sockaddr_alloc() returns either a sockaddr from the pool belonging to the specified family, or NULL if the pool is exhausted. The returned sockaddr has the right size for that family; sa_family and sa_len fields are initialized to the family and sockaddr length---e.g., sa_family = AF_INET and sa_len = sizeof(struct sockaddr_in). sockaddr_free() puts the given sockaddr back into its family's pool. sockaddr_dup() and sockaddr_copy() work analogously to strdup() and strcpy(), respectively. sockaddr_copy() KASSERTs that the family of the destination and source sockaddrs are alike. The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is passed directly to pool_get(9). 2 I added routines for initializing sockaddrs in each address family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(), etc. They are fairly self-explanatory. 3 structs route_in6 and route_iso are no more. All protocol families use struct route. I have changed the route cache, 'struct route', so that it does not contain storage space for a sockaddr. Instead, struct route points to a sockaddr coming from the pool the sockaddr belongs to. I added a new method to struct route, rtcache_setdst(), for setting the cache destination: int rtcache_setdst(struct route *, const struct sockaddr *); rtcache_setdst() returns 0 on success, or ENOMEM if no memory is available to create the sockaddr storage. It is now possible for rtcache_getdst() to return NULL if, say, rtcache_setdst() failed. I check the return value for NULL everywhere in the kernel. 4 Each routing domain (struct domain) has a list of live route caches, dom_rtcache. rtflushall(sa_family_t af) looks up the domain indicated by 'af', walks the domain's list of route caches and invalidates each one.
Pull up following revision(s) (requested by christos in ticket #587): sys/netinet6/ip6_input.c: revision 1.102 sys/netinet6/route6.c: revision 1.18 sys/netinet6/ip6_var.h: revision 1.41 sys/netinet6/ip6_var.h: revision 1.42 sbin/sysctl/sysctl.8: patch Disable processing of routing header type 0 packets since they can be used of DoS attacks. Provide a sysctl to re-enable them (net.inet6.ip6.rht0). Information from: http://www.secdev.org/conf/IPv6_RH_security-csw07.pdf fix typo.
Pull up following revision(s) (requested by christos in ticket #1766): sys/netinet6/ip6_input.c: revision 1.102 via patch sys/netinet6/route6.c: revision 1.18 via patch sys/netinet6/ip6_var.h: revision 1.41 via patch sys/netinet6/ip6_var.h: revision 1.42 via patch sbin/sysctl/sysctl.8: patch Disable processing of routing header type 0 packets since they can be used of DoS attacks. Provide a sysctl to re-enable them (net.inet6.ip6.rht0). Information from: http://www.secdev.org/conf/IPv6_RH_security-csw07.pdf fix typo.
Pull up following revision(s) (requested by christos in ticket #1766): sys/netinet6/ip6_input.c: revision 1.102 via patch sys/netinet6/route6.c: revision 1.18 via patch sys/netinet6/ip6_var.h: revision 1.41 via patch sys/netinet6/ip6_var.h: revision 1.42 via patch sbin/sysctl/sysctl.8: patch Disable processing of routing header type 0 packets since they can be used of DoS attacks. Provide a sysctl to re-enable them (net.inet6.ip6.rht0). Information from: http://www.secdev.org/conf/IPv6_RH_security-csw07.pdf fix typo.
Pull up following revision(s) (requested by christos in ticket #1766): sys/netinet6/ip6_input.c: revision 1.102 via patch sys/netinet6/route6.c: revision 1.18 via patch sys/netinet6/ip6_var.h: revision 1.41 via patch sys/netinet6/ip6_var.h: revision 1.42 via patch sbin/sysctl/sysctl.8: patch Disable processing of routing header type 0 packets since they can be used of DoS attacks. Provide a sysctl to re-enable them (net.inet6.ip6.rht0). Information from: http://www.secdev.org/conf/IPv6_RH_security-csw07.pdf fix typo.
Disable processing of routing header type 0 packets since they can be used of DoS attacks. Provide a sysctl to re-enable them (net.inet6.ip6.rht0). Information from: http://www.secdev.org/conf/IPv6_RH_security-csw07.pdf
Sync with head.
Pullup to -current
sync with head.
Minor change - be a little more consistant in sysctl handlers names
Don't call ip*flow_reap if we're just looking up maxflows
Add a new sysctl net.inet6.ip6.hashsize to control the hash table size. The sysctl handler will ensure this value is a power of 2 ok dyoung@
Sync with HEAD.
Add IPv6 Fast Forward - the IPv4 counterpart: If ip6_forward successfully forwards a packet, a cache, in this case a ip6flow struct entry, will be created. ether_input and friends will then be able to call ip6flow_fastforward with the packet which will then be passed to if_output (unless an issue is found - in that case the packet is passed back to ip6_input). ok matt@ christos@ dyoung@ and joerg@
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
- sync with head. - move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
sync with head.
Cosmetic: use __arraycount. In ip6_input, move type of parameter into parentheses.
KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous parentheses in return statements. Cosmetic: don't open-code TAILQ_FOREACH(). Cosmetic: change types of variables to avoid oodles of casts: in in6_src.c, avoid casts by changing several route_in6 pointers to struct route pointers. Remove unnecessary casts to caddr_t elsewhere. Pave the way for eliminating address family-specific route caches: soon, struct route will not embed a sockaddr, but it will hold a reference to an external sockaddr, instead. We will set the destination sockaddr using rtcache_setdst(). (I created a stub for it, but it isn't used anywhere, yet.) rtcache_free() will free the sockaddr. I have extracted from rtcache_free() a helper subroutine, rtcache_clear(). rtcache_clear() will "forget" a cached route, but it will not forget the destination by releasing the sockaddr. I use rtcache_clear() instead of rtcache_free() in rtcache_update(), because rtcache_update() is not supposed to forget the destination. Constify: 1 Introduce const accessor for route->ro_dst, rtcache_getdst(). 2 Constify the 'dst' argument to ifnet->if_output(). This led me to constify a lot of code called by output routines. 3 Constify the sockaddr argument to protosw->pr_ctlinput. This led me to constify a lot of code called by ctlinput routines. 4 Introduce const macros for converting from a generic sockaddr to family-specific sockaddrs, e.g., sockaddr_in: satocsin6, satocsin, et cetera.
Commit my SoC work Add ipv6 support for fast_ipsec Note that currently, packet with extensions headers are not correctly supported Change the ipcomp logic
Sync with head.
sync with head.
sync with head.
Introduce new helper functions to abstract the route caching. rtcache_init and rtcache_init_noclone lookup ro_dst and store the result in ro_rt, taking care of the reference counting and calling the domain specific route cache. rtcache_free checks if a route was cashed and frees the reference. rtcache_copy copies ro_dst of the given struct route, checking that enough space is available and incrementing the reference count of the cached rtentry if necessary. rtcache_check validates that the cached route is still up. If it isn't, it tries to look it up again. Afterwards ro_rt is either a valid again or NULL. rtcache_copy is used internally. Adjust to callers of rtalloc/rtflush in the tree to check the sanity of ro_dst first (if necessary). If it doesn't fit the expectations, free the cache, otherwise check if the cached route is still valid. After that combination, a single check for ro_rt == NULL is enough to decide whether a new lookup needs to be done with a different ro_dst. Make the route checking in gre stricter by repeating the loop check after revalidation. Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly changed here to first validate the route and check RTF_GATEWAY afterwards. This is sementically equivalent though. etherip doesn't need sc_route_expire similiar to the gif changes from dyoung@ earlier. Based on the earlier patch from dyoung@, reviewed and discussed with him.
sync with head.
Here are various changes designed to protect against bad IPv4 routing caused by stale route caches (struct route). Route caches are sprinkled throughout PCBs, the IP fast-forwarding table, and IP tunnel interfaces (gre, gif, stf). Stale IPv6 and ISO route caches will be treated by separate patches. Thank you to Christoph Badura for suggesting the general approach to invalidating route caches that I take here. Here are the details: Add hooks to struct domain for tracking and for invalidating each domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall. Introduce helper subroutines, rtflush(ro) for invalidating a route cache, rtflushall(family) for invalidating all route caches in a routing domain, and rtcache(ro) for notifying the domain of a new cached route. Chain together all IPv4 route caches where ro_rt != NULL. Provide in_rtcache() for adding a route to the chain. Provide in_rtflush() and in_rtflushall() for invalidating IPv4 route caches. In in_rtflush(), set ro_rt to NULL, and remove the route from the chain. In in_rtflushall(), walk the chain and remove every route cache. In rtrequest1(), call rtflushall() to invalidate route caches when a route is added. In gif(4), discard the workaround for stale caches that involves expiring them every so often. Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a call to rtflush(ro). Update ipflow_fastforward() and all other users of route caches so that they expect a cached route, ro->ro_rt, to turn to NULL. Take care when moving a 'struct route' to rtflush() the source and to rtcache() the destination. In domain initializers, use .dom_xxx tags. KNF here and there.
Use the queue(3) macros instead of open-coding them. Shorten staircases. Remove unnecessary casts. Where appropriate, s/8/NBBY/. De-__P(). KNF. No functional changes intended.
Sync with head.
__unused removal on arguments; approved by core.
sync with head
- sprinkle __unused on function decls. - fix a couple of unused bugs - no more -Wno-unused for i386
sync with head
sync with head.
sync with head.
Sync with head.
Sync with head.
Repair a patching error from previous revision (ticket #10626)
Repair a patching error from previous revision (ticket #10626)
Make the mbuf writable before calling in6_clearscope(). Based on patch sent by David Young on tech-kern.
Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
sync with head.
Pull up following revision(s) (requested by rpaulo in ticket #10626): sys/netinet6/ip6_input.c: revision 1.87 In ip6_savecontrol(), ignore IPv4 packets. From JINMEI Tatuya (KAME). Should fix PR 33269.
Pull up following revision(s) (requested by rpaulo in ticket #10626): sys/netinet6/ip6_input.c: revision 1.87 In ip6_savecontrol(), ignore IPv4 packets. From JINMEI Tatuya (KAME). Should fix PR 33269.
Pull up following revision(s) (requested by rpaulo in ticket #10626): sys/netinet6/ip6_input.c: revision 1.87 In ip6_savecontrol(), ignore IPv4 packets. From JINMEI Tatuya (KAME). Should fix PR 33269.
Pull up following revision(s) (requested by rpaulo in ticket #1338): sys/netinet6/ip6_input.c: revision 1.87 via patch In ip6_savecontrol(), ignore IPv4 packets. From JINMEI Tatuya (KAME). Should fix PR 33269.
Pull up following revision(s) (requested by rpaulo in ticket #1338): sys/netinet6/ip6_input.c: revision 1.87 via patch In ip6_savecontrol(), ignore IPv4 packets. From JINMEI Tatuya (KAME). Should fix PR 33269.
In ip6_savecontrol(), ignore IPv4 packets. From JINMEI Tatuya (KAME). Should fix PR 33269.
sync with head
while (1) -> for (;;)
Add support for RFC 3542 Adv. Socket API for IPv6 (which obsoletes 2292). * RFC 3542 isn't binary compatible with RFC 2292. * RFC 2292 support is on by default but can be disabled. * update ping6, telnet and traceroute6 to the new API. From the KAME project (www.kame.net). Reviewed by core.
Sync with head.
sync with head.
Coverity CID 856: m cannot be NULL here. Remove bogus test.
sync with head.
NDP-related improvements: RFC4191 - supports host-side router-preference RFC3542 - if DAD fails on a interface, disables IPv6 operation on the interface - don't advertise MLD report before DAD finishes Others - fixes integer overflow for valid and preferred lifetimes - improves timer granularity for MLD, using callout-timer. - reflects rtadvd's IPv6 host variable information into kernel (router only) - adds a sysctl option to enable/disable pMTUd for multicast packets - performs NUD on PPP/GRE interface by default - Redirect works regardless of ip6_accept_rtadv - removes RFC1885-related code From the KAME project via SUZUKI Shinsuke. Reviewed by core.
ip6_savecontrol(): remove references to in6pcb.
remove in6_pcb.h and include in_pcb.h.
sync with head.
ip6_input: don't embed scope id before running packet filters.
Better support of IPv6 scoped addresses. - most of the kernel code will not care about the actual encoding of scope zone IDs and won't touch "s6_addr16[1]" directly. - similarly, most of the kernel code will not care about link-local scoped addresses as a special case. - scope boundary check will be stricter. For example, the current *BSD code allows a packet with src=::1 and dst=(some global IPv6 address) to be sent outside of the node, if the application do: s = socket(AF_INET6); bind(s, "::1"); sendto(s, some_global_IPv6_addr); This is clearly wrong, since ::1 is only meaningful within a single node, but the current implementation of the *BSD kernel cannot reject this attempt. - and, while there, don't try to remove the ff02::/32 interface route entry in in6_ifdetach() as it's already gone. This also includes some level of support for the standard source address selection algorithm defined in RFC3484, which will be completed on in the future. From the KAME project via JINMEI Tatuya. Approved by core@.
merge ktrace-lwp.
Sync with HEAD. Here we go again...
Implement net.inet6.ip6.stats sysctl. Reviewed by Elad Efrat.
- avoid shadowed variables - sprinkle const.
Sync with HEAD.
Convert lo(4) to a clonable device. This also removes the loif array and changes all code to use the new lo0ifp pointer which points to the lo0 ifnet structure. Approved by christos.
Sync with HEAD.
We don't need to include bpfilter.h
Fix the sync with head I botched.
Sync with HEAD.
Sync with HEAD
there's no use to check privs on curproc in the input path. jinmei@kame
Pull up revision 1.74 (requested by atatat in ticket #391): Sysctl descriptions under net subtree (net.key not done)
Sysctl descriptions under net subtree (net.key not done)
Pullup rev 1.67 (requested by itojun in ticket #103) Fix endian bug in fragment header scanning.
Tango on sysctl_createv() and flags. The flags have all been renamed, and sysctl_createv() now uses more arguments.
minor KNF
KNF
Dynamic sysctl. Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(), vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all nodes are registered with the tree, and nodes can be added (or removed) easily, and I/O to and from the tree is handled generically. Since the nodes are registered with the tree, the mapping from name to number (and back again) can now be discovered, instead of having to be hard coded. Adding new nodes to the tree is likewise much simpler -- the new infrastructure handles almost all the work for simple types, and just about anything else can be done with a small helper function. All existing nodes are where they were before (numerically speaking), so all existing consumers of sysctl information should notice no difference. PS - I'm sorry, but there's a distinct lack of documentation at the moment. I'm working on sysctl(3/8/9) right now, and I promise to watch out for buses.
implement net.inet6.ifq
Remove some assigned-to but otherwise unused variables.
Pull up revision 1.67 (requested by itojun in ticket #1525): fix endian bug in fragment header scanning.
fix endian bug in fragment header scanning.
randomize IPv4/v6 fragment ID and IPv6 flowlabel. avoids predictability of these fields. ip_id.c is from openbsd. ip6_id.c is adapted by kame.
Move UCB-licensed code from 4-clause to 3-clause licence. Patches provided by Joel Baker in PR 22364, verified by myself.
avoid ICMPv6 redirect if the packet filter rewrite dst addr to an address on the incoming interface. cedric@openbsd
KNF
do not use m_pulldown() to parse intermediate extension headers (like routing). we don't want to drop packets due to extension header parsing. KAME rev 1.59. (performance may suck, but it is slowpath anyways)
always use PULLDOWN_TEST codepath.
The Double-Semi-Colon Police.
Catch up to -current.
sync kqueue with -current; this includes merge of gehenna-devsw branch, merge of i386 MP branch, and part of autoconf rototil work
Remove breaks after returns, unreachable returns and returns after returns(!).
Catch up to -current.
correct signedness mixup in pointer passing. sync w/kame
sync kqueue branch with HEAD
Catch up to -current.
catch up with -current.
No longer need to pull in lwp.h; proc.h pulls it in for us.
Changes to allow the IPv4 and IPv6 layers to align headers themseves, as necessary: * Implement a new mbuf utility routine, m_copyup(), is is like m_pullup(), except that it always prepends and copies, rather than only doing so if the desired length is larger than m->m_len. m_copyup() also allows an offset into the destination mbuf, which allows space for packet headers, in the forwarding case. * Add *_HDR_ALIGNED_P() macros for IP, IPv6, ICMP, and IGMP. These macros expand to 1 if __NO_STRICT_ALIGNMENT is defined, so that architectures which do not have strict alignment constraints don't pay for the test or visit the new align-if-needed path. * Use the new macros to check if a header needs to be aligned, or to assert that it already is, as appropriate. Note: This code is still somewhat experimental. However, the new code path won't be visited if individual device drivers continue to guarantee that packets are delivered to layer 3 already properly aligned (which are rules that are already in use).
Curproc->curlwp renaming. Change uses of "curproc->l_proc" back to "curproc", which is more like the original use. Bare uses of "curproc" are now "curlwp". "curproc" is now #defined in proc.h as ((curlwp) ? (curlwp)->l_proc) : NULL) so that it is always safe to reference curproc (*de*referencing curproc is another story, but that's always been true).
catch up with -current on kqueue branch
catch up with -current.
Catch up to -current.
whitespace cleanup
sync with latest KAME in6_ifaddr/prefix/default router manipulation. behavior changes: - two iocts used by ndp(8) are now obsolete (backward compat provided). use sysctl path instead. - lo0 does not get ::1 automatically. it will get ::1 when lo0 comes up.
Catch up with -current.
use arc4random() where possible. XXX is it necessary to do microtime() on tcp syn cache?
limit number of IPv6 fragments (not the fragment queue size) to fight against lots-of-frags DoS attacks. sync w/kame
Spelling fixes, from Sergey Svishchev in kern/16650.
Apply patch (requested by martti): Fix it so that IPFilter handles IPv6 traffic.
Sync kqueue branch with -current.
Catch up to -current.
make it compile even if NGIF=0
move in6_gif_hlim decl to in6_gif.c. sync with kame
reduce white space/cosmetic diffs w/kame.
Catch up to -current.
add RCSIDs
Sync the thorpej-mips-cache branch with -current.
check offset overrun in ip6_nexthdr.
Don't need to include <uvm/uvm_extern.h> just to include <sys/sysctl.h> anymore.
more whitespace sync with kame
Catch up to -current.
more whitespace/comment sync with kame
implement IPV6_V6ONLY socket option from draft-ietf-ipngwg-rfc2553bis-03.txt. IPV6_BINDV6ONLY (netbsd only) is deprecated, but still work just like before.
Merge Aug 24 -current into the kqueue branch.
Catch up with -current.
cache IPsec policy on in6?pcb. most of the lookup operations can be bypassed, especially when it is a connected SOCK_STREAM in6?pcb. sync with kame.
Catch up to -current.
Sync with HEAD
Remove the use of splimp() from the NetBSD kernel. splnet() and only splnet() is allowed for the protection of data structures used by network devices.
Catch up with -current.
Pull up revision 1.39 (via patch, requested by itojun): Record IPsec packet history in m_aux structure. Let ipfilter look at wire-format packet only (not the decapsulated ones), so that VPN setting can work with NAT/ipfilter settings.
enable FAKE_LOOPBACK_IF case by default. now traffic on loopback interface will be presented to bpf as normal wire format packet (without KAME scopeid in s6_addr16[1]). fix KAME PR 250 (host mistakenly accepts packets to fe80::x%lo0). sync with kame.
Sync with HEAD.
do not inject packets to ipfilter, if the packet went through IPsec tunnel. http://www.netbsd.org/Documentation/network/ipsec/#ipf-interaction
drop packets with link-local addresses, if (internally-used) interface ID portion is already filled. sync with kame
Be more careful not to dereference curproc when there might not be a process context.
Sync with HEAD.
Pull up revision 1.37 (requested by itojun): Ensure that we enforce inbound IPsec policy on all IP protocols, not just TCP, UDP and ICMP.
Initial commit of scheduler activations and lightweight process support.
make sure to enforce inbound ipsec policy checking, for any protocols on top of ip (check it when final header is visited). sync with kame. XXX kame team will need to re-check policy engine code
C requires that labels be followed by statements.
Sync with HEAD.
to sync with kame better, (1) remove register declaration for variables, (2) sync whitespaces, (3) update comments. (4) bring in some of portability and logging enhancements. no functional changes here.
during ip6/icmp6 inbound packet processing, do not call log() nor printf() in normal operation (/var can get filled up by flodding bogus packets). sysctl net.inet6.icmp6.nd6_debug will turn on diagnostic messages. (#define ND6_DEBUG will turn it on by default) improve stats in ND6 code. lots of synchronziation with kame (including comments and cometic ones).
Sync with HEAD
Back out the sledgehammer damage applied by wiz while I was out for the holiday.
Back out previous change. It causes NAT to fail, and was CLEARLY NOT TESTED before it was committed.
Slight adjustment to how pfil_head's are registered. Instead of a "key" and a "dlt", use a "type" (PFIL_TYPE_{AF,IFNET} for now) and a val/ptr appropriate for that type. This allows for more future flexibility with the pfil_hook mechanism.
Add ALTQ glue. XXX Temporary until ALTQ is changed to use a pfil hook.
Sync with HEAD.
Update thorpej_scsipi to -current as of a month ago
Restructure the PFIL_HOOKS mechanism a bit: - All packets are passed to PFIL_HOOKS as they come off the wire, i.e. fields in protocol headers in network order, etc. - Allow for multiple hooks to be registered, using a "key" and a "dlt". The "dlt" is a BPF data link type, indicating what type of header is present. - INET and INET6 register with key == AF_INET or AF_INET6, and dlt == DLT_RAW. - PFIL_HOOKS now take an argument for the filter hook, and mbuf **, an ifnet *, and a direction (PFIL_IN or PFIL_OUT), thus making them less IP (really, IP Filter) centric. Maintain compatibility with IP Filter by adding wrapper functions for IP Filter.
make IFA_STATS really work on IPv6.
add missing \n on log(). sync with kame
pullup (approved by releng-1-5) > implement net.inet6.ip6.{anon,low}port{min,max} sysctl variable. > cvs rdiff -r1.67 -r1.68 basesrc/lib/libc/gen/sysctl.3 > cvs rdiff -r1.53 -r1.54 basesrc/sbin/sysctl/sysctl.8 > cvs rdiff -r1.18 -r1.19 syssrc/sys/netinet6/in6.h > cvs rdiff -r1.29 -r1.30 syssrc/sys/netinet6/in6_pcb.c > cvs rdiff -r1.3 -r1.4 syssrc/sys/netinet6/in6_src.c > cvs rdiff -r1.25 -r1.26 syssrc/sys/netinet6/ip6_input.c > cvs rdiff -r1.14 -r1.15 syssrc/sys/netinet6/ip6_var.h
implement net.inet6.ip6.{anon,low}port{min,max} sysctl variable.
- do not use bitfield for router renumbering header. - add protection mechanism against ND cache corruption due to bad NUD hints. - more stats - icmp6 pps limitation. TOOD: should implement ppsratecheck(9).
Pull up rev. 1.24: drop packet to tentative/duplicated interface address earlier. sync w/kame
drop packet to tentative/duplicated interface address earlier. sync w/kame
<vm/vm.h> -> <uvm/uvm_extern.h>
Sync w/ netbsd-1-5-base.
do not use cached route if the route becomes !RTF_UP. make the validation for jumbo payload option more strict.
correct manipulation of link-local scoped address on loopback. now "telnet fe80::1%lo0" should work again. (we have another bug near here - will attack it soon)
revisit in6_ifattach(). - be persistent on initializing interfaces, even if there's manually- assigned linklocal, multicast/whatever initialization is necessary. - do not cache mac addr in the kernel. grab mac addr from existing cards (this is important when you swap ethernet cards back and forth) now ppp6 works just fine! call in6_ifattach() on ATM PVC interface to assign link-local, using hardware MAC address as seed. (the change is in sync with kame tree).
New callout mechanism with two major improvements over the old timeout()/untimeout() API: - Clients supply callout handle storage, thus eliminating problems of resource allocation. - Insertion and removal of callouts is constant time, important as this facility is used quite a lot in the kernel. The old timeout()/untimeout() API has been removed from the kernel.
cleanup AH/policy processing. - parse IPv6 header by using common function, ip6_{last,next}hdr. - fix behaivior in multiple AH cases. make strict boundary checks on mbuf chasing. (sync with latest kame)
#if 0'ed too strong sanity check against packets with v4 compatible addresses. we may want to re-enable it whenever mech-xx clarifies router behavior against native IPv6 packet with IPv4 compatible addresses.
pass "struct pfil_head *" to pfil_add_hook and pfil_remove hook rather than "struct protosw *".
Change the use of pfil hooks. There is no longer a single list of all pfil information, instead, struct protosw now contains a structure which caontains list heads, etc. The per-protosw pfil struct is passed to pfil_hook_get(), along with an in/out flag to get the head of the relevant filter list. This has been done for only IPv4 and IPv6, at present, with these patches only enabling filtering for IPPROTO_IP and IPPROTO_IPV6, although it is possible to have tcp/udp, etc, dedicated filters now also. The ipfilter code has been updated to only filter IPv4 packets - next major release of ipfilter is required for ipv6.
fix include pathname for better rfc2292 compliance.
be proactive about malicious packet on the wire. we fear that v4 mapped address to be used as a tool to hose security filters (like bypassing "local host only" filter by using ::ffff:127.0.0.1).
remove extra portability #ifdef (like #ifdef __FreeBSD__) in KAME IPv6/IPsec code, from netbsd-current repository. #ifdef'ed version is always available from ftp.kame.net. XXX please do not make too many diff-unfriendly changes, we'll need to take bunch of diffs on upgrade...
make IPV6_BINDV6ONLY setsockopt available. it controls behavior of AF_INET6 wildcard listening socket. heavily documented in ip6(4). net.inet6.ip6.bindv6only defines default value. default is 1. "options INET6_BINDV6ONLY" removes any code fragment that supports IPV6_BINDV6ONLY == 0 case (not defopt'ed as use of this is rare).
add missing net.inet6.ip6.rr_prune case.
Pull up to last week's -current.
sync IPv6 part with latest KAME tree. IPsec part is left unmodified due to massive changes in KAME side. - IPv6 output goes through nd6_output - faith can capture IPv4 packets as well - you can run IPv4-to-IPv6 translator using heavily modified DNS servers - per-interface statistics (required for IPv6 MIB) - interface autoconfig is revisited - udp input handling has a big change for mapped address support. - introduce in4_cksum() for non-overwriting checksumming - introduce m_pulldown() - neighbor discovery cleanups/improvements - netinet/in.h strictly conforms to RFC2553 (no extra defs visible to userland) - IFA_STATS is fixed a bit (not tested) - and more more more. TODO: - cleanup os-independency #ifdef - avoid rcvif dual use (for IPsec) to help ifdetach (sorry for jumbo commit, I can't separate this any more...)
bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch just for reference purposes. This commit includes 1.4 -> 1.4.1 sync for kame branch. The branch does not compile at all (due to the lack of ALTQ and some other source code). Please do not try to modify the branch, this is just for referenre purposes. synchronization to latest KAME will take place on HEAD branch soon.
sanity check against truncated extension headers.
remove invalid initialization if in6_iflladdr.
Update from trunk.
sync with recent KAME. - loosen ipsec restriction on packet diredtion. - revise icmp6 redirect handling on IsRouter bit. - tcp/udp notification processing (link-local address case) - cosmetic fixes (better code share across *BSD).
change unnecessary u_long/long into u_int32_t or something relevant. more fixes should follow.
defopt IPSEC and IPSEC_ESP (both into opt_ipsec.h).
KAME/NetBSD 1.4, SNAP kit 1999/07/05. NOTE: this branch is just for reference purposes (i.e. for taking cvs diff). do not touch anything on the branch. actual work must be done on HEAD branch.
RCS ID police.
Sync w/ -current.
file ip6_input.c was added on branch chs-ubc2 on 1999-07-01 23:48:28 +0000
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628. (Sorry for a big commit, I can't separate this into several pieces...) Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details. - sys/kern: do not assume single mbuf, accept chained mbuf on passing data from userland to kernel (or other way round). - "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ package (ftp://ftp.csl.sony.co.jp/pub/kjc/). - sys/netinet/tcp*: IPv4/v6 dual stack tcp support. - sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those file to be there so we patch it up. - sys/netinet: IPsec additions are here and there. - sys/netinet6/*: most of IPv6 code sits here. - sys/netkey: IPsec key management code - dev/pci/pcidevs: regen In my understanding no code here is subject to export control so it should be safe.
KAME/NetBSD 1.4 SNAP kit, dated 19990628. NOTE: this branch (kame) is used just for refernce. this may not compile due to multiple reasons.
file ip6_input.c was initially added on branch kame.