Up to [cvs.NetBSD.org] / src / sys / net
Request diff between arbitrary revisions
Keyword substitution: kv
Default branch: MAIN
wg(4): Avoid spurious kassert for harmless race in session retry. If we have already transitioned away from INIT_ACTIVE by the time the retry timer has fired, the handshake start time may have been zeroed, but that's harmless. So don't kassert about it until after we've verified we're still in INIT_ACTIVE state. PR kern/58859: KASSERT in wg_task_retry_handshake
Pull up following revision(s) (requested by riastradh in ticket #934): sys/net/if_wg.c: revision 1.117 sys/net/if_wg.c: revision 1.118 sys/net/if_wg.c: revision 1.119 sys/net/if_wg.c: revision 1.80 sys/net/if_wg.c: revision 1.81 tests/net/if_wg/t_misc.sh: revision 1.13 sys/net/if_wg.c: revision 1.82 sys/net/if_wg.c: revision 1.130 tests/net/if_wg/t_misc.sh: revision 1.14 sys/net/if_wg.c: revision 1.83 sys/net/if_wg.c: revision 1.131 tests/net/if_wg/t_misc.sh: revision 1.15 sys/net/if_wg.c: revision 1.84 sys/net/if_wg.c: revision 1.132 tests/net/if_wg/t_misc.sh: revision 1.16 sys/net/if_wg.c: revision 1.85 sys/net/if_wg.c: revision 1.86 tests/net/if_wg/t_basic.sh: revision 1.5 sys/net/if_wg.c: revision 1.87 tests/net/if_wg/t_basic.sh: revision 1.6 sys/net/if_wg.c: revision 1.88 sys/net/if_wg.c: revision 1.89 sys/net/if_wg.c: revision 1.100 sys/net/if_wg.c: revision 1.101 sys/net/if_wg.c: revision 1.102 sys/net/if_wg.c: revision 1.103 sys/net/if_wg.c: revision 1.104 sys/net/if_wg.c: revision 1.105 sys/net/if_wg.c: revision 1.106 sys/net/if_wg.c: revision 1.107 sys/net/if_wg.c: revision 1.108 sys/net/if_wg.c: revision 1.109 sys/net/if_wg.c: revision 1.120 sys/net/if_wg.c: revision 1.121 sys/net/if_wg.c: revision 1.122 sys/net/if_wg.c: revision 1.123 sys/net/if_wg.c: revision 1.124 sys/net/if_wg.c: revision 1.75 sys/net/if_wg.c: revision 1.77 sys/net/if_wg.c: revision 1.125 sys/net/if_wg.c: revision 1.126 sys/net/if_wg.c: revision 1.79 sys/net/if_wg.c: revision 1.127 sys/net/if_wg.c: revision 1.128 sys/net/if_wg.c: revision 1.129 sys/net/if_wg.c: revision 1.90 sys/net/if_wg.c: revision 1.91 sys/net/if_wg.c: revision 1.92 sys/net/if_wg.c: revision 1.93 sys/net/if_wg.c: revision 1.94 sys/net/if_wg.c: revision 1.95 sys/net/if_wg.c: revision 1.96 sys/net/if_wg.c: revision 1.97 sys/net/if_wg.c: revision 1.98 sys/net/if_wg.c: revision 1.99 sys/net/if_wg.c: revision 1.110 sys/net/if_wg.c: revision 1.111 sys/net/if_wg.c: revision 1.112 sys/net/if_wg.c: revision 1.113 sys/net/if_wg.c: revision 1.114 sys/net/if_wg.c: revision 1.115 sys/net/if_wg.c: revision 1.116 fix simple mis-matched function prototype and definitions. most of these are like, eg void foo(int[2]); with either of these void foo(int*) { ... } void foo(int[]) { ... } in some cases (such as stat or utimes* calls found in our header files), we now match standard definition from opengroup. found by GCC 12. sys: Drop redundant NULL check before m_freem(9) m_freem(9) safely has accepted NULL argument at least since 4.2BSD: https://www.tuhs.org/cgi-bin/utree.pl?file=4.2BSD/usr/src/sys/sys/uipc_mbuf.c Compile-tested on amd64/ALL. Suggested by knakahara@ Add a wg_debug variable to split between debug/trace/dump messages Add more debugging in packet validation If any of the WG_DEBUG_XXX symbols happens to be defined (say, from a stray rump Makefile...) then we now must have WG_DEBUG also defined, so if it wasn't, make it so. While the previous change fixed the broken build, it wasn't the best way, as defining any of the WG_DEBUG_XXX symbols then effectively defined all of them - making them as seperate entities, pointless. So, rearrange the way things are done a little to avoid doing that. Add packet dump debugging fix size limit calculation in dump and NULL checks use hexdump... Fix 32 bit (32 bit size_t) WG_DEBUG builds - use %zu rather than %lu to print size_t values. There's a new WG_DEBUG_XXX ( XXX==PACKET ) to deal with now. That needs WG_DEBUG defined as well, if set. Make the debug (WG_DEBUG) func gethexdump() always return a valid pointer, never NULL, so it doesn't need to be tested before being printed, which was being done sometimes, but not always. Add more debugging from Taylor wg(4): Allow modunload before any interface creation. The workqueue and pktq are both lazily created, for annoying module initialization order reasons, so they may not have been created by the time of modunload. PR kern/58470 Limit the size of the packet, and print ... if it is bigger. (from kre@) wg(4): Rework some details of internal session state machine. This way: - There is a clear transition between when a session is being set up, and when it is exposed to the data rx path (wg_handle_msg_data): atomic_store_release to set wgs->wgs_state to INIT_PASSIVE or ESTABLISHED. (The transition INIT_PASSIVE -> ESTABLISHED is immaterial to the data rx path, so that's just atomic_store_relaxed. Similarly the transition to DESTROYING.) - There is a clear transition between when a session is being set up, and when it is exposed to the data tx path (wg_output): atomic_store_release to set wgp->wgp_session_stable to it. - Every path that reinitializes a session must go through wg_destroy_session via wg_put_index_session first. This avoids races between session reuse and the data rx/tx paths. - Add a log message at the time of every state transition. Prompted by: PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails PR kern/56252: wg(4) state machine has race conditions PR kern/58463: if_wg does not work when idle. wg(4): Fix logic to ensure session initiation is underway. Previously, wg_task_send_init_message would call wg_send_handshake_msg_init if either: (a) the stable session is UNKNOWN, meaning a session has not yet been established, either by us or by the peer (but it could be in progress); or (b) the stable session is not UNKNOWN but the unstable session is _not_ INIT_ACTIVE, meaning there is an established session and we are not currently initiating a new session. If wg_output (or wgintr) found no established session while there was already a session being initiated, we may only enter wg_task_send_init_message after the session is already established, and trigger spurious reinitiation. Instead, create a separate flag to indicate whether it is mandatory to rekey because limits have passed. Then create a session only if: (a) the stable session is not ESTABLISHED, or (b) the mandatory rekey flag is not set, and clear the mandatory rekey flag. While here, arrange to do rekey-after-time on tx, not on callout. If there's no data to tx, we shouldn't reinitiate a session -- we should stay quiet on the network. PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails PR kern/56252: wg(4) state machine has race conditions PR kern/58463: if_wg does not work when idle. wg(4): Use callout_halt, not callout_stop. It's possible that callout_stop might work here, but let's simplify reasoning about it -- the timers in question only take the peer intr lock, so it's safe to wait for them while holding the peer lock in the handshake worker thread. We may have to undo the task bit but that will take a bit more analysis to determine. Prompted by (but probably won't fix anything in): PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails PR kern/56252: wg(4) state machine has race conditions PR kern/58463: if_wg does not work when idle. wg(4): Omit needless pserialize_perform on transition to DESTROYING. A session can still be used when it is in the DESTROYING state, so there's no need to wait for users to drain here -- that's the whole point of a separate DESTROYING state. It is only the transition from DESTROYING back to UNKNOWN, after the session has been unpublished so no new users can begin, that requires waiting for all users to drain, and we already do that in wg_destroy_session. Prompted by (but won't fix anything in, because this is just a performance optimization): PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails PR kern/56252: wg(4) state machine has race conditions PR kern/58463: if_wg does not work when idle. wg(4): Expand cookie secret to 32 bytes. This is only relevant for denial of service mitigation, so it's not that big a deal, and the spec doesn't say anything about the size, but let's make it the standard key size. PR kern/58479: experimental wg(4) uses 32-bit cookie secret, not 32-byte cookie secret wg(4): Mark wgp_pending volatile to reflect its usage. Prompted by (but won't fix any part of): PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails PR kern/56252: wg(4) state machine has race conditions PR kern/58463: if_wg does not work when idle. wg(4): Fix session destruction. Schedule destruction as soon as the session is created, to ensure key erasure within 2*reject-after-time seconds. Previously, we would schedule destruction of the previous session 1 second after the next one has been established. Combined with a failure to update the state machine on keepalive packets, this led to temporary deadlock scenarios. To keep it simple, there's just one callout which runs every reject-after-time seconds and erases keys in sessions older than reject-after-time, so if a session is established the moment after it runs, the keys might not be erased until (2-eps)*reject-after-time seconds. PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails PR kern/56252: wg(4) state machine has race conditions PR kern/58463: if_wg does not work when idle. wg(4): Reject rx on sessions older than reject-after-time sec. Prompted by (but won't fix anything in): PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails PR kern/56252: wg(4) state machine has race conditions PR kern/58463: if_wg does not work when idle. wg(4): On rx of valid ciphertext, make sure to update state machine. Previously, we also required the plaintext to be a plausible-looking IP packet before updating the state machine. But keepalive packets are empty -- and if the peer initiated the session to rekey after last tx but had no more data to tx, it will send a keepalive to finish session initiation. If we didn't update the state machine in that case, we would stay in INIT_PASSIVE state unable to tx on the session, which would make things hang. So make sure to always update the state machine once we have accepted a packet as genuine, even if it's genuine garbage on the inside. PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails PR kern/56252: wg(4) state machine has race conditions PR kern/58463: if_wg does not work when idle. wg(4): Make sure to update endpoint on keepalive packets too. Prompted by: PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails PR kern/56252: wg(4) state machine has race conditions PR kern/58463: if_wg does not work when idle. tests/net/if_wg/t_misc: Tweak timeouts in wg_handshake_timeout. Most of the timers in wg(4) have only 1sec resolution, which might be rounded in either direction, so make sure there's a 2sec buffer on either side of the event we care about (the point at which wg(4) decides to stop retrying handshake). Won't fix any bugs, but might make the tests slightly less flaky. PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails PR kern/56252: wg(4) state machine has race conditions tests/net/if_wg/t_misc: Elaborate in wg_rekey debug messages. Helpful for following the test log when things go wrong. PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails PR kern/56252: wg(4) state machine has race conditions PR kern/58463: if_wg does not work when idle. wg(4): Tests should pass now. PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails PR kern/56252: wg(4) state machine has race conditions PR kern/58463: if_wg does not work when idle. wg(4): Use 32-bit for times handled in rx/tx paths. The rx and tx paths require unlocked access to wgs_time_established (to decide whether it's time to rekey) and wgs_time_last_data_sent (to decide whether we need to reply to incoming data with a keepalive packet), so do it with atomic_load/store_*. On 32-bit platforms, we may not be able to do that on time_t. However, since sessions only last for a few minutes before reject-after-time kicks in and they are erased, 32 bits is plenty to record the durations that we need to record here, so this shouldn't introduce any new bugs even on hosts that exceed 136 years of uptime. Prompted by: PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails PR kern/56252: wg(4) state machine has race conditions PR kern/58463: if_wg does not work when idle. wg(4): Make time_uptime32 work in netbsd<=10. This is the low 32 bits of time_uptime. Will simplify pullups to 10 for: PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails PR kern/56252: wg(4) state machine has race conditions PR kern/58463: if_wg does not work when idle. wg(4): Fix quotation in comment. Prompted by: PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails PR kern/56252: wg(4) state machine has race conditions PR kern/58463: if_wg does not work when idle. wg(4): Process all altq'd packets when deleting peer. Can't just drop them because we can only go through all packets on an interface at a time, for all peers -- so we'd either have to drop all peers' packets, or requeue the packets for other peers. Probably not worth the trouble, so let's just wait for all the packets currently queued up to go through first. This requires reordering teardown so that we wg_destroy_all_peers, and thus wg_purge_pending_packets, _before_ we wg_if_detach, because wg_if_detach -> if_detach destroys the lock that IFQ_DEQUEUE uses. PR kern/58477: experimental wg(4) ALTQ support is probably buggy wg(4): Tidy up error branches. No functional change intended, except to add some log messages in failure cases. Cleanup after: PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails PR kern/56252: wg(4) state machine has race conditions PR kern/58463: if_wg does not work when idle. wg(4): Be more consistent about #ifdef INET/INET6. PR kern/58478: experimental wg(4) probably doesn't build with INET6-only wg(4): Parenthesize macro expansions properly. PR kern/58480: experimental wg(4) sliding window logic has oopsie wg(4): Delete temporary hacks to dump keys and packets. No longer useful for: PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails PR kern/56252: wg(4) state machine has race conditions PR kern/58463: if_wg does not work when idle. wg(4): Explain why gethexdump/puthexdump is there, and tidy. This way I will not be tempted to replace it by in-line calls to libkern hexdump. PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails PR kern/56252: wg(4) state machine has race conditions PR kern/58463: if_wg does not work when idle. wg(4): Put force_rekey state in the session, not the peer. That way, there is a time when one thread has exclusive access to the state, in wg_destroy_session under the peer lock, when we can clear the state without racing against the data tx path. This will work more reliably than the atomic_swap_uint I used before. Noted by kre@. PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails PR kern/56252: wg(4) state machine has race conditions PR kern/58463: if_wg does not work when idle. wg(4): Sprinkle static on fixed-size array parameters. Let's make the static size declarations useful. No functional change intended. wg(4): Queue pending packet in FIFO order, not LIFO order. Sometimes the session takes a seconds to establish, for whatever reason. It is better if the pending packet, which we queue up to send as soon as we get the responder's handshake response, is the most recent packet, rather than the first packet. That way, we don't wind up with a weird multi-second-delayed ping, followed by a bunch of dropped, followed by normal ping timings, or wind up sending the first TCP SYN instead of the most recent, or what have you. Senders need to be prepared to retransmit anyway if packets are dropped. PR kern/58508: experimental wg(4) queues LIFO, not FIFO, pending first handshake wg(4): Sprinkle comments into wg_swap_sessions. No functional change intended. Prompted by: PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails PR kern/56252: wg(4) state machine has race conditions PR kern/58463: if_wg does not work when idle. wg(4): No need for atomic access to wgs_time_established in tx/rx. This is stable while the session is visible to the tx/rx paths -- it is initialized before the session is exposed to tx/rx, and doesn't change until the session is no longer used by any tx/rx path and has been recycled. When I sprinkled atomic access to wgs_time_established in if_wg.c rev. 1.104, it was a vestige of an uncommitted draft that did the transition from INIT_PASSIVE to ESTABLISHED in the tx path itself, in an attempt to enable prompter tx on the new session as soon as it is established. This turned out to be unnecessary, so I reverted most of it, but forgot that wgs_time_established no longer needed atomic treatment. We could go back to using time_t and time_uptime, now that there's no need to do atomic loads and stores on these quantities. But there's no point in 64-bit arithmetic when the time differences are all guaranteed bounded by a few minutes, so keeping it 32-bit is probably a slight performance improvement on 32-bit systems. (In contrast, wgs_time_last_data_sent is both written and read in the tx path, which may run in parallel on multiple CPUs, so it still requires the atomic treatment.) Tidying up for: PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails PR kern/56252: wg(4) state machine has race conditions PR kern/58463: if_wg does not work when idle. wg(4): Fix memory ordering in detach. PR kern/58510: experimental wg(4) lacks memory ordering between wg_count_dec and module unload wg(4): Fix typo in comment recently added. Comment added in the service of: PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails PR kern/56252: wg(4) state machine has race conditions PR kern/58463: if_wg does not work when idle. wg(4): Omit needless atomic_load. wgs_local_index is only ever written to while only one thread has access to it and it is not in the thmap -- before it is published in wg_get_session_index, and after it is unpublished in wg_destroy_session. So no need for atomic_load -- it is stable if we observe it in thmap_get result. (Of course this is only for an assertion, which if tripped obviously indicates a violation of our assumptions. But if that happens, well, in the worst case we'll see a weird assertion message claiming that the index is not equal to itself, which from which we can conclude there must have been a concurrent update, which is good enough to help diagnose that problem without any atomic_load.) Tidying some of the changes for: PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails PR kern/56252: wg(4) state machine has race conditions PR kern/58463: if_wg does not work when idle. wg(4): Sprinkle comments on internal sliding window API. Post-fix tidying for: PR kern/58480: experimental wg(4) sliding window logic has oopsie wg(4): Deduplicate session establishment actions. The actions to (a) record the last handshake time, (b) clear some handshake state, (c) transmit first data if queued, or (if initiator) keepalive, and (d) begin destroying the old session, were formerly duplicated between wg_handle_msg_resp (for when we're the initiator) and wg_task_establish_session (for when we're the responder). Instead, let's factor this out into wg_swap_session so there's only one copy of the logic. This requires moving wg_update_endpoint_if_necessary a little earlier in wg_handle_msg_resp -- which should be done anyway so that the endpoint is updated _before_ the session is published for the data tx path to use. Other than moving wg_update_endpoint_if_necessary a little earlier, no functional change intended. Post-fix tidying for: PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails PR kern/56252: wg(4) state machine has race conditions PR kern/58463: if_wg does not work when idle. wg(4): Read wgs_state atomically in wg_get_stable_session. As noted in the comment above, it may concurrently transition from ESTABLISHED to DESTROYING. Post-fix tidying for: PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails PR kern/56252: wg(4) state machine has race conditions PR kern/58463: if_wg does not work when idle. wg(4): Force rekey on tx if session is older than reject-after-time. One more corner case for: PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails PR kern/56252: wg(4) state machine has race conditions PR kern/58463: if_wg does not work when idle. wg(4): Add missing barriers around wgp_pending access. PR kern/58520: experimental wg(4) lacks barriers around access to packet pending initiation wg(4): Trigger session initiation in wgintr, not in wg_output. We have to look up the session in wgintr anyway, for wg_send_data_msg. By triggering session initiation in wgintr instead of wg_output, we can skip the stable session lookup and reference in wg_output -- simpler that way. Post-fix tidying for: PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails PR kern/56252: wg(4) state machine has race conditions PR kern/58463: if_wg does not work when idle. wg(4): Queue packet for post-handshake retransmit if limits are hit. PR kern/58521: experimental wg(4) may drop packet after minutes of quiet wg(4): When a session is established, send first packet directly. Like we would do with the keepalive packet, if we had to send that instead -- no need to defer it to the pktq. Keep it simple. Post-fix tidying for: PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails PR kern/56252: wg(4) state machine has race conditions PR kern/58463: if_wg does not work when idle. wg(4): Sprinkle volatile on variables requiring atomic access. No functional change intended, since the relevant access is always done with atomic_* when it might race with concurrent access -- and really this should be _Atomic or something. But for now our atomic_ops(9) API is still spelled with volatile, so we'll use that. Post-fix tidying for: PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails PR kern/56252: wg(4) state machine has race conditions PR kern/58463: if_wg does not work when idle. wg(4): Make a rule for who wins when both peers send INIT at once. The rule is that the peer with the numerically smaller public key hash, in little-endian, takes priority iff the low order bit of H(peer A pubkey) ^ H(peer B pubkey) ^ H(posix minutes as le64) is 0, and the peer with the lexicographically larger public key takes priority iff the low-order bit is 1. Another case of: PR kern/56252: wg(4) state machine has race conditions PR kern/58463: if_wg does not work when idle. This one is, as far as I can tell, simply a deadlock in the protocol of the whitepaper -- until both sides give up on the handshake and one of them (but not both) later decides to try sending data again. (But not related to our t_misc:wg_rekey test, as far as I can tell, and I haven't put enough thought into how to reliably trigger this race to write a new automatic test for it.) wg(4): Add Internet Archive links for the versions cited. No functional change. tests/net/if_wg/t_misc: Add some diagnostics. PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails wg(4): Test truncated UDP input from the network. This triggers double-free in the IPv6 udp6_input path -- but, confusingly, not the IPv4 udp_input path, even though the overudp_cb interface ought to be the same: /* udp_input -- no further use of m if return is -1 */ if ((n = udp4_realinput(&src, &dst, &m, iphlen)) == -1) { UDP_STATINC(UDP_STAT_HDROPS); return; } /* udp6_input -- m_freem if return is not 0 */ if (udp6_realinput(AF_INET6, &src, &dst, &m, off) == 0) { ... } bad: m_freem(m); return IPPROTO_DONE; The subroutines udp4_realinput and udp6_realinput pass through the return value of overudp_cb in essentially the same way: /* udp4_realinput */ if (inp->inp_overudp_cb != NULL) { int ret; ret = inp->inp_overudp_cb(mp, off, inp->inp_socket, sintosa(src), inp->inp_overudp_arg); switch (ret) { case -1: /* Error, m was freed */ rcvcnt = -1; goto bad; ... bad: return rcvcnt; /* udp6_realinput */ if (inp->inp_overudp_cb != NULL) { int ret; ret = inp->inp_overudp_cb(mp, off, inp->inp_socket, sin6tosa(src), inp->inp_overudp_arg); switch (ret) { case -1: /* Error, m was freed */ rcvcnt = -1; goto bad; ... bad: return rcvcnt; PR kern/58688: userland panic of kernel via wg(4) wg(4): Fix wg_overudp_cb drop paths to null out *mp as caller needs. PR kern/58688: userland panic of kernel via wg(4)
wg(4): Fix wg_overudp_cb drop paths to null out *mp as caller needs. PR kern/58688: userland panic of kernel via wg(4)
wg(4): Add Internet Archive links for the versions cited. No functional change.
wg(4): Make a rule for who wins when both peers send INIT at once. The rule is that the peer with the numerically smaller public key hash, in little-endian, takes priority iff the low order bit of H(peer A pubkey) ^ H(peer B pubkey) ^ H(posix minutes as le64) is 0, and the peer with the lexicographically larger public key takes priority iff the low-order bit is 1. Another case of: PR kern/56252: wg(4) state machine has race conditions PR kern/58463: if_wg does not work when idle. This one is, as far as I can tell, simply a deadlock in the protocol of the whitepaper -- until both sides give up on the handshake and one of them (but not both) later decides to try sending data again. (But not related to our t_misc:wg_rekey test, as far as I can tell, and I haven't put enough thought into how to reliably trigger this race to write a new automatic test for it.)
wg(4): Sprinkle volatile on variables requiring atomic access. No functional change intended, since the relevant access is always done with atomic_* when it might race with concurrent access -- and really this should be _Atomic or something. But for now our atomic_ops(9) API is still spelled with volatile, so we'll use that. Post-fix tidying for: PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails PR kern/56252: wg(4) state machine has race conditions PR kern/58463: if_wg does not work when idle.
wg(4): When a session is established, send first packet directly. Like we would do with the keepalive packet, if we had to send that instead -- no need to defer it to the pktq. Keep it simple. Post-fix tidying for: PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails PR kern/56252: wg(4) state machine has race conditions PR kern/58463: if_wg does not work when idle.
wg(4): Queue packet for post-handshake retransmit if limits are hit. PR kern/58521: experimental wg(4) may drop packet after minutes of quiet
wg(4): Trigger session initiation in wgintr, not in wg_output. We have to look up the session in wgintr anyway, for wg_send_data_msg. By triggering session initiation in wgintr instead of wg_output, we can skip the stable session lookup and reference in wg_output -- simpler that way. Post-fix tidying for: PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails PR kern/56252: wg(4) state machine has race conditions PR kern/58463: if_wg does not work when idle.
wg(4): Add missing barriers around wgp_pending access. PR kern/58520: experimental wg(4) lacks barriers around access to packet pending initiation
wg(4): Force rekey on tx if session is older than reject-after-time. One more corner case for: PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails PR kern/56252: wg(4) state machine has race conditions PR kern/58463: if_wg does not work when idle.
wg(4): Read wgs_state atomically in wg_get_stable_session. As noted in the comment above, it may concurrently transition from ESTABLISHED to DESTROYING. Post-fix tidying for: PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails PR kern/56252: wg(4) state machine has race conditions PR kern/58463: if_wg does not work when idle.
wg(4): Deduplicate session establishment actions. The actions to (a) record the last handshake time, (b) clear some handshake state, (c) transmit first data if queued, or (if initiator) keepalive, and (d) begin destroying the old session, were formerly duplicated between wg_handle_msg_resp (for when we're the initiator) and wg_task_establish_session (for when we're the responder). Instead, let's factor this out into wg_swap_session so there's only one copy of the logic. This requires moving wg_update_endpoint_if_necessary a little earlier in wg_handle_msg_resp -- which should be done anyway so that the endpoint is updated _before_ the session is published for the data tx path to use. Other than moving wg_update_endpoint_if_necessary a little earlier, no functional change intended. Post-fix tidying for: PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails PR kern/56252: wg(4) state machine has race conditions PR kern/58463: if_wg does not work when idle.
wg(4): Sprinkle comments on internal sliding window API. Post-fix tidying for: PR kern/58480: experimental wg(4) sliding window logic has oopsie
wg(4): Omit needless atomic_load. wgs_local_index is only ever written to while only one thread has access to it and it is not in the thmap -- before it is published in wg_get_session_index, and after it is unpublished in wg_destroy_session. So no need for atomic_load -- it is stable if we observe it in thmap_get result. (Of course this is only for an assertion, which if tripped obviously indicates a violation of our assumptions. But if that happens, well, in the worst case we'll see a weird assertion message claiming that the index is not equal to itself, which from which we can conclude there must have been a concurrent update, which is good enough to help diagnose that problem without any atomic_load.) Tidying some of the changes for: PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails PR kern/56252: wg(4) state machine has race conditions PR kern/58463: if_wg does not work when idle.
wg(4): Fix typo in comment recently added. Comment added in the service of: PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails PR kern/56252: wg(4) state machine has race conditions PR kern/58463: if_wg does not work when idle.
wg(4): Fix memory ordering in detach. PR kern/58510: experimental wg(4) lacks memory ordering between wg_count_dec and module unload
wg(4): No need for atomic access to wgs_time_established in tx/rx. This is stable while the session is visible to the tx/rx paths -- it is initialized before the session is exposed to tx/rx, and doesn't change until the session is no longer used by any tx/rx path and has been recycled. When I sprinkled atomic access to wgs_time_established in if_wg.c rev. 1.104, it was a vestige of an uncommitted draft that did the transition from INIT_PASSIVE to ESTABLISHED in the tx path itself, in an attempt to enable prompter tx on the new session as soon as it is established. This turned out to be unnecessary, so I reverted most of it, but forgot that wgs_time_established no longer needed atomic treatment. We could go back to using time_t and time_uptime, now that there's no need to do atomic loads and stores on these quantities. But there's no point in 64-bit arithmetic when the time differences are all guaranteed bounded by a few minutes, so keeping it 32-bit is probably a slight performance improvement on 32-bit systems. (In contrast, wgs_time_last_data_sent is both written and read in the tx path, which may run in parallel on multiple CPUs, so it still requires the atomic treatment.) Tidying up for: PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails PR kern/56252: wg(4) state machine has race conditions PR kern/58463: if_wg does not work when idle.
wg(4): Sprinkle comments into wg_swap_sessions. No functional change intended. Prompted by: PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails PR kern/56252: wg(4) state machine has race conditions PR kern/58463: if_wg does not work when idle.
wg(4): Queue pending packet in FIFO order, not LIFO order. Sometimes the session takes a seconds to establish, for whatever reason. It is better if the pending packet, which we queue up to send as soon as we get the responder's handshake response, is the most recent packet, rather than the first packet. That way, we don't wind up with a weird multi-second-delayed ping, followed by a bunch of dropped, followed by normal ping timings, or wind up sending the first TCP SYN instead of the most recent, or what have you. Senders need to be prepared to retransmit anyway if packets are dropped. PR kern/58508: experimental wg(4) queues LIFO, not FIFO, pending first handshake
wg(4): Sprinkle static on fixed-size array parameters. Let's make the static size declarations useful. No functional change intended.
wg(4): Put force_rekey state in the session, not the peer. That way, there is a time when one thread has exclusive access to the state, in wg_destroy_session under the peer lock, when we can clear the state without racing against the data tx path. This will work more reliably than the atomic_swap_uint I used before. Noted by kre@. PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails PR kern/56252: wg(4) state machine has race conditions PR kern/58463: if_wg does not work when idle.
wg(4): Explain why gethexdump/puthexdump is there, and tidy. This way I will not be tempted to replace it by in-line calls to libkern hexdump. PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails PR kern/56252: wg(4) state machine has race conditions PR kern/58463: if_wg does not work when idle.
wg(4): Delete temporary hacks to dump keys and packets. No longer useful for: PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails PR kern/56252: wg(4) state machine has race conditions PR kern/58463: if_wg does not work when idle.
wg(4): Parenthesize macro expansions properly. PR kern/58480: experimental wg(4) sliding window logic has oopsie
wg(4): Be more consistent about #ifdef INET/INET6. PR kern/58478: experimental wg(4) probably doesn't build with INET6-only
wg(4): Tidy up error branches. No functional change intended, except to add some log messages in failure cases. Cleanup after: PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails PR kern/56252: wg(4) state machine has race conditions PR kern/58463: if_wg does not work when idle.
wg(4): Process all altq'd packets when deleting peer. Can't just drop them because we can only go through all packets on an interface at a time, for all peers -- so we'd either have to drop all peers' packets, or requeue the packets for other peers. Probably not worth the trouble, so let's just wait for all the packets currently queued up to go through first. This requires reordering teardown so that we wg_destroy_all_peers, and thus wg_purge_pending_packets, _before_ we wg_if_detach, because wg_if_detach -> if_detach destroys the lock that IFQ_DEQUEUE uses. PR kern/58477: experimental wg(4) ALTQ support is probably buggy
wg(4): Fix quotation in comment. Prompted by: PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails PR kern/56252: wg(4) state machine has race conditions PR kern/58463: if_wg does not work when idle.
wg(4): Make time_uptime32 work in netbsd<=10. This is the low 32 bits of time_uptime. Will simplify pullups to 10 for: PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails PR kern/56252: wg(4) state machine has race conditions PR kern/58463: if_wg does not work when idle.
wg(4): Use 32-bit for times handled in rx/tx paths. The rx and tx paths require unlocked access to wgs_time_established (to decide whether it's time to rekey) and wgs_time_last_data_sent (to decide whether we need to reply to incoming data with a keepalive packet), so do it with atomic_load/store_*. On 32-bit platforms, we may not be able to do that on time_t. However, since sessions only last for a few minutes before reject-after-time kicks in and they are erased, 32 bits is plenty to record the durations that we need to record here, so this shouldn't introduce any new bugs even on hosts that exceed 136 years of uptime. Prompted by: PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails PR kern/56252: wg(4) state machine has race conditions PR kern/58463: if_wg does not work when idle.
wg(4): Make sure to update endpoint on keepalive packets too. Prompted by: PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails PR kern/56252: wg(4) state machine has race conditions PR kern/58463: if_wg does not work when idle.
wg(4): On rx of valid ciphertext, make sure to update state machine. Previously, we also required the plaintext to be a plausible-looking IP packet before updating the state machine. But keepalive packets are empty -- and if the peer initiated the session to rekey after last tx but had no more data to tx, it will send a keepalive to finish session initiation. If we didn't update the state machine in that case, we would stay in INIT_PASSIVE state unable to tx on the session, which would make things hang. So make sure to always update the state machine once we have accepted a packet as genuine, even if it's genuine garbage on the inside. PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails PR kern/56252: wg(4) state machine has race conditions PR kern/58463: if_wg does not work when idle.
wg(4): Reject rx on sessions older than reject-after-time sec. Prompted by (but won't fix anything in): PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails PR kern/56252: wg(4) state machine has race conditions PR kern/58463: if_wg does not work when idle.
wg(4): Fix session destruction. Schedule destruction as soon as the session is created, to ensure key erasure within 2*reject-after-time seconds. Previously, we would schedule destruction of the previous session 1 second after the next one has been established. Combined with a failure to update the state machine on keepalive packets, this led to temporary deadlock scenarios. To keep it simple, there's just one callout which runs every reject-after-time seconds and erases keys in sessions older than reject-after-time, so if a session is established the moment after it runs, the keys might not be erased until (2-eps)*reject-after-time seconds. PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails PR kern/56252: wg(4) state machine has race conditions PR kern/58463: if_wg does not work when idle.
wg(4): Mark wgp_pending volatile to reflect its usage. Prompted by (but won't fix any part of): PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails PR kern/56252: wg(4) state machine has race conditions PR kern/58463: if_wg does not work when idle.
wg(4): Expand cookie secret to 32 bytes. This is only relevant for denial of service mitigation, so it's not that big a deal, and the spec doesn't say anything about the size, but let's make it the standard key size. PR kern/58479: experimental wg(4) uses 32-bit cookie secret, not 32-byte cookie secret
wg(4): Omit needless pserialize_perform on transition to DESTROYING. A session can still be used when it is in the DESTROYING state, so there's no need to wait for users to drain here -- that's the whole point of a separate DESTROYING state. It is only the transition from DESTROYING back to UNKNOWN, after the session has been unpublished so no new users can begin, that requires waiting for all users to drain, and we already do that in wg_destroy_session. Prompted by (but won't fix anything in, because this is just a performance optimization): PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails PR kern/56252: wg(4) state machine has race conditions PR kern/58463: if_wg does not work when idle.
wg(4): Use callout_halt, not callout_stop. It's possible that callout_stop might work here, but let's simplify reasoning about it -- the timers in question only take the peer intr lock, so it's safe to wait for them while holding the peer lock in the handshake worker thread. We may have to undo the task bit but that will take a bit more analysis to determine. Prompted by (but probably won't fix anything in): PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails PR kern/56252: wg(4) state machine has race conditions PR kern/58463: if_wg does not work when idle.
wg(4): Fix logic to ensure session initiation is underway. Previously, wg_task_send_init_message would call wg_send_handshake_msg_init if either: (a) the stable session is UNKNOWN, meaning a session has not yet been established, either by us or by the peer (but it could be in progress); or (b) the stable session is not UNKNOWN but the unstable session is _not_ INIT_ACTIVE, meaning there is an established session and we are not currently initiating a new session. If wg_output (or wgintr) found no established session while there was already a session being initiated, we may only enter wg_task_send_init_message after the session is already established, and trigger spurious reinitiation. Instead, create a separate flag to indicate whether it is mandatory to rekey because limits have passed. Then create a session only if: (a) the stable session is not ESTABLISHED, or (b) the mandatory rekey flag is not set, and clear the mandatory rekey flag. While here, arrange to do rekey-after-time on tx, not on callout. If there's no data to tx, we shouldn't reinitiate a session -- we should stay quiet on the network. PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails PR kern/56252: wg(4) state machine has race conditions PR kern/58463: if_wg does not work when idle.
wg(4): Rework some details of internal session state machine. This way: - There is a clear transition between when a session is being set up, and when it is exposed to the data rx path (wg_handle_msg_data): atomic_store_release to set wgs->wgs_state to INIT_PASSIVE or ESTABLISHED. (The transition INIT_PASSIVE -> ESTABLISHED is immaterial to the data rx path, so that's just atomic_store_relaxed. Similarly the transition to DESTROYING.) - There is a clear transition between when a session is being set up, and when it is exposed to the data tx path (wg_output): atomic_store_release to set wgp->wgp_session_stable to it. - Every path that reinitializes a session must go through wg_destroy_session via wg_put_index_session first. This avoids races between session reuse and the data rx/tx paths. - Add a log message at the time of every state transition. Prompted by: PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails PR kern/56252: wg(4) state machine has race conditions PR kern/58463: if_wg does not work when idle.
Limit the size of the packet, and print ... if it is bigger. (from kre@)
wg(4): Allow modunload before any interface creation. The workqueue and pktq are both lazily created, for annoying module initialization order reasons, so they may not have been created by the time of modunload. PR kern/58470
consistently use printf instead of aprint_debug and print the tkeys with the packet.
Add more debugging from Taylor
Make the debug (WG_DEBUG) func gethexdump() always return a valid pointer, never NULL, so it doesn't need to be tested before being printed, which was being done sometimes, but not always.
There's a new WG_DEBUG_XXX ( XXX==PACKET ) to deal with now. That needs WG_DEBUG defined as well, if set.
Fix 32 bit (32 bit size_t) WG_DEBUG builds - use %zu rather than %lu to print size_t values.
use hexdump...
fix size limit calculation in dump and NULL checks
Add packet dump debugging
While the previous change fixed the broken build, it wasn't the best way, as defining any of the WG_DEBUG_XXX symbols then effectively defined all of them - making them as seperate entities, pointless. So, rearrange the way things are done a little to avoid doing that.
If any of the WG_DEBUG_XXX symbols happens to be defined (say, from a stray rump Makefile...) then we now must have WG_DEBUG also defined, so if it wasn't, make it so.
Add more debugging in packet validation
Add a wg_debug variable to split between debug/trace/dump messages
sys: Drop redundant NULL check before m_freem(9) m_freem(9) safely has accepted NULL argument at least since 4.2BSD: https://www.tuhs.org/cgi-bin/utree.pl?file=4.2BSD/usr/src/sys/sys/uipc_mbuf.c Compile-tested on amd64/ALL. Suggested by knakahara@
Pull up following revision(s) (requested by riastradh in ticket #628): sys/net/if_wg.c: revision 1.78 wg(4): Bind to CPU in wg_handle_packet. Required by use of psref there. Assert we're bound up front so we catch mistakes early, rather than later on if we get unlucky in preemption and scheduling. PR bin/58021
wg(4): Bind to CPU in wg_handle_packet. Required by use of psref there. Assert we're bound up front so we catch mistakes early, rather than later on if we get unlucky in preemption and scheduling. PR bin/58021
wg_output(): Use ifq_classify_packet(), and let that function check for ALTQ-enabled. Acquire KERNEL_LOCK before calling ALTQ_ENQUEUE(). XXX The ALTQ integration here is a mess.
Update for the new location of altq_flags (not in if_snd directly).
fix simple mis-matched function prototype and definitions. most of these are like, eg void foo(int[2]); with either of these void foo(int*) { ... } void foo(int[]) { ... } in some cases (such as stat or utimes* calls found in our header files), we now match standard definition from opengroup. found by GCC 12.
Pull up following revision(s) (requested by jakllsch in ticket #228): sys/net/if_wg.c: revision 1.76 Give scope and additional details to wg(4) diagnostic messages.
Give scope and additional details to wg(4) diagnostic messages.
s/termintaed/terminated/ in comment.
Pull up following revision(s) (requested by jakllsch in ticket #49): sys/secmodel/suser/secmodel_suser.c: revision 1.57 sys/sys/kauth.h: revision 1.89 sys/net/if_wg.c: revision 1.72 sys/net/if_wg.c: revision 1.73 sys/net/if_wg.c: revision 1.74 Check for authorization for SIOCSDRVSPEC and SIOCGDRVSPEC ioctls for wg(4). Addresses PR 57161. wg(4): Allow non-root to retrieve information other than the private key and the peer preshared key. Add kauth(9) enums for wg(4) and add use them in suser secmodel. Refines fix for PR 57161. centralize the kauth ugliness.
centralize the kauth ugliness.
wg(4): Allow non-root to retrieve information other than the private key and the peer preshared key. Add kauth(9) enums for wg(4) and add use them in suser secmodel. Refines fix for PR 57161.
Check for authorization for SIOCSDRVSPEC and SIOCGDRVSPEC ioctls for wg(4). Addresses PR 57161.
inpcb: rename functions to inpcb_* Inspired by rmind-smpnet patches.
Adjust pf, wg, dccp and sctp for struct inpcb integration
Prevent memory corruption from wg_send_handshake_msg_init() on LP64 machines with "MSIZE == 256", sparc64 for example. wg_send_handshake_msg_init() tries to put 148 bytes into a buffer of 144 bytes and overwrites 4 bytes following the mbuf. Check for "sizeof() > MHLEN" and use a cluster in this case. With help from Taylor R Campbell <riastradh@>
wg(4): Limit the size of ifdrv requests. Avoids potential integer overflow or kernel memory exhaustion. Reported by Thomas Leroy a while back.
sys: Use if_init wrapper function. Exception: Not in kern_pmf.c, for the kind of silly reason that it avoids having kern_pmf.c refer to symbols defined only in net; this avoids a pain in the rump.
sys: Use if_stop wrapper function. Exception: Not in kern_pmf.c, for the kind of silly reason that it avoids having kern_pmf.c refer to symbols defined only in net; this avoids a pain in the rump.
Some signnes, casts, and constant sizes. Add module dependencies.
Sync w/ HEAD.
if_attach and if_initialize cannot fail, don't test return value These were originally made failable back in 2017 when if_initialize allocated a softint in every interface for link state changes, so that it could fail gracefully instead of panicking: https://mail-index.NetBSD.org/source-changes/2017/10/23/msg089053.html However, this spawned many seldom- or never-tested error branches, which are risky to have around. And that softint in every interface has since been replaced by a single global workqueue, because link state changes require thread context but not low latency or high throughput: https://mail-index.NetBSD.org/source-changes/2020/02/06/msg113759.html So there is no longer any reason for if_initialize to fail. (The subroutine if_stats_init can't fail because percpu_alloc can't fail either.) There is a snag: the softint_establish in if_percpuq_create could fail, potentially leading to bad consequences later on trying to use the softint. This change doesn't introduce any new bugs because of the snag -- if_percpuq_attach was already broken. However, the snag can be better addressed without spawning error branches, either by using a single softint or making softints less scarce. (Separate commit will change the signatures of if_attach and if_initialize to return void, scheduled to ride whatever is the next convenient kernel bump.) Patch and testing on amd64 and evbmips64-eb by maya@; commit message soliloquy, and compile-testing on evbppc/i386/earmv7hf, by me.
Sync with HEAD.
Sprinkle __noinline to reduce gigantic stack frames in ALL kernels. In principle this might just push a real problem around, but this is unlikely to be a real problem because: 1. The large stack frames are really only in the setup state machine message handlers, which run at the top loop of a thread with a shallow stack anyway. 2. If these are inlined, gcc might create multiple nonoverlapping stack buffers, whereas if not inlined, the stack frames from consecutive or alternative procedure calls would overlap anyway. (I haven't investigated exactly what's going on leading to ~5 KB-byte stack frames, but this shuts gcc up, at least, and the hypotheses sound plausible to me!)
Sync w/ HEAD.
wg: Sprinkle #ifdef INET6. Avoid unconditional use of ip6 structs. Fixes no-INET6 build. Based on patch from Brad Spencer: https://mail-index.NetBSD.org/current-users/2020/11/11/msg039883.html
wg: with no peers, the link status is DOWN, otherwise UP This mirrors the recent changes to gif(4) where the link is UP when a tunnel is set, otherwise DOWN.
wg: Add altq hooks. While here, remove the IFQ_CLASSIFY bottleneck (takes the ifq lock, so it would serialize all transmission to all peers on a single wg(4) interface). altq can be disabled at compile-time or at run-time; even if included at comple-time the run-time impact should be negligible if disabled.
wg: Fix detach logic. Not tested but this should be less of a rake to step on if anyone made an unloadable wg module.
wg: Use RUN_ONCE to defer workqueue_create until after configure. Should really fix workqueue(9) so workqueue_create can be done before CPUs have been detected in configure, but this will serve as a stop- gap measure.
wg: Add missing kpreempt_disable/enable around pktq_enqueue.
wg: Drop wgp_lock while waiting for endpoint psref to drain. - This is safe because wgp_endpoint_changing locks out any attempts to change the endpoint until the draining is complete. - This is necessary to avoid a deadlock where the handshake thread holds a psref and awaits mutex_enter(wgp->wgp_lock). XXX The same deadlock may occur in wg_destroy_session. Not clear that it's safe to just release wgp_lock there; may need to create a new session state, say WGS_STATE_DRAINING, while we wait for psref_target_destroy. But this needs a little more thought; a new state may not be necessary, and would be nice to avoid if not necessary.
wg: Use threadpool(9) and workqueue(9) for asynchronous tasks. - Using threadpool(9) job per interface to receive incoming handshake messages gives the same concurrency for active interfaces but doesn't waste kthreads for inactive ones. => Can't really do this with a global workqueue(9) because there's no bound on the amount of time wg_receive_packets() might run for; we really need separate threads or threadpool jobs in order to avoid having one interface starve all the others. - Using a global workqueue(9) for asynchronous peer tasks avoids creating unnecessary kthreads. => Each task does a more or less bounded amount of work, so it's OK to share a global workqueue -- there's no advantage to adding concurrency for what is almost certainly going to be CPU-bound asymmetric crypto. => This way we don't need a thread per peer or iteration over a list of all peers, so the task mechanism should no longer be a bottleneck to scaling to thousands of peers. XXX This doesn't distribute the load across CPUs -- it keeps it on the same CPU where the packet came in. Should consider doing something to balance the load -- maybe note if the current CPU is loaded, and if so, sort CPUs by queue length or some other measure of load and pick the least loaded one or something.
wg: Use a global pktqueue rather than a per-peer pcq. - Improves scalability -- won't hit limit on softints no matter how many peers there are. - Improves parallelism -- softint was kernel-locked to serialize access to the pcq. - Requires per-peer queue on handshake init to avoid dropping first packet. . Per-peer queue is currently a single packet -- should serve well enough for pings, dns queries, tcp connections, &c.
wg: Fix debug output now that the priority is mixed into it.
wg: Fix non-DIAGNOSTIC build.
wg: Avoid memory leak if socreate fails.
wg: Make it build with WG_DEBUG on 32-bit platforms.
wg: Simplify locking. Summary: Access to a stable established session is still allowed via psref; all other access to peer and session state is now serialized by struct wg_peer::wgp_lock, with no dancing around a per-session lock. This way, the handshake paths are locked, while the data transmission paths are pserialized. - Eliminate struct wg_session::wgs_lock. - Eliminate wg_get_unstable_session -- access to the unstable session is allowed only with struct wgp_peer::wgp_lock held. - Push INIT_PASSIVE->ESTABLISHED transition down into a thread task. - Push rekey down into a thread task. - Allocate session indices only on transition from UNKNOWN and free them only on transition back to UNKNOWN. - Be a little more explicit about allowed state transitions, and reject some nonsensical ones. - Sprinkle assertions and comments. - Reduce atomic r/m/w swap operations that can just as well be store-release.
wg: M_NOWAIT -> M_DONTWAIT These happen to be aliases, but M_NOWAIT is part of the legacy malloc API whereas M_DONTWAIT is part of the mbuf API.
wg: wg_sockaddr audit. - Ensure all access to struct wg_peer::wgp_endpoint happens while holding a psref. - Simplify internalize/externalize logic and be more careful about verifying it before printing anything.
wg: On INIT, do DH and decrypt timestamp before locking session. This narrows the window when the session is unlocked. Really there should be no such window, but we'll finish getting rid of it later.
wg: Verify or send cookie challenge before looking up session. This step doesn't depend on the session, so let's avoid touching the session state until we've passed it.
wg: Verify mac1 as the first step on INIT and RESP messages. This avoids the expensive DH computation before the sender has proven knowledge of our public key.
wg: Omit needless variable.
wg: Switch to callout_stop for session destructor timer. Can't release the lock here, and can't sleep waiting for the callout while we hold it without risking deadlock. But not waiting is fine; after we transition out of WGS_STATE_UNKNOWN the timer has no effect.
wg: Fix indentation. No functional change.
wg: Just call callout_halt directly. No functional change, just makes it easier to read where callout_halt happens.
wg: Fix byte order on wire. Give this a chance to work on big-endian systems.
wg: mbuf m_freem audit. 1. wg_handle_msg_data frees m but the other wg_handle_msg_* just take a pointer to the mbuf content and not m itself, so free m in those cases. 2. Can't trivially prove that the pcq is empty by the time wg_destroy_peer runs pcq_destroy, so let's explicitly purge it just in case. 3. If wg_send_udp isn't doing udp_send or udp6_output, it still has to free m in the !INET6 error branch for IPv6 packets. 4. After rumpuser_wg_send_peer or rumpuser_wg_send_user, we still need to free the mbuf.
wg: Use thmap(9) for peer and session lookup. Make sure we also don't trip over our own shoelaces by choosing the same session index twice.
wg: XAEAD doesn't use a counter, so don't pass one.
wg: Count down wg_npeers in wg_destroy_all_peers too. Doesn't actually make a difference -- wg_destroy_all_peers is only used when we're destroying the wg instance altogether -- but let's not leave rakes to step on.
wg: Note lock order.
wg: Remove IFF_POINTOPOINT. Unclear why this was set; setting it seems to have required a kludge in netinet/in.c that broke ipsec tunnels. Clearing it makes wg work again after that kludge was reverted.
wg: Sort includes.
Summary: let wg interfaces carry multicast traffic Once a wg interface is up and running, it is useful to be able to run a routing protocol over it. Marking the interface multicast capable enables this. (One must also use the wgconfig --allowed-ips option to explicitly permit the group one needs, e.g. 224.0.0.5/32 for OSPF.)
wg: Assert MCLBYTES is enough for requested length in wg_get_mbuf.
wg: Make sure all paths into wg_handle_msg_data guarantee enough m_len. Earlier commit moved the m_pullup into wg_validate_msg_header, but wg_overudp_cb doesn't go through that.
wg: Drop invalid message types on the floor faster. Don't even let them reach the thread -- drop them in softint.
wg: KASSERT m_len before mtod. XXX We should really make mtod do this automagically, and use something else for mtod(m, void *).
wg: Use m_pullup to make message header contiguous before processing.
wg: Check mbuf chain length before m_copydata.
Clarify wg(4)'s relation to WireGuard, pending further discussion. Still planning to replace wgconfig(8) and wg-keygen(8) by one wg(8) tool compatible with wireguard-tools; update wg(4) for the minor changes from the 2018-06-30 spec to the 2020-06-01 spec; &c. This just clarifies the current state of affairs as it exists in the development tree for now. Mark the man page EXPERIMENTAL for extra clarity.
Initialize peers early on for error branch.
Use lock rather than 64-bit atomics for platforms without the latter.
Fix sysctl types. - CTLTYPE_QUAD, not CTLTYPE_LONG, for uint64_t - use unsigned rather than time_t -- these are all short durations - clamp timeouts to be safe for conversion to int ticks in callout Should fix 32-bit builds.
Ifdef out fast path that relies on atomic 64-bit load/store. (Really this sliding window business could probably be done with 32-bit sequence numbers and careful detection of wraparound, but that's a little more effort to work out -- let's just unbreak the builds for now.)
Mark KASSERT-only variable as __diagused.
Avoid callout_halt under lock. - We could pass the lock in, except we hold another lock too. - We could halt before taking the other lock, but it's not safe to sleep after getting the session pointer before taking its lock. - We could halt before getting the session pointer, but then there's no point in doing it under the lock. So just halt a little earlier instead.
Sprinkle const.
Use container_of rather than casts via void *.
Use be32enc, rather than possibly unaligned uint32_t cast and htonl.
KNF
Use consttime_memequal, not memcmp, to compare secrets for equality.
Take advantage of prop_dictionary_util(3).
Split up wg_process_peer_tasks into bite-size functions.
Fix race in wg_worker kthread destruction. Also allow the thread to migrate between CPUs -- just not while we're in the middle of processing and holding onto things with psrefs.
Update for proplib API changes.
Use SYSCTL_SETUP for net.wireguard subtree.
Fix in-kernel debug build.
Implement sliding window for wireguard replay detection.
Don't falsely assert cpu_softintr_p(). Will fail in the following stack trace: wg_worker (kthread) wg_receive_packets wg_handle_packet wg_handle_msg_data KASSERT(cpu_softintr_p()) Instead, use kpreempt_disable/enable around softint_schedule. XXX Not clear that softint is the right place to do this!
Convert wg(4) to if_stat.
Use cprng_strong, not cprng_fast, for ephemeral key.
[ozaki-r] Fix bugs found by maxv's audits
[ozaki-r] Add wg files