[BACK]Return to IMPLEMENTATION CVS log [TXT][DIR] Up to [cvs.NetBSD.org] / src / sys / netinet6

Annotation of src/sys/netinet6/IMPLEMENTATION, Revision 1.14

1.14    ! itojun      1: $NetBSD: IMPLEMENTATION,v 1.13 2000/06/10 08:21:11 itojun Exp $
1.2       thorpej     2:
1.1       itojun      3: # NOTE: this is from original KAME distribution.
                      4: # Some portion of this document is not applicable to the code merged into
1.5       itojun      5: # NetBSD-current (for example, section 5).  Check sys/netinet6/TODO as well.
1.1       itojun      6:
                      7:                        Implementation Note
                      8:
                      9:                        KAME Project
                     10:                        http://www.kame.net/
1.14    ! itojun     11:                        KAME Date: 2000/06/12 09:29:16
1.1       itojun     12:
                     13: 1. IPv6
                     14:
                     15: 1.1 Conformance
                     16:
                     17: The KAME kit conforms, or tries to conform, to the latest set of IPv6
                     18: specifications.  For future reference we list some of the relevant documents
                     19: below (NOTE: this is not a complete list - this is too hard to maintain...).
                     20: For details please refer to specific chapter in the document, RFCs, manpages
                     21: come with KAME, or comments in the source code.
                     22:
1.5       itojun     23: Conformance tests have been performed on past and latest KAME STABLE kit,
1.3       itojun     24: at TAHI project.  Results can be viewed at http://www.tahi.org/report/KAME/.
1.1       itojun     25: We also attended Univ. of New Hampshire IOL tests (http://www.iol.unh.edu/)
                     26: in the past, with our past snapshots.
                     27:
                     28: RFC1639: FTP Operation Over Big Address Records (FOOBAR)
                     29:     * RFC2428 is preferred over RFC1639.  ftp clients will first try RFC2428,
                     30:       then RFC1639 if failed.
1.3       itojun     31: RFC1886: DNS Extensions to support IPv6
1.1       itojun     32: RFC1933: Transition Mechanisms for IPv6 Hosts and Routers
                     33:     * IPv4 compatible address is not supported.
1.3       itojun     34:     * automatic tunneling (4.3) is not supported.
1.1       itojun     35:     * "gif" interface implements IPv[46]-over-IPv[46] tunnel in a generic way,
                     36:       and it covers "configured tunnel" described in the spec.
                     37:       See 1.5 in this document for details.
                     38: RFC1981: Path MTU Discovery for IPv6
                     39: RFC2080: RIPng for IPv6
                     40:     * KAME-supplied route6d, bgpd and hroute6d support this.
                     41: RFC2283: Multiprotocol Extensions for BGP-4
                     42:     * so-called "BGP4+".
                     43:     * KAME-supplied bgpd supports this.
                     44: RFC2292: Advanced Sockets API for IPv6
1.3       itojun     45:     * For supported library functions/kernel APIs, see sys/netinet6/ADVAPI.
1.1       itojun     46: RFC2362: Protocol Independent Multicast-Sparse Mode (PIM-SM)
                     47:     * RFC2362 defines packet formats for PIM-SM.  draft-ietf-pim-ipv6-01.txt
                     48:       is written based on this.
                     49: RFC2373: IPv6 Addressing Architecture
                     50:     * KAME supports node required addresses, and conforms to the scope
                     51:       requirement.
                     52: RFC2374: An IPv6 Aggregatable Global Unicast Address Format
                     53:     * KAME supports 64-bit length of Interface ID.
                     54: RFC2375: IPv6 Multicast Address Assignments
                     55:     * Userland applications use the well-known addresses assigned in the RFC.
                     56: RFC2428: FTP Extensions for IPv6 and NATs
                     57:     * RFC2428 is preferred over RFC1639.  ftp clients will first try RFC2428,
                     58:       then RFC1639 if failed.
                     59: RFC2460: IPv6 specification
                     60: RFC2461: Neighbor discovery for IPv6
                     61:     * See 1.2 in this document for details.
                     62: RFC2462: IPv6 Stateless Address Autoconfiguration
                     63:     * See 1.4 in this document for details.
                     64: RFC2463: ICMPv6 for IPv6 specification
                     65:     * See 1.8 in this document for details.
                     66: RFC2464: Transmission of IPv6 Packets over Ethernet Networks
1.3       itojun     67: RFC2465: MIB for IPv6: Textual Conventions and General Group
                     68:     * Necessary statistics are gathered by the kernel.  Actual IPv6 MIB
                     69:       support is provided as patchkit for ucd-snmp.
                     70: RFC2466: MIB for IPv6: ICMPv6 group
                     71:     * Necessary statistics are gathered by the kernel.  Actual IPv6 MIB
                     72:       support is provided as patchkit for ucd-snmp.
1.1       itojun     73: RFC2467: Transmission of IPv6 Packets over FDDI Networks
                     74: RFC2472: IPv6 over PPP
                     75: RFC2492: IPv6 over ATM Networks
                     76:     * only PVC is supported.
1.3       itojun     77: RFC2497: Transmission of IPv6 packet over ARCnet Networks
1.1       itojun     78: RFC2545: Use of BGP-4 Multiprotocol Extensions for IPv6 Inter-Domain Routing
                     79: RFC2553: Basic Socket Interface Extensions for IPv6
                     80:     * IPv4 mapped address (3.7) and special behavior of IPv6 wildcard bind
                     81:       socket (3.8) are,
1.3       itojun     82:        - supported on KAME/FreeBSD3x,
                     83:        - supported on KAME/NetBSD,
1.4       itojun     84:        - supported on KAME/BSDI4,
                     85:        - not supported on KAME/FreeBSD228, KAME/OpenBSD and KAME/BSDI3.
1.1       itojun     86:       see 1.12 in this document for details.
1.3       itojun     87: RFC2675: IPv6 Jumbograms
                     88:     * See 1.7 in this document for details.
                     89: RFC2710: Multicast Listener Discovery for IPv6
                     90: RFC2711: IPv6 router alert option
1.5       itojun     91: RFC2732: Format for Literal IPv6 Addresses in URL's
                     92:     * The spec is implemented in programs that handle URLs
                     93:       (like freebsd ftpio(3) and fetch(1), or netbsd ftp(1))
1.12      itojun     94: draft-ietf-ipngwg-router-renum-10: Router renumbering for IPv6
1.10      itojun     95: draft-ietf-ipngwg-icmp-name-lookups-05: IPv6 Name Lookups Through ICMP
1.12      itojun     96: draft-ietf-pim-ipv6-03.txt: PIM for IPv6
1.3       itojun     97:     * pim6dd implements dense mode.  pim6sd implements sparse mode.
1.12      itojun     98: draft-ietf-dhc-dhcpv6-15.txt: DHCPv6
                     99: draft-ietf-dhc-dhcpv6exts-12.txt: Extensions for DHCPv6
1.3       itojun    100:     * kame/dhcp6 has test implementation, which will not be compiled in
                    101:       default compilation.
1.8       itojun    102: draft-itojun-ipv6-tcp-to-anycast-00.txt:
1.1       itojun    103:        Disconnecting TCP connection toward IPv6 anycast address
1.11      itojun    104: draft-ietf-ipngwg-scopedaddr-format-01.txt:
1.3       itojun    105:        An Extension of Format for IPv6 Scoped Addresses
1.12      itojun    106: draft-ietf-ngtrans-tcpudp-relay-01.txt:
1.5       itojun    107:        An IPv6-to-IPv4 transport relay translator
                    108:     * FAITH tcp relay translator (faithd) implements this.  See 3.1 for more
                    109:       details.
1.13      itojun    110: draft-ietf-ngtrans-6to4-06.txt:
1.12      itojun    111:        Connection of IPv6 Domains via IPv4 Clouds without Explicit Tunnels
                    112:     * "stf" interface implements it.  Be sure to read the next item before
                    113:       configuring it, there are security issues.
                    114: http://playground.iijlab.net/i-d/draft-itojun-ipv6-transition-abuse-00.txt:
                    115:        Possible abuse against IPv6 transition technologies
                    116:     * KAME does not implement RFC1933 automatic tunnel.
                    117:     * "stf" interface implements some address filters.  Refer to stf(4)
                    118:       for details.  Since there's no way to make 6to4 interface 100% secure,
                    119:       we do not include "stf" interface into GENERIC.v6 compilation.
                    120:     * kame/openbsd completely disables IPv4 mapped address support.
                    121:     * kame/netbsd makes IPv4 mapped address support off by default.
1.13      itojun    122:     * See section 12.6 and 14 for more details.
1.1       itojun    123:
                    124: 1.2 Neighbor Discovery
                    125:
                    126: Neighbor Discovery is fairly stable.  Currently Address Resolution,
                    127: Duplicated Address Detection, and Neighbor Unreachability Detection
1.9       itojun    128: are supported.  In the near future we will be adding Unsolicited Neighbor
                    129: Advertisement transmission command as admin tool.
1.1       itojun    130:
1.5       itojun    131: Duplicated Address Detection (DAD) will be performed when an IPv6 address
                    132: is assigned to a network interface, or the network interface is enabled
                    133: (ifconfig up).  It is documented in RFC2462 5.4.
1.1       itojun    134: If DAD fails, the address will be marked "duplicated" and message will be
                    135: generated to syslog (and usually to console).  The "duplicated" mark
1.3       itojun    136: can be checked with ifconfig.  It is administrators' responsibility to check
1.5       itojun    137: for and recover from DAD failures.  We may try to improve failure recovery
                    138: in future KAME code.
                    139: DAD procedure may not be effective on certain network interfaces/drivers.
                    140: If a network driver needs long initialization time (with wireless network
                    141: interfaces this situation is popular), and the driver mistakingly raises
                    142: IFF_RUNNING before the driver becomes ready, DAD code will try to transmit
                    143: DAD probes to not-really-ready network driver and the packet will not go out
                    144: from the interface.  In such cases, network drivers should be corrected.
1.1       itojun    145:
1.5       itojun    146: Some of network drivers loop multicast packets back to themselves,
1.1       itojun    147: even if instructed not to do so (especially in promiscuous mode).
                    148: In such cases DAD may fail, because DAD engine sees inbound NS packet
                    149: (actually from the node itself) and considers it as a sign of duplicate.
1.3       itojun    150: You may want to look at #if condition marked "heuristics" in
                    151: sys/netinet6/nd6_nbr.c:nd6_dad_timer() as workaround (note that the code
                    152: fragment in "heuristics" section is not spec conformant).
1.1       itojun    153:
                    154: Neighbor Discovery specification (RFC2461) does not talk about neighbor
                    155: cache handling in the following cases:
1.3       itojun    156: (1) when there was no neighbor cache entry, node received unsolicited
                    157:     RS/NS/NA/redirect packet without link-layer address
1.1       itojun    158: (2) neighbor cache handling on medium without link-layer address
                    159:     (we need a neighbor cache entry for IsRouter bit)
                    160: For (1), we implemented workaround based on discussions on IETF ipngwg mailing
                    161: list.  For more details, see the comments in the source code and email
                    162: thread started from (IPng 7155), dated Feb 6 1999.
                    163:
                    164: IPv6 on-link determination rule (RFC2461) is quite different from assumptions
1.5       itojun    165: in BSD IPv4 network code.  To implement behavior in RFC2461 section 5.2
                    166: (when default router list is empty), the kernel needs to know the default
                    167: outgoing interface.  To configure the default outgoing interface, use
                    168: commands like "ndp -I de0" as root.  Note that the spec misuse the word
                    169: "host" and "node" in several places in the section.
1.1       itojun    170:
                    171: To avoid possible DoS attacks and infinite loops, KAME stack will accept
                    172: only 10 options on ND packet.  Therefore, if you have 20 prefix options
                    173: attached to RA, only the first 10 prefixes will be recognized.
                    174: If this troubles you, please contact KAME team and/or modify
1.3       itojun    175: nd6_maxndopt in sys/netinet6/nd6.c.  If there are high demands we may
                    176: provide sysctl knob for the variable.
1.9       itojun    177:
                    178: Proxy Neighbor Advertisement support is implemented in the kernel.
                    179: You can configure it by using the following command:
                    180:        # ndp -s fe80:1::1234 0:1:2:3:4:5 proxy
                    181: You need to fill in scope index into the address - see 1.3.3.
                    182: There are certain limitations, though:
                    183: - It does not send unsolicited multicast NA on configuration.  This is MAY
                    184:   behavior in RFC2461.
                    185: - It does not add random delay before transmission of solicited NA.  This is
                    186:   SHOULD behavior in RFC2461.
                    187: - We cannot configure proxy NDP for off-link address.  The target address for
                    188:   proxying must be link-local address, or must be in prefixes configured to
                    189:   node which does proxy NDP.
                    190: - RFC2461 is unclear about if it is legal for a host to perform proxy ND.
                    191:   We do not prohibit hosts from doing proxy ND, but there will be very limited
                    192:   use in it.
1.1       itojun    193:
1.12      itojun    194: Starting mid March 2000, we support Neighbor Unreachability Detection (NUD)
                    195: on p2p interfaces, including tunnel interfaces (gif).  NUD is turned on by
                    196: default.  Before March 2000 KAME stack did not perform NUD on p2p interfaces.
                    197: If the change raises any interoperability issues, you can turn off/on NUD
                    198: by per-interface basis.  Use "ndp -i interface -nud" to turn it off.
                    199: Consult ndp(8) for details.
                    200:
1.1       itojun    201: 1.3 Scope Index
                    202:
1.5       itojun    203: IPv6 uses scoped addresses.  It is therefore very important to
1.1       itojun    204: specify scope index (interface index for link-local address, or
                    205: site index for site-local address) with an IPv6 address.  Without
1.5       itojun    206: scope index, a scoped IPv6 address is ambiguous to the kernel, and
                    207: the kernel will not be able to determine the outbound interface for a
                    208: packet.  KAME code tries to address the issue in several ways.
                    209:
1.6       itojun    210: Site-local address is very vaguely defined in the specs, and both specification
                    211: and KAME code need tons of improvements to enable its actual use.
                    212: For example, it is still very unclear how we define a site, or how we resolve
                    213: hostnames in a site.  There are work underway to define behavior of routers
                    214: at site border, however, we have almost no code for site boundary node support
                    215: (both forwarding nor routing) and we bet almost noone has.
                    216: We recommend, at this moment, you to use global addresses for experiments -
                    217: there are way too many pitfalls if you use site-local addresses.
                    218:
1.5       itojun    219: 1.3.1 Kernel internal
                    220:
                    221: In the kernel, the interface index for a link-local scope address is
                    222: embedded into the 2nd 16bit-word (the 3rd and 4th bytes) in the IPv6
                    223: address.
1.3       itojun    224: For example, you may see something like:
1.1       itojun    225:        fe80:1::200:f8ff:fe01:6317
                    226: in the routing table and interface address structure (struct
1.5       itojun    227: in6_ifaddr). The address above is a link-local unicast address
1.3       itojun    228: which belongs to a network interface whose interface identifier is 1.
                    229: The embedded index enables us to identify IPv6 link local
1.1       itojun    230: addresses over multiple interfaces effectively and with only a
                    231: little code change.
1.5       itojun    232:
                    233: 1.3.2 Interaction with API
                    234:
                    235: Ordinary userland applications should use the advanced API (RFC2292)
                    236: to specify scope index, or interface index.  For the similar purpose,
                    237: the sin6_scope_id member in the sockaddr_in6 structure is defined in
                    238: RFC2553.  However, the semantics for sin6_scope_id is rather vague.
                    239: If you care about portability of your application, we suggest you to
                    240: use the advanced API rather than sin6_scope_id.
                    241:
1.1       itojun    242: Routing daemons and configuration programs, like route6d and
                    243: ifconfig, will need to manipulate the "embedded" scope index.
                    244: These programs use routing sockets and ioctls (like SIOCGIFADDR_IN6)
1.3       itojun    245: and the kernel API will return IPv6 addresses with 2nd 16bit-word
1.1       itojun    246: filled in.  The APIs are for manipulating kernel internal structure.
                    247: Programs that use these APIs have to be prepared about differences
                    248: in kernels anyway.
                    249:
1.5       itojun    250: getaddrinfo(3) and getnameinfo(3) are modified to support extended numeric
1.11      itojun    251: IPv6 syntax, as documented in draft-ietf-ipngwg-scopedaddr-format-01.txt.
1.5       itojun    252: You can specify outgoing link, by using name of the outgoing interface
1.8       itojun    253: like "fe80::1%ne0".  This way you will be able to specify link-local scoped
1.5       itojun    254: address without much trouble.
                    255: To use this extension in your program, you'll need to use getaddrinfo(3),
                    256: and getnameinfo(3) with NI_WITHSCOPEID.
                    257: The implementation currently assumes 1-to-1 relationship between a link and an
                    258: interface, which is stronger than what IPv6 specs say.
                    259: Other APIs like inet_pton(3) or getipnodebyname(3) are inherently unfriendly
                    260: with scoped addresses, since they are unable to annotate addresses with
                    261: scope identifier.
                    262:
                    263: 1.3.3 Interaction with users (command line)
                    264:
                    265: Some of the userland tools support extended numeric IPv6 syntax, as
1.11      itojun    266: documented in draft-ietf-ipngwg-scopedaddr-format-01.txt.  In this case,
1.5       itojun    267: you can specify outgoing link, by using name of the outgoing interface like
1.8       itojun    268: "fe80::1%ne0".
1.5       itojun    269:
1.1       itojun    270: When you specify scoped address to the command line, NEVER write the
                    271: embedded form (such as ff02:1::1 or fe80:2::fedc).  This is not supposed
                    272: to work.  Always use standard form, like ff02::1 or fe80::fedc, with
                    273: command line option for specifying interface (like "ping6 -I ne0 ff02::1).
                    274: In general, if a command does not have command line option to specify
                    275: outgoing interface, that command is not ready to accept scoped address.
                    276: This may seem to be opposite from IPv6's premise to support "dentist office"
                    277: situation.  We believe that specifications need some improvements for this.
                    278:
1.5       itojun    279: The only exception to the above rule would be when you configure routing table
1.8       itojun    280: manually by route(8), or ndp(8).  Gateway portion of IPv6 routing entry must
                    281: be an link-local address (otherwise ICMPv6 redirect will not work), and in this
1.5       itojun    282: case you'll need to configure it by putting interface index into the address:
                    283:        # route add -inet6 default fe80:2::9876:5432:1234:5678
                    284:        (when interface index for outgoing interface = 2)
                    285: To avoid configuration mistakes, we suggest you to run dynamic routing instead
                    286: (like route6d(8)).
1.3       itojun    287:
1.1       itojun    288: 1.4 Plug and Play
                    289:
                    290: The KAME kit implements most of the IPv6 stateless address
                    291: autoconfiguration in the kernel.
                    292: Neighbor Discovery functions are implemented in the kernel as a whole.
                    293: Router Advertisement (RA) input for hosts is implemented in the
                    294: kernel.  Router Solicitation (RS) output for endhosts, RS input
                    295: for routers, and RA output for routers are implemented in the
                    296: userland.
                    297:
1.3       itojun    298: 1.4.1 Assignment of link-local, and special addresses
                    299:
1.6       itojun    300: IPv6 link-local address is generated from IEEE802 address (ethernet MAC
1.3       itojun    301: address).  Each of interface is assigned an IPv6 link-local address
                    302: automatically, when the interface becomes up (IFF_UP).  Also, direct route
                    303: for the link-local address is added to routing table.
                    304:
                    305: Here is an output of netstat command:
1.1       itojun    306:
                    307: Internet6:
                    308: Destination                   Gateway                   Flags      Netif Expire
1.8       itojun    309: fe80::%ed0/64                 link#1                    UC           ed0
                    310: fe80::%ep0/64                 link#2                    UC           ep0
1.1       itojun    311:
1.3       itojun    312: Interfaces that has no IEEE802 address (pseudo interfaces like tunnel
                    313: interfaces, or ppp interfaces) will borrow IEEE802 address from other
                    314: interfaces, such as ethernet interfaces, whenever possible.
                    315: If there is no IEEE802 hardware attached, last-resort pseudorandom value,
                    316: which is from MD5(hostname), will be used as source of link-local address.
                    317: If it is not suitable for your usage, you will need to configure the
                    318: link-local address manually.
                    319:
                    320: If an interface is not capable of handling IPv6 (such as lack of multicast
                    321: support), link-local address will not be assigned to that interface.
                    322: See section 2 for details.
                    323:
1.1       itojun    324: Each interface joins the solicited multicast address and the
1.3       itojun    325: link-local all-nodes multicast addresses (e.g.  fe80::1:ff01:6317
                    326: and ff02::1, respectively, on the link the interface is attached).
                    327: In addition to a link-local address, the loopback address (::1) will be
                    328: assigned to the loopback interface.  Also, ::1/128 and ff01::/32 are
                    329: automatically added to routing table, and loopback interface joins
                    330: node-local multicast group ff01::1.
                    331:
                    332: 1.4.2 Stateless address autoconfiguration on hosts
                    333:
                    334: In IPv6 specification, nodes are separated into two categories:
                    335: routers and hosts.  Routers forward packets addressed to others, hosts does
                    336: not forward the packets.  net.inet6.ip6.forwarding defines whether this
                    337: node is router or host (router if it is 1, host if it is 0).
                    338:
1.5       itojun    339: It is NOT recommended to change net.inet6.ip6.forwarding while the node
                    340: is in operation.   IPv6 specification defines behavior for "host" and "router"
                    341: quite differently, and switching from one to another can cause serious
                    342: troubles.  It is recommended to configure the variable at bootstrap time only.
                    343:
                    344: The first step in stateless address configuration is Duplicated Address
                    345: Detection (DAD).  See 1.2 for more detail on DAD.
                    346:
1.3       itojun    347: When a host hears Router Advertisement from the router, a host may
                    348: autoconfigure itself by stateless address autoconfiguration.
                    349: This behavior can be controlled by net.inet6.ip6.accept_rtadv
                    350: (host autoconfigures itself if it is set to 1).
                    351: By autoconfiguration, network address prefix for the receiving interface
                    352: (usually global address prefix) is added.  Default route is also configured.
                    353: Routers periodically generate Router Advertisement packets.  To request
                    354: an adjacent router to generate RA packet, a host can transmit Router
                    355: Solicitation.  To generate a RS packet at any time, use the "rtsol" command.
                    356: "rtsold" daemon is also available.  "rtsold" generates Router Solicitation
1.1       itojun    357: whenever necessary, and it works great for nomadic usage (notebooks/laptops).
                    358: If one wishes to ignore Router Advertisements, use sysctl to set
                    359: net.inet6.ip6.accept_rtadv to 0.
                    360:
                    361: To generate Router Advertisement from a router, use the "rtadvd" daemon.
                    362:
1.3       itojun    363: Note that, IPv6 specification assumes the following items, and nonconforming
                    364: cases are left unspecified:
                    365: - Only hosts will listen to router advertisements
                    366: - Hosts have single network interface (except loopback)
                    367: Therefore, this is unwise to enable net.inet6.ip6.accept_rtadv on routers,
                    368: or multi-interface host.  A misconfigured node can behave strange
                    369: (KAME code allows nonconforming configuration, for those who would like
                    370: to do some experiments).
                    371:
                    372: To summarize the sysctl knob:
                    373:        accept_rtadv    forwarding      role of the node
                    374:        ---             ---             ---
                    375:        0               0               host (to be manually configured)
                    376:        0               1               router
                    377:        1               0               autoconfigured host
                    378:                                        (spec assumes that host has single
                    379:                                        interface only, autoconfigred host with
                    380:                                        multiple interface is out-of-scope)
                    381:        1               1               invalid, or experimental
                    382:                                        (out-of-scope of spec)
                    383:
1.1       itojun    384: RFC2462 has validation rule against incoming RA prefix information option,
                    385: in 5.5.3 (e).  This is to protect hosts from malicious (or misconfigured)
                    386: routers that advertise very short prefix lifetime.
                    387: There was an update from Jim Bound to ipngwg mailing list (look
                    388: for "(ipng 6712)" in the archive) and KAME implements Jim's update.
                    389:
                    390: See 1.2 in the document for relationship between DAD and autoconfiguration.
                    391:
1.3       itojun    392: 1.4.3 DHCPv6
                    393:
                    394: We supply a tiny DHCPv6 server/client in kame/dhcp6.  However, the
                    395: implementation is very premature (for example, this does NOT
                    396: implement address lease/release), and it is not in default compilation
                    397: tree.  If you want to do some experiment, compile it on your own.
                    398:
                    399: DHCPv6 and autoconfiguration also needs more work.  "Managed" and "Other"
                    400: bits in RA have no special effect to stateful autoconfiguration procedure
                    401: in DHCPv6 client program ("Managed" bit actually prevents stateless
                    402: autoconfiguration, but no special action will be taken for DHCPv6 client).
1.1       itojun    403:
                    404: 1.5 Generic tunnel interface
                    405:
                    406: GIF (Generic InterFace) is a pseudo interface for configured tunnel.
                    407: Details are described in gif(4) manpage.
                    408: Currently
                    409:        v6 in v6
                    410:        v6 in v4
                    411:        v4 in v6
                    412:        v4 in v4
                    413: are available.  Use "gifconfig" to assign physical (outer) source
                    414: and destination address to gif interfaces.
                    415: Configuration that uses same address family for inner and outer IP
                    416: header (v4 in v4, or v6 in v6) is dangerous.  It is very easy to
                    417: configure interfaces and routing tables to perform infinite level
                    418: of tunneling.  Please be warned.
                    419:
                    420: gif can be configured to be ECN-friendly.  See 4.5 for ECN-friendliness
                    421: of tunnels, and gif(4) manpage for how to configure.
                    422:
1.3       itojun    423: If you would like to configure an IPv4-in-IPv6 tunnel with gif interface,
1.5       itojun    424: read gif(4) carefully.  You may need to remove IPv6 link-local address
1.3       itojun    425: automatically assigned to the gif interface.
                    426:
1.1       itojun    427: 1.6 Source Address Selection
                    428:
1.13      itojun    429: KAME's source address selection takes care of the following
                    430: conditions:
                    431: - address scope
                    432: - prefix matching against the destination
                    433: - outgoing interface
                    434: - whether an address is deprecated
                    435:
                    436: Roughly speaking, the selection policy is as follows:
                    437: - always use an address that belongs to the same scope zone as the
                    438:   destination.
                    439: - addresses that have equal or larger scope than the scope of the
                    440:   destination are preferred.
                    441: - if multiple addresses have the equal scope, one which is longest
                    442:   prefix matching against the destination is preferred.
                    443: - a deprecated address is not used in new communications if an
                    444:   alternate (non-deprecated) address is available and has sufficient
                    445:   scope.
                    446: - if none of above conditions tie-breaks, addresses assigned on the
                    447:   outgoing interface are preferred.
                    448:
                    449: For instance, ::1 is selected for ff01::1,
                    450: fe80::200:f8ff:fe01:6317%ne0 for fe80::2a0:24ff:feab:839b%ne0.
                    451: To see how longest-matching works, suppose that
1.3       itojun    452: 3ffe:501:808:1:200:f8ff:fe01:6317 and 3ffe:2001:9:124:200:f8ff:fe01:6317
1.13      itojun    453: are given on the outgoing interface. Then the former is chosen as the
                    454: source for the destination 3ffe:501:800::1. Note that even if all
                    455: available addresses have smaller scope than the scope of the
                    456: destination, we choose one anyway. For example, if we have link-local
                    457: and site-local addresses only, we choose a site-local addresses for a
                    458: global destination. If the packet is going to break a site boundary,
                    459: the boundary router will return an ICMPv6 destination unreachable
                    460: error with code 2 - beyond scope of source address.
                    461:
                    462: The precise desripction of the algorithm is quite complicated. To
                    463: describe the algorithm, we introduce the following notation:
                    464:
                    465: For a given destination D,
                    466:   samescope(D): A set of addresses that have the same scope as D.
                    467:   largerscope(D): A set of addresses that have a larger scope than D.
                    468:   smallerscope(D): A set of addresses that have a smaller scope than D.
                    469:
                    470: For a given set of addresses A,
                    471:   DEP(A): a set of deprecated addresses in A.
                    472:   nonDEP(A): A - DEP(A).
                    473:
                    474: Also, the algorithm assumes that the outgoing interface for the
                    475: destination D is determined. We call the interface "I".
                    476:
                    477: The algorithm is as follows. Selection proceeds step by step as
                    478: described; For example, if an address is selected by item 1, item 2 or
                    479: later are not considered at all.
                    480:
                    481:   0. If there is no address in the same scope zone as D, just give up;
                    482:      the packet will not be sent.
                    483:   1. If nonDEP(samescope(D)) is not empty,
                    484:      choose a longest matching address against D. If more than one
                    485:      address is longest matching, choose arbitrary one provided that
                    486:      an address on I is always preferred.
                    487:   2. If nonDEP(largerscope(D)) is not empty,
                    488:      choose an address that has the smallest scope. If more than one
                    489:      address has the smallest scope, choose arbitrary one provided
                    490:      that an address on I is always preferred.
                    491:   3. If DEP(samescope(D)) is not empty,
                    492:      choose a longest matching address against D. If more than one
                    493:      address is longest matching, choose arbitrary one provided that
                    494:      an address on I is always preferred.
                    495:   4. If DEP(largerscope(D)) is not empty,
                    496:      choose an address that has the smallest scope. If more than one
                    497:      address has the smallest scope, choose arbitrary one provided
                    498:      that an address on I is always preferred.
                    499:   5. if nonDEP(smallerscope(D)) is not empty,
                    500:      choose an address that has the largest scope. If more than one
                    501:      address has the largest scope, choose arbitrary one provided
                    502:      that an address on I is always preferred.
                    503:   6. if DEP(smallerscope(D)) is not empty,
                    504:      choose an address that has the largest scope. If more than one
                    505:      address has the largest scope, choose arbitrary one provided
                    506:      that an address on I is always preferred.
                    507:
                    508: There exists a document about source address selection
                    509: (draft-ietf-ipngwg-default-addr-select-xx.txt). KAME's algorithm
                    510: described above takes a similar approach to the document, but there
                    511: are some differences. See the document for more details.
1.1       itojun    512:
                    513: There are some cases where we do not use the above rule.  One
1.13      itojun    514: example is connected TCP session, and we use the address kept in TCP
                    515: protocol control block (tcb) as the source.
1.1       itojun    516: Another example is source address for Neighbor Advertisement.
                    517: Under the spec (RFC2461 7.2.2) NA's source should be the target
                    518: address of the corresponding NS's target.  In this case we follow
                    519: the spec rather than the above longest-match rule.
                    520:
1.12      itojun    521: If you would like to prohibit the use of deprecated address for some
                    522: reason, configure net.inet6.ip6.use_deprecated to 0.  The issue
                    523: related to deprecated address is described in RFC2462 5.5.4 (NOTE:
                    524: there is some debate underway in IETF ipngwg on how to use
1.3       itojun    525: "deprecated" address).
                    526:
1.1       itojun    527: 1.7 Jumbo Payload
                    528:
                    529: KAME supports the Jumbo Payload hop-by-hop option used to send IPv6
                    530: packets with payloads longer than 65,535 octets.  But since currently
                    531: KAME does not support any physical interface whose MTU is more than
                    532: 65,535, such payloads can be seen only on the loopback interface(i.e.
                    533: lo0).
                    534:
                    535: If you want to try jumbo payloads, you first have to reconfigure the
                    536: kernel so that the MTU of the loopback interface is more than 65,535
                    537: bytes; add the following to the kernel configuration file:
                    538:        options         "LARGE_LOMTU"           #To test jumbo payload
                    539: and recompile the new kernel.
                    540:
                    541: Then you can test jumbo payloads by the ping6 command with -b and -s
                    542: options.  The -b option must be specified to enlarge the size of the
                    543: socket buffer and the -s option specifies the length of the packet,
                    544: which should be more than 65,535.  For example, type as follows;
                    545:        % ping6 -b 70000 -s 68000 ::1
                    546:
                    547: The IPv6 specification requires that the Jumbo Payload option must not
                    548: be used in a packet that carries a fragment header.  If this condition
                    549: is broken, an ICMPv6 Parameter Problem message must be sent to the
                    550: sender.  KAME kernel follows the specification, but you cannot usually
                    551: see an ICMPv6 error caused by this requirement.
                    552:
                    553: If KAME kernel receives an IPv6 packet, it checks the frame length of
                    554: the packet and compares it to the length specified in the payload
                    555: length field of the IPv6 header or in the value of the Jumbo Payload
                    556: option, if any.  If the former is shorter than the latter, KAME kernel
                    557: discards the packet and increments the statistics. You can see the
                    558: statistics as output of netstat command with `-s -p ip6' option:
                    559:        % netstat -s -p ip6
                    560:        ip6:
                    561:                (snip)
                    562:                1 with data size < data length
                    563:
                    564: So, KAME kernel does not send an ICMPv6 error unless the erroneous
                    565: packet is an actual Jumbo Payload, that is, its packet size is more
                    566: than 65,535 bytes.  As described above, KAME kernel currently does not
                    567: support physical interface with such a huge MTU, so it rarely returns an
                    568: ICMPv6 error.
                    569:
                    570: TCP/UDP over jumbogram is not supported at this moment.  This is because
                    571: we have no medium (other than loopback) to test this.  Contact us if you
                    572: need this.
                    573:
                    574: IPsec does not work on jumbograms.  This is due to some specification twists
1.3       itojun    575: in supporting AH with jumbograms (AH header size influences payload length,
                    576: and this makes it real hard to authenticate inbound packet with jumbo payload
                    577: option as well as AH).
                    578:
                    579: There are fundamental issues in *BSD support for jumbograms.  We would like to
1.12      itojun    580: address those, but we need more time to finalize the task.  To name a few:
                    581: - mbuf pkthdr.len field is typed as "int" in 4.4BSD, so it cannot hold
1.3       itojun    582:   jumbogram with len > 2G on 32bit architecture CPUs.  If we would like to
                    583:   support jumbogram properly, the field must be expanded to hold 4G +
                    584:   IPv6 header + link-layer header.  Therefore, it must be expanded to at least
                    585:   int64_t (u_int32_t is NOT enough).
                    586: - We mistakingly use "int" to hold packet length in many places.  We need
1.12      itojun    587:   to convert them into larger numeric type.  It needs a great care, as we may
1.3       itojun    588:   experience overflow during packet length computation.
                    589: - We mistakingly check for ip6_plen field of IPv6 header for packet payload
                    590:   length in various places.  We should be checking mbuf pkthdr.len instead.
                    591:   ip6_input() will perform sanity check on jumbo payload option on input,
                    592:   and we can safely use mbuf pkthdr.len afterwards.
1.12      itojun    593: - TCP code needs careful updates in bunch of places, of course.
1.1       itojun    594:
                    595: 1.8 Loop prevention in header processing
                    596:
                    597: IPv6 specification allows arbitrary number of extension headers to
                    598: be placed onto packets.  If we implement IPv6 packet processing
                    599: code in the way BSD IPv4 code is implemented, kernel stack may
1.3       itojun    600: overflow due to long function call chain.  KAME sys/netinet6 code
1.1       itojun    601: is carefully designed to avoid kernel stack overflow.  Because of
                    602: this, KAME sys/netinet6 code defines its own protocol switch
                    603: structure, as "struct ip6protosw" (see netinet6/ip6protosw.h).
                    604: IPv4 part (sys/netinet) remains untouched for compatibility.
                    605: Because of this, if you receive IPsec-over-IPv4 packet with massive
                    606: number of IPsec headers, kernel stack may blow up.  IPsec-over-IPv6 is okay.
                    607:
                    608: 1.9 ICMPv6
                    609:
                    610: After RFC2463 was published, IETF ipngwg has decided to disallow ICMPv6 error
                    611: packet against ICMPv6 redirect, to prevent ICMPv6 storm on a network medium.
                    612: KAME already implements this into the kernel.
                    613:
                    614: 1.10 Applications
                    615:
                    616: For userland programming, we support IPv6 socket API as specified in
                    617: RFC2553, RFC2292 and upcoming internet drafts.
                    618:
                    619: TCP/UDP over IPv6 is available and quite stable.  You can enjoy "telnet",
                    620: "ftp", "rlogin", "rsh", "ssh", etc.  These applications are protocol
                    621: independent.  That is, they automatically chooses IPv4 or IPv6
                    622: according to DNS.
                    623:
                    624: 1.11 Kernel Internals
                    625:
1.3       itojun    626:  (*) TCP/UDP part is handled differently between operating system platforms.
                    627:      See 1.12 for details.
1.1       itojun    628:
                    629: The current KAME has escaped from the IPv4 netinet logic.  While
                    630: ip_forward() calls ip_output(), ip6_forward() directly calls
                    631: if_output() since routers must not divide IPv6 packets into fragments.
                    632:
                    633: ICMPv6 should contain the original packet as long as possible up to
                    634: 1280.  UDP6/IP6 port unreach, for instance, should contain all
                    635: extension headers and the *unchanged* UDP6 and IP6 headers.
                    636: So, all IP6 functions except TCP6 never convert network byte
                    637: order into host byte order, to save the original packet.
                    638:
                    639: tcp6_input(), udp6_input() and icmp6_input() can't assume that IP6
                    640: header is preceding the transport headers due to extension
                    641: headers.  So, in6_cksum() was implemented to handle packets whose IP6
                    642: header and transport header is not continuous.  TCP/IP6 nor UDP/IP6
                    643: header structure don't exist for checksum calculation.
                    644:
                    645: To process IP6 header, extension headers and transport headers easily,
                    646: KAME requires network drivers to store packets in one internal mbuf or
                    647: one or more external mbufs.  A typical old driver prepares two
                    648: internal mbufs for 100 - 208 bytes data, however, KAME's reference
                    649: implementation stores it in one external mbuf.
                    650:
                    651: "netstat -s -p ip6" tells you whether or not your driver conforms
                    652: KAME's requirement.  In the following example, "cce0" violates the
                    653: requirement. (For more information, refer to Section 2.)
                    654:
                    655:         Mbuf statistics:
                    656:                 317 one mbuf
                    657:                 two or more mbuf::
                    658:                         lo0 = 8
                    659:                        cce0 = 10
                    660:                 3282 one ext mbuf
                    661:                 0 two or more ext mbuf
                    662:
                    663: Each input function calls IP6_EXTHDR_CHECK in the beginning to check
                    664: if the region between IP6 and its header is
                    665: continuous.  IP6_EXTHDR_CHECK calls m_pullup() only if the mbuf has
                    666: M_LOOP flag, that is, the packet comes from the loopback
                    667: interface.  m_pullup() is never called for packets coming from physical
                    668: network interfaces.
                    669:
                    670: TCP6 reassembly makes use of IP6 header to store reassemble
                    671: information.  IP6 is not supposed to be just before TCP6, so
                    672: ip6tcpreass structure has a pointer to TCP6 header.  Of course, it has
                    673: also a pointer back to mbuf to avoid m_pullup().
                    674:
                    675: Like TCP6, both IP and IP6 reassemble functions never call m_pullup().
                    676:
                    677: xxx_ctlinput() calls in_mrejoin() on PRC_IFNEWADDR.  We think this is
                    678: one of 4.4BSD implementation flaws.  Since 4.4BSD keeps ia_multiaddrs
                    679: in in_ifaddr{}, it can't use multicast feature if the interface has no
                    680: unicast address.  So, if an application joins to an interface and then
                    681: all unicast addresses are removed from the interface, the application
                    682: can't send/receive any multicast packets.  Moreover, if a new unicast
                    683: address is assigned to the interface, in_mrejoin() must be called.
                    684: KAME's interfaces, however, have ALWAYS one link-local unicast
                    685: address.  These extensions have thus not been implemented in KAME.
                    686:
                    687: 1.12 IPv4 mapped address and IPv6 wildcard socket
                    688:
                    689: RFC2553 describes IPv4 mapped address (3.7) and special behavior
                    690: of IPv6 wildcard bind socket (3.8).  The spec allows you to:
1.4       itojun    691: - Accept IPv4 connections by AF_INET6 wildcard bind socket.
1.1       itojun    692: - Transmit IPv4 packet over AF_INET6 socket by using special form of
                    693:   the address like ::ffff:10.1.1.1.
1.3       itojun    694: but the spec itself is very complicated and does not specify how the
                    695: socket layer should behave.
1.4       itojun    696: Here we call the former one "listening side" and the latter one "initiating
                    697: side", for reference purposes.
1.1       itojun    698:
1.4       itojun    699: Almost all KAME implementations treat tcp/udp port number space separately
1.6       itojun    700: between IPv4 and IPv6.  You can perform wildcard bind on both of the address
1.4       itojun    701: families, on the same port.
                    702:
                    703: There are some OS-platform differences in KAME code, as we use tcp/udp
                    704: code from different origin.  The following table summarizes the behavior.
                    705:
                    706:                listening side          initiating side
1.6       itojun    707:                (AF_INET6 wildcard      (connection to ::ffff:10.1.1.1)
1.4       itojun    708:                socket gets IPv4 conn.)
                    709:                ---                     ---
                    710: KAME/BSDI3     not supported           not supported
                    711: KAME/FreeBSD228        not supported           not supported
                    712: KAME/FreeBSD3x configurable            supported
                    713:                default: enabled
                    714: KAME/NetBSD    configurable            supported
                    715:                default: disabled
1.12      itojun    716: KAME/BSDI4     enabled                 supported
1.4       itojun    717: KAME/OpenBSD   not supported           not supported
1.1       itojun    718:
1.4       itojun    719: The following sections will give you more details, and how you can
1.3       itojun    720: configure the behavior.
1.1       itojun    721:
1.4       itojun    722: Comments on listening side:
                    723:
                    724: It looks that RFC2553 talks too little on wildcard bind issue,
1.12      itojun    725: specifically on (1) port space issue, (2) failure mode, (3) relationship
                    726: between AF_INET/INET6 wildcard bind like ordering constraint, and (4) behavior
                    727: when conflicting socket is opened/closed.  There can be several separate
1.4       itojun    728: interpretation for this RFC which conform to it but behaves differently.
                    729: So, to implement portable application you should assume nothing
                    730: about the behavior in the kernel.  Using getaddrinfo() is the safest way.
                    731: Port number space and wildcard bind issues were discussed in detail
                    732: on ipv6imp mailing list, in mid March 1999 and it looks that there's
                    733: no concrete consensus (means, up to implementers).  You may want to
                    734: check the mailing list archives.
                    735: We supply a tool called "bindtest" that explores the behavior of
                    736: kernel bind(2).  The tool will not be compiled by default.
                    737:
                    738: If a server application would like to accept IPv4 and IPv6 connections,
                    739: it should use AF_INET and AF_INET6 socket (you'll need two sockets).
                    740: Use getaddrinfo() with AI_PASSIVE into ai_flags, and socket(2) and bind(2)
                    741: to all the addresses returned.
                    742: By opening multiple sockets, you can accept connections onto the socket with
                    743: proper address family.  IPv4 connections will be accepted by AF_INET socket,
                    744: and IPv6 connections will be accepted by AF_INET6 socket (NOTE: KAME/BSDI4
                    745: kernel sometimes violate this - we will fix it).
                    746:
                    747: If you try to support IPv6 traffic only and would like to reject IPv4
                    748: traffic, always check the peer address when a connection is made toward
                    749: AF_INET6 listening socket.  If the address is IPv4 mapped address, you may
                    750: want to reject the connection.  You can check the condition by using
                    751: IN6_IS_ADDR_V4MAPPED() macro.  This is one of the reasons the author of
                    752: the section (itojun) dislikes special behavior of AF_INET6 wildcard bind.
                    753:
                    754: Comments on initiating side:
                    755:
1.1       itojun    756: Advise to application implementers: to implement a portable IPv6 application
                    757: (which works on multiple IPv6 kernels), we believe that the following
                    758: is the key to the success:
1.3       itojun    759: - NEVER hardcode AF_INET nor AF_INET6.
1.1       itojun    760: - Use getaddrinfo() and getnameinfo() throughout the system.
                    761:   Never use gethostby*(), getaddrby*(), inet_*() or getipnodeby*().
                    762: - If you would like to connect to destination, use getaddrinfo() and try
                    763:   all the destination returned, like telnet does.
                    764: - Some of the IPv6 stack is shipped with buggy getaddrinfo().  Ship a minimal
                    765:   working version with your application and use that as last resort.
                    766:
1.4       itojun    767: If you would like to use AF_INET6 socket for both IPv4 and IPv6 outgoing
                    768: connection, you will need tweaked implementation in DNS support libraries,
                    769: as documented in RFC2553 6.1.  KAME libinet6 includes the tweak in
                    770: getipnodebyname().  Note that getipnodebyname() itself is not recommended as
                    771: it does not handle scoped IPv6 addresses at all.  For IPv6 name resolution
                    772: getaddrinfo() is the preferred API.  getaddrinfo() does not implement the
                    773: tweak.
                    774:
                    775: When writing applications that make outgoing connections, story goes much
1.6       itojun    776: simpler if you treat AF_INET and AF_INET6 as totally separate address family.
1.4       itojun    777: {set,get}sockopt issue goes simpler, DNS issue will be made simpler.  We do
                    778: not recommend you to rely upon IPv4 mapped address.
1.3       itojun    779:
                    780: 1.12.1 KAME/BSDI3 and KAME/FreeBSD228
1.1       itojun    781:
1.4       itojun    782: The platforms do not support IPv4 mapped address at all (both listening side
                    783: and initiating side).  AF_INET6 and AF_INET sockets are totally separated.
1.1       itojun    784:
1.5       itojun    785: Port number space is totally separate between AF_INET and AF_INET6 sockets.
1.1       itojun    786:
1.4       itojun    787: 1.12.2 KAME/FreeBSD3x
1.1       itojun    788:
1.4       itojun    789: KAME/FreeBSD3x uses shared tcp4/6 code (from sys/netinet/tcp*) and shared
                    790: udp4/6 code (from sys/netinet/udp*).  It uses unified inpcb/in6pcb structure.
1.1       itojun    791:
1.4       itojun    792: 1.12.2.1 KAME/FreeBSD3x, listening side
1.1       itojun    793:
1.4       itojun    794: The platform can be configured to support IPv4 mapped address/special
1.12      itojun    795: AF_INET6 wildcard bind (enabled by default).  There is no kernel compilation
                    796: option to disable it.  You can enable/disable the behavior with sysctl
                    797: (per-node), or setsockopt (per-socket).
1.4       itojun    798:
                    799: Wildcard AF_INET6 socket grabs IPv4 connection if and only if the following
                    800: conditions are satisfied:
                    801: - there's no AF_INET socket that matches the IPv4 connection
                    802: - the AF_INET6 socket is configured to accept IPv4 traffic, i.e.
                    803:   getsockopt(IPV6_BINDV6ONLY) returns 0.
                    804:
                    805: (XXX need checking)
                    806:
1.5       itojun    807: 1.12.2.2 KAME/FreeBSD3x, initiating side
1.4       itojun    808:
1.6       itojun    809: KAME/FreeBSD3x supports outgoing connection to IPv4 mapped address
1.4       itojun    810: (::ffff:10.1.1.1), if the node is configured to accept IPv4 connections
                    811: by AF_INET6 socket.
1.1       itojun    812:
1.4       itojun    813: (XXX need checking)
1.1       itojun    814:
1.5       itojun    815: 1.12.3 KAME/NetBSD
1.1       itojun    816:
1.3       itojun    817: KAME/NetBSD uses shared tcp4/6 code (from sys/netinet/tcp*) and shared
                    818: udp4/6 code (from sys/netinet/udp*).  The implementation is made differently
                    819: from KAME/FreeBSD3x.  KAME/NetBSD uses separate inpcb/in6pcb structures,
                    820: while KAME/FreeBSD3x uses merged inpcb structure.
                    821:
1.5       itojun    822: 1.12.3.1 KAME/NetBSD, listening side
1.4       itojun    823:
                    824: The platform can be configured to support IPv4 mapped address/special AF_INET6
                    825: wildcard bind (disabled by default).  Kernel behavior can be summarized as
                    826: follows:
                    827: - default: special support code will be compiled in, but is disabled by
                    828:   default.  It can be controlled by sysctl (net.inet6.ip6.bindv6only),
                    829:   or setsockopt(IPV6_BINDV6ONLY).
                    830: - add "INET6_BINDV6ONLY": No special support code for AF_INET6 wildcard socket
                    831:   will be compiled in.  AF_INET6 sockets and AF_INET sockets are totally
                    832:   separate.  The behavior is similar to what described in 1.12.1.
                    833:
                    834: sysctl setting will affect per-socket configuration at in6pcb creation time
                    835: only.  In other words, per-socket configuration will be copied from sysctl
                    836: configuration at in6pcb creation time.  To change per-socket behavior, you
                    837: must perform setsockopt or reopen the socket.  Change in sysctl configuration
                    838: will not change the behavior or sockets that are already opened.
                    839:
                    840: Wildcard AF_INET6 socket grabs IPv4 connection if and only if the following
                    841: conditions are satisfied:
                    842: - there's no AF_INET socket that matches the IPv4 connection
                    843: - the AF_INET6 socket is configured to accept IPv4 traffic, i.e.
                    844:   getsockopt(IPV6_BINDV6ONLY) returns 0.
1.12      itojun    845:
                    846: You cannot bind(2) with IPv4 mapped address.  This is a workaround for port
                    847: number duplicate and other twists.
1.4       itojun    848:
1.5       itojun    849: 1.12.3.2 KAME/NetBSD, initiating side
1.4       itojun    850:
                    851: When you initiate a connection, you can always connect to IPv4 destination
                    852: over AF_INET6 socket, usin IPv4 mapped address destination (::ffff:10.1.1.1).
                    853: This is enabled independently from the configuration for listening side, and
                    854: always enabled.
1.3       itojun    855:
1.5       itojun    856: 1.12.4 KAME/BSDI4
1.4       itojun    857:
                    858: KAME/BSDI4 uses NRL-based TCP/UDP stack and inpcb source code,
1.3       itojun    859: which was derived from NRL IPv6/IPsec stack.  I guess it supports IPv4 mapped
                    860: address and speical AF_INET6 wildcard bind.  The implementation is, again,
                    861: different from other KAME/*BSDs.
1.4       itojun    862:
1.5       itojun    863: 1.12.4.1 KAME/BSDI4, listening side
1.4       itojun    864:
                    865: NRL inpcb layer supports special behavior of AF_INET6 wildcard socket.
1.12      itojun    866: There is no way to disable the behavior.
                    867:
                    868: Wildcard AF_INET6 socket grabs IPv4 connection if and only if the following
                    869: condition is satisfied:
                    870: - there's no AF_INET socket that matches the IPv4 connection
1.1       itojun    871:
1.5       itojun    872: 1.12.4.2 KAME/BSDI4, initiating side
1.4       itojun    873:
                    874: KAME/BSDi4 supports connection initiation to IPv4 mapped address
                    875: (like ::ffff:10.1.1.1).
                    876:
1.5       itojun    877: 1.12.5 KAME/OpenBSD
1.1       itojun    878:
1.4       itojun    879: KAME/OpenBSD uses NRL-based TCP/UDP stack and inpcb source code,
                    880: which was derived from NRL IPv6/IPsec stack.
                    881:
1.5       itojun    882: 1.12.5.1 KAME/OpenBSD, listening side
1.4       itojun    883:
                    884: KAME/OpenBSD disables special behavior on AF_INET6 wildcard bind for
                    885: security reasons (if IPv4 traffic toward AF_INET6 wildcard bind is allowed,
                    886: access control will become much harder).  KAME/BSDI4 uses NRL-based TCP/UDP
                    887: stack as well, however, the behavior is different due to OpenBSD's security
                    888: policy.
                    889:
                    890: As a result the behavior of KAME/OpenBSD is similar to KAME/BSDI3 and
                    891: KAME/FreeBSD228 (see 1.12.1 for more detail).
                    892:
1.5       itojun    893: 1.12.5.2 KAME/OpenBSD, initiating side
1.4       itojun    894:
                    895: KAME/OpenBSD does not support connection initiation to IPv4 mapped address
                    896: (like ::ffff:10.1.1.1).
                    897:
1.13      itojun    898: 1.12.6 More issues
                    899:
                    900: IPv4 mapped address support adds a big requirement to EVERY userland codebase.
                    901: Every userland code should check if an AF_INET6 sockaddr contains IPv4
                    902: mapped address or not.  This adds many twists:
                    903:
                    904: - Access controls code becomes harder to write.
                    905:   For example, if you would like to reject packets from 10.0.0.0/8,
                    906:   you need to reject packets to AF_INET socket from 10.0.0.0/8,
                    907:   and to AF_INET6 socket from ::ffff:10.0.0.0/104.
1.14    ! itojun    908: - If a protocol on top of IPv4 is defined differently with IPv6, we need to be
        !           909:   really careful when we determine which protocol to use.
1.13      itojun    910:   For example, with FTP protocol, we can not simply use sa_family to determine
                    911:   FTP command sets.  The following example is incorrect:
                    912:        if (sa_family == AF_INET)
                    913:                use EPSV/EPRT or PASV/PORT;     /*IPv4*/
                    914:        else if (sa_family == AF_INET6)
                    915:                use EPSV/EPRT or LPSV/LPRT;     /*IPv6*/
                    916:        else
                    917:                error;
                    918:   Under SIIT environment, the correct code would be:
                    919:        if (sa_family == AF_INET)
                    920:                use EPSV/EPRT or PASV/PORT;     /*IPv4*/
                    921:        else if (sa_family == AF_INET6 && IPv4 mapped address)
                    922:                use EPSV/EPRT or PASV/PORT;     /*IPv4 command set on AF_INET6*/
                    923:        else if (sa_family == AF_INET6 && !IPv4 mapped address)
                    924:                use EPSV/EPRT or LPSV/LPRT;     /*IPv6*/
                    925:        else
                    926:                error;
1.14    ! itojun    927:   It is too much to ask for every body to be careful like this.
        !           928:   The problem is, we are not sure if the above code fragment is perfect for
        !           929:   all situations.
1.13      itojun    930: - By enabling kernel support for IPv4 mapped address (outgoing direction),
                    931:   servers on the kernel can be hosed by IPv6 native packet that has IPv4
                    932:   mapped address in IPv6 header source, and can generate unwanted IPv4 packets.
                    933:   http://playground.iijlab.net/i-d/draft-itojun-ipv6-transition-abuse-00.txt
                    934:   talks more about this scenario.
                    935:
                    936: Due to the above twists, some of KAME userland programs has restrictions on
                    937: the use of IPv4 mapped addresses:
                    938: - rshd/rlogind do not accept connections from IPv4 mapped address.
                    939:   This is to avoid malicious use of IPv4 mapped address in IPv6 native
                    940:   packet, to bypass source-address based authentication.
                    941: - ftp/ftpd does not support SIIT environment.  IPv4 mapped address will be
                    942:   decoded in userland, and will be passed to AF_INET sockets
                    943:   (SIIT client should pass IPv4 mapped address as is, to AF_INET6 sockets).
                    944:
1.4       itojun    945: 1.13 sockaddr_storage
                    946:
1.6       itojun    947: When RFC2553 was about to be finalized, there was discussion on how struct
1.4       itojun    948: sockaddr_storage members are named.  One proposal is to prepend "__" to the
                    949: members (like "__ss_len") as they should not be touched.  The other proposal
                    950: was that don't prepend it (like "ss_len") as we need to touch those members
                    951: directly.  There was no clear consensus on it.
                    952:
                    953: As a result, RFC2553 defines struct sockaddr_storage as follows:
                    954:        struct sockaddr_storage {
                    955:                u_char  __ss_len;       /* address length */
                    956:                u_char  __ss_family;    /* address family */
                    957:                /* and bunch of padding */
                    958:        };
                    959: On the contrary, XNET draft defines as follows:
                    960:        struct sockaddr_storage {
                    961:                u_char  ss_len;         /* address length */
                    962:                u_char  ss_family;      /* address family */
                    963:                /* and bunch of padding */
                    964:        };
                    965:
                    966: In December 1999, it was agreed that RFC2553bis should pick the latter (XNET)
                    967: definition.
                    968:
                    969: KAME kit prior to December 1999 used RFC2553 definition.  KAME kit after
                    970: December 1999 (including December) will conform to XNET definition,
1.6       itojun    971: based on RFC2553bis discussion.
1.4       itojun    972:
                    973: If you look at multiple IPv6 implementations, you will be able to see
                    974: both definitions.  As an userland programmer, the most portable way of
                    975: dealing with it is to:
                    976: (1) ensure ss_family and/or ss_len are available on the platform, by using
                    977:     GNU autoconf,
                    978: (2) have -Dss_family=__ss_family to unify all occurences (including header
                    979:     file) into __ss_family, or
                    980: (3) never touch __ss_family.  cast to sockaddr * and use sa_family like:
                    981:        struct sockaddr_storage ss;
                    982:        family = ((struct sockaddr *)&ss)->sa_family
1.1       itojun    983:
1.5       itojun    984: 1.14 Invalid addresses on the wire
                    985:
1.13      itojun    986: Some of IPv6 transition technologies embed IPv4 address into IPv6 address.
                    987: These specifications themselves are fine, however, there can be certain
                    988: set of attacks enabled by these specifications.  Recent speicifcation
                    989: documents covers up those issues, however, there are already-published RFCs
                    990: that does not have protection against those (like using source address of
1.5       itojun    991: ::ffff:127.0.0.1 to bypass "reject packet from remote" filter).
                    992:
1.13      itojun    993: To name a few, these address ranges can be used to hose an IPv6 implementation,
                    994: or bypass security controls:
                    995: - IPv4 mapped address that embeds unspecified/multicast/loopback/broadcast
                    996:   IPv4 address (if they are in IPv6 native packet header, they are malicious)
                    997:        ::ffff:0.0.0.0/104      ::ffff:127.0.0.0/104
                    998:        ::ffff:224.0.0.0/100    ::ffff:255.0.0.0/104
                    999: - 6to4 prefix generated from unspecified/multicast/loopback/broadcast/private
                   1000:   IPv4 address
                   1001:        2002:0000::/24          2002:7f00::/24          2002:e000::/24
                   1002:        2002:ff00::/24          2002:0a00::/24          2002:ac10::/28
                   1003:        2002:c0a8::/32
                   1004:
                   1005: Also, since KAME does not support RFC1933 auto tunnels, seeing IPv4 compatible
                   1006: is very rare.  You should take caution if you see those on the wire.
                   1007:
1.5       itojun   1008: KAME code is carefully written to avoid such incidents.  More specifically,
1.13      itojun   1009: KAME kernel will reject packets with certain source/dstination address in IPv6
                   1010: base header, or IPv6 routing header.  Also, KAME default configuration file
                   1011: is written carefully, to avoid those attacks.
                   1012:
                   1013: http://playground.iijlab.net/i-d/draft-itojun-ipv6-transition-abuse-00.txt
                   1014: talks about more about this.
1.5       itojun   1015:
1.12      itojun   1016: 1.15 Node's required addresses
                   1017:
                   1018: RFC2373 section 2.8 talks about required addresses for an IPv6
                   1019: node.  The section talks about how KAME stack manages those required
                   1020: addresses.
                   1021:
                   1022: 1.15.1 Host case
                   1023:
                   1024: The following items are automatically assigned to the node (or the node will
                   1025: automatically joins the group), at bootstrap time:
                   1026: - Loopback address
                   1027: - All-nodes multicast addresses (ff01::1)
                   1028:
                   1029: The following items will be automatically handled when the interface becomes
                   1030: IFF_UP:
                   1031: - Its link-local address for each interface
                   1032: - Solicited-node multicast address for link-local addresses
                   1033: - Link-local allnodes multicast address (ff02::1)
                   1034:
                   1035: The following items need to be configured manually by ifconfig(8) or prefix(8).
                   1036: Alternatively, these can be autoconfigured by using stateless address
                   1037: autoconfiguration.
                   1038: - Assigned unicast/anycast addresses
                   1039: - Solicited-Node multicast address for assigned unicast address
                   1040:
                   1041: Users can join groups by using appropriate system calls like setsockopt(2).
                   1042:
                   1043: 1.15.2 Router case
                   1044:
                   1045: In addition to the above, routers needs to handle the following items.
                   1046:
                   1047: The following items need to be configured manually by using ifconfig(8).
                   1048: o The subnet-router anycast addresses for the interfaces it is configured
                   1049:   to act as a router on (prefix::/64)
                   1050: o All other anycast addresses with which the router has been configured
                   1051:
                   1052: The router will join the following multicast group when rtadvd(8) is available
                   1053: for the interface.
                   1054: o All-Routers Multicast Addresses (ff02::2)
                   1055:
                   1056: Routing daemons will join appropriate multicast groups, as necessary,
                   1057: like ff02::9 for RIPng.
                   1058:
                   1059: Users can join groups by using appropriate system calls like setsockopt(2).
                   1060:
1.1       itojun   1061: 2. Network Drivers
                   1062:
                   1063: KAME requires three items to be added into the standard drivers:
                   1064:
                   1065: (1) mbuf clustering requirement. In this stable release, we changed
                   1066:     MINCLSIZE into MHLEN+1 for all the operating systems in order to make
                   1067:     all the drivers behave as we expect.
                   1068:
                   1069: (2) multicast.  If "ifmcstat" yields no multicast group for a
                   1070:     interface, that interface has to be patched.
                   1071:
                   1072: To avoid troubles, we suggest you to comment out the device drivers
1.3       itojun   1073: for unsupported/unnecessary cards, from the kernel configuration file.
1.1       itojun   1074: If you accidentally enable unsupported drivers, some of the userland
                   1075: tools may not work correctly (routing daemons are typical example).
                   1076:
                   1077: In the following sections, "official support" means that KAME developers
                   1078: are using that ethernet card/driver frequently.
                   1079:
1.3       itojun   1080: (NOTE: In the past we required all pcmcia drivers to have a call to
                   1081: in6_ifattach().  We have no such requirement any more)
                   1082:
1.1       itojun   1083: 2.1 FreeBSD 2.2.x-RELEASE
                   1084:
                   1085: Here is a list of FreeBSD 2.2.x-RELEASE drivers and its conditions:
                   1086:
1.12      itojun   1087:        driver  mbuf(1)         multicast(2)    official support?
1.3       itojun   1088:        ---     ---             ---             ---
1.1       itojun   1089:        (Ethernet)
1.3       itojun   1090:        ar      looks ok        -               -
                   1091:        cnw     ok              ok              yes (*)
                   1092:        ed      ok              ok              yes
                   1093:        ep      ok              ok              yes
                   1094:        fe      ok              ok              yes
                   1095:        sn      looks ok        -               -   (*)
                   1096:        vx      looks ok        -               -
                   1097:        wlp     ok              ok              -   (*)
                   1098:        xl      ok              ok              yes
                   1099:        zp      ok              ok              -
1.1       itojun   1100:        (FDDI)
1.3       itojun   1101:        fpa     looks ok        ?               -
1.1       itojun   1102:        (ATM)
1.3       itojun   1103:        en      ok              ok              yes
1.1       itojun   1104:        (Serial)
1.3       itojun   1105:        lp      ?               -               not work
                   1106:        sl      ?               -               not work
                   1107:        sr      looks ok        ok              -   (**)
1.1       itojun   1108:
                   1109: You may want to add an invocation of "rtsol" in "/etc/pccard_ether",
                   1110: if you are using notebook computers and PCMCIA ethernet card.
                   1111:
                   1112: (*) These drivers are distributed with PAO (http://www.jp.freebsd.org/PAO/).
                   1113:
                   1114: (**) There was some report says that, if you make sr driver up and down and
                   1115: then up, the kernel may hang up.  We have disabled frame-relay support from
                   1116: sr driver and after that this looks to be working fine.  If you need
1.3       itojun   1117: frame-relay support to come back, please contact KAME developers.
1.1       itojun   1118:
1.3       itojun   1119: 2.2 BSD/OS 3.x
1.1       itojun   1120:
1.3       itojun   1121: The following lists BSD/OS 3.x device drivers and its conditions:
1.1       itojun   1122:
1.12      itojun   1123:        driver  mbuf(1)         multicast(2)    official support?
1.3       itojun   1124:        ---     ---             ---             ---
1.1       itojun   1125:        (Ethernet)
1.3       itojun   1126:        cnw     ok              ok              yes
                   1127:        de      ok              ok              -
                   1128:        df      ok              ok              -
                   1129:        eb      ok              ok              -
                   1130:        ef      ok              ok              yes
                   1131:        exp     ok              ok              -
                   1132:        mz      ok              ok              yes
                   1133:        ne      ok              ok              yes
                   1134:        we      ok              ok              -
1.1       itojun   1135:        (FDDI)
1.3       itojun   1136:        fpa     ok              ok              -
1.1       itojun   1137:        (ATM)
1.3       itojun   1138:        en      maybe           ok              -
1.1       itojun   1139:        (Serial)
1.3       itojun   1140:        ntwo    ok              ok              yes
                   1141:        sl      ?               -               not work
                   1142:        appp    ?               -               not work
1.1       itojun   1143:
                   1144: You may want to use "@insert" directive in /etc/pccard.conf to invoke
                   1145: "rtsol" command right after dynamic insertion of PCMCIA ethernet cards.
                   1146:
                   1147: 2.3 NetBSD
                   1148:
                   1149: The following table lists the network drivers we have tried so far.
                   1150:
1.12      itojun   1151:        driver          mbuf(1) multicast(2)    official support?
1.1       itojun   1152:        ---             ---     ---             ---
                   1153:        (Ethernet)
1.5       itojun   1154:        awi pcmcia/i386 ok      ok              -
                   1155:        bah zbus/amiga  NG(*)
                   1156:        cnw pcmcia/i386 ok      ok              yes
1.1       itojun   1157:        ep pcmcia/i386  ok      ok              -
                   1158:        le sbus/sparc   ok      ok              yes
1.5       itojun   1159:        ne pci/i386     ok      ok              yes
1.12      itojun   1160:        ne pcmcia/i386  ok      ok              yes
1.5       itojun   1161:        wi pcmcia/i386  ok      ok              yes
1.1       itojun   1162:        (ATM)
                   1163:        en pci/i386     ok      ok              -
                   1164:
1.3       itojun   1165: (*) This may need some fix, but I'm not sure what arcnet interfaces assume...
                   1166:
1.1       itojun   1167: 2.4 FreeBSD 3.x-RELEASE
                   1168:
                   1169: Here is a list of FreeBSD 3.x-RELEASE drivers and its conditions:
                   1170:
1.12      itojun   1171:        driver  mbuf(1)         multicast(2)    official support?
1.1       itojun   1172:        ---     ---             ---             ---
                   1173:        (Ethernet)
1.12      itojun   1174:        cnw     ok              ok              -(*)
                   1175:        ed      ?               ok              -
                   1176:        ep      ok              ok              -
1.3       itojun   1177:        fe      ok              ok              yes
1.8       itojun   1178:        fxp     ?(**)
1.3       itojun   1179:        lnc     ?               ok              -
                   1180:        sn      ?               ?               -(*)
1.12      itojun   1181:        wi      ok              ok              yes
1.3       itojun   1182:        xl      ?               ok              -
                   1183:
                   1184: (*) These drivers are distributed with PAO as PAO3
                   1185:     (http://www.jp.freebsd.org/PAO/).
1.8       itojun   1186: (**) there are trouble reports with multicast filter initialization.
1.1       itojun   1187:
                   1188: More drivers will just simply work on KAME FreeBSD 3.x-RELEASE but have not
                   1189: been checked yet.
                   1190:
1.3       itojun   1191: 2.5 OpenBSD 2.x
                   1192:
                   1193: Here is a list of OpenBSD 2.x drivers and its conditions:
                   1194:
1.12      itojun   1195:        driver          mbuf(1)         multicast(2)    official support?
1.3       itojun   1196:        ---             ---             ---             ---
                   1197:        (Ethernet)
1.12      itojun   1198:        de pci/i386     ok              ok              yes
                   1199:        fxp pci/i386    ?(*)
1.5       itojun   1200:        le sbus/sparc   ok              ok              yes
1.3       itojun   1201:        ne pci/i386     ok              ok              yes
                   1202:        ne pcmcia/i386  ok              ok              yes
1.5       itojun   1203:
                   1204: (*) There seem to be some problem in driver, with multicast filter
                   1205: configuration.  This happens with certain revision of chipset on the card.
1.12      itojun   1206: Should be fixed by now by workaround in sys/net/if.c, but still not sure.
1.5       itojun   1207:
                   1208: 2.6 BSD/OS 4.x
                   1209:
                   1210: The following lists BSD/OS 4.x device drivers and its conditions:
                   1211:
1.12      itojun   1212:        driver  mbuf(1)         multicast(2)    official support?
1.5       itojun   1213:        ---     ---             ---             ---
                   1214:        (Ethernet)
                   1215:        de      ok              ok              yes
1.12      itojun   1216:        exp     (*)
1.5       itojun   1217:
                   1218: You may want to use "@insert" directive in /etc/pccard.conf to invoke
                   1219: "rtsol" command right after dynamic insertion of PCMCIA ethernet cards.
1.3       itojun   1220:
1.12      itojun   1221: (*) exp driver has serious conflict with KAME initialization sequence.
                   1222: A workaround is committed into sys/i386/pci/if_exp.c, and should be okay by now.
                   1223:
1.1       itojun   1224: 3. Translator
                   1225:
                   1226: We categorize IPv4/IPv6 translator into 4 types.
                   1227:
                   1228: Translator A --- It is used in the early stage of transition to make
                   1229: it possible to establish a connection from an IPv6 host in an IPv6
                   1230: island to an IPv4 host in the IPv4 ocean.
                   1231:
                   1232: Translator B --- It is used in the early stage of transition to make
                   1233: it possible to establish a connection from an IPv4 host in the IPv4
                   1234: ocean to an IPv6 host in an IPv6 island.
                   1235:
                   1236: Translator C --- It is used in the late stage of transition to make it
                   1237: possible to establish a connection from an IPv4 host in an IPv4 island
                   1238: to an IPv6 host in the IPv6 ocean.
                   1239:
                   1240: Translator D --- It is used in the late stage of transition to make it
                   1241: possible to establish a connection from an IPv6 host in the IPv6 ocean
                   1242: to an IPv4 host in an IPv4 island.
                   1243:
                   1244: KAME provides an TCP relay translator for category A.  This is called
                   1245: "FAITH".  We also provide IP header translator for category A.
                   1246:
                   1247: 3.1 FAITH TCP relay translator
                   1248:
                   1249: FAITH system uses TCP relay daemon called "faithd" helped by the KAME kernel.
                   1250: FAITH will reserve an IPv6 address prefix, and relay TCP connection
                   1251: toward that prefix to IPv4 destination.
                   1252:
                   1253: For example, if the reserved IPv6 prefix is 3ffe:0501:0200:ffff::, and
                   1254: the IPv6 destination for TCP connection is 3ffe:0501:0200:ffff::163.221.202.12,
                   1255: the connection will be relayed toward IPv4 destination 163.221.202.12.
                   1256:
                   1257:        destination IPv4 node (163.221.202.12)
                   1258:          ^
                   1259:          | IPv4 tcp toward 163.221.202.12
                   1260:        FAITH-relay dual stack node
                   1261:          ^
                   1262:          | IPv6 TCP toward 3ffe:0501:0200:ffff::163.221.202.12
                   1263:        source IPv6 node
                   1264:
                   1265: faithd must be invoked on FAITH-relay dual stack node.
                   1266:
1.12      itojun   1267: For more details, consult kame/kame/faithd/README and
                   1268: draft-ietf-ngtrans-tcpudp-relay-01.txt.
1.1       itojun   1269:
                   1270: 3.2 IPv6-to-IPv4 header translator
                   1271:
1.8       itojun   1272: # removed since it is not imported to NetBSD-current
1.1       itojun   1273:
                   1274: 4. IPsec
                   1275:
1.5       itojun   1276: IPsec is implemented as the following three components.
1.1       itojun   1277:
                   1278: (1) Policy Management
                   1279: (2) Key Management
1.5       itojun   1280: (3) AH, ESP and IPComp handling in kernel
                   1281:
                   1282: Note that KAME/OpenBSD does NOT include support for KAME IPsec code,
                   1283: as OpenBSD team has their home-brew IPsec stack and they have no plan
                   1284: to replace it.  IPv6 support for IPsec is, therefore, lacking on KAME/OpenBSD.
1.1       itojun   1285:
                   1286: 4.1 Policy Management
                   1287:
1.5       itojun   1288: The kernel implements experimental policy management code.  There are two way
1.12      itojun   1289: to manage security policy.  One is to configure per-socket policy using
1.5       itojun   1290: setsockopt(3).  In this cases, policy configuration is described in
                   1291: ipsec_set_policy(3).  The other is to configure kernel packet filter-based
                   1292: policy using PF_KEY interface, via setkey(8).
                   1293:
                   1294: The policy entry will be matched in order.  The order of entries makes
                   1295: difference in behavior.
1.1       itojun   1296:
                   1297: 4.2 Key Management
                   1298:
                   1299: The key management code implemented in this kit (sys/netkey) is a
                   1300: home-brew PFKEY v2 implementation.  This conforms to RFC2367.
                   1301:
1.5       itojun   1302: The home-brew IKE daemon, "racoon" is included in the kit (kame/kame/racoon,
                   1303: or usr.sbin/racoon).
1.3       itojun   1304: Basically you'll need to run racoon as daemon, then setup a policy
                   1305: to require keys (like ping -P 'out ipsec esp/transport//use').
                   1306: The kernel will contact racoon daemon as necessary to exchange keys.
1.1       itojun   1307:
1.11      itojun   1308: In IKE spec, there's ambiguity about interpretation of "tunnel" proposal.
                   1309: For example, if we would like to propose the use of following packet:
                   1310:        IP AH ESP IP payload
1.12      itojun   1311: some implementation proposes it as "AH transport and ESP tunnel", since
1.11      itojun   1312: this is more logical from packet construction point of view.  Some
                   1313: implementation proposes it as "AH tunnel and ESP tunnel".
                   1314: Racoon follows the former route.
                   1315: This raises real interoperability issue.  We hope this to be resolved quickly.
                   1316:
1.1       itojun   1317: 4.3 AH and ESP handling
                   1318:
                   1319: IPsec module is implemented as "hooks" to the standard IPv4/IPv6
                   1320: processing.  When sending a packet, ip{,6}_output() checks if ESP/AH
                   1321: processing is required by checking if a matching SPD (Security
                   1322: Policy Database) is found.  If ESP/AH is needed,
                   1323: {esp,ah}{4,6}_output() will be called and mbuf will be updated
                   1324: accordingly.  When a packet is received, {esp,ah}4_input() will be
                   1325: called based on protocol number, i.e. (*inetsw[proto])().
                   1326: {esp,ah}4_input() will decrypt/check authenticity of the packet,
                   1327: and strips off daisy-chained header and padding for ESP/AH.  It is
                   1328: safe to strip off the ESP/AH header on packet reception, since we
                   1329: will never use the received packet in "as is" form.
                   1330:
1.3       itojun   1331: By using ESP/AH, TCP4/6 effective data segment size will be affected by
                   1332: extra daisy-chained headers inserted by ESP/AH.  Our code takes care of
                   1333: the case.
1.1       itojun   1334:
                   1335: Basic crypto functions can be found in directory "sys/crypto".  ESP/AH
                   1336: transform are listed in {esp,ah}_core.c with wrapper functions.  If you
                   1337: wish to add some algorithm, add wrapper function in {esp,ah}_core.c, and
                   1338: add your crypto algorithm code into sys/crypto.
                   1339:
1.5       itojun   1340: Tunnel mode works basically fine, but comes with the following restrictions:
                   1341: - You cannot run routing daemon across IPsec tunnel, since we do not model
                   1342:   IPsec tunnel as pseudo interfaces.
1.1       itojun   1343: - Authentication model for AH tunnel must be revisited.  We'll need to
                   1344:   improve the policy management engine, eventually.
1.5       itojun   1345: - Tunnelling for IPv6 IPsec is still incomplete.  This is disabled by default.
                   1346:   If you need to perform experiments, add "options IPSEC_IPV6FWD" into
                   1347:   the kernel configuration file.  Note that path MTU discovery does not work
                   1348:   across IPv6 IPsec tunnel gateway due to insufficient code.
1.1       itojun   1349:
1.11      itojun   1350: AH specificaton does not talk much about "multiple AH on a packet" case.
                   1351: We incrementally compute AH checksum, from inside to outside.  Also, we
                   1352: treat inner AH to be immutable.
                   1353: For example, if we are to create the following packet:
                   1354:        IP AH1 AH2 AH3 payload
                   1355: we do it incrementally.  As a result, we get crypto checksums like below:
                   1356:        AH3 has checksum against "IP AH3' payload".
                   1357:                where AH3' = AH3 with checksum field filled with 0.
                   1358:        AH2 has checksum against "IP AH2' AH3 payload".
                   1359:        AH1 has checksum against "IP AH1' AH2 AH3 payload",
                   1360: Also note that AH3 has the smallest sequence number, and AH1 has the largest
                   1361: sequence number.
                   1362:
1.5       itojun   1363: 4.4 IPComp handling
                   1364:
                   1365: IPComp stands for IP payload compression protocol.  This is aimed for
                   1366: payload compression, not the header compression like PPP VJ compression.
                   1367: This may be useful when you are using slow serial link (say, cell phone)
                   1368: with powerful CPU (well, recent notebook PCs are really powerful...).
                   1369: The protocol design of IPComp is very similar to IPsec, though it was
                   1370: defined separately from IPsec itself.
                   1371:
                   1372: Here are some points to be noted:
                   1373: - IPComp is treated as part of IPsec protocol suite, and SPI and
                   1374:   CPI space is unified.  Spec says that there's no relationship
                   1375:   between two so they are assumed to be separate in specs.
                   1376: - IPComp association (IPCA) is kept in SAD.
                   1377: - It is possible to use well-known CPI (CPI=2 for DEFLATE for example),
                   1378:   for outbound/inbound packet, but for indexing purposes one element from
                   1379:   SPI/CPI space will be occupied anyway.
                   1380: - pfkey is modified to support IPComp.  However, there's no official
                   1381:   SA type number assignment yet.  Portability with other IPComp
                   1382:   stack is questionable (anyway, who else implement IPComp on UN*X?).
1.11      itojun   1383: - Spec says that IPComp output processing must be performed before AH/ESP
1.5       itojun   1384:   output processing, to achieve better compression ratio and "stir" data
1.11      itojun   1385:   stream before encryption.  The most meaningful processing order is:
                   1386:   (1) compress payload by IPComp, (2) encrypt payload by ESP, then (3) attach
                   1387:   authentication data by AH.
                   1388:   However, with manual SPD setting, you are able to violate the ordering
                   1389:   (KAME code is too generic, maybe).  Also, it is just okay to use IPComp
                   1390:   alone, without AH/ESP.
                   1391: - Though the packet size can be significantly decreased by using IPComp, no
                   1392:   special consideration is made about path MTU (spec talks nothing about MTU
1.5       itojun   1393:   consideration).  IPComp is designed for serial links, not ethernet-like
                   1394:   medium, it seems.
                   1395: - You can change compression ratio on outbound packet, by changing
                   1396:   deflate_policy in sys/netinet6/ipcomp_core.c.  You can also change outbound
                   1397:   history buffer size by changing deflate_window_out in the same source code.
                   1398:   (should it be sysctl accessible, or per-SAD configurable?)
                   1399: - Tunnel mode IPComp is not working right.  KAME box can generate tunnelled
                   1400:   IPComp packet, however, cannot accept tunneled IPComp packet.
                   1401: - You can negotiate IPComp association with racoon IKE daemon.
                   1402: - KAME code does not attach Adler32 checksum to compressed data.
                   1403:   see ipsec wg mailing list discussion in Jan 2000 for details.
                   1404:
                   1405: 4.5 Conformance to RFCs and IDs
1.1       itojun   1406:
                   1407: The IPsec code in the kernel conforms (or, tries to conform) to the
                   1408: following standards:
                   1409:     "old IPsec" specification documented in rfc182[5-9].txt
                   1410:     "new IPsec" specification documented in rfc240[1-6].txt, rfc241[01].txt,
                   1411:        rfc2451.txt and draft-mcdonald-simple-ipsec-api-01.txt (draft expired,
                   1412:        but you can take from ftp://ftp.kame.net/pub/internet-drafts/).
1.6       itojun   1413:        (NOTE: IKE specifications, rfc240[7-9].txt are implemented in userland,
1.1       itojun   1414:        as "racoon" IKE daemon)
1.5       itojun   1415:     IPComp:
                   1416:        RFC2393: IP Payload Compression Protocol (IPComp)
1.1       itojun   1417:
                   1418: Currently supported algorithms are:
                   1419:     old IPsec AH
                   1420:        null crypto checksum (no document, just for debugging)
                   1421:        keyed MD5 with 128bit crypto checksum (rfc1828.txt)
                   1422:        keyed SHA1 with 128bit crypto checksum (no document)
                   1423:        HMAC MD5 with 128bit crypto checksum (rfc2085.txt)
                   1424:        HMAC SHA1 with 128bit crypto checksum (no document)
                   1425:     old IPsec ESP
                   1426:        null encryption (no document, similar to rfc2410.txt)
                   1427:        DES-CBC mode (rfc1829.txt)
                   1428:     new IPsec AH
                   1429:        null crypto checksum (no document, just for debugging)
                   1430:        keyed MD5 with 96bit crypto checksum (no document)
                   1431:        keyed SHA1 with 96bit crypto checksum (no document)
                   1432:        HMAC MD5 with 96bit crypto checksum (rfc2403.txt
                   1433:        HMAC SHA1 with 96bit crypto checksum (rfc2404.txt)
                   1434:     new IPsec ESP
                   1435:        null encryption (rfc2410.txt)
                   1436:        DES-CBC with derived IV
                   1437:                (draft-ietf-ipsec-ciph-des-derived-01.txt, draft expired)
                   1438:        DES-CBC with explicit IV (rfc2405.txt)
                   1439:        3DES-CBC with explicit IV (rfc2451.txt)
                   1440:        BLOWFISH CBC (rfc2451.txt)
                   1441:        CAST128 CBC (rfc2451.txt)
                   1442:        RC5 CBC (rfc2451.txt)
                   1443:        each of the above can be combined with:
                   1444:            ESP authentication with HMAC-MD5(96bit)
                   1445:            ESP authentication with HMAC-SHA1(96bit)
1.5       itojun   1446:     IPComp
                   1447:        RFC2394: IP Payload Compression Using DEFLATE
1.1       itojun   1448:
                   1449: The following algorithms are NOT supported:
                   1450:     old IPsec AH
                   1451:        HMAC MD5 with 128bit crypto checksum + 64bit replay prevention
                   1452:                (rfc2085.txt)
                   1453:        keyed SHA1 with 160bit crypto checksum + 32bit padding (rfc1852.txt)
                   1454:
1.5       itojun   1455: The key/policy management API is based on the following document, with fair
                   1456: amount of extensions:
                   1457:        RFC2367: PF_KEY key management API
1.3       itojun   1458:
1.5       itojun   1459: 4.6 ECN consideration on IPsec tunnels
1.1       itojun   1460:
                   1461: KAME IPsec implements ECN-friendly IPsec tunnel, described in
1.12      itojun   1462: draft-ietf-ipsec-ecn-02.txt.
1.1       itojun   1463: Normal IPsec tunnel is described in RFC2401.  On encapsulation,
                   1464: IPv4 TOS field (or, IPv6 traffic class field) will be copied from inner
                   1465: IP header to outer IP header.  On decapsulation outer IP header
                   1466: will be simply dropped.  The decapsulation rule is not compatible
                   1467: with ECN, since ECN bit on the outer IP TOS/traffic class field will be
                   1468: lost.
                   1469: To make IPsec tunnel ECN-friendly, we should modify encapsulation
                   1470: and decapsulation procedure.  This is described in
1.12      itojun   1471: draft-ietf-ipsec-ecn-02.txt, chapter 3.3.
1.1       itojun   1472:
                   1473: KAME IPsec tunnel implementation can give you three behaviors, by setting
                   1474: net.inet.ipsec.ecn (or net.inet6.ipsec6.ecn) to some value:
                   1475: - RFC2401: no consideration for ECN (sysctl value -1)
                   1476: - ECN forbidden (sysctl value 0)
                   1477: - ECN allowed (sysctl value 1)
                   1478: Note that the behavior is configurable in per-node manner, not per-SA manner
1.12      itojun   1479: (draft-ietf-ipsec-ecn-02 wants per-SA configuration, but it looks too much
                   1480: for me).
1.1       itojun   1481:
                   1482: The behavior is summarized as follows (see source code for more detail):
                   1483:
                   1484:                encapsulate                     decapsulate
                   1485:                ---                             ---
                   1486: RFC2401                copy all TOS bits               drop TOS bits on outer
                   1487:                from inner to outer.            (use inner TOS bits as is)
                   1488:
                   1489: ECN forbidden  copy TOS bits except for ECN    drop TOS bits on outer
                   1490:                (masked with 0xfc) from inner   (use inner TOS bits as is)
                   1491:                to outer.  set ECN bits to 0.
                   1492:
                   1493: ECN allowed    copy TOS bits except for ECN    use inner TOS bits with some
                   1494:                CE (masked with 0xfe) from      change.  if outer ECN CE bit
                   1495:                inner to outer.                 is 1, enable ECN CE bit on
                   1496:                set ECN CE bit to 0.            the inner.
                   1497:
                   1498: General strategy for configuration is as follows:
                   1499: - if both IPsec tunnel endpoint are capable of ECN-friendly behavior,
                   1500:   you'd better configure both end to "ECN allowed" (sysctl value 1).
                   1501: - if the other end is very strict about TOS bit, use "RFC2401"
                   1502:   (sysctl value -1).
                   1503: - in other cases, use "ECN forbidden" (sysctl value 0).
                   1504: The default behavior is "ECN forbidden" (sysctl value 0).
                   1505:
                   1506: For more information, please refer to:
1.12      itojun   1507:        draft-ietf-ipsec-ecn-02.txt
1.1       itojun   1508:        RFC2481 (Explicit Congestion Notification)
                   1509:        KAME sys/netinet6/{ah,esp}_input.c
                   1510:
                   1511: (Thanks goes to Kenjiro Cho <kjc@csl.sony.co.jp> for detailed analysis)
                   1512:
1.5       itojun   1513: 4.7 Interoperability
                   1514:
                   1515: IPsec, IPComp (in kernel) and IKE (in userland as "racoon") has been tested
                   1516: at several interoperability test events, and it is known to interoperate
                   1517: with many other implementations well.  Also, KAME IPsec has quite wide
                   1518: coverage for IPsec crypto algorithms documented in RFC (we do not cover
                   1519: algorithms with intellectual property issues, though).
1.3       itojun   1520:
                   1521: Here are (some of) platforms we have tested IPsec/IKE interoperability
1.5       itojun   1522: in the past, in no particular order.  Note that both ends (KAME and
                   1523: others) may have modified their implementation, so use the following
                   1524: list just for reference purposes.
                   1525:        Altiga, Ashley-laurent (vpcom.com), Data Fellows (F-Secure),
                   1526:        BlueSteel, CISCO, Ericsson, ACC, Fitel, FreeS/WAN, HITACHI, IBM
1.8       itojun   1527:        AIX, IIJ, Intel, Microsoft WinNT, NAI PGPnet,
1.12      itojun   1528:        NIST (linux IPsec + plutoplus), Netscreen, OpenBSD isakmpd, Radguard,
1.8       itojun   1529:        RedCreek, Routerware, SSH, Secure Computing, Soliton, Toshiba,
                   1530:        TIS/NAI Gauntret, VPNet, Yamaha RT100i
1.5       itojun   1531:
                   1532: Here are (some of) platforms we have tested IPComp/IKE interoperability
                   1533: in the past, in no particular order.
                   1534:        IRE
                   1535:
                   1536: 5. ALTQ
                   1537:
1.8       itojun   1538: # removed since it is not imported to NetBSD-current
1.1       itojun   1539:
1.8       itojun   1540: 6. mobile-ip6
                   1541:
                   1542: # removed since it is not imported to NetBSD-current
1.1       itojun   1543:
1.3       itojun   1544:                                                 <end of IMPLEMENTATION>

CVSweb <webmaster@jp.NetBSD.org>