src/share/doc/papers/pulldown/1.t - annotate

Return to 1.t CVS log
Up to [cvs.NetBSD.org] / src / share / doc / papers / pulldown
Annotation of src/share/doc/papers/pulldown/1.t, Revision 1.2

1.2     ! simonb      1: .\"    $Id: 1.t,v 1.1 2001/07/04 05:29:25 itojun Exp $
1.1       itojun      2: .\"
                      3: .\".ds RH 4.4BSD incompatibility with IPv6/IPsec packet processing
                      4: .NH 1
                      5: 4.4BSD incompatibility with IPv6/IPsec packet processing
                      6: .PP
                      7: The 4.4BSD network code holds a packet in a chain of ``mbuf'' structures.
                      8: Each mbuf structure has three flavors:
                      9: .IP \(sq
                     10: non-cluster header mbuf, which holds MHLEN
                     11: (100 bytes in a 32bit architecture installation of 4.4BSD),
                     12: .IP \(sq
                     13: non-cluster data mbuf, which holds MLEN (104 bytes), and
                     14: .IP \(sq
                     15: cluster mbuf which holds MCLBYTES (2048 bytes).
                     16: .LP
                     17: We can make a chain of mbuf structures as a linked list.
                     18: Mbuf chains will efficiently hold variable-length packet data.
                     19: Such chains also enable us to insert or remove
                     20: some of the packet data from the chain
                     21: without data copies.
                     22: .PP
                     23: When processing inbound packets, 4.4BSD uses a function called
                     24: .I m_pullup
                     25: to ease the manipulation of data content in the mbufs.
                     26: It also uses a deep function call tree for inbound packet processing.
                     27: While these two items work just fine for traditional IPv4 processing,
                     28: they do not work as well with IPv6 and IPsec processing.
                     29: .NH 2
                     30: Restrictions in 4.4BSD m_pullup
                     31: .PP
                     32: For input packet processing,
                     33: the 4.4BSD network stack uses the
                     34: .I m_pullup
                     35: function to ease parsing efforts
                     36: by adjusting the data content in mbufs for placement onto the continuous memory
                     37: region.
                     38: .I m_pullup
                     39: is defined as follows:
                     40: .DS
                     41: .SM
                     42: \f[CR]struct mbuf *
                     43: m_pullup(m, len)
                     44:        struct mbuf *m;
                     45:        int len;\fP
                     46: .DE
                     47: .NL
                     48: .I m_pullup
                     49: will ensure that the first
                     50: .I len
                     51: bytes in the packet
                     52: are placed in the continuous memory region.
                     53: After a call to
                     54: .I m_pullup,
1.2     ! simonb     55: the caller can safely access the first
1.1       itojun     56: .I len
                     57: bytes of the packet, assuming that they are continuous.
                     58: The caller can, for example, safely use pointer variables into
                     59: the continuous region, as long as they point inside the
                     60: .I len
                     61: boundary.
                     62: .PP
                     63: .1C
                     64: .KS
                     65: .PS
                     66: box wid boxwid*1.2 "IPv6 header" "next = routing"
                     67: box same "routing header" "next = auth"
                     68: box same "auth header" "next = TCP"
                     69: box same "TCP header"
                     70: box same "TCP payload"
                     71: .PE
                     72: .ce
                     73: .nr figure +1
                     74: Figure \n[figure]: IPv6 extension header chain
                     75: .KE
                     76: .if t .2C
                     77: .I m_pullup
                     78: makes certain assumptions regarding protocol headers.
                     79: .I m_pullup
                     80: can only take
                     81: .I len
                     82: upto MHLEN.
                     83: If the total packet header length is longer than MHLEN,
                     84: .I m_pullup
                     85: will fail, and the result will be a loss of the packet.
                     86: Under IPv4,
                     87: .[
                     88: RFC791
                     89: .]
                     90: the length assumption worked fine in most cases,
                     91: since for almost every protocol, the total length of the protocol header part
                     92: was less than MHLEN.
                     93: Each packet has only two protocol headers, including the IPv4 header.
                     94: For example, the total length of the protocol header part of a TCP packet
                     95: (up to TCP data payload) is a maximum of 120 bytes.
                     96: Typically, this length is 40 to 48 bytes.
                     97: When an IPv4 option is present, it is stripped off before TCP
                     98: header processing, and the maximum length passed to
                     99: .I m_pullup
                    100: will be 100.
                    101: .IP 1
                    102: The IPv4 header occupies 20 bytes.
                    103: .IP 2
                    104: The IPv4 option occupies 40 bytes maximum.
                    105: It will be stripped off before we parse the TCP header.
                    106: Also note that the use of IPv4 options is very rare.
                    107: .IP 3
                    108: The TCP header length is 20 bytes.
                    109: .IP 4
                    110: The TCP option is 40 bytes maximum.
                    111: In most cases it is 0 to 8 bytes.
                    112: .LP
                    113: .PP
                    114: IPv6 specification
                    115: .[
                    116: RFC2460
                    117: .]
                    118: and IPsec specification
                    119: .[
                    120: RFC2401
                    121: .]
                    122: allow more flexible use of protocol headers
                    123: by introducing chained extension headers.
                    124: With chained extension headers, each header has a ``next header field'' in it.
                    125: A chain of headers can be made as shown
                    126: .nr figure +1
                    127: in Figure \n[figure].
                    128: .nr figure -1
                    129: The type of protocol header is determined by
                    130: inspecting the previous protocol header.
                    131: There is no restriction in the number of extension headers in the spec.
                    132: .PP
                    133: Because of extension header chains, there is now no upper limit in
                    134: protocol packet header length.
                    135: The
                    136: .I m_pullup
                    137: function would impose unnecessary restriction
                    138: to the extension header processing.
                    139: In addition,
                    140: with the introduction of IPsec, it is now impossible to strip off extension headers
                    141: during inbound packet processing.
                    142: All of the data on the packet must be retained if it is to be authenticated
                    143: using Authentication Header.
                    144: .[
                    145: RFC2402
                    146: .]
                    147: Continuing the use of
                    148: .I m_pullup
                    149: will limit the
                    150: number of extension headers allowed on the packet,
                    151: and could jeopadize the possible usefulness of IPv6 extension headers. \**
                    152: .FS
                    153: In IPv4 days, the IPv4 options turned out to be unusable
                    154: due to a lack of implementation.
                    155: This was because most commercial products simply did not support IPv4 options.
                    156: .FE
                    157: .PP
                    158: Another problem related to
                    159: .I m_pullup
                    160: is that it tends to copy the protocol header even
                    161: when it is unnecessary to do so.
                    162: For example, consider the mbuf chain shown
                    163: .nr figure +1
                    164: in Figure \n[figure]:
                    165: .nr figure -1
                    166: .KS
                    167: .PS
                    168: define pointer { box ht boxht*1/4 }
                    169: define payload { box }
                    170: IP: [
                    171:        IPp: pointer
                    172:        IPd: payload with .n at bottom of IPp "IPv4"
                    173: ]
                    174: move
                    175: TCP: [
                    176:        TCPp: pointer
                    177:        TCPd: payload with .n at bottom of TCPp "TCP" "TCP payload"
                    178: ]
                    179: arrow from IP.IPp.center to TCP.TCPp.center
                    180: .PE
                    181: .ce
                    182: .nr figure +1
                    183: .nr beforepullup \n[figure]
                    184: Figure \n[figure]: mbuf chain before \fIm_pullup\fP
                    185: .KE
                    186: Here, the first mbuf contains an IPv4 header in the continuous region,
                    187: and the second mbuf contains a TCP header in the continuous region.
                    188: When we look at the content of the TCP header,
                    189: under 4.4BSD the code will look like the following:
                    190: .DS
                    191: .SM
                    192: \f[CR]struct ip *ip;
                    193: struct tcphdr *th;
                    194: ip = mtod(m, struct ip *);
                    195: /* extra copy with m_pullup */
                    196: m = m_pullup(m, iphdrlen + tcphdrlen);
                    197: /* MUST  reinit ip */
                    198: ip = mtod(m, struct ip *);
                    199: th = mtod(m, caddr_t) + iphdrlen;\fP
                    200: .NL
                    201: .DE
                    202: As a result, we will get a mbuf chain shown in
                    203: .nr figure +1
                    204: Figure \n[figure].
                    205: .nr figure -1
                    206: .KF
                    207: .PS
                    208: define pointer { box ht boxht*1/4 }
                    209: define payload { box }
                    210: IP: [
                    211:        IPp: pointer
                    212:        IPd: payload with .n at bottom of IPp "IPv4" "TCP"
                    213: ]
                    214: move
                    215: TCP: [
                    216:        TCPp: pointer
                    217:        TCPd: payload with .n at bottom of TCPp "TCP payload"
                    218: ]
                    219: arrow from IP.IPp.center to TCP.TCPp.center
                    220: .PE
                    221: .ce
                    222: .nr figure +1
                    223: Figure \n[figure]: mbuf chain in figure \n[beforepullup] after \fIm_pullup\fP
                    224: .KE
                    225: Because
                    226: .I m_pullup
                    227: is only able to make a continuous
                    228: region starting from the top of the mbuf chain,
                    229: it copies the TCP portion in second mbuf
                    230: into the first mbuf.
                    231: The copy could be avoided if
                    232: .I m_pullup
                    233: were clever enough
                    234: to handle this case.
                    235: Also, the caller side is required to reinitialize all of
                    236: the pointers that point to the content of mbuf,
                    237: since after
                    238: .I m_pullup,
                    239: the first mbuf on the chain
                    240: .1C
                    241: .KS
                    242: .PS
                    243: ellipse "\fIip6_input\fP"
                    244: arrow
                    245: ellipse "\fIrthdr6_input\fP"
                    246: arrow
                    247: ellipse "\fIah_input\fP"
                    248: arrow "stack" "overflow"
                    249: ellipse "\fIesp_input\fP"
                    250: arrow
                    251: ellipse "\fItcp_input\fP"
                    252: .PE
                    253: .ce
                    254: Figure 5: an excessively deep call chain can cause kernel stack overflow
                    255: .KE
                    256: .if t .2C
                    257: .LP
                    258: can be reallocated and lives at
                    259: a different address than before.
                    260: While
                    261: .I m_pullup
                    262: design has provided simplicity in packet parsing,
                    263: it is disadvantageous for protocols like IPv6.
                    264: .PP
                    265: The problems can be summarized as follows:
                    266: (1)
                    267: .I m_pullup
                    268: imposes too strong restriction
                    269: on the total length of the packet header (MHLEN);
                    270: (2)
                    271: .I m_pullup
                    272: makes an extra copy even when this can be avoided; and
                    273: (3)
                    274: .I m_pullup
                    275: requires the caller to reinitialize all of the pointers into the mbuf chain.
                    276: .NH 2
                    277: Protocol header processing with a deep function call chain
                    278: .PP
                    279: Under 4.4BSD, protocol header processing will make a chain of function calls.
                    280: For example, if we have an IPv4 TCP packet, the following function call chain will be made
                    281: .nr figure +1
                    282: (see Figure \n[figure]):
                    283: .nr figure -1
                    284: .IP (1)
                    285: .I ipintr
                    286: will be called from the network software interrupt logic,
                    287: .IP (2)
                    288: .I ipintr
                    289: processes the IPv4 header, then calls
                    290: .I tcp_input.
                    291: .\".I ipintr
                    292: .\"can be called
                    293: .\".I ip_input
                    294: .\"from its functionality.
                    295: .IP (3)
                    296: .I tcp_input
                    297: will process the TCP header and pass the data payload
                    298: to the socket queues.
                    299: .LP
                    300: .KF
                    301: .PS
                    302: ellipse "\fIipintr\fP"
                    303: arrow
                    304: ellipse "\fItcp_input\fP"
                    305: .PE
                    306: .ce
                    307: .nr figure +1
                    308: Figure \n[figure]: function call chain in IPv4 inbound packet processing
                    309: .KE
                    310: .PP
                    311: If chained extension headers are handled as described above,
                    312: the kernel stack can overflow by a deep function call chain, as shown in
                    313: .nr figure +1
                    314: Figure \n[figure].
                    315: .nr figure -1
                    316: .nr figure +1
                    317: IPv6/IPsec specifications do not define any upper limit
                    318: to the number of extension headers on a packet,
                    319: so a malicious party can transmit a ``legal'' packet with a large number of chained
                    320: headers in order to attack IPv6/IPsec implementations.
                    321: We have experienced kernel stack overflow in IPsec code,
                    322: tunnelled packet processing code, and in several other cases.
                    323: The IPsec processing routines tend to use a large chunk of memory
                    324: on the kernel stack, in order to hold intermediate data and the secret keys
                    325: used for encryption. \**
                    326: .FS
                    327: For example, blowfish encryption processing code typically uses
                    328: an intermediate data region of 4K or more.
                    329: With typical 4.4BSD installation on i386 architecture,
                    330: the kernel stack region occupies less than 8K bytes and does not grow on demand.
                    331: .FE
                    332: We cannot put the intermediate data region into a static data region outside of
                    333: the kernel stack,
                    334: because it would become a source of performance drawback on multiprocessors
                    335: due to data locking.
                    336: .PP
                    337: Even though the IPv6 specifications do not define any restrictions
                    338: on the number of extension headers, it may be possible
                    339: to impose additional restriction in an IPv6 implementation for safety.
                    340: In any case, it is not possible to estimate the amount of the
                    341: kernel stack, which will be used by protocol handlers.
                    342: We need a better calling convention for IPv6/IPsec header processing,
                    343: regardless of the limits in the number of extension headers we may impose.
CVSweb <webmaster@jp.NetBSD.org>