Annotation of src/share/doc/papers/pulldown/1.t, Revision 1.2
1.2 ! simonb 1: .\" $Id: 1.t,v 1.1 2001/07/04 05:29:25 itojun Exp $
1.1 itojun 2: .\"
3: .\".ds RH 4.4BSD incompatibility with IPv6/IPsec packet processing
4: .NH 1
5: 4.4BSD incompatibility with IPv6/IPsec packet processing
6: .PP
7: The 4.4BSD network code holds a packet in a chain of ``mbuf'' structures.
8: Each mbuf structure has three flavors:
9: .IP \(sq
10: non-cluster header mbuf, which holds MHLEN
11: (100 bytes in a 32bit architecture installation of 4.4BSD),
12: .IP \(sq
13: non-cluster data mbuf, which holds MLEN (104 bytes), and
14: .IP \(sq
15: cluster mbuf which holds MCLBYTES (2048 bytes).
16: .LP
17: We can make a chain of mbuf structures as a linked list.
18: Mbuf chains will efficiently hold variable-length packet data.
19: Such chains also enable us to insert or remove
20: some of the packet data from the chain
21: without data copies.
22: .PP
23: When processing inbound packets, 4.4BSD uses a function called
24: .I m_pullup
25: to ease the manipulation of data content in the mbufs.
26: It also uses a deep function call tree for inbound packet processing.
27: While these two items work just fine for traditional IPv4 processing,
28: they do not work as well with IPv6 and IPsec processing.
29: .NH 2
30: Restrictions in 4.4BSD m_pullup
31: .PP
32: For input packet processing,
33: the 4.4BSD network stack uses the
34: .I m_pullup
35: function to ease parsing efforts
36: by adjusting the data content in mbufs for placement onto the continuous memory
37: region.
38: .I m_pullup
39: is defined as follows:
40: .DS
41: .SM
42: \f[CR]struct mbuf *
43: m_pullup(m, len)
44: struct mbuf *m;
45: int len;\fP
46: .DE
47: .NL
48: .I m_pullup
49: will ensure that the first
50: .I len
51: bytes in the packet
52: are placed in the continuous memory region.
53: After a call to
54: .I m_pullup,
1.2 ! simonb 55: the caller can safely access the first
1.1 itojun 56: .I len
57: bytes of the packet, assuming that they are continuous.
58: The caller can, for example, safely use pointer variables into
59: the continuous region, as long as they point inside the
60: .I len
61: boundary.
62: .PP
63: .1C
64: .KS
65: .PS
66: box wid boxwid*1.2 "IPv6 header" "next = routing"
67: box same "routing header" "next = auth"
68: box same "auth header" "next = TCP"
69: box same "TCP header"
70: box same "TCP payload"
71: .PE
72: .ce
73: .nr figure +1
74: Figure \n[figure]: IPv6 extension header chain
75: .KE
76: .if t .2C
77: .I m_pullup
78: makes certain assumptions regarding protocol headers.
79: .I m_pullup
80: can only take
81: .I len
82: upto MHLEN.
83: If the total packet header length is longer than MHLEN,
84: .I m_pullup
85: will fail, and the result will be a loss of the packet.
86: Under IPv4,
87: .[
88: RFC791
89: .]
90: the length assumption worked fine in most cases,
91: since for almost every protocol, the total length of the protocol header part
92: was less than MHLEN.
93: Each packet has only two protocol headers, including the IPv4 header.
94: For example, the total length of the protocol header part of a TCP packet
95: (up to TCP data payload) is a maximum of 120 bytes.
96: Typically, this length is 40 to 48 bytes.
97: When an IPv4 option is present, it is stripped off before TCP
98: header processing, and the maximum length passed to
99: .I m_pullup
100: will be 100.
101: .IP 1
102: The IPv4 header occupies 20 bytes.
103: .IP 2
104: The IPv4 option occupies 40 bytes maximum.
105: It will be stripped off before we parse the TCP header.
106: Also note that the use of IPv4 options is very rare.
107: .IP 3
108: The TCP header length is 20 bytes.
109: .IP 4
110: The TCP option is 40 bytes maximum.
111: In most cases it is 0 to 8 bytes.
112: .LP
113: .PP
114: IPv6 specification
115: .[
116: RFC2460
117: .]
118: and IPsec specification
119: .[
120: RFC2401
121: .]
122: allow more flexible use of protocol headers
123: by introducing chained extension headers.
124: With chained extension headers, each header has a ``next header field'' in it.
125: A chain of headers can be made as shown
126: .nr figure +1
127: in Figure \n[figure].
128: .nr figure -1
129: The type of protocol header is determined by
130: inspecting the previous protocol header.
131: There is no restriction in the number of extension headers in the spec.
132: .PP
133: Because of extension header chains, there is now no upper limit in
134: protocol packet header length.
135: The
136: .I m_pullup
137: function would impose unnecessary restriction
138: to the extension header processing.
139: In addition,
140: with the introduction of IPsec, it is now impossible to strip off extension headers
141: during inbound packet processing.
142: All of the data on the packet must be retained if it is to be authenticated
143: using Authentication Header.
144: .[
145: RFC2402
146: .]
147: Continuing the use of
148: .I m_pullup
149: will limit the
150: number of extension headers allowed on the packet,
151: and could jeopadize the possible usefulness of IPv6 extension headers. \**
152: .FS
153: In IPv4 days, the IPv4 options turned out to be unusable
154: due to a lack of implementation.
155: This was because most commercial products simply did not support IPv4 options.
156: .FE
157: .PP
158: Another problem related to
159: .I m_pullup
160: is that it tends to copy the protocol header even
161: when it is unnecessary to do so.
162: For example, consider the mbuf chain shown
163: .nr figure +1
164: in Figure \n[figure]:
165: .nr figure -1
166: .KS
167: .PS
168: define pointer { box ht boxht*1/4 }
169: define payload { box }
170: IP: [
171: IPp: pointer
172: IPd: payload with .n at bottom of IPp "IPv4"
173: ]
174: move
175: TCP: [
176: TCPp: pointer
177: TCPd: payload with .n at bottom of TCPp "TCP" "TCP payload"
178: ]
179: arrow from IP.IPp.center to TCP.TCPp.center
180: .PE
181: .ce
182: .nr figure +1
183: .nr beforepullup \n[figure]
184: Figure \n[figure]: mbuf chain before \fIm_pullup\fP
185: .KE
186: Here, the first mbuf contains an IPv4 header in the continuous region,
187: and the second mbuf contains a TCP header in the continuous region.
188: When we look at the content of the TCP header,
189: under 4.4BSD the code will look like the following:
190: .DS
191: .SM
192: \f[CR]struct ip *ip;
193: struct tcphdr *th;
194: ip = mtod(m, struct ip *);
195: /* extra copy with m_pullup */
196: m = m_pullup(m, iphdrlen + tcphdrlen);
197: /* MUST reinit ip */
198: ip = mtod(m, struct ip *);
199: th = mtod(m, caddr_t) + iphdrlen;\fP
200: .NL
201: .DE
202: As a result, we will get a mbuf chain shown in
203: .nr figure +1
204: Figure \n[figure].
205: .nr figure -1
206: .KF
207: .PS
208: define pointer { box ht boxht*1/4 }
209: define payload { box }
210: IP: [
211: IPp: pointer
212: IPd: payload with .n at bottom of IPp "IPv4" "TCP"
213: ]
214: move
215: TCP: [
216: TCPp: pointer
217: TCPd: payload with .n at bottom of TCPp "TCP payload"
218: ]
219: arrow from IP.IPp.center to TCP.TCPp.center
220: .PE
221: .ce
222: .nr figure +1
223: Figure \n[figure]: mbuf chain in figure \n[beforepullup] after \fIm_pullup\fP
224: .KE
225: Because
226: .I m_pullup
227: is only able to make a continuous
228: region starting from the top of the mbuf chain,
229: it copies the TCP portion in second mbuf
230: into the first mbuf.
231: The copy could be avoided if
232: .I m_pullup
233: were clever enough
234: to handle this case.
235: Also, the caller side is required to reinitialize all of
236: the pointers that point to the content of mbuf,
237: since after
238: .I m_pullup,
239: the first mbuf on the chain
240: .1C
241: .KS
242: .PS
243: ellipse "\fIip6_input\fP"
244: arrow
245: ellipse "\fIrthdr6_input\fP"
246: arrow
247: ellipse "\fIah_input\fP"
248: arrow "stack" "overflow"
249: ellipse "\fIesp_input\fP"
250: arrow
251: ellipse "\fItcp_input\fP"
252: .PE
253: .ce
254: Figure 5: an excessively deep call chain can cause kernel stack overflow
255: .KE
256: .if t .2C
257: .LP
258: can be reallocated and lives at
259: a different address than before.
260: While
261: .I m_pullup
262: design has provided simplicity in packet parsing,
263: it is disadvantageous for protocols like IPv6.
264: .PP
265: The problems can be summarized as follows:
266: (1)
267: .I m_pullup
268: imposes too strong restriction
269: on the total length of the packet header (MHLEN);
270: (2)
271: .I m_pullup
272: makes an extra copy even when this can be avoided; and
273: (3)
274: .I m_pullup
275: requires the caller to reinitialize all of the pointers into the mbuf chain.
276: .NH 2
277: Protocol header processing with a deep function call chain
278: .PP
279: Under 4.4BSD, protocol header processing will make a chain of function calls.
280: For example, if we have an IPv4 TCP packet, the following function call chain will be made
281: .nr figure +1
282: (see Figure \n[figure]):
283: .nr figure -1
284: .IP (1)
285: .I ipintr
286: will be called from the network software interrupt logic,
287: .IP (2)
288: .I ipintr
289: processes the IPv4 header, then calls
290: .I tcp_input.
291: .\".I ipintr
292: .\"can be called
293: .\".I ip_input
294: .\"from its functionality.
295: .IP (3)
296: .I tcp_input
297: will process the TCP header and pass the data payload
298: to the socket queues.
299: .LP
300: .KF
301: .PS
302: ellipse "\fIipintr\fP"
303: arrow
304: ellipse "\fItcp_input\fP"
305: .PE
306: .ce
307: .nr figure +1
308: Figure \n[figure]: function call chain in IPv4 inbound packet processing
309: .KE
310: .PP
311: If chained extension headers are handled as described above,
312: the kernel stack can overflow by a deep function call chain, as shown in
313: .nr figure +1
314: Figure \n[figure].
315: .nr figure -1
316: .nr figure +1
317: IPv6/IPsec specifications do not define any upper limit
318: to the number of extension headers on a packet,
319: so a malicious party can transmit a ``legal'' packet with a large number of chained
320: headers in order to attack IPv6/IPsec implementations.
321: We have experienced kernel stack overflow in IPsec code,
322: tunnelled packet processing code, and in several other cases.
323: The IPsec processing routines tend to use a large chunk of memory
324: on the kernel stack, in order to hold intermediate data and the secret keys
325: used for encryption. \**
326: .FS
327: For example, blowfish encryption processing code typically uses
328: an intermediate data region of 4K or more.
329: With typical 4.4BSD installation on i386 architecture,
330: the kernel stack region occupies less than 8K bytes and does not grow on demand.
331: .FE
332: We cannot put the intermediate data region into a static data region outside of
333: the kernel stack,
334: because it would become a source of performance drawback on multiprocessors
335: due to data locking.
336: .PP
337: Even though the IPv6 specifications do not define any restrictions
338: on the number of extension headers, it may be possible
339: to impose additional restriction in an IPv6 implementation for safety.
340: In any case, it is not possible to estimate the amount of the
341: kernel stack, which will be used by protocol handlers.
342: We need a better calling convention for IPv6/IPsec header processing,
343: regardless of the limits in the number of extension headers we may impose.
CVSweb <webmaster@jp.NetBSD.org>