[BACK]Return to 1.t CVS log [TXT][DIR] Up to [cvs.NetBSD.org] / src / share / doc / papers / pulldown

File: [cvs.NetBSD.org] / src / share / doc / papers / pulldown / 1.t (download)

Revision 1.2, Fri Apr 23 02:58:27 2004 UTC (18 years, 7 months ago) by simonb
Branch: MAIN
CVS Tags: yamt-pf42-baseX, yamt-pf42-base4, yamt-pf42-base3, yamt-pf42-base2, yamt-pf42-base, yamt-pf42, yamt-pagecache-tag8, yamt-pagecache-base9, yamt-pagecache-base8, yamt-pagecache-base7, yamt-pagecache-base6, yamt-pagecache-base5, yamt-pagecache-base4, yamt-pagecache-base3, yamt-pagecache-base2, yamt-pagecache-base, yamt-pagecache, wrstuden-revivesa-base-3, wrstuden-revivesa-base-2, wrstuden-revivesa-base-1, wrstuden-revivesa-base, wrstuden-revivesa, wrstuden-fixsa-newbase, wrstuden-fixsa-base-1, wrstuden-fixsa-base, wrstuden-fixsa, tls-maxphys-base, tls-maxphys, tls-earlyentropy-base, tls-earlyentropy, riastradh-xf86-video-intel-2-7-1-pre-2-21-15, riastradh-drm2-base3, riastradh-drm2-base2, riastradh-drm2-base1, riastradh-drm2-base, riastradh-drm2, prg-localcount2-base3, prg-localcount2-base2, prg-localcount2-base1, prg-localcount2-base, prg-localcount2, phil-wifi-base, phil-wifi-20200421, phil-wifi-20200411, phil-wifi-20200406, phil-wifi-20191119, phil-wifi-20190609, phil-wifi, pgoyette-localcount-base, pgoyette-localcount-20170426, pgoyette-localcount-20170320, pgoyette-localcount-20170107, pgoyette-localcount-20161104, pgoyette-localcount-20160806, pgoyette-localcount-20160726, pgoyette-localcount, pgoyette-compat-merge-20190127, pgoyette-compat-base, pgoyette-compat-20190127, pgoyette-compat-20190118, pgoyette-compat-1226, pgoyette-compat-1126, pgoyette-compat-1020, pgoyette-compat-0930, pgoyette-compat-0906, pgoyette-compat-0728, pgoyette-compat-0625, pgoyette-compat-0521, pgoyette-compat-0502, pgoyette-compat-0422, pgoyette-compat-0415, pgoyette-compat-0407, pgoyette-compat-0330, pgoyette-compat-0322, pgoyette-compat-0315, pgoyette-compat, perseant-stdc-iso10646-base, perseant-stdc-iso10646, netbsd-9-base, netbsd-9-3-RELEASE, netbsd-9-2-RELEASE, netbsd-9-1-RELEASE, netbsd-9-0-RELEASE, netbsd-9-0-RC2, netbsd-9-0-RC1, netbsd-9, netbsd-8-base, netbsd-8-2-RELEASE, netbsd-8-1-RELEASE, netbsd-8-1-RC1, netbsd-8-0-RELEASE, netbsd-8-0-RC2, netbsd-8-0-RC1, netbsd-8, netbsd-7-nhusb-base-20170116, netbsd-7-nhusb-base, netbsd-7-nhusb, netbsd-7-base, netbsd-7-2-RELEASE, netbsd-7-1-RELEASE, netbsd-7-1-RC2, netbsd-7-1-RC1, netbsd-7-1-2-RELEASE, netbsd-7-1-1-RELEASE, netbsd-7-1, netbsd-7-0-RELEASE, netbsd-7-0-RC3, netbsd-7-0-RC2, netbsd-7-0-RC1, netbsd-7-0-2-RELEASE, netbsd-7-0-1-RELEASE, netbsd-7-0, netbsd-7, netbsd-6-base, netbsd-6-1-RELEASE, netbsd-6-1-RC4, netbsd-6-1-RC3, netbsd-6-1-RC2, netbsd-6-1-RC1, netbsd-6-1-5-RELEASE, netbsd-6-1-4-RELEASE, netbsd-6-1-3-RELEASE, netbsd-6-1-2-RELEASE, netbsd-6-1-1-RELEASE, netbsd-6-1, netbsd-6-0-RELEASE, netbsd-6-0-RC2, netbsd-6-0-RC1, netbsd-6-0-6-RELEASE, netbsd-6-0-5-RELEASE, netbsd-6-0-4-RELEASE, netbsd-6-0-3-RELEASE, netbsd-6-0-2-RELEASE, netbsd-6-0-1-RELEASE, netbsd-6-0, netbsd-6, netbsd-5-base, netbsd-5-2-RELEASE, netbsd-5-2-RC1, netbsd-5-2-3-RELEASE, netbsd-5-2-2-RELEASE, netbsd-5-2-1-RELEASE, netbsd-5-2, netbsd-5-1-RELEASE, netbsd-5-1-RC4, netbsd-5-1-RC3, netbsd-5-1-RC2, netbsd-5-1-RC1, netbsd-5-1-5-RELEASE, netbsd-5-1-4-RELEASE, netbsd-5-1-3-RELEASE, netbsd-5-1-2-RELEASE, netbsd-5-1-1-RELEASE, netbsd-5-1, netbsd-5-0-RELEASE, netbsd-5-0-RC4, netbsd-5-0-RC3, netbsd-5-0-RC2, netbsd-5-0-RC1, netbsd-5-0-2-RELEASE, netbsd-5-0-1-RELEASE, netbsd-5-0, netbsd-5, netbsd-4-base, netbsd-4-0-RELEASE, netbsd-4-0-RC5, netbsd-4-0-RC4, netbsd-4-0-RC3, netbsd-4-0-RC2, netbsd-4-0-RC1, netbsd-4-0-1-RELEASE, netbsd-4-0, netbsd-4, netbsd-3-base, netbsd-3-1-RELEASE, netbsd-3-1-RC4, netbsd-3-1-RC3, netbsd-3-1-RC2, netbsd-3-1-RC1, netbsd-3-1-1-RELEASE, netbsd-3-1, netbsd-3-0-RELEASE, netbsd-3-0-RC6, netbsd-3-0-RC5, netbsd-3-0-RC4, netbsd-3-0-RC3, netbsd-3-0-RC2, netbsd-3-0-RC1, netbsd-3-0-3-RELEASE, netbsd-3-0-2-RELEASE, netbsd-3-0-1-RELEASE, netbsd-3-0, netbsd-3, mjf-devfs2-base, mjf-devfs2, matt-premerge-20091211, matt-nb8-mediatek-base, matt-nb8-mediatek, matt-nb6-plus-nbase, matt-nb6-plus-base, matt-nb6-plus, matt-nb5-pq3-base, matt-nb5-pq3, matt-nb5-mips64-u2-k2-k4-k7-k8-k9, matt-nb5-mips64-u1-k1-k5, matt-nb5-mips64-premerge-20101231, matt-nb5-mips64-premerge-20091211, matt-nb5-mips64-k15, matt-nb5-mips64, matt-nb4-mips64-k7-u2a-k9b, matt-mips64-premerge-20101231, matt-mips64-base2, matt-mips64-base, matt-mips64, matt-armv6-prevmlocking, matt-armv6-nbase, matt-armv6-base, matt-armv6, localcount-20160914, keiichi-mipv6-nbase, keiichi-mipv6-base, keiichi-mipv6, jym-xensuspend-nbase, jym-xensuspend-base, jym-xensuspend, is-mlppp-base, is-mlppp, hpcarm-cleanup-nbase, hpcarm-cleanup-base, hpcarm-cleanup, cube-autoconf-base, cube-autoconf, cjep_sun2x-base1, cjep_sun2x-base, cjep_sun2x, cjep_staticlib_x-base1, cjep_staticlib_x-base, cjep_staticlib_x, cherry-xenmp-base, cherry-xenmp, bouyer-socketcan-base1, bouyer-socketcan-base, bouyer-socketcan, bouyer-quota2-nbase, bouyer-quota2-base, bouyer-quota2, agc-symver-base, agc-symver, abandoned-netbsd-4-base, abandoned-netbsd-4, HEAD
Changes since 1.1: +2 -2 lines

s/the the/the/ (only in sources that aren't regularly imported from
elsewhere).

.\"	$Id: 1.t,v 1.2 2004/04/23 02:58:27 simonb Exp $
.\"
.\".ds RH 4.4BSD incompatibility with IPv6/IPsec packet processing
.NH 1
4.4BSD incompatibility with IPv6/IPsec packet processing
.PP
The 4.4BSD network code holds a packet in a chain of ``mbuf'' structures.
Each mbuf structure has three flavors:
.IP \(sq
non-cluster header mbuf, which holds MHLEN
(100 bytes in a 32bit architecture installation of 4.4BSD),
.IP \(sq
non-cluster data mbuf, which holds MLEN (104 bytes), and
.IP \(sq
cluster mbuf which holds MCLBYTES (2048 bytes).
.LP
We can make a chain of mbuf structures as a linked list.
Mbuf chains will efficiently hold variable-length packet data.
Such chains also enable us to insert or remove
some of the packet data from the chain
without data copies.
.PP
When processing inbound packets, 4.4BSD uses a function called
.I m_pullup
to ease the manipulation of data content in the mbufs.
It also uses a deep function call tree for inbound packet processing.
While these two items work just fine for traditional IPv4 processing,
they do not work as well with IPv6 and IPsec processing.
.NH 2
Restrictions in 4.4BSD m_pullup
.PP
For input packet processing,
the 4.4BSD network stack uses the
.I m_pullup
function to ease parsing efforts
by adjusting the data content in mbufs for placement onto the continuous memory
region.
.I m_pullup
is defined as follows:
.DS
.SM
\f[CR]struct mbuf *
m_pullup(m, len)
	struct mbuf *m;
	int len;\fP
.DE
.NL
.I m_pullup
will ensure that the first
.I len
bytes in the packet
are placed in the continuous memory region.
After a call to
.I m_pullup,
the caller can safely access the first
.I len
bytes of the packet, assuming that they are continuous.
The caller can, for example, safely use pointer variables into
the continuous region, as long as they point inside the
.I len
boundary.
.PP
.1C
.KS
.PS
box wid boxwid*1.2 "IPv6 header" "next = routing"
box same "routing header" "next = auth"
box same "auth header" "next = TCP"
box same "TCP header"
box same "TCP payload"
.PE
.ce
.nr figure +1
Figure \n[figure]: IPv6 extension header chain
.KE
.if t .2C
.I m_pullup
makes certain assumptions regarding protocol headers.
.I m_pullup
can only take
.I len
upto MHLEN.
If the total packet header length is longer than MHLEN,
.I m_pullup
will fail, and the result will be a loss of the packet.
Under IPv4,
.[
RFC791
.]
the length assumption worked fine in most cases,
since for almost every protocol, the total length of the protocol header part
was less than MHLEN.
Each packet has only two protocol headers, including the IPv4 header.
For example, the total length of the protocol header part of a TCP packet
(up to TCP data payload) is a maximum of 120 bytes.
Typically, this length is 40 to 48 bytes.
When an IPv4 option is present, it is stripped off before TCP
header processing, and the maximum length passed to
.I m_pullup
will be 100.
.IP 1
The IPv4 header occupies 20 bytes.
.IP 2
The IPv4 option occupies 40 bytes maximum.
It will be stripped off before we parse the TCP header.
Also note that the use of IPv4 options is very rare.
.IP 3
The TCP header length is 20 bytes.
.IP 4
The TCP option is 40 bytes maximum.
In most cases it is 0 to 8 bytes.
.LP
.PP
IPv6 specification
.[
RFC2460
.]
and IPsec specification
.[
RFC2401
.]
allow more flexible use of protocol headers
by introducing chained extension headers.
With chained extension headers, each header has a ``next header field'' in it.
A chain of headers can be made as shown
.nr figure +1
in Figure \n[figure].
.nr figure -1
The type of protocol header is determined by
inspecting the previous protocol header.
There is no restriction in the number of extension headers in the spec.
.PP
Because of extension header chains, there is now no upper limit in
protocol packet header length.
The
.I m_pullup
function would impose unnecessary restriction
to the extension header processing.
In addition,
with the introduction of IPsec, it is now impossible to strip off extension headers
during inbound packet processing.
All of the data on the packet must be retained if it is to be authenticated
using Authentication Header.
.[
RFC2402
.]
Continuing the use of
.I m_pullup
will limit the
number of extension headers allowed on the packet,
and could jeopadize the possible usefulness of IPv6 extension headers. \**
.FS
In IPv4 days, the IPv4 options turned out to be unusable
due to a lack of implementation.
This was because most commercial products simply did not support IPv4 options.
.FE
.PP
Another problem related to
.I m_pullup
is that it tends to copy the protocol header even
when it is unnecessary to do so.
For example, consider the mbuf chain shown
.nr figure +1
in Figure \n[figure]:
.nr figure -1
.KS
.PS
define pointer { box ht boxht*1/4 }
define payload { box }
IP: [
	IPp: pointer
	IPd: payload with .n at bottom of IPp "IPv4"
]
move
TCP: [
	TCPp: pointer
	TCPd: payload with .n at bottom of TCPp "TCP" "TCP payload"
]
arrow from IP.IPp.center to TCP.TCPp.center
.PE
.ce
.nr figure +1
.nr beforepullup \n[figure]
Figure \n[figure]: mbuf chain before \fIm_pullup\fP
.KE
Here, the first mbuf contains an IPv4 header in the continuous region, 
and the second mbuf contains a TCP header in the continuous region.
When we look at the content of the TCP header,
under 4.4BSD the code will look like the following:
.DS
.SM
\f[CR]struct ip *ip;
struct tcphdr *th;
ip = mtod(m, struct ip *);
/* extra copy with m_pullup */
m = m_pullup(m, iphdrlen + tcphdrlen);
/* MUST  reinit ip */
ip = mtod(m, struct ip *);
th = mtod(m, caddr_t) + iphdrlen;\fP
.NL
.DE
As a result, we will get a mbuf chain shown in
.nr figure +1
Figure \n[figure].
.nr figure -1
.KF
.PS
define pointer { box ht boxht*1/4 }
define payload { box }
IP: [
	IPp: pointer
	IPd: payload with .n at bottom of IPp "IPv4" "TCP"
]
move
TCP: [
	TCPp: pointer
	TCPd: payload with .n at bottom of TCPp "TCP payload"
]
arrow from IP.IPp.center to TCP.TCPp.center
.PE
.ce
.nr figure +1
Figure \n[figure]: mbuf chain in figure \n[beforepullup] after \fIm_pullup\fP
.KE
Because
.I m_pullup
is only able to make a continuous
region starting from the top of the mbuf chain,
it copies the TCP portion in second mbuf
into the first mbuf.
The copy could be avoided if
.I m_pullup
were clever enough
to handle this case.
Also, the caller side is required to reinitialize all of
the pointers that point to the content of mbuf,
since after
.I m_pullup,
the first mbuf on the chain
.1C
.KS
.PS
ellipse "\fIip6_input\fP"
arrow
ellipse "\fIrthdr6_input\fP"
arrow
ellipse "\fIah_input\fP"
arrow "stack" "overflow"
ellipse "\fIesp_input\fP"
arrow
ellipse "\fItcp_input\fP"
.PE
.ce
Figure 5: an excessively deep call chain can cause kernel stack overflow
.KE
.if t .2C
.LP
can be reallocated and lives at
a different address than before.
While
.I m_pullup
design has provided simplicity in packet parsing,
it is disadvantageous for protocols like IPv6.
.PP
The problems can be summarized as follows:
(1)
.I m_pullup
imposes too strong restriction 
on the total length of the packet header (MHLEN);
(2)
.I m_pullup
makes an extra copy even when this can be avoided; and
(3)
.I m_pullup
requires the caller to reinitialize all of the pointers into the mbuf chain.
.NH 2
Protocol header processing with a deep function call chain
.PP
Under 4.4BSD, protocol header processing will make a chain of function calls.
For example, if we have an IPv4 TCP packet, the following function call chain will be made
.nr figure +1
(see Figure \n[figure]):
.nr figure -1
.IP (1)
.I ipintr
will be called from the network software interrupt logic,
.IP (2)
.I ipintr
processes the IPv4 header, then calls
.I tcp_input.
.\".I ipintr
.\"can be called
.\".I ip_input
.\"from its functionality.
.IP (3)
.I tcp_input
will process the TCP header and pass the data payload
to the socket queues.
.LP
.KF
.PS
ellipse "\fIipintr\fP"
arrow
ellipse "\fItcp_input\fP"
.PE
.ce
.nr figure +1
Figure \n[figure]: function call chain in IPv4 inbound packet processing
.KE
.PP
If chained extension headers are handled as described above,
the kernel stack can overflow by a deep function call chain, as shown in
.nr figure +1
Figure \n[figure].
.nr figure -1
.nr figure +1
IPv6/IPsec specifications do not define any upper limit
to the number of extension headers on a packet,
so a malicious party can transmit a ``legal'' packet with a large number of chained
headers in order to attack IPv6/IPsec implementations.
We have experienced kernel stack overflow in IPsec code,
tunnelled packet processing code, and in several other cases.
The IPsec processing routines tend to use a large chunk of memory
on the kernel stack, in order to hold intermediate data and the secret keys
used for encryption. \**
.FS
For example, blowfish encryption processing code typically uses
an intermediate data region of 4K or more.
With typical 4.4BSD installation on i386 architecture,
the kernel stack region occupies less than 8K bytes and does not grow on demand.
.FE
We cannot put the intermediate data region into a static data region outside of
the kernel stack,
because it would become a source of performance drawback on multiprocessors
due to data locking.
.PP
Even though the IPv6 specifications do not define any restrictions
on the number of extension headers, it may be possible
to impose additional restriction in an IPv6 implementation for safety.
In any case, it is not possible to estimate the amount of the
kernel stack, which will be used by protocol handlers.
We need a better calling convention for IPv6/IPsec header processing,
regardless of the limits in the number of extension headers we may impose.