[BACK]Return to 8.t CVS log [TXT][DIR] Up to [cvs.NetBSD.org] / src / share / doc / papers / pulldown

File: [cvs.NetBSD.org] / src / share / doc / papers / pulldown / 8.t (download)

Revision 1.1, Wed Jul 4 05:29:25 2001 UTC (21 years, 4 months ago) by itojun
Branch: MAIN
CVS Tags: yamt-pf42-baseX, yamt-pf42-base4, yamt-pf42-base3, yamt-pf42-base2, yamt-pf42-base, yamt-pf42, yamt-pagecache-tag8, yamt-pagecache-base9, yamt-pagecache-base8, yamt-pagecache-base7, yamt-pagecache-base6, yamt-pagecache-base5, yamt-pagecache-base4, yamt-pagecache-base3, yamt-pagecache-base2, yamt-pagecache-base, yamt-pagecache, wrstuden-revivesa-base-3, wrstuden-revivesa-base-2, wrstuden-revivesa-base-1, wrstuden-revivesa-base, wrstuden-revivesa, wrstuden-fixsa-newbase, wrstuden-fixsa-base-1, wrstuden-fixsa-base, wrstuden-fixsa, tls-maxphys-base, tls-maxphys, tls-earlyentropy-base, tls-earlyentropy, riastradh-xf86-video-intel-2-7-1-pre-2-21-15, riastradh-drm2-base3, riastradh-drm2-base2, riastradh-drm2-base1, riastradh-drm2-base, riastradh-drm2, prg-localcount2-base3, prg-localcount2-base2, prg-localcount2-base1, prg-localcount2-base, prg-localcount2, phil-wifi-base, phil-wifi-20200421, phil-wifi-20200411, phil-wifi-20200406, phil-wifi-20191119, phil-wifi-20190609, phil-wifi, pgoyette-localcount-base, pgoyette-localcount-20170426, pgoyette-localcount-20170320, pgoyette-localcount-20170107, pgoyette-localcount-20161104, pgoyette-localcount-20160806, pgoyette-localcount-20160726, pgoyette-localcount, pgoyette-compat-merge-20190127, pgoyette-compat-base, pgoyette-compat-20190127, pgoyette-compat-20190118, pgoyette-compat-1226, pgoyette-compat-1126, pgoyette-compat-1020, pgoyette-compat-0930, pgoyette-compat-0906, pgoyette-compat-0728, pgoyette-compat-0625, pgoyette-compat-0521, pgoyette-compat-0502, pgoyette-compat-0422, pgoyette-compat-0415, pgoyette-compat-0407, pgoyette-compat-0330, pgoyette-compat-0322, pgoyette-compat-0315, pgoyette-compat, perseant-stdc-iso10646-base, perseant-stdc-iso10646, netbsd-9-base, netbsd-9-3-RELEASE, netbsd-9-2-RELEASE, netbsd-9-1-RELEASE, netbsd-9-0-RELEASE, netbsd-9-0-RC2, netbsd-9-0-RC1, netbsd-9, netbsd-8-base, netbsd-8-2-RELEASE, netbsd-8-1-RELEASE, netbsd-8-1-RC1, netbsd-8-0-RELEASE, netbsd-8-0-RC2, netbsd-8-0-RC1, netbsd-8, netbsd-7-nhusb-base-20170116, netbsd-7-nhusb-base, netbsd-7-nhusb, netbsd-7-base, netbsd-7-2-RELEASE, netbsd-7-1-RELEASE, netbsd-7-1-RC2, netbsd-7-1-RC1, netbsd-7-1-2-RELEASE, netbsd-7-1-1-RELEASE, netbsd-7-1, netbsd-7-0-RELEASE, netbsd-7-0-RC3, netbsd-7-0-RC2, netbsd-7-0-RC1, netbsd-7-0-2-RELEASE, netbsd-7-0-1-RELEASE, netbsd-7-0, netbsd-7, netbsd-6-base, netbsd-6-1-RELEASE, netbsd-6-1-RC4, netbsd-6-1-RC3, netbsd-6-1-RC2, netbsd-6-1-RC1, netbsd-6-1-5-RELEASE, netbsd-6-1-4-RELEASE, netbsd-6-1-3-RELEASE, netbsd-6-1-2-RELEASE, netbsd-6-1-1-RELEASE, netbsd-6-1, netbsd-6-0-RELEASE, netbsd-6-0-RC2, netbsd-6-0-RC1, netbsd-6-0-6-RELEASE, netbsd-6-0-5-RELEASE, netbsd-6-0-4-RELEASE, netbsd-6-0-3-RELEASE, netbsd-6-0-2-RELEASE, netbsd-6-0-1-RELEASE, netbsd-6-0, netbsd-6, netbsd-5-base, netbsd-5-2-RELEASE, netbsd-5-2-RC1, netbsd-5-2-3-RELEASE, netbsd-5-2-2-RELEASE, netbsd-5-2-1-RELEASE, netbsd-5-2, netbsd-5-1-RELEASE, netbsd-5-1-RC4, netbsd-5-1-RC3, netbsd-5-1-RC2, netbsd-5-1-RC1, netbsd-5-1-5-RELEASE, netbsd-5-1-4-RELEASE, netbsd-5-1-3-RELEASE, netbsd-5-1-2-RELEASE, netbsd-5-1-1-RELEASE, netbsd-5-1, netbsd-5-0-RELEASE, netbsd-5-0-RC4, netbsd-5-0-RC3, netbsd-5-0-RC2, netbsd-5-0-RC1, netbsd-5-0-2-RELEASE, netbsd-5-0-1-RELEASE, netbsd-5-0, netbsd-5, netbsd-4-base, netbsd-4-0-RELEASE, netbsd-4-0-RC5, netbsd-4-0-RC4, netbsd-4-0-RC3, netbsd-4-0-RC2, netbsd-4-0-RC1, netbsd-4-0-1-RELEASE, netbsd-4-0, netbsd-4, netbsd-3-base, netbsd-3-1-RELEASE, netbsd-3-1-RC4, netbsd-3-1-RC3, netbsd-3-1-RC2, netbsd-3-1-RC1, netbsd-3-1-1-RELEASE, netbsd-3-1, netbsd-3-0-RELEASE, netbsd-3-0-RC6, netbsd-3-0-RC5, netbsd-3-0-RC4, netbsd-3-0-RC3, netbsd-3-0-RC2, netbsd-3-0-RC1, netbsd-3-0-3-RELEASE, netbsd-3-0-2-RELEASE, netbsd-3-0-1-RELEASE, netbsd-3-0, netbsd-3, netbsd-2-base, netbsd-2-1-RELEASE, netbsd-2-1-RC6, netbsd-2-1-RC5, netbsd-2-1-RC4, netbsd-2-1-RC3, netbsd-2-1-RC2, netbsd-2-1-RC1, netbsd-2-1, netbsd-2-0-base, netbsd-2-0-RELEASE, netbsd-2-0-RC5, netbsd-2-0-RC4, netbsd-2-0-RC3, netbsd-2-0-RC2, netbsd-2-0-RC1, netbsd-2-0-3-RELEASE, netbsd-2-0-2-RELEASE, netbsd-2-0-1-RELEASE, netbsd-2-0, netbsd-2, netbsd-1-6-base, netbsd-1-6-RELEASE, netbsd-1-6-RC3, netbsd-1-6-RC2, netbsd-1-6-RC1, netbsd-1-6-PATCH002-RELEASE, netbsd-1-6-PATCH002-RC4, netbsd-1-6-PATCH002-RC3, netbsd-1-6-PATCH002-RC2, netbsd-1-6-PATCH002-RC1, netbsd-1-6-PATCH002, netbsd-1-6-PATCH001-RELEASE, netbsd-1-6-PATCH001-RC3, netbsd-1-6-PATCH001-RC2, netbsd-1-6-PATCH001-RC1, netbsd-1-6-PATCH001, netbsd-1-6, mjf-devfs2-base, mjf-devfs2, matt-premerge-20091211, matt-nb8-mediatek-base, matt-nb8-mediatek, matt-nb6-plus-nbase, matt-nb6-plus-base, matt-nb6-plus, matt-nb5-pq3-base, matt-nb5-pq3, matt-nb5-mips64-u2-k2-k4-k7-k8-k9, matt-nb5-mips64-u1-k1-k5, matt-nb5-mips64-premerge-20101231, matt-nb5-mips64-premerge-20091211, matt-nb5-mips64-k15, matt-nb5-mips64, matt-nb4-mips64-k7-u2a-k9b, matt-mips64-premerge-20101231, matt-mips64-base2, matt-mips64-base, matt-mips64, matt-armv6-prevmlocking, matt-armv6-nbase, matt-armv6-base, matt-armv6, localcount-20160914, keiichi-mipv6-nbase, keiichi-mipv6-base, keiichi-mipv6, jym-xensuspend-nbase, jym-xensuspend-base, jym-xensuspend, is-mlppp-base, is-mlppp, hpcarm-cleanup-nbase, hpcarm-cleanup-base, hpcarm-cleanup, fvdl_fs64_base, cube-autoconf-base, cube-autoconf, cjep_sun2x-base1, cjep_sun2x-base, cjep_sun2x, cjep_staticlib_x-base1, cjep_staticlib_x-base, cjep_staticlib_x, cherry-xenmp-base, cherry-xenmp, bouyer-socketcan-base1, bouyer-socketcan-base, bouyer-socketcan, bouyer-quota2-nbase, bouyer-quota2-base, bouyer-quota2, agc-symver-base, agc-symver, abandoned-netbsd-4-base, abandoned-netbsd-4, HEAD

add Freenix 2000 paper on m_pulldown(9), by itojun.

.\"	$Id: 8.t,v 1.1 2001/07/04 05:29:25 itojun Exp $
.\"
.\".ds RH Comparisons
.NH 1
Comparisons
.PP
This section compares the following three approaches in terms of
their characteristics and actual behavior:
(1) 4.4BSD
.I m_pullup,
(2) NRL
.I m_pullup2,
and (3) KAME
.I m_pulldown.
.LP
.NH 2
Comparison of assumption
.PP
Table 1 shows the assumptions made by each of the three approaches.
As mentioned earlier,
.I m_pullup
imposes too stringent requirement for the total length of packet headers.
.I m_pullup2
is workable in most cases, although
this approach adds more restrictions than the specification claims.
.I m_pulldown
assumes that the single packet header is smaller than MCLBYTES,
but makes
no restriction regarding the total length of packet headers.
With a standard mbuf chain,
this is the best
.I m_pulldown
can do, since there is no way to hold continuous region longer than MCLBYTES.
This characteristic can contribute to better specification conformance,
since
.I m_pulldown
will impose fewer additional restrictions due to the
requirements of implementation.
.PP
Among the three approaches, only
.I m_pulldown
avoids making unnecessary copies of intermediate header data and
avoids pointer reinitialization after calls to these functions.
These attributes result in smaller overhead during input packet processing.
.PP
.nr table +1
At present,
we know of no other 4.4BSD-based IPv6/IPsec stack that addresses kernel
stack overflow issues,
although we are open to
new perspectives and new information.
.NH 2
Performance comparison based on simulated statistics
.PP
To compare the behavior and performance of
.I m_pulldown
against
.I m_pullup
and
.I m_pullup2
using the same set of traffic and
mbuf chains, we have gathered simulated statistics for
.I m_pullup
and
.I m_pullup2,
in
.I m_pulldown
function.
By running a kernel using the modified
.I m_pulldown
function,
we can easily
gather statistics for these three functions against exactly the same traffic.
.PP
The comparison was made on a computer
(with Celeron 366MHz CPU, 192M bytes of memory)
running NetBSD 1.4.1 with the KAME IPv6/IPsec stack.
Network drivers allocate mbufs just as normal 4.4BSD does.
.I m_pulldown
is called whenever it is needed to ensure continuity in packet data
during inbound packet processing.
The role of the computer is as an end node, not a router.
.PP
To describe the content of the following table,
we must look at the source code fragment.
.nr figure +1
Figure \n[figure]
.nr figure -1
shows the code fragment from our source code.
The code fragment will
(1) make the TCP header on the mbuf chain
.I m
at offset
.I hdrlen
continuous, and (2) point the region with pointer
.I th.
We use a macro named IP6_EXTHDR_CHECK,
and the code before and after the macro expansion is shown in the figure.
.KF
.LD
.ps 6
.vs 7
\f[CR]/* ensure that *th from hdrlen is continuous */
/* before macro expansion... */
struct tcphdr *th;
IP6_EXTHDR_CHECK(th, struct tcphdr *, m,
	hdrlen, sizeof(*th));
if (th == NULL)
    return;	/*m is already freed*/


/* after macro expansion... */
struct tcphdr *th;
int off;
struct mbuf *n;
if (m->m_len < hdrlen + sizeof(*th)) {
    n = m_pulldown(m, hdrlen, sizeof(*th), &off);
    if (n)
	th = (struct tcphdr *)(mtod(n, caddr_t) + off);
    else
	th = NULL;
} else
    th = (struct tcphdr *)(mtod(m, caddr_t) + hdrlen);
if (th == NULL)
    return;\fP
.NL
.DE
.nr figure +1
Figure \n[figure]: code fragment for trimming mbuf chain.
.KE
In Table 2,
the first column identifies the test case.
The second column shows the number of times
the IP6_EXTHDR_CHECK macro was used.
In other words, it shows the number of times we have made checks against
mbuf length.
The remaining columns show, from left to right,
the number of times memory allocation/copy was performed in each of the variants.
In the case of
.I m_pullup,
we counted the number of cases we passed
.I len
in excess of MHLEN (96 bytes in this installation).
.\"With
.\".I m_pullup2
.\"and
.\".I m_pulldown,
.\"there were no such failures.
This result suggests
that there was no packet with a packet header portion larger than
MCLBYTES (2048 bytes).
.\" The percentage in parentheses is ratio against the number on the first column.
In the evaluation we have used
.I m_pulldown
against IPv6 traffic only.
.1C
.KF
.TS
center box;
l cfI cfI cfI
l c c c.
	m_pullup	m_pullup2	m_pulldown
_
total header length	MHLEN(100)	MCLBYTES(2048)	\(mi
single header length	\(mi	\(mi	MCLBYTES(2048)
_
T{
avoids copy on intermediate headers
T}	no	no	yes
_
T{
avoids pointer reinitialization
T}	no	no	yes
.TE
.ce
Table 1: assumptions in mbuf manipulation approaches.
.KE
.KF
.TS
center box;
c |c |cfI s s |cfI s s |cfI s
c |r |c c c |c c c |c c
r |r |r r r |r r r |r r.
test	len checks	m_pulldown	m_pullup	m_pullup2
		call	alloc	copy	alloc	copy	fail	alloc	copy
_
(1)	204923	1706	1595	1596	165	165	1541	1596	1596
(2)	1063995	23786	22931	23008	1171	1229	22557	22895	22953
(3)	520028	1245	948	957	432	432	813	945	945
(4)	438602	180	6	6	178	178	2	24	24
(5)	5570	2236	206	206	812	812	1424	1424	1424
.TE
.ce
Table 2: number of mbuf allocation/copy against traffic
.KE
.KF
.TS
center box;
c |c c c c |c c c
c |r r r r |r r r.
test	IPv6 input	TCP	UDP	ICMPv6	1 mbuf	2 mbufs	ext mbuf(s)
_
(1)	29334	20892	2699	5739	3624	15632	10078
(2)	313218	215919	15930	80263	38751	172976	101491
(3)	132267	117822	8561	5882	12782	59799	59686
(4)	73160	66512	5249	1343	7475	42053	23632
(5)	1433	148	53	52	103	1203	127
.TE
.ce
Table 3: Traffic characteristics for tests in Table 2
.KE
.if t .2C
.PP
From these measured results, we obtain several interesting observations.
.I m_pullup
actually failed on IPv6 trafic.
If an IPv6 implementation uses
.I m_pullup
for IPv6 input processing,
it must be coded carefully so as to avoid trying
.I m_pullup
against any length longer than MHLEN.
To achieve this end, the code copies the data portion from the mbuf
chain to a separate buffer, and the cost of memory copies becomes a penalty.
.PP
Due to the nature of this simulation,
the comparison described above may contain an implicit bias.
Since the IPv6 protocol processing code is written by using
.I m_pulldown,
the code is somewhat biased toward
.I m_pulldown.
If a programmer had to write the entire IPv6 protocol processing with
.I m_pullup
only, he or she would use
.I m_copydata
to copy intermediate
extension headers buried deep inside the header chains,
thus making it unnecessary to call
.I m_pullup.
In any case, a call to
.I m_copydata
will result in a data copy,
which causes extra overhead.
.\"The author thinks that this bias toward
.\".I m_pulldown
.\"is therefore negligible.
.PP
In all cases, the number of length checks (second column) exceeds the
number of inbound packets.
This behavior is the same as in the original 4.4BSD stack;
we did not add a significant number of length checks to the code.
This is because
.I m_pulldown
(or
.I m_pullup
in the 4.4BSD case)
is called
as necessary during the parsing of the headers.
For example, to process a TCP-over-IPv6 packet, at least 3
checks would be made against m->m_len;
these checks would be made
to grab the IPv6 header (40 bytes),
to grab the TCP header (20 bytes), and to grab the TCP header
and options (20 to 60 bytes).
The length of the TCP option part is kept inside the TCP header,
so the length needs to be checked twice for the TCP part.
.\"If the function call overhead is more significant than the actual
.\".I m_pullup
.\"or
.\".I m_pulldown
.\"operation,
.\"we may be able to blindly call
.\".I m_pulldown
.\"with the maximum TCP option length
.\"(60 bytes) in order to reduce the number of function calls.
.KF
.PS
Ao:	box invis ht boxht*2
A:	box at center of Ao "IPv6 header"
Bo:	box invis ht boxht*2
B:	box at center of Bo "TCP header" "(len)"
Co:	box invis ht boxht*2
C:	box at center of Co "TCP options"
D:	box "payload"

arrow from 1/3 of the way between Ao.sw and Ao.se to Ao.sw
arrow from 2/3 of the way between Ao.sw and Ao.se to Ao.se
line invis from Ao.sw to Ao.se "40"
line from Ao.sw to 4/5 of the way between Ao.sw and A.sw
line from Ao.se to 4/5 of the way between Ao.se and A.se

arrow from 1/3 of the way between Bo.nw and Bo.ne to Bo.nw
arrow from 2/3 of the way between Bo.nw and Bo.ne to Bo.ne
line invis from Bo.nw to Bo.ne "20"
line from Bo.nw to 4/5 of the way between Bo.nw and B.nw
line from Bo.ne to 4/5 of the way between Bo.ne and B.ne

arrow from 1/3 of the way between Bo.sw and Co.se to Bo.sw
arrow from 2/3 of the way between Bo.sw and Co.se to Co.se
line invis from Bo.sw to Co.se "20 to 60"
line from Bo.sw to 4/5 of the way between Bo.sw and B.sw
line from Co.se to 4/5 of the way between Co.se and C.se
.PE
.ce
.nr figure +1
Figure \n[figure]: processing a TCP-over-IPv6 packet requires 3 length checks.
.KE
The results suggest that we call
.I m_pulldown
more frequently in ICMPv6 processing than in the processing of other protocols.
These additional calls are made for parsing of ICMPv6 and for neighbor discovery options.
The use of loopback interface also contributes to the use of
.I m_pulldown.
.PP
In the tests, the number of copies made in the
.I m_pullup2
case is similar to the number made in the
.I m_pulldown
case.
.I m_pulldown
makes less copies than
.I m_pullup2
against packets like below:
.IP \(sq
A packet is kept in multiple mbuf.
With mbuf allocation policy in
.I m_devget,
we will see two mbufs to hold single packet
if the packet is larger than MHLEN and smaller than MHLEN + MLEN,
or the packet is larger than MCLBYTES.
.IP \(sq
We have extension headers in multiple mbufs.
Header portion in the packet needs to occupy first mbuf and
subsequent mbufs.
.LP
To demonstrate the difference, we have generated an IPv6 packet with a
routing header, with 4 IPv6 addresses.
The test result is presented as the 5th test in Table 2.
Packet will look like
.nr figure +1
Figure \n[figure].
.nr figure -1
First 112 bytes are occupied by an IPv6 header and a routing header,
and the remaining 16 bytes are used for an ICMPv6 header and payload.
The packet met the above condition, and
.I m_pulldown
made less copies than
.I m_pullup2.
To process single incoming ICMPv6 packet shown in the figure,
.I m_pullup2
made 7 copies while
.I m_pulldown
made only 1 copy.
.KF
.LD
.ps 6
.vs 7
\f[CR]node A (source) = 2001:240:0:200:260:97ff:fe07:69ea
node B (destination) = 2001:240:0:200:a00:5aff:fe38:6f86
17:39:43.346078 A > B:
	srcrt (type=0,segleft=4,[0]B,[1]B,[2]B,[3]B):
	icmp6: echo request (len 88, hlim 64)
		 6000 0000 0058 2b40 2001 0240 0000 0200
		 0260 97ff fe07 69ea 2001 0240 0000 0200
		 0a00 5aff fe38 6f86 3a08 0004 0000 0000
		 2001 0240 0000 0200 0a00 5aff fe38 6f86
		 2001 0240 0000 0200 0a00 5aff fe38 6f86
		 2001 0240 0000 0200 0a00 5aff fe38 6f86
		 2001 0240 0000 0200 0a00 5aff fe38 6f86
		 8000 b650 030e 00c8 ce6e fd38 d553 0700
.DE
.ce
.nr figure +1
Figure \n[figure]: Packets with IPv6 routing header.
.KE
.PP
During the test, we experienced no kernel stack overflow,
thanks to a new calling sequence between IPv6 protocol handlers.
.PP
The number of copies and mbuf allocations vary very much by tests.
We need to investigate the traffic characteristic more carefully,
for example, about the average length of header portion in packets.