Annotation of src/lib/libc/stdlib/jemalloc.3, Revision 1.8
1.1 jruoho 1: .\" $NetBSD $
2: .\"
3: .\" Copyright (c) 1980, 1991, 1993
4: .\" The Regents of the University of California. All rights reserved.
5: .\"
6: .\" This code is derived from software contributed to Berkeley by
7: .\" the American National Standards Committee X3, on Information
8: .\" Processing Systems.
9: .\"
10: .\" Redistribution and use in source and binary forms, with or without
11: .\" modification, are permitted provided that the following conditions
12: .\" are met:
13: .\" 1. Redistributions of source code must retain the above copyright
14: .\" notice, this list of conditions and the following disclaimer.
15: .\" 2. Redistributions in binary form must reproduce the above copyright
16: .\" notice, this list of conditions and the following disclaimer in the
17: .\" documentation and/or other materials provided with the distribution.
18: .\" 3. Neither the name of the University nor the names of its contributors
19: .\" may be used to endorse or promote products derived from this software
20: .\" without specific prior written permission.
21: .\"
22: .\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
23: .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
24: .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
25: .\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
26: .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
27: .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
28: .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
29: .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
30: .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
31: .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
32: .\" SUCH DAMAGE.
33: .\"
34: .\" @(#)malloc.3 8.1 (Berkeley) 6/4/93
35: .\" $FreeBSD: src/lib/libc/stdlib/malloc.3,v 1.73 2007/06/15 22:32:33 jasone Exp $
36: .\"
1.7 jruoho 37: .Dd June 21, 2011
1.6 njoly 38: .Dt JEMALLOC 3
1.1 jruoho 39: .Os
40: .Sh NAME
41: .Nm jemalloc
42: .Nd the default system allocator
1.2 jruoho 43: .Sh LIBRARY
44: .Lb libc
45: .Sh SYNOPSIS
1.3 jruoho 46: .Ft const char *
1.2 jruoho 47: .Va _malloc_options ;
1.1 jruoho 48: .Sh DESCRIPTION
49: The
50: .Nm
51: is a general-purpose concurrent
52: .Xr malloc 3
53: implementation specifically designed to be scalable
54: on modern multi-processor systems.
55: It is the default user space system allocator in
56: .Nx .
1.5 jruoho 57: .Pp
1.1 jruoho 58: When the first call is made to one of the memory allocation
59: routines such as
60: .Fn malloc
61: or
62: .Fn realloc ,
63: various flags that affect the workings of the allocator are set or reset.
64: These are described below.
65: .Pp
66: The
67: .Dq name
68: of the file referenced by the symbolic link named
69: .Pa /etc/malloc.conf ,
70: the value of the environment variable
71: .Ev MALLOC_OPTIONS ,
72: and the string pointed to by the global variable
73: .Va _malloc_options
74: will be interpreted, in that order, character by character as flags.
75: .Pp
76: Most flags are single letters.
77: Uppercase letters indicate that the behavior is set, or on,
78: and lowercase letters mean that the behavior is not set, or off.
79: The following options are available.
80: .Bl -tag -width "A " -offset 3n
81: .It Em A
82: All warnings (except for the warning about unknown
83: flags being set) become fatal.
84: The process will call
85: .Xr abort 3
86: in these cases.
87: .It Em H
88: Use
89: .Xr madvise 2
90: when pages within a chunk are no longer in use, but the chunk as a whole cannot
91: yet be deallocated.
92: This is primarily of use when swapping is a real possibility, due to the high
93: overhead of the
94: .Fn madvise
95: system call.
96: .It Em J
97: Each byte of new memory allocated by
98: .Fn malloc ,
99: .Fn realloc
100: will be initialized to 0xa5.
101: All memory returned by
102: .Fn free ,
103: .Fn realloc
104: will be initialized to 0x5a.
105: This is intended for debugging and will impact performance negatively.
106: .It Em K
107: Increase/decrease the virtual memory chunk size by a factor of two.
108: The default chunk size is 1 MB.
109: This option can be specified multiple times.
110: .It Em N
111: Increase/decrease the number of arenas by a factor of two.
112: The default number of arenas is four times the number of CPUs, or one if there
113: is a single CPU.
114: This option can be specified multiple times.
115: .It Em P
116: Various statistics are printed at program exit via an
117: .Xr atexit 3
118: function.
119: This has the potential to cause deadlock for a multi-threaded process that exits
120: while one or more threads are executing in the memory allocation functions.
121: Therefore, this option should only be used with care; it is primarily intended
122: as a performance tuning aid during application development.
123: .It Em Q
124: Increase/decrease the size of the allocation quantum by a factor of two.
125: The default quantum is the minimum allowed by the architecture (typically 8 or
126: 16 bytes).
127: This option can be specified multiple times.
128: .It Em S
129: Increase/decrease the size of the maximum size class that is a multiple of the
130: quantum by a factor of two.
131: Above this size, power-of-two spacing is used for size classes.
132: The default value is 512 bytes.
133: This option can be specified multiple times.
134: .It Em U
135: Generate
136: .Dq utrace
137: entries for
138: .Xr ktrace 1 ,
139: for all operations.
140: Consult the source for details on this option.
141: .It Em V
142: Attempting to allocate zero bytes will return a
143: .Dv NULL
144: pointer instead of a valid pointer.
145: (The default behavior is to make a minimal allocation and return a
146: pointer to it.)
147: This option is provided for System V compatibility.
148: This option is incompatible with the
149: .Em X
150: option.
151: .It Em X
152: Rather than return failure for any allocation function,
153: display a diagnostic message on
154: .Dv stderr
155: and cause the program to drop
156: core (using
157: .Xr abort 3 ) .
158: This option should be set at compile time by including the following in
159: the source code:
160: .Bd -literal -offset indent
161: _malloc_options = "X";
162: .Ed
163: .Pp
164: .It Em Z
165: Each byte of new memory allocated by
166: .Fn malloc ,
167: .Fn realloc
168: will be initialized to 0.
169: Note that this initialization only happens once for each byte, so
170: .Fn realloc
171: does not zero memory that was previously allocated.
172: This is intended for debugging and will impact performance negatively.
173: .El
174: .Pp
1.7 jruoho 175: Extra care should be taken when enabling
176: any of the options in production environments.
1.1 jruoho 177: The
1.7 jruoho 178: .Em A ,
179: .Em J ,
1.1 jruoho 180: and
181: .Em Z
182: options are intended for testing and debugging.
183: An application which changes its behavior when these options are used
184: is flawed.
185: .Sh IMPLEMENTATION NOTES
186: The
187: .Nm
188: allocator uses multiple arenas in order to reduce lock
189: contention for threaded programs on multi-processor systems.
190: This works well with regard to threading scalability, but incurs some costs.
191: There is a small fixed per-arena overhead, and additionally, arenas manage
192: memory completely independently of each other, which means a small fixed
193: increase in overall memory fragmentation.
194: These overheads are not generally an issue,
195: given the number of arenas normally used.
196: Note that using substantially more arenas than the default is not likely to
197: improve performance, mainly due to reduced cache performance.
198: However, it may make sense to reduce the number of arenas if an application
199: does not make much use of the allocation functions.
200: .Pp
201: Memory is conceptually broken into equal-sized chunks,
202: where the chunk size is a power of two that is greater than the page size.
203: Chunks are always aligned to multiples of the chunk size.
204: This alignment makes it possible to find
205: metadata for user objects very quickly.
206: .Pp
207: User objects are broken into three categories according to size:
208: .Bl -enum -offset 3n
209: .It
210: Small objects are smaller than one page.
211: .It
212: Large objects are smaller than the chunk size.
213: .It
214: Huge objects are a multiple of the chunk size.
215: .El
216: .Pp
217: Small and large objects are managed by arenas; huge objects are managed
218: separately in a single data structure that is shared by all threads.
219: Huge objects are used by applications infrequently enough that this single
220: data structure is not a scalability issue.
221: .Pp
222: Each chunk that is managed by an arena tracks its contents in a page map as
223: runs of contiguous pages (unused, backing a set of small objects, or backing
224: one large object).
225: The combination of chunk alignment and chunk page maps makes it possible to
226: determine all metadata regarding small and large allocations in constant time.
227: .Pp
228: Small objects are managed in groups by page runs.
229: Each run maintains a bitmap that tracks which regions are in use.
230: Allocation requests can be grouped as follows.
231: .Pp
232: .Bl -bullet -offset 3n
233: .It
234: Allocation requests that are no more than half the quantum (see the
235: .Em Q
236: option) are rounded up to the nearest power of two (typically 2, 4, or 8).
237: .It
238: Allocation requests that are more than half the quantum, but no more than the
239: maximum quantum-multiple size class (see the
240: .Em S
241: option) are rounded up to the nearest multiple of the quantum.
242: .It
243: Allocation requests that are larger than the maximum quantum-multiple size
244: class, but no larger than one half of a page, are rounded up to the nearest
245: power of two.
246: .It
247: Allocation requests that are larger than half of a page, but small enough to
248: fit in an arena-managed chunk (see the
249: .Em K
250: option), are rounded up to the nearest run size.
251: .It
252: Allocation requests that are too large to fit in an arena-managed chunk are
253: rounded up to the nearest multiple of the chunk size.
254: .El
255: .Pp
256: Allocations are packed tightly together, which can be an issue for
257: multi-threaded applications.
258: If you need to assure that allocations do not suffer from cache line sharing,
259: round your allocation requests up to the nearest multiple of the cache line
260: size.
261: .Sh DEBUGGING
262: The first thing to do is to set the
263: .Em A
264: option.
265: This option forces a coredump (if possible) at the first sign of trouble,
266: rather than the normal policy of trying to continue if at all possible.
267: .Pp
268: It is probably also a good idea to recompile the program with suitable
269: options and symbols for debugger support.
270: .Pp
271: If the program starts to give unusual results, coredump or generally behave
272: differently without emitting any of the messages mentioned in the next
273: section, it is likely because it depends on the storage being filled with
274: zero bytes.
275: Try running it with the
276: .Em Z
277: option set;
278: if that improves the situation, this diagnosis has been confirmed.
279: If the program still misbehaves,
280: the likely problem is accessing memory outside the allocated area.
281: .Pp
282: Alternatively, if the symptoms are not easy to reproduce, setting the
283: .Em J
284: option may help provoke the problem.
285: In truly difficult cases, the
286: .Em U
287: option, if supported by the kernel, can provide a detailed trace of
288: all calls made to these functions.
289: .Pp
290: Unfortunately,
291: .Nm
292: does not provide much detail about the problems it detects;
293: the performance impact for storing such information would be prohibitive.
294: There are a number of allocator implementations available on the Internet
295: which focus on detecting and pinpointing problems by trading performance for
296: extra sanity checks and detailed diagnostics.
1.8 ! wiz 297: .Sh ENVIRONMENT
! 298: The following environment variables affect the execution of the allocation
! 299: functions:
! 300: .Bl -tag -width ".Ev MALLOC_OPTIONS"
! 301: .It Ev MALLOC_OPTIONS
! 302: If the environment variable
! 303: .Ev MALLOC_OPTIONS
! 304: is set, the characters it contains will be interpreted as flags to the
! 305: allocation functions.
! 306: .El
! 307: .Sh EXAMPLES
! 308: To dump core whenever a problem occurs:
! 309: .Pp
! 310: .Bd -literal -offset indent
! 311: ln -s 'A' /etc/malloc.conf
! 312: .Ed
! 313: .Pp
! 314: To specify in the source that a program does no return value checking
! 315: on calls to these functions:
! 316: .Bd -literal -offset indent
! 317: _malloc_options = "X";
! 318: .Ed
1.5 jruoho 319: .Sh DIAGNOSTICS
1.1 jruoho 320: If any of the memory allocation/deallocation functions detect an error or
321: warning condition, a message will be printed to file descriptor
322: .Dv STDERR_FILENO .
323: Errors will result in the process dumping core.
324: If the
325: .Em A
326: option is set, all warnings are treated as errors.
327: .Pp
1.3 jruoho 328: .\"
329: .\" XXX: The _malloc_message should be documented
330: .\" better in order to be worth mentioning.
331: .\"
1.1 jruoho 332: The
333: .Va _malloc_message
334: variable allows the programmer to override the function which emits
335: the text strings forming the errors and warnings if for some reason
336: the
337: .Dv stderr
338: file descriptor is not suitable for this.
339: Please note that doing anything which tries to allocate memory in
340: this function is likely to result in a crash or deadlock.
341: .Pp
342: All messages are prefixed by
343: .Dq Ao Ar progname Ac Ns Li \&: Pq malloc .
344: .Sh SEE ALSO
345: .Xr emalloc 3 ,
346: .Xr malloc 3 ,
347: .Xr memory 3 ,
348: .Xr memoryallocators 9
349: .\"
350: .\" XXX: Add more references that could be worth reading.
351: .\"
352: .Rs
353: .%A Jason Evans
354: .%T "A Scalable Concurrent malloc(3) Implementation for FreeBSD"
355: .%D April 16, 2006
356: .%O BSDCan 2006
357: .%U http://people.freebsd.org/~jasone/jemalloc/bsdcan2006/jemalloc.pdf
358: .Re
359: .Rs
360: .%A Poul-Henning Kamp
361: .%T "Malloc(3) revisited"
362: .%I USENIX Association
363: .%B Proceedings of the FREENIX Track: 1998 USENIX Annual Technical Conference
364: .%D June 15-19, 1998
1.4 wiz 365: .%U http://www.usenix.org/publications/library/proceedings/usenix98/freenix/kamp.pdf
1.1 jruoho 366: .Re
367: .Rs
368: .%A Paul R. Wilson
369: .%A Mark S. Johnstone
370: .%A Michael Neely
371: .%A David Boles
372: .%T "Dynamic Storage Allocation: A Survey and Critical Review"
373: .%D 1995
374: .%I University of Texas at Austin
375: .%U ftp://ftp.cs.utexas.edu/pub/garbage/allocsrv.ps
376: .Re
377: .Sh HISTORY
378: The
379: .Nm
380: allocator became the default system allocator first in
381: .Fx 7.0
382: and then in
383: .Nx 5.0 .
384: In both systems it replaced the older so-called
385: .Dq phkmalloc
386: implementation.
387: .Sh AUTHORS
388: .An Jason Evans Aq jasone@canonware.com
CVSweb <webmaster@jp.NetBSD.org>