Up to [cvs.NetBSD.org] / src / external / bsd / jemalloc / dist / src
Request diff between arbitrary revisions
Keyword substitution: kv
Default branch: MAIN
bring back fix that was lost in the merge: use a local variable to modify the allocation alignment instead of modifying globally. Fixes overallocation.
merge changes for jemalloc-5.3.0
Import jemalloc-5.3.0 (previous was 5.1.0) * 5.3.0 (May 6, 2022) This release contains many speed and space optimizations, from micro optimizations on common paths to rework of internal data structures and locking schemes, and many more too detailed to list below. Multiple percent of system level metric improvements were measured in tested production workloads. The release has gone through large-scale production testing. New features: - Add the thread.idle mallctl which hints that the calling thread will be idle for a nontrivial period of time. (@davidtgoldblatt) - Allow small size classes to be the maximum size class to cache in the thread-specific cache, through the opt.[lg_]tcache_max option. (@interwq, @jordalgo) - Make the behavior of realloc(ptr, 0) configurable with opt.zero_realloc. (@davidtgoldblatt) - Add 'make uninstall' support. (@sangshuduo, @Lapenkov) - Support C++17 over-aligned allocation. (@marksantaniello) - Add the thread.peak mallctl for approximate per-thread peak memory tracking. (@davidtgoldblatt) - Add interval-based stats output opt.stats_interval. (@interwq) - Add prof.prefix to override filename prefixes for dumps. (@zhxchen17) - Add high resolution timestamp support for profiling. (@tyroguru) - Add the --collapsed flag to jeprof for flamegraph generation. (@igorwwwwwwwwwwwwwwwwwwww) - Add the --debug-syms-by-id option to jeprof for debug symbols discovery. (@DeannaGelbart) - Add the opt.prof_leak_error option to exit with error code when leak is detected using opt.prof_final. (@yunxuo) - Add opt.cache_oblivious as an runtime alternative to config.cache_oblivious. (@interwq) - Add mallctl interfaces: + opt.zero_realloc (@davidtgoldblatt) + opt.cache_oblivious (@interwq) + opt.prof_leak_error (@yunxuo) + opt.stats_interval (@interwq) + opt.stats_interval_opts (@interwq) + opt.tcache_max (@interwq) + opt.trust_madvise (@azat) + prof.prefix (@zhxchen17) + stats.zero_reallocs (@davidtgoldblatt) + thread.idle (@davidtgoldblatt) + thread.peak.{read,reset} (@davidtgoldblatt) Bug fixes: - Fix the synchronization around explicit tcache creation which could cause invalid tcache identifiers. This regression was first released in 5.0.0. (@yoshinorim, @davidtgoldblatt) - Fix a profiling biasing issue which could cause incorrect heap usage and object counts. This issue existed in all previous releases with the heap profiling feature. (@davidtgoldblatt) - Fix the order of stats counter updating on large realloc which could cause failed assertions. This regression was first released in 5.0.0. (@azat) - Fix the locking on the arena destroy mallctl, which could cause concurrent arena creations to fail. This functionality was first introduced in 5.0.0. (@interwq) Portability improvements: - Remove nothrow from system function declarations on macOS and FreeBSD. (@davidtgoldblatt, @fredemmott, @leres) - Improve overcommit and page alignment settings on NetBSD. (@zoulasc) - Improve CPU affinity support on BSD platforms. (@devnexen) - Improve utrace detection and support. (@devnexen) - Improve QEMU support with MADV_DONTNEED zeroed pages detection. (@azat) - Add memcntl support on Solaris / illumos. (@devnexen) - Improve CPU_SPINWAIT on ARM. (@AWSjswinney) - Improve TSD cleanup on FreeBSD. (@Lapenkov) - Disable percpu_arena if the CPU count cannot be reliably detected. (@azat) - Add malloc_size(3) override support. (@devnexen) - Add mmap VM_MAKE_TAG support. (@devnexen) - Add support for MADV_[NO]CORE. (@devnexen) - Add support for DragonFlyBSD. (@devnexen) - Fix the QUANTUM setting on MIPS64. (@brooksdavis) - Add the QUANTUM setting for ARC. (@vineetgarc) - Add the QUANTUM setting for LoongArch. (@wangjl-uos) - Add QNX support. (@jqian-aurora) - Avoid atexit(3) calls unless the relevant profiling features are enabled. (@BusyJay, @laiwei-rice, @interwq) - Fix unknown option detection when using Clang. (@Lapenkov) - Fix symbol conflict with musl libc. (@georgthegreat) - Add -Wimplicit-fallthrough checks. (@nickdesaulniers) - Add __forceinline support on MSVC. (@santagada) - Improve FreeBSD and Windows CI support. (@Lapenkov) - Add CI support for PPC64LE architecture. (@ezeeyahoo) Incompatible changes: - Maximum size class allowed in tcache (opt.[lg_]tcache_max) now has an upper bound of 8MiB. (@interwq) Optimizations and refactors (@davidtgoldblatt, @Lapenkov, @interwq): - Optimize the common cases of the thread cache operations. - Optimize internal data structures, including RB tree and pairing heap. - Optimize the internal locking on extent management. - Extract and refactor the internal page allocator and interface modules. Documentation: - Fix doc build with --with-install-suffix. (@lawmurray, @interwq) - Add PROFILING_INTERNALS.md. (@davidtgoldblatt) - Ensure the proper order of doc building and installation. (@Mingli-Yu) * 5.2.1 (August 5, 2019) This release is primarily about Windows. A critical virtual memory leak is resolved on all Windows platforms. The regression was present in all releases since 5.0.0. Bug fixes: - Fix a severe virtual memory leak on Windows. This regression was first released in 5.0.0. (@Ignition, @j0t, @frederik-h, @davidtgoldblatt, @interwq) - Fix size 0 handling in posix_memalign(). This regression was first released in 5.2.0. (@interwq) - Fix the prof_log unit test which may observe unexpected backtraces from compiler optimizations. The test was first added in 5.2.0. (@marxin, @gnzlbg, @interwq) - Fix the declaration of the extent_avail tree. This regression was first released in 5.1.0. (@zoulasc) - Fix an incorrect reference in jeprof. This functionality was first released in 3.0.0. (@prehistoric-penguin) - Fix an assertion on the deallocation fast-path. This regression was first released in 5.2.0. (@yinan1048576) - Fix the TLS_MODEL attribute in headers. This regression was first released in 5.0.0. (@zoulasc, @interwq) Optimizations and refactors: - Implement opt.retain on Windows and enable by default on 64-bit. (@interwq, @davidtgoldblatt) - Optimize away a branch on the operator delete[] path. (@mgrice) - Add format annotation to the format generator function. (@zoulasc) - Refactor and improve the size class header generation. (@yinan1048576) - Remove best fit. (@djwatson) - Avoid blocking on background thread locks for stats. (@oranagra, @interwq) * 5.2.0 (April 2, 2019) This release includes a few notable improvements, which are summarized below: 1) improved fast-path performance from the optimizations by @djwatson; 2) reduced virtual memory fragmentation and metadata usage; and 3) bug fixes on setting the number of background threads. In addition, peak / spike memory usage is improved with certain allocation patterns. As usual, the release and prior dev versions have gone through large-scale production testing. New features: - Implement oversize_threshold, which uses a dedicated arena for allocations crossing the specified threshold to reduce fragmentation. (@interwq) - Add extents usage information to stats. (@tyleretzel) - Log time information for sampled allocations. (@tyleretzel) - Support 0 size in sdallocx. (@djwatson) - Output rate for certain counters in malloc_stats. (@zinoale) - Add configure option --enable-readlinkat, which allows the use of readlinkat over readlink. (@davidtgoldblatt) - Add configure options --{enable,disable}-{static,shared} to allow not building unwanted libraries. (@Ericson2314) - Add configure option --disable-libdl to enable fully static builds. (@interwq) - Add mallctl interfaces: + opt.oversize_threshold (@interwq) + stats.arenas.<i>.extent_avail (@tyleretzel) + stats.arenas.<i>.extents.<j>.n{dirty,muzzy,retained} (@tyleretzel) + stats.arenas.<i>.extents.<j>.{dirty,muzzy,retained}_bytes (@tyleretzel) Portability improvements: - Update MSVC builds. (@maksqwe, @rustyx) - Workaround a compiler optimizer bug on s390x. (@rkmisra) - Make use of pthread_set_name_np(3) on FreeBSD. (@trasz) - Implement malloc_getcpu() to enable percpu_arena for windows. (@santagada) - Link against -pthread instead of -lpthread. (@paravoid) - Make background_thread not dependent on libdl. (@interwq) - Add stringify to fix a linker directive issue on MSVC. (@daverigby) - Detect and fall back when 8-bit atomics are unavailable. (@interwq) - Fall back to the default pthread_create if dlsym(3) fails. (@interwq) Optimizations and refactors: - Refactor the TSD module. (@davidtgoldblatt) - Avoid taking extents_muzzy mutex when muzzy is disabled. (@interwq) - Avoid taking large_mtx for auto arenas on the tcache flush path. (@interwq) - Optimize ixalloc by avoiding a size lookup. (@interwq) - Implement opt.oversize_threshold which uses a dedicated arena for requests crossing the threshold, also eagerly purges the oversize extents. Default the threshold to 8 MiB. (@interwq) - Clean compilation with -Wextra. (@gnzlbg, @jasone) - Refactor the size class module. (@davidtgoldblatt) - Refactor the stats emitter. (@tyleretzel) - Optimize pow2_ceil. (@rkmisra) - Avoid runtime detection of lazy purging on FreeBSD. (@trasz) - Optimize mmap(2) alignment handling on FreeBSD. (@trasz) - Improve error handling for THP state initialization. (@jsteemann) - Rework the malloc() fast path. (@djwatson) - Rework the free() fast path. (@djwatson) - Refactor and optimize the tcache fill / flush paths. (@djwatson) - Optimize sync / lwsync on PowerPC. (@chmeeedalf) - Bypass extent_dalloc() when retain is enabled. (@interwq) - Optimize the locking on large deallocation. (@interwq) - Reduce the number of pages committed from sanity checking in debug build. (@trasz, @interwq) - Deprecate OSSpinLock. (@interwq) - Lower the default number of background threads to 4 (when the feature is enabled). (@interwq) - Optimize the trylock spin wait. (@djwatson) - Use arena index for arena-matching checks. (@interwq) - Avoid forced decay on thread termination when using background threads. (@interwq) - Disable muzzy decay by default. (@djwatson, @interwq) - Only initialize libgcc unwinder when profiling is enabled. (@paravoid, @interwq) Bug fixes (all only relevant to jemalloc 5.x): - Fix background thread index issues with max_background_threads. (@djwatson, @interwq) - Fix stats output for opt.lg_extent_max_active_fit. (@interwq) - Fix opt.prof_prefix initialization. (@davidtgoldblatt) - Properly trigger decay on tcache destroy. (@interwq, @amosbird) - Fix tcache.flush. (@interwq) - Detect whether explicit extent zero out is necessary with huge pages or custom extent hooks, which may change the purge semantics. (@interwq) - Fix a side effect caused by extent_max_active_fit combined with decay-based purging, where freed extents can accumulate and not be reused for an extended period of time. (@interwq, @mpghf) - Fix a missing unlock on extent register error handling. (@zoulasc) Testing: - Simplify the Travis script output. (@gnzlbg) - Update the test scripts for FreeBSD. (@devnexen) - Add unit tests for the producer-consumer pattern. (@interwq) - Add Cirrus-CI config for FreeBSD builds. (@jasone) - Add size-matching sanity checks on tcache flush. (@davidtgoldblatt, @interwq) Incompatible changes: - Remove --with-lg-page-sizes. (@davidtgoldblatt) Documentation: - Attempt to build docs by default, however skip doc building when xsltproc is missing. (@interwq, @cmuellner)
Merge changes from current as of 20200406
set that NetBSD overcommits (from maya)
Sync with HEAD
file pages.c was added on branch phil-wifi on 2019-06-10 21:44:55 +0000
Enforce alignment also if the compiled in PAGE_SIZE is bigger than getpagesize()
Undo previous, it is moving us in the wrong direction.
Allow os_page sizes greater than the built-in page size. This can happen for example for COMPAT_NETBSD32 sparc binaries (4K page size because of MIN_PAGE_SIZE), running on sparc64 (8K pages).
we have MAP_ALIGNED, so use it (although it does not do anything by default)
import jemalloc-5.1.0
Initial revision