Up to [cvs.NetBSD.org] / src / lib / libpthread
Request diff between arbitrary revisions
Keyword substitution: kv
Default branch: MAIN
Create a new header lwp_private.h to contain _lwp_getprivate_fast, _lwp_gettcb_fast, _lwp_settcb and remove them from mcontext.h, so that: 1. we don't need special hacks to hide them 2. we can include <lwp.h> where needed to get the necessary prototypes without redefining them locally.
Pull up following revision(s) (requested by riastradh in ticket #1878): lib/libpthread/arch/x86_64/pthread_md.h: revision 1.13 lib/libpthread/pthread_int.h: revision 1.110 lib/libpthread/pthread_int.h: revision 1.111 lib/libpthread/arch/i386/pthread_md.h: revision 1.21 lib/libpthread/arch/arm/pthread_md.h: revision 1.12 lib/libpthread/arch/arm/pthread_md.h: revision 1.13 lib/libpthread/pthread_spin.c: revision 1.11 lib/libpthread/arch/aarch64/pthread_md.h: revision 1.2 libpthread: Use __nothing, not /* nothing */, for empty macros. No functional change intended -- just safer to do it this way in case the macros are used in if branches or comma expressions. PR port-arm/57437 (pthread__smt_pause/wake issue) libpthread: New pthread__smt_wait to put CPU in low power for spin. This is now distinct from pthread__smt_pause, which is for spin lock backoff with no paired wakeup. On Arm, there is a single-bit event register per CPU, and there are two instructions to manage it: - wfe, wait for event -- if event register is clear, enter low power mode and wait until event register is set; then exit low power mode and clear event register - sev, signal event -- sets event register on all CPUs (other circumstances like interrupts also set the event register and cause wfe to wake) These can be used to reduce the power consumption of spinning for a lock, but only if they are actually paired -- if there's no sev, wfe might hang indefinitely. Currently only pthread_spin(3) actually pairs them; the other lock primitives (internal lock, mutex, rwlock) do not -- they have spin lock backoff loops, but no corresponding wakeup to cancel a wfe. It may be worthwhile to teach the other lock primitives to pair wfe/sev, but that requires some performance measurement to verify it's actually worthwhile. So for now, we just make sure not to use wfe when there's no sev, and keep everything else the same -- this should fix severe performance degredation in libpthread on Arm without hurting anything else. No change in the generated code on amd64 and i386. No change in the generated code for pthread_spin.c on arm and aarch64 -- changes only the generated code for pthread_lock.c, pthread_mutex.c, and pthread_rwlock.c, as intended. PR port-arm/57437
Pull up following revision(s) (requested by riastradh in ticket #1700): lib/libpthread/arch/x86_64/pthread_md.h: revision 1.13 lib/libpthread/pthread_int.h: revision 1.110 lib/libpthread/pthread_int.h: revision 1.111 lib/libpthread/arch/i386/pthread_md.h: revision 1.21 lib/libpthread/arch/arm/pthread_md.h: revision 1.12 lib/libpthread/arch/arm/pthread_md.h: revision 1.13 lib/libpthread/pthread_spin.c: revision 1.11 lib/libpthread/arch/aarch64/pthread_md.h: revision 1.2 libpthread: Use __nothing, not /* nothing */, for empty macros. No functional change intended -- just safer to do it this way in case the macros are used in if branches or comma expressions. PR port-arm/57437 (pthread__smt_pause/wake issue) libpthread: New pthread__smt_wait to put CPU in low power for spin. This is now distinct from pthread__smt_pause, which is for spin lock backoff with no paired wakeup. On Arm, there is a single-bit event register per CPU, and there are two instructions to manage it: - wfe, wait for event -- if event register is clear, enter low power mode and wait until event register is set; then exit low power mode and clear event register - sev, signal event -- sets event register on all CPUs (other circumstances like interrupts also set the event register and cause wfe to wake) These can be used to reduce the power consumption of spinning for a lock, but only if they are actually paired -- if there's no sev, wfe might hang indefinitely. Currently only pthread_spin(3) actually pairs them; the other lock primitives (internal lock, mutex, rwlock) do not -- they have spin lock backoff loops, but no corresponding wakeup to cancel a wfe. It may be worthwhile to teach the other lock primitives to pair wfe/sev, but that requires some performance measurement to verify it's actually worthwhile. So for now, we just make sure not to use wfe when there's no sev, and keep everything else the same -- this should fix severe performance degredation in libpthread on Arm without hurting anything else. No change in the generated code on amd64 and i386. No change in the generated code for pthread_spin.c on arm and aarch64 -- changes only the generated code for pthread_lock.c, pthread_mutex.c, and pthread_rwlock.c, as intended. PR port-arm/57437
Pull up following revision(s) (requested by riastradh in ticket #296): lib/libpthread/arch/x86_64/pthread_md.h: revision 1.13 lib/libpthread/pthread_int.h: revision 1.110 lib/libpthread/pthread_int.h: revision 1.111 lib/libpthread/arch/i386/pthread_md.h: revision 1.21 lib/libpthread/arch/arm/pthread_md.h: revision 1.12 lib/libpthread/arch/arm/pthread_md.h: revision 1.13 lib/libpthread/pthread_spin.c: revision 1.11 lib/libpthread/arch/aarch64/pthread_md.h: revision 1.2 libpthread: Use __nothing, not /* nothing */, for empty macros. No functional change intended -- just safer to do it this way in case the macros are used in if branches or comma expressions. PR port-arm/57437 (pthread__smt_pause/wake issue) libpthread: New pthread__smt_wait to put CPU in low power for spin. This is now distinct from pthread__smt_pause, which is for spin lock backoff with no paired wakeup. On Arm, there is a single-bit event register per CPU, and there are two instructions to manage it: - wfe, wait for event -- if event register is clear, enter low power mode and wait until event register is set; then exit low power mode and clear event register - sev, signal event -- sets event register on all CPUs (other circumstances like interrupts also set the event register and cause wfe to wake) These can be used to reduce the power consumption of spinning for a lock, but only if they are actually paired -- if there's no sev, wfe might hang indefinitely. Currently only pthread_spin(3) actually pairs them; the other lock primitives (internal lock, mutex, rwlock) do not -- they have spin lock backoff loops, but no corresponding wakeup to cancel a wfe. It may be worthwhile to teach the other lock primitives to pair wfe/sev, but that requires some performance measurement to verify it's actually worthwhile. So for now, we just make sure not to use wfe when there's no sev, and keep everything else the same -- this should fix severe performance degredation in libpthread on Arm without hurting anything else. No change in the generated code on amd64 and i386. No change in the generated code for pthread_spin.c on arm and aarch64 -- changes only the generated code for pthread_lock.c, pthread_mutex.c, and pthread_rwlock.c, as intended. PR port-arm/57437
libpthread: New pthread__smt_wait to put CPU in low power for spin. This is now distinct from pthread__smt_pause, which is for spin lock backoff with no paired wakeup. On Arm, there is a single-bit event register per CPU, and there are two instructions to manage it: - wfe, wait for event -- if event register is clear, enter low power mode and wait until event register is set; then exit low power mode and clear event register - sev, signal event -- sets event register on all CPUs (other circumstances like interrupts also set the event register and cause wfe to wake) These can be used to reduce the power consumption of spinning for a lock, but only if they are actually paired -- if there's no sev, wfe might hang indefinitely. Currently only pthread_spin(3) actually pairs them; the other lock primitives (internal lock, mutex, rwlock) do not -- they have spin lock backoff loops, but no corresponding wakeup to cancel a wfe. It may be worthwhile to teach the other lock primitives to pair wfe/sev, but that requires some performance measurement to verify it's actually worthwhile. So for now, we just make sure not to use wfe when there's no sev, and keep everything else the same -- this should fix severe performance degredation in libpthread on Arm without hurting anything else. No change in the generated code on amd64 and i386. No change in the generated code for pthread_spin.c on arm and aarch64 -- changes only the generated code for pthread_lock.c, pthread_mutex.c, and pthread_rwlock.c, as intended. PR port-arm/57437 XXX pullup-10
libpthread: Use __nothing, not /* nothing */, for empty macros. No functional change intended -- just safer to do it this way in case the macros are used in if branches or comma expressions. PR port-arm/57437 (pthread__smt_pause/wake issue) XXX pullup-10
lib: remove CONSTCOND comment Since 2021-01-31, lint doesn't need it anymore for the common pattern of 'do ... while (0)'.
libpthread: Move namespacing include to top of .c files. Stuff like libc's namespace.h, or atomic_op_namespace.h, which does namespacing tricks like `#define atomic_cas_uint _atomic_cas_uint', has to go at the top of each .c file. If it goes in the middle, it might be too late to affect the declarations, and result in compile errors. I tripped over this by including <sys/atomic.h> in mips <machine/lock.h>. (Maybe we should create a new pthread_namespace.h file for the purpose, but this'll do for now.)
- Make pthread_condvar and pthread_mutex work on the stack rather than in pthread_t, so there's less chance of bad things happening if someone calls (for example) pthread_cond_broadcast() from a signal handler. - Remove all the deferred waiter handling except for the one case that really matters which is transferring waiters from condvar -> mutex on wakeup, and do that by splicing the condvar's waiters onto the mutex. - Remove the mutex waiters bit as it's another complication that's not strictly needed.
Pass down errno when calling pthread__errorfunc after a system call. Allow format arguments for that reason and use (v)snprintf_ss in pthread_errorfunc to avoid race conditions and the like.
In the interests of reliability simplify waiter handling more and redo condvars to manage the list of waiters with atomic ops.
- Try to eliminate a hang in "parked" I've been seeing while stress testing. Centralise wakeup of deferred waiters in pthread__clear_waiters() and use throughout libpthread. Make fewer assumptions. Be more conservative in pthread_mutex when dealing with pending waiters. - Remove the "hint" argument everywhere since the kernel doesn't use it any more.
Merge changes from current as of 20200406
Revert "Enhance the pthread(3) + malloc(3) init model" It is reported to hand on aarch64 with gzip.
Enhance the pthread(3) + malloc(3) init model Separate the pthread_atfork(3) call from pthread_tsd_init() and move it into a distinct function. Call inside pthread__init() late TSD initialization route, just after "pthread_atfork(NULL, NULL, pthread__fork_callback);". Document that malloc(3) initialization is now controlled again and called during the first pthread_atfork(3) call. Remove #if 0 code from pthread_mutex.c as we no longer initialize malloc prematurely.
Retire ifdef ERRORCHECK in pthread(3) It is enabled unconditionally since 2003 and used only for rwlocks and spinlocks. LLVM sanitizers make assumptions that these checks are enabled always.
- A bit more alignment in __pthread_st especially for the rbtree node. - Use COHERENCY_UNIT from sys/param.h.
pthread_detach(), pthread_join(): go back to using _lwp_detach() and _lwp_wait(), rather than doing it all in userspace. There's less to go wrong. Doesn't seem to be a performance penalty.
Pull up following revision(s) (requested by ad in ticket #647): lib/libpthread/pthread_rwlock.c: revision 1.37 (patch) lib/libpthread/pthread_misc.c: revision 1.16 lib/libpthread/pthread.c: revision 1.154 lib/libpthread/pthread_int.h: revision 1.98 lib/libpthread/pthread_cond.c: revision 1.66 lib/libpthread/pthread_mutex.c: revision 1.66 Rip out some very ambitious optimisations around pthread_mutex that are don't buy much. This stuff is hard enough to get right in the kernel let alone userspace, and I don't trust that it's right.
Rip out some very ambitious optimisations around pthread_mutex that are don't buy much. This stuff is hard enough to get right in the kernel let alone userspace, and I don't trust that it's right.
Pull up following revision(s) (requested by joerg in ticket #571): lib/libpthread/pthread_int.h: revision 1.97 Bump PTHREAD__UNPARK_MAX to 128 as bandaid for locking related hangs.
Bump PTHREAD__UNPARK_MAX to 128 as bandaid for locking related hangs.
G/c unused rwlock owner macros copy-pasted from the kernel. They were brought along with the rwlock flags but never used and never even adapted to the new home (the struct member name is different here). I looked at adapting and using them, but they don't really help readability that much and there are cases where we need to deal with "fused" owner values anyway and so can't use them.
Sync with HEAD
Transfer all the keys that were created in the libc stub implementation to the pthread tsd implementation when the main thread is created. This corrects a problem where a process created keys before libpthread was loaded (either from the libc constructor or because libpthread was dlopened later). This fixes a problem with jemalloc which creates keys in the constructor.
Pull up following revision(s) (requested by joerg in ticket #234): sys/arch/amd64/include/vmparam.h: revision 1.43 sys/kern/exec_subr.c: revision 1.79 lib/libpthread/pthread_int.h: revision 1.94 sys/arch/mips/include/vmparam.h: revision 1.58 sys/arch/mips/include/vmparam.h: revision 1.59 lib/libpthread/TODO: revision 1.19 sys/arch/powerpc/include/vmparam.h: revision 1.20 sys/arch/riscv/include/vmparam.h: revision 1.2 sys/arch/riscv/include/vmparam.h: revision 1.3 sys/arch/i386/include/vmparam.h: revision 1.85 tests/lib/libpthread/t_join.c: revision 1.9 sys/uvm/uvm_meter.c: revision 1.66 sys/uvm/uvm_param.h: revision 1.36 sys/kern/exec_subr.c: revision 1.80 sys/uvm/uvm_param.h: revision 1.37 sys/kern/exec_subr.c: revision 1.81 sys/kern/exec_subr.c: revision 1.82 lib/libpthread/pthread_attr_getguardsize.3: revision 1.4 lib/libpthread/pthread.c: revision 1.148 lib/libpthread/pthread_attr.c: revision 1.17 sys/arch/amd64/include/vmparam.h: revision 1.42 Always include a 1MB guard area beyond the end of stack. While ASLR will normally create a guard area as well, this provides a deterministic area for all binaries. Mitigates the rest of CVE-2017-1000374 and CVE-2017-1000375 from Qualys. Revert for the moment, creates problems on i386. Recommit exec_subr.c revision 1.79: Always include a 1MB guard area beyond the end of stack. While ASLR will normally create a guard area as well, this provides a deterministic area for all binaries. Mitigates the rest of CVE-2017-1000374 and CVE-2017-1000375 from Qualys. Additionally, change VM_DEFAULT_ADDRESS_TOPDOWN to include user_stack_guard_size in the size reservation. Update VM_DEFAULT_ADDRESS32_TOPDOWN to include guard area. Export the guard size of the main thread via vm.guard_size. Add a complementary writable sysctl for the initial guard size of threads created via pthread_create. Let the existing attribut accessors do the right thing. Raise the default guard size for threads to 64KB.
Export the guard size of the main thread via vm.guard_size. Add a complementary writable sysctl for the initial guard size of threads created via pthread_create. Let the existing attribut accessors do the right thing. Raise the default guard size for threads to 64KB.
Sync with HEAD
Sync with HEAD
libpthread_dbg(3) deletion from the base distribution libpthread_dbg(3) is a remnant library from the M:N thread model (pre-NetBSD-5.0) API to introspect threads within a process and for use of debuggers. Currently in the 1:1 model it's not used in GDB neither in LLDB and it's not either planned to be used. It's current function to read pthread_t structures is realizable within a regular debugger capable to instrospect objects within a tracee (GDB, LLDB...). Remaining users of this API can still use this library from pkgsrc/devel/libpthread_dbg. Sponsored by <The NetBSD Foundation>
Pull up following revision(s) (requested by manu in ticket #829): lib/libpthread_dbg/pthread_dbg.c: revision 1.43 (via patch) lib/libpthread/pthread_int.h: revision 1.91-1.92 (via patch) lib/libc/stdlib/jemalloc.c: revision 1.37-1.38 lib/libpthread/pthread_tsd.c: revision 1.12-1.14 (via patch) include/limits.h: revision 1.34 (via patch) lib/libpthread/pthread.c: revision 1.146-1.147 (via patch) lib/libpthread/pthread_key_create.3: revision 1.7 (via patch) libpthread: Make PTHREAD_KEYS_MAX dynamically adjustable NetBSD's PTHREAD_KEYS_MAX is set to 256, which is low compared to other systems like Linux (1024) or MacOS X (512). As a result some setups tested on Linux will exhibit problems on NetBSD because of pthread_keys usage beyond the limit. This happens for instance on Apache with various module loaded, and in this case no particular developper can be blamed for going beyond the limit, since several modules from different sources contribute to the problem. This patch makes the limit conigurable through the PTHREAD_KEYS_MAX environement variable. If undefined, the default remains unchanged (256). In any case, the value cannot be lowered below POSIX-mandated _POSIX_THREAD_KEYS_MAX (128). While there: - use EXIT_FAILURE instead of 1 when calling err(3) in libpthread. - Reset _POSIX_THREAD_KEYS_MAX to POSIX mandated 128, instead of 256. Fix previous: Can't use calloc/malloc before we complete initialization of the thread library, because malloc uses pthread_foo_specific, and it will end up initializing itself incorrectly. Thanks rump for not letting us use even mmap during initialization. libc/jemalloc: Fix non _REENTRANT build. Defer using pthread keys until we are threaded. From Christos, fixes PR port-arm/50087 by allowing malloc calls prior to libpthread initialization.
Fix previous: Can't use calloc/malloc before we complete initialization of the thread library, because malloc uses pthread_foo_specific, and it will end up initializing itself incorrectly.
Make PTHREAD_KEYS_MAX dynamically adjustable NetBSD's PTHREAD_KEYS_MAX is set to 256, which is low compared to other systems like Linux (1024) or MacOS X (512). As a result some setups tested on Linux will exhibit problems on NetBSD because of pthread_keys usage beyond the limit. This happens for instance on Apache with various module loaded, and in this case no particular developper can be blamed for going beyond the limit, since several modules from different sources contribute to the problem. This patch makes the limit conigurable through the PTHREAD_KEYS_MAX environement variable. If undefined, the default remains unchanged (256). In any case, the value cannot be lowered below POSIX-mandated _POSIX_THREAD_KEYS_MAX (128). While there: - use EXIT_FAILURE instead of 1 when calling err(3) in libpthread. - Reset _POSIX_THREAD_KEYS_MAX to POSIX mandated 128, instead of 256.
Allow for arbitrary MI scheduler implementations. A concrete result is enabling unpatched libpthread to run on the rumprun stacks (e.g. Xen and bare metal) with a non-NetBSD scheduler. Those schedulers hook into the existing _lwp_frobnitz() NetBSD syscall interfaces (well, "syscall" interfaces in that scenario ;) More specifically about the change itself: 1) instead of calling _lwp_makecontext() followed by _lwp_create() and passing the entry point in ucontext_t (MD) through the calls, roll the calls into pthread__makelwp() and allow alternate implementations for that MI interface. 2) allow compile-time overriding of __lwp_gettcb_fast() or __lwp_getprivate_fast, which are inline and leak MD scheduler/thread details into libpthread Additionally, two small nits: I) define LIB=pthread before including mk.conf so that it's possible to test for LIB==pthread in mk.conf II) make it possible to leave out pthread_cancelstub.c. This is required by the current implementation of rumprun-posix (i.e. rumprun on POSIX hosts) due to symbol collisions. It needs to be fixed properly some day, but for now allows an almost-correct libpthread to run. I am sure @justin will be happy to explain the details ;) no change to NetBSD tested: anita+atf
sync with head. for a reference, the tree before this commit was tagged as yamt-pagecache-tag8. this commit was splitted into small chunks to avoid a limitation of cvs. ("Protocol error: too many arguments")
resync from head
Pull up following revision(s) (requested by manu in ticket #869): lib/libpthread/pthread_rwlock.c: revision 1.33 lib/libc/include/reentrant.h: revision 1.16 lib/libpthread/pthread_cond.c: revision 1.59 lib/libpthread/pthread_misc.c: revision 1.15 lib/libc/thread-stub/thread-stub.c: revision 1.23 lib/libpthread/pthread_cancelstub.c: revision 1.38 lib/libpthread/pthread_specific.c: revision 1.26 lib/libpthread/pthread_mutex.c: revision 1.56 lib/libpthread/pthread_tsd.c: revision 1.11 lib/libpthread/Makefile: revision 1.80 lib/libpthread/pthread.c: revision 1.143 lib/libpthread/pthread_int.h: revision 1.89 - Allow libpthread to be dlopened again, by providing libc stubs to libpthread. - Fail if the dlopened libpthread does pthread_create(). From manu@ - Discussed at length in the mailing lists; approved by core@ - This was chosen as the least intrusive patch that will provide the necessary functionality. XXX: pullup to 6
- Allow libpthread to be dlopened again, by providing libc stubs to libpthread. - Fail if the dlopened libpthread does pthread_create(). From manu@ - Discussed at length in the mailing lists; approved by core@ - This was chosen as the least intrusive patch that will provide the necessary functionality. XXX: pullup to 6
resync with head
sync with (a bit old) head
Back out ticket #724 (libpthread changes) until they can be better understood, as they broke threaded programs on (at least) i386 and amd64.
Pull up following revision(s) (requested by christos in ticket #724): lib/libpthread/pthread_specific.c: revision 1.24 lib/libpthread/pthread_tsd.c: revision 1.10 lib/libpthread/pthread_tsd.c: revision 1.9 lib/libpthread/pthread_int.h: revision 1.88 Replace the simple implementation of pthread_key_{create,destroy} and pthread_{g,s}etspecific functions, to one that invalidates values of keys in other threads when pthread_key_delete() is called. This fixes chromium, which expects pthread_key_delete() to do cleanup in all threads. Don't call the destructor in pthread_key_delete() following the standard.
Replace the simple implementation of pthread_key_{create,destroy} and pthread_{g,s}etspecific functions, to one that invalidates values of keys in other threads when pthread_key_delete() is called. This fixes chromium, which expects pthread_key_delete() to do cleanup in all threads.
Resync to 2012-11-19 00:00:00 UTC
libpthread: replace the use of obsolete sys/tree.h interface with rbtree(9).
sync with head
Add a pthread__smt_wake and add support for it on arm along with pthread__smt_pause. These are implemented using the ARM instructions SEV (wake) and WFE (pause). These are treated as NOPs on ARM CPUs that don't support them.
sync with head.
Simplify check for TLS definition to not hide code. Drop it in another place as it is redundant.
sync with head
Keep track of the size of the guard area, in case we want to make it modifiable later. Only reuse the stack if it was allocated by libpthread and if the expected thread size matches the current stack size.
Separate pthread_t from thread stack. Drop additional alignment restrictions on the thread stack. Remove remaining parts of stackid.
Introduce __HAVE_NO___THREAD for sun2 and vax to disable the TLS usage. Require __HAVE_TLS_VARIANT_I or __HAVE_TLS_VARIANT_II as well as __lwp_getprivate_fast / __lwp_gettcb_fast to exist for libpthread. Define VAX as going to use TLS variant I, if it is ever implemented.
Include limits.h to get PTHREAD_KEYS_MAX, and move its definition there.
Use __dead
fix spello in comment
Add __HAVE___LWP_GETTCB_FAST support (for mips and powerpc).
If TLS support is present, use it for pthread__self(). The initialisation order is correct in this case as _lwp_setprivate has been called already by ld.elf_so for dynamic programs or _libc_init for statically linked ones.
Add TLS support infrastructure. For dynamic binaries, ld.elf_so exports _rtld_tls_allocate and _rtld_tls_free. libpthread uses this functions to setup the thread private area of all new threads. ld.elf_so is responsible for setting up the private area for the initial thread. Similar functions are called from _libc_init for static binaries, using dl_iterate_phdr to access the ELF Program Header. Add test cases to exercise the different TLS storage models. Test cases are compiled and installed on all platforms, but are skipped on platforms not marked for TLS support. This material is based upon work partially supported by The NetBSD Foundation under a contract with Joerg Sonnenberger. It is inspired by the TLS support in FreeBSD by Doug Rabson and the clean ups of the DragonFly port of the original FreeBSD modifications.
Sync with HEAD
Back out using the thread register (if present) for now. libgcc_s's __register_frame_info gets called from libc's CSU code before the libc constructors are run. __register_frame_info in turn calls pthread_mutex_lock. libpthread is not initialised at this point and therefore pthread__self() traps when deferencing the thread register. This worked before because the garbage from pthread__self() is effectively ignored.
Allow storing and receiving the LWP private pointer via ucontext_t on all platforms except VAX and IA64. Add fast access via register for AMD64, i386 and SH3 ports. Use this fast access in libpthread to replace the stack based pthread_self(). Implement skeleton support for Alpha, HPPA, PowerPC, SPARC and SPARC64, but leave it disabled. Ports that support this feature provide __HAVE____LWP_GETPRIVATE_FAST in machine/types.h and a corresponding __lwp_getprivate_fast in machine/mcontext.h. This material is based upon work partially supported by The NetBSD Foundation under a contract with Joerg Sonnenberger.
I've had this patch in my tree for a while and since it only improves the situation, I decided to commit it. There is an inherent problem with ASLR and the way the pthread library is using the thread stack. Our pthread library chooses that stack for each thread strategically so that it can locate the location of the pthread struct for each thread by masking the stack pointer and looking just below the red zone it creates. Unfortunately with ASLR you get many random values for the initial stack, and there are situations where the masked stack base ends up below the base of the stack. (this happens on x86 when the stack base happens to be 0x???02000 for example and your mask is stackmask is 0xffe00000). To fix this, we detect the pathological cases (this happens only in the main thread), allocate more stack, and mprotect it appropriately. Then we stash the main base and the main struct, so that when we look for the pthread struct in pthread__id, we can special case the main thread. Another way to work around the problem is unlimiting stacksize, but the proper way is to use TLS to find the thread structure and not to play games with the thread stacks.
- Convert from makecontext() -> _lwp_makecontext(). - Rely on _lwp_makecontext() to set up the thread identity register. This is not currently done (a bug), nor does libpthread use the threadreg yet. I'm doing this so it the code can be used by the person working on TLS to verify that their threadreg code is working.
Remove unused code that's confusing when using cscope/opengrok.
Sync with wrstuden-revivesa-base-2.
Sync with the following revisions (requested by skrll in ticket #1196): gnu/dist/gdb removed gnu/usr.bin/gdb53 removed distrib/cats/instkernel/Makefile 1.14.6.1 gnu/dist/gdb6/bfd/config.bfd 1.3.6.1 gnu/dist/gdb6/bfd/elfxx-sparc.c 1.1.1.2.6.1 gnu/dist/gdb6/bfd/elfxx-sparc.h 1.1.1.2.6.1 gnu/dist/gdb6/gdb/Makefile.in 1.2.2.1.2.2 gnu/dist/gdb6/gdb/alpha-tdep.c 1.1.1.2.6.1 gnu/dist/gdb6/gdb/alpha-tdep.h 1.1.1.2.6.1 gnu/dist/gdb6/gdb/alphabsd-nat.c 1.1.1.2.6.2 gnu/dist/gdb6/gdb/alphabsd-nat.h 1.1.2.1 gnu/dist/gdb6/gdb/alphabsd-tdep.c 1.1.1.2.6.1 gnu/dist/gdb6/gdb/alphabsd-tdep.h 1.1.1.2.6.1 gnu/dist/gdb6/gdb/alphanbsd-nat.c 1.1.2.1 gnu/dist/gdb6/gdb/alphanbsd-tdep.c 1.1.1.2.6.1 gnu/dist/gdb6/gdb/amd64-nat.c 1.1.1.2.6.1 gnu/dist/gdb6/gdb/amd64bsd-nat.c 1.1.1.2.6.1 gnu/dist/gdb6/gdb/amd64nbsd-nat.c 1.1.1.2.6.3 gnu/dist/gdb6/gdb/amd64nbsd-tdep.c 1.1.1.2.6.1 gnu/dist/gdb6/gdb/arm-tdep.h 1.1.1.2.6.1 gnu/dist/gdb6/gdb/armbsd-tdep.c 1.1.2.1 gnu/dist/gdb6/gdb/armnbsd-nat.c 1.1.1.2.6.2 gnu/dist/gdb6/gdb/armnbsd-tdep.c 1.1.1.2.6.1 gnu/dist/gdb6/gdb/configure 1.1.1.2.6.1 gnu/dist/gdb6/gdb/configure.ac 1.1.1.2.6.1 gnu/dist/gdb6/gdb/i386bsd-nat.c 1.1.1.2.6.1 gnu/dist/gdb6/gdb/i386nbsd-tdep.c 1.1.1.2.6.1 gnu/dist/gdb6/gdb/m68kbsd-nat.c 1.1.1.2.6.2 gnu/dist/gdb6/gdb/mipsnbsd-nat.c 1.1.1.2.6.2 gnu/dist/gdb6/gdb/nbsd-thread.c 1.1.2.3 gnu/dist/gdb6/gdb/ppcnbsd-nat.c 1.1.1.2.6.2 gnu/dist/gdb6/gdb/ppcnbsd-tdep.c 1.3.6.1 gnu/dist/gdb6/gdb/sh-tdep.c 1.1.1.2.6.1 gnu/dist/gdb6/gdb/shnbsd-nat.c 1.1.1.2.6.3 gnu/dist/gdb6/gdb/shnbsd-tdep.c 1.1.1.2.6.4 gnu/dist/gdb6/gdb/shnbsd-tdep.h 1.1.1.2.6.1 gnu/dist/gdb6/gdb/sparc-nat.c 1.1.1.2.6.1 gnu/dist/gdb6/gdb/sparc64nbsd-nat.c 1.1.1.2.6.2 gnu/dist/gdb6/gdb/sparcnbsd-nat.c 1.1.1.2.6.2 gnu/dist/gdb6/gdb/tramp-frame.h 1.1.1.2.6.1 gnu/dist/gdb6/gdb/vaxbsd-nat.c 1.1.1.2.6.2 gnu/dist/gdb6/gdb/config/alpha/nbsd.mh 1.1.1.2.6.1 gnu/dist/gdb6/gdb/config/arm/nbsd.mt 1.1.1.1.6.1 gnu/dist/gdb6/gdb/config/arm/nbsdelf.mh 1.1.1.1.6.1 gnu/dist/gdb6/gdb/config/i386/nbsd64.mh 1.1.1.1.6.1 gnu/dist/gdb6/gdb/config/m68k/nbsdelf.mh 1.1.1.1.6.1 gnu/dist/gdb6/gdb/config/mips/nbsd.mh 1.1.1.1.6.1 gnu/dist/gdb6/gdb/config/powerpc/nbsd.mh 1.1.1.2.6.1 gnu/dist/gdb6/gdb/config/sh/nbsd.mh 1.1.1.1.6.2 gnu/dist/gdb6/gdb/config/sh/tm-nbsd.h 1.1.1.1.6.1 gnu/dist/gdb6/gdb/config/sparc/nbsd64.mh 1.1.1.1.6.1 gnu/dist/gdb6/gdb/config/sparc/nbsdelf.mh 1.1.1.1.6.1 gnu/dist/gdb6/gdb/config/vax/nbsdelf.mh 1.1.1.1.6.1 gnu/dist/gdb6/opcodes/configure 1.1.1.2.6.1 gnu/dist/gdb6/opcodes/configure.in 1.1.1.2.6.1 gnu/usr.bin/Makefile 1.126.4.1 gnu/usr.bin/gdb6/arch/alpha/config.h 1.3.4.1 gnu/usr.bin/gdb6/arch/alpha/defs.mk 1.2.6.1 gnu/usr.bin/gdb6/arch/alpha/init.c 1.2.6.1 gnu/usr.bin/gdb6/arch/alpha/nm.h 1.2.6.1 gnu/usr.bin/gdb6/arch/arm/defs.mk 1.2.6.2 gnu/usr.bin/gdb6/arch/arm/init.c 1.1.6.1 gnu/usr.bin/gdb6/arch/armeb/config.h 1.1.6.2 gnu/usr.bin/gdb6/arch/armeb/defs.mk 1.1.6.3 gnu/usr.bin/gdb6/arch/armeb/init.c 1.1.6.2 gnu/usr.bin/gdb6/arch/armeb/tm.h 1.1.6.2 gnu/usr.bin/gdb6/arch/armeb/version.c 1.1.6.2 gnu/usr.bin/gdb6/arch/i386/defs.mk 1.4.4.1 gnu/usr.bin/gdb6/arch/i386/init.c 1.3.6.1 gnu/usr.bin/gdb6/arch/m68000/config.h 1.1.6.2 gnu/usr.bin/gdb6/arch/m68000/defs.mk 1.1.6.2 gnu/usr.bin/gdb6/arch/m68000/init.c 1.1.6.2 gnu/usr.bin/gdb6/arch/m68000/tm.h 1.1.6.2 gnu/usr.bin/gdb6/arch/m68000/version.c 1.1.6.2 gnu/usr.bin/gdb6/arch/m68k/defs.mk 1.1.4.1 gnu/usr.bin/gdb6/arch/m68k/init.c 1.1.4.1 gnu/usr.bin/gdb6/arch/mipseb/config.h 1.3.4.1 gnu/usr.bin/gdb6/arch/mipseb/defs.mk 1.2.6.2 gnu/usr.bin/gdb6/arch/mipseb/init.c 1.2.6.2 gnu/usr.bin/gdb6/arch/mipsel/config.h 1.2.6.3 gnu/usr.bin/gdb6/arch/mipsel/defs.mk 1.2.6.3 gnu/usr.bin/gdb6/arch/mipsel/init.c 1.2.6.3 gnu/usr.bin/gdb6/arch/mipsel/tm.h 1.2.6.2 gnu/usr.bin/gdb6/arch/mipsel/version.c 1.2.6.2 gnu/usr.bin/gdb6/arch/powerpc/defs.mk 1.3.6.1 gnu/usr.bin/gdb6/arch/powerpc/init.c 1.3.6.1 gnu/usr.bin/gdb6/arch/sh3eb/config.h 1.2.2.2 gnu/usr.bin/gdb6/arch/sh3eb/defs.mk 1.2.8.3 gnu/usr.bin/gdb6/arch/sh3eb/init.c 1.1.8.3 gnu/usr.bin/gdb6/arch/sh3eb/nm.h 1.1.8.2 gnu/usr.bin/gdb6/arch/sh3eb/tm.h 1.1.8.2 gnu/usr.bin/gdb6/arch/sh3eb/version.c 1.1.8.2 gnu/usr.bin/gdb6/arch/sh3el/config.h 1.2.2.2 gnu/usr.bin/gdb6/arch/sh3el/defs.mk 1.2.8.3 gnu/usr.bin/gdb6/arch/sh3el/init.c 1.1.8.3 gnu/usr.bin/gdb6/arch/sh3el/nm.h 1.1.8.2 gnu/usr.bin/gdb6/arch/sh3el/tm.h 1.1.8.2 gnu/usr.bin/gdb6/arch/sh3el/version.c 1.1.8.2 gnu/usr.bin/gdb6/arch/sparc/defs.mk 1.2.6.1 gnu/usr.bin/gdb6/arch/sparc/init.c 1.1.6.1 gnu/usr.bin/gdb6/arch/sparc64/defs.mk 1.2.6.1 gnu/usr.bin/gdb6/arch/sparc64/init.c 1.1.6.1 gnu/usr.bin/gdb6/arch/vax/config.h 1.1.6.2 gnu/usr.bin/gdb6/arch/vax/defs.mk 1.1.6.2 gnu/usr.bin/gdb6/arch/vax/init.c 1.1.6.2 gnu/usr.bin/gdb6/arch/vax/tm.h 1.1.6.2 gnu/usr.bin/gdb6/arch/vax/version.c 1.1.6.2 gnu/usr.bin/gdb6/arch/x86_64/defs.mk 1.2.6.1 gnu/usr.bin/gdb6/arch/x86_64/init.c 1.1.6.1 gnu/usr.bin/gdb6/bfd/arch/armeb/bfd.h 1.1.6.2 gnu/usr.bin/gdb6/bfd/arch/armeb/bfdver.h 1.1.6.2 gnu/usr.bin/gdb6/bfd/arch/armeb/config.h 1.1.6.2 gnu/usr.bin/gdb6/bfd/arch/armeb/defs.mk 1.1.6.2 gnu/usr.bin/gdb6/bfd/arch/m68000/bfd.h 1.1.6.2 gnu/usr.bin/gdb6/bfd/arch/m68000/bfdver.h 1.1.6.2 gnu/usr.bin/gdb6/bfd/arch/m68000/config.h 1.1.6.2 gnu/usr.bin/gdb6/bfd/arch/m68000/defs.mk 1.1.6.2 gnu/usr.bin/gdb6/bfd/arch/mipsel/bfd.h 1.1.6.2 gnu/usr.bin/gdb6/bfd/arch/mipsel/bfdver.h 1.1.6.2 gnu/usr.bin/gdb6/bfd/arch/mipsel/config.h 1.1.6.2 gnu/usr.bin/gdb6/bfd/arch/mipsel/defs.mk 1.1.6.2 gnu/usr.bin/gdb6/bfd/arch/sh3eb/bfd.h 1.1.8.3 gnu/usr.bin/gdb6/bfd/arch/sh3eb/bfdver.h 1.1.8.2 gnu/usr.bin/gdb6/bfd/arch/sh3eb/config.h 1.1.8.2 gnu/usr.bin/gdb6/bfd/arch/sh3eb/defs.mk 1.1.8.3 gnu/usr.bin/gdb6/bfd/arch/sh3el/bfd.h 1.1.8.3 gnu/usr.bin/gdb6/bfd/arch/sh3el/bfdver.h 1.1.8.2 gnu/usr.bin/gdb6/bfd/arch/sh3el/config.h 1.1.8.2 gnu/usr.bin/gdb6/bfd/arch/sh3el/defs.mk 1.1.8.3 gnu/usr.bin/gdb6/bfd/arch/vax/bfd.h 1.1.6.2 gnu/usr.bin/gdb6/bfd/arch/vax/bfdver.h 1.1.6.2 gnu/usr.bin/gdb6/bfd/arch/vax/config.h 1.1.6.2 gnu/usr.bin/gdb6/bfd/arch/vax/defs.mk 1.1.6.2 gnu/usr.bin/gdb6/gdb/Makefile 1.5.2.1.2.2 gnu/usr.bin/gdb6/gdbtui/Makefile 1.2.6.1 gnu/usr.bin/gdb6/libiberty/arch/armeb/config.h 1.1.6.2 gnu/usr.bin/gdb6/libiberty/arch/armeb/defs.mk 1.1.6.2 gnu/usr.bin/gdb6/libiberty/arch/m68000/config.h 1.1.6.2 gnu/usr.bin/gdb6/libiberty/arch/m68000/defs.mk 1.1.6.2 gnu/usr.bin/gdb6/libiberty/arch/mipsel/config.h 1.1.6.2 gnu/usr.bin/gdb6/libiberty/arch/mipsel/defs.mk 1.1.6.2 gnu/usr.bin/gdb6/libiberty/arch/sh3eb/config.h 1.1.8.2 gnu/usr.bin/gdb6/libiberty/arch/sh3eb/defs.mk 1.1.8.2 gnu/usr.bin/gdb6/libiberty/arch/sh3el/config.h 1.1.8.2 gnu/usr.bin/gdb6/libiberty/arch/sh3el/defs.mk 1.1.8.2 gnu/usr.bin/gdb6/libiberty/arch/vax/config.h 1.1.6.2 gnu/usr.bin/gdb6/libiberty/arch/vax/defs.mk 1.1.6.2 gnu/usr.bin/gdb6/opcodes/arch/armeb/config.h 1.1.6.2 gnu/usr.bin/gdb6/opcodes/arch/armeb/defs.mk 1.1.6.2 gnu/usr.bin/gdb6/opcodes/arch/m68000/config.h 1.1.6.2 gnu/usr.bin/gdb6/opcodes/arch/m68000/defs.mk 1.1.6.2 gnu/usr.bin/gdb6/opcodes/arch/mipsel/config.h 1.1.6.2 gnu/usr.bin/gdb6/opcodes/arch/mipsel/defs.mk 1.1.6.2 gnu/usr.bin/gdb6/opcodes/arch/sh3eb/config.h 1.1.8.2 gnu/usr.bin/gdb6/opcodes/arch/sh3eb/defs.mk 1.1.8.3 gnu/usr.bin/gdb6/opcodes/arch/sh3el/config.h 1.1.8.2 gnu/usr.bin/gdb6/opcodes/arch/sh3el/defs.mk 1.1.8.3 gnu/usr.bin/gdb6/opcodes/arch/vax/config.h 1.1.6.2 gnu/usr.bin/gdb6/opcodes/arch/vax/defs.mk 1.1.6.2 gnu/usr.bin/gdb6/readline/arch/armeb/config.h 1.1.6.2 gnu/usr.bin/gdb6/readline/arch/armeb/defs.mk 1.1.6.2 gnu/usr.bin/gdb6/readline/arch/m68000/config.h 1.1.6.2 gnu/usr.bin/gdb6/readline/arch/m68000/defs.mk 1.1.6.2 gnu/usr.bin/gdb6/readline/arch/mipsel/config.h 1.1.6.2 gnu/usr.bin/gdb6/readline/arch/mipsel/defs.mk 1.1.6.2 gnu/usr.bin/gdb6/readline/arch/sh3eb/config.h 1.1.8.2 gnu/usr.bin/gdb6/readline/arch/sh3eb/defs.mk 1.1.8.2 gnu/usr.bin/gdb6/readline/arch/sh3el/config.h 1.1.8.2 gnu/usr.bin/gdb6/readline/arch/sh3el/defs.mk 1.1.8.2 gnu/usr.bin/gdb6/readline/arch/vax/config.h 1.1.6.2 gnu/usr.bin/gdb6/readline/arch/vax/defs.mk 1.1.6.2 gnu/usr.bin/gdb6/sim/arch/mipseb/cconfig.h 1.1.2.1 gnu/usr.bin/gdb6/sim/arch/mipseb/config.h 1.1.2.1 gnu/usr.bin/gdb6/sim/arch/mipseb/defs.mk 1.1.2.1 gnu/usr.bin/gdb6/sim/arch/mipsel/cconfig.h 1.1.2.1 gnu/usr.bin/gdb6/sim/arch/mipsel/config.h 1.1.2.1 gnu/usr.bin/gdb6/sim/arch/mipsel/defs.mk 1.1.2.1 lib/libkvm/kvm_sparc64.c 1.10.18.2 lib/libpthread/pthread.c 1.48.6.4 lib/libpthread/pthread_barrier.c 1.6.18.1 lib/libpthread/pthread_cond.c 1.18.12.2 lib/libpthread/pthread_debug.h 1.8.18.1 lib/libpthread/pthread_int.h 1.34.4.5 lib/libpthread/pthread_lock.c 1.14.6.1 lib/libpthread/pthread_mutex.c 1.22.4.2 lib/libpthread/pthread_run.c 1.18.12.4 lib/libpthread/pthread_rwlock.c 1.13.6.2 lib/libpthread/pthread_sa.c 1.37.6.5 lib/libpthread/pthread_sig.c 1.47.4.8 lib/libpthread/pthread_sleep.c 1.7.6.2 lib/libpthread/sem.c 1.9.6.2 lib/libpthread/arch/sh3/pthread_md.h 1.3.6.1 regress/lib/libpthread/resolv/Makefile 1.1.12.1 regress/lib/libpthread/sigrunning/Makefile 1.1.2.1 regress/lib/libpthread/sigrunning/sigrunning.c 1.1.2.1 share/mk/bsd.own.mk 1.489.4.3 sys/arch/amd64/amd64/locore.S 1.18.14.1 sys/arch/amd64/amd64/machdep.c 1.44.2.3.2.1 sys/arch/amd64/conf/kern.ldscript 1.1.70.1 sys/arch/cats/conf/Makefile.cats.inc 1.17.30.1 sys/arch/shark/conf/Makefile.shark.inc 1.6.30.1 sys/arch/sparc64/conf/kern.ldscript 1.7.26.2 sys/arch/sparc64/conf/kern32.ldscript 1.6.26.2 sys/arch/sparc64/include/kcore.h 1.4.92.2 sys/arch/sparc64/sparc64/locore.s 1.232.4.4 sys/arch/sparc64/sparc64/machdep.c 1.193.4.3 sys/arch/sparc64/sparc64/pmap.c 1.184.2.1.2.4 sys/conf/newvers.sh 1.42.26.2 sys/kern/kern_sa.c 1.87.4.11 sys/kern/kern_synch.c 1.173.4.2 sys/sys/savar.h 1.20.10.2 tools/gdb/Makefile 1.9.4.1 tools/gdb/mknative-gdb 1.1.6.1 pullup the wrstuden-fixsa CVS branch to netbsd-4: toolchain/35540 - GDB 6 support for pthreads. port-sparc64/37534 - ktrace firefox gives kernel trap 30: data access expection GDB changes: - delete gdb53 - enable gdb6 on all architectures - add support for amd64 crash dumps - add support for sparc64 crash dumps - add support for /proc pid to executable filename for all archs - enable thread support for all architectures - add a note section to kernels to all platforms - support detection/unwinding of signals for most architectures. - Fix PTHREAD_UCONTEXT_TO_REG / PTHREAD_REG_TO_UCONTEXT on sh3. - Apply fix from binutils-current so that sparc gdb can be cross built on a 64bit host. SA/pthread changes: Pre-allocate memory needed for event delivery. Eliminates dropped interrupts under load. Deliver intra-process signals to running threads Eliminate some deadlock scenarios Fix intra-process signal delivery when delivering to a thread waiting for signals. Makes afs work again!
Now that we have all the scheduling gunk, make these do something useful: pthread_attr_get_np pthread_attr_setschedparam pthread_attr_getschedparam pthread_attr_setschedpolicy pthread_attr_getschedpolicy
file pthread_int.h was added on branch christos-time_t on 2008-06-28 10:29:38 +0000
Now that we have all the scheduling gunk, make these do something useful: pthread_attr_get_np pthread_attr_setschedparam pthread_attr_getschedparam pthread_attr_setschedpolicy pthread_attr_getschedpolicy
Sync w/ -current. 34 merge conflicts to follow.
sync with head
PR lib/38741 priority inversion in libpthread breaks apps that use SCHED_FIFO threads - Change condvar sync so that we never take the condvar's spinlock without first holding the caller-provided mutex. Previously, the spinlock was only taken without the mutex in an error path, but it was enough to trigger the problem described in the PR. - Even with this change, applications calling pthread_cond_signal/broadcast without holding the interlocking mutex are still subject to the problem described in the PR. POSIX discourages this saying that it leads to undefined scheduling behaviour, which seems good enough for the time being. - Elsewhere, use a hash of mutexes instead of per-object spinlocks to synchronize entry/exit from sleep queues. - Simplify how sleep queues are maintained.
sync with head.
Remove clause 3 and 4 from TNF licenses
sync with HEAD
Adjust mutex/rwlock definitions to match reality now that there is only one implementation of each. PR lib/38030.
- Remove libpthread's atomic ops. - Remove the old spinlock-based mutex and rwlock implementations. - Use the atomic ops from libc.
sync with HEAD
add missing static decls.
Fix comment.
- Use pthread__cancelled() in more places. - pthread_join(): assert that pthread_cond_wait() returns zero.
- Fix pthread_rwlock_trywrlock() which was broken. - Add new functions: pthread_mutex_held_np, mutex_owner_np, rwlock_held_np, rwlock_wrheld_np, rwlock_rdheld_np. These match the kernel's locking primitives and can be used when porting kernel code to userspace. - Always create LWPs detached. Do join/exit sync mostly in userland. When looped on a dual core box this seems ~30% quicker than using lwp_wait(). Reduce number of lock acquire/release ops during thread exit.
Remove the debuglog stuff. ktrace is more useful now.
Mutexes: - Play scrooge again and chop more cycles off acquire/release. - Spin while the lock holder is running on another CPU (adaptive mutexes). - Do non-atomic release. Threadreg: - Add the necessary hooks to use a thread register. - Add the code for i386, using %gs. - Leave i386 code disabled until xen and COMPAT_NETBSD32 have the changes.
For PR bin/37347: - Override __libc_thr_init() instead of using our own constructor. - Add pthread__getenv() and use instead of getenv(). This is used before we are up and running and unfortunatley getenv() takes locks. Other changes: - Cache the spinlock vectors in pthread__st. Internal spinlock operations now take 1 function call instead of 3 (i386). - Use pthread__self() internally, not pthread_self(). - Use __attribute__ ((visibility("hidden"))) in some places. - Kill PTHREAD_MAIN_DEBUG.
sync with HEAD
Check in changes to locking behavior. pthread__sched() now takes a parameter indicating if the run queue is already locked. Useful in cases where we already hold pthread__runqueue_lock. pthread__suspend() now requires callers explicitly lock pthread__runqueue_lock so we avoid issues with locking order regarding pt_statelock. Adjsut our lock hierarchy. pthread__runqueue_lock is now above pt_statelock, triggering the above adjustments. Adjust a lot of routines as a result. Also move pt_siglock way up in the hierarchy, making pthread__kill() not violate locking. Add a few extra locks to the list. Adjust a botch in how pthread_join() used pthread-spintrylock(). pthread_cancel() now correctly walks up the locks with thread->pt_sleeplock. We can't just lock it, as it points to a lock in the top locking rung. So try locking, and if it fails, unlock and re-lock. Add code to cope with the target thread not being in the expected state (which was on a blocked queue) after we get all the locks. Add comments to describe what's going on in places that I got confused. Now that pt_statelock is lower in the locking order than pthread__runqueue_lock, we can explicitly lock a thread's state before we take it off the run queue. Adjust sched_yield() accordingly and add some locking calls that were commented out before (as they'd have been locking violations). pthread_next(): now that we can lock the state lock while holding the run queue lock, do so. Set a thread's state to PT_STATE_RUNNING before we pull it off the run queue. Since we always are going to switch to it, set pt_vpid and pt_lastlwp while setting the state. pthread_next callers now _don't_ set these values. pthread__kill(): grab pthread__runqueue_lock before target->pt_statelock. If we want to target a thread that is on a blocked queue, do the pthread_spintrylock() dance. Unlock all three locks we're running around with, lock target->pt_sleeplock, then re-lock them all. After we lock, make sure that the thread's still on a blocked queue before proceeding. If it's not, either exit (if we wanted to wake out of sigtimedwait()) or start it all over. If the thread has gone live, it may have blocked our signal and it'd be quite weird to get a signal you'd disabled, just because the signaller had been running before you blocked it.
Work on cleaning up lock ordering. Turns out that there's not too much to do, other than fixing an issue in join and one I introduced. Add volumous comment in pthread_int.h describing how I understand the current locking to work. pthread_join() considered pt_flaglock to be a higher-priority lock than pt_join_lock. Life makes more sense if we flip that. To not make a lot of routines messy, pthread__runqueue_lock has to be lower in the lock ordering than pt_statelock. Adapt the changes I made to sched_yield() to this ordering. There still is a wart regarding setting the state of a thread we are taking off of the run (or idle) queue. We can't lock its pt_statelock as we have the runqueue lock held. For now, go back to what the old code did which was just write over the info. This isn't that bad as the only things that should be changing the state of this thread should be run-queue savy. I need to check this though....
Note that libpthread_dbg needs to be checked after making changes to libpthread.
... but preserve the linked list, for the debugger only.
Replace the global thread list with a red-black tree. From joerg@.
Rename pt_blockedlwp to pt_lastlwp, and set it whenever we switch to a new pthread. This way we always know on what lwp a given thread is running.
Resurrect the function pointers for lock operations and allow each architecture to provide asm versions of the RAS operations. We do this because relying on the compiler to get the RAS right is not sensible. (It gets alpha wrong and hppa is suboptimal) Provide asm RAS ops for hppa. (A slightly different version) reviewed by Andrew Doran.
Add a per-mutex deferred wakeup flag so that threads doing something like the following do not wake other threads early: pthread_mutex_lock(&mutex); pthread_cond_broadcast(&cond); foo = malloc(100); /* takes libc mutexes */ pthread_mutex_unlock(&mutex);
Make the new mutexes faster: - Eliminate mutexattr_private and just set a bit in ptm_owner if the mutex is recursive. This forces the slow path to be taken for recursive mutexes. Overload an unused field in pthread_mutex_t to record whether or not it's an errorcheck mutex. - Streamline pthread_mutex_lock / pthread_mutex_unlock a bit more. As a side effect makes it possible to have assembly stubs for them.
Merge nick-csl-alignment.
Sync with HEAD.
Check in first step towards having pthread_kill() kill a thread running on another CPU. This change adds initial support for deferred signal handling. Just before we go to sleep and while we hold &self->pt_statelock, check to see if we have any deferred signals (blocked signals) pending. These are signals that are not masked in our mask and which have been sent to us. We were running when they came in. Further, since they are being handled this way, there's a signal handler defined for them. So unlock, run the signal handler(s), then carry on. For condition variables, we consider this a spurious wakeup, so we just return 0, having not unlocked the mutex. We run the handler with the mutex held. This shouldn't matter, as you aren't supposed to play with mutexes in signal handlers. :-) For nanosleep(), we just process signals, then go to sleep. For all other cases, we are in a loop with some external predicate. So we process the signal then roll around the loop to see if it still applies. In sched_yield(), spin until all deferred signals are gone. Since we hold self->pt_statelock and that lock has to be held before sending a deferred signal, no new deferred signals will come in until we're asleep. While here, be more careful about locking while changing pt_state to PT_STATE_RUNNING. Grab pt_statelock while doing it, and also set next->pt_vpid to self->pt_vpid holding the same lock. Will make the test to determine how to deliver a signal work right (since a thread's vpid will soon matter in the general case). No longer set next->pt_vpid in pthread__next().
- Get rid of self->pt_mutexhint and use pthread__mutex_owned() instead. - Update some comments and fix minor bugs. Minor cosmetic changes. - Replace some spinlocks with mutexes and rwlocks. - Change the process private semaphores to use mutexes and condition variables instead of doing the synchronization directly. Spinlocks are no longer used by the semaphore code.
- Don't take the mutex's spinlock (ptr_interlock) in pthread_cond_wait(). Instead, make the deferred wakeup list a per-thread array and pass down the lwpid_t's that way. - In pthread_cond_wait(), take the mutex before dealing with early wakeup. In this way there should never be contention on the CV's spinlock if the app follows POSIX rules (there should only be contention on the user-provided mutex). - Add a port of the kernel's rwlocks. The rwlock's spinlock is only taken if there is contention. This is enabled where atomic ops are available. Right now that is only i386 and amd64 because I don't have other hardware to test with. It's trivial to add stubs for other architectures as long as they have compare-and-swap. When we have proper atomic ops the old rwlock code can be removed. - Add a new mutex implementation that's similar to the kernel's mutexes, but uses compare-and-swap to maintain the waiters list, so no spinlocks are involved. Same caveats apply as for the rwlocks.
Add: pthread__atomic_cas_ptr, pthread__atomic_swap_ptr, pthread__membar_full This is a stopgap until the thorpej-atomic branch is complete.
Sync with HEAD.
Trim fat off libpthread internal spinlock operations. Makes a mesurable improvement across the board.
- Reinitialize the absolute minimum when recycling user thread state. Chops another ~10% off create/join in a loop on i386. - Disable low level debugging as this is stable. Improves benchmarks across the board by a small percentage. Uncontested mutex acquire and release in a loop becomes about 8% quicker. - Minor cleanup.
Remove PT_FIXEDSTACKSIZE_LG.
Cache thread context for creation instead of setting it up every time. Speeds create/join loop by about 10-15% on i386.
Sync with HEAD.
Change the signature of _lwp_park() to accept an lwpid_t and second hint pointer, but do so in a way that remains compatible with older pthread libraries. This can be used to wake another thread before the calling thread goes asleep, saving at least one syscall + involuntary context switch. This turns out to be a fairly large win on the condvar benchmarks that I have tried.
Make libpthread_dbg build again.
file pthread_int.h was added on branch matt-mips64 on 2007-08-04 18:54:14 +0000
Make libpthread_dbg build again.
Some significant performance improvements, and a fix for a race with pthread detach/join. - Make mutex acquire spin for a short time, as done with spinlocks. - Make the number of spins controllable with the env var PTHREAD_NSPINS. - Reduce the amount of time that libpthread internal spinlocks are held. - Rely more on the barrier effects of park/unpark to avoid taking spinlocks. - Simplify the locking around pthreads and the global queues. - Align per-thread sync data on a 128 byte boundary. - Offset thread stacks by a small amount to try and reduce cache thrash.
Mirror a fix made to the kernel's condvars: After resuming execution, the thread must check to see if it has been restarted as a result of pthread_cond_signal(). If it has, but cannot take the wakeup (because of eg a pending Unix signal or timeout) then try to ensure that another thread sees it. This is necessary because there may be multiple waiters, and at least one should take the wakeup if possible.
- Test+branch is usually cheaper than making an indirect function call, so avoid making them. - When parking an LWP on a condition variable, point the hint argument at the mutex's waiters queue. Chances are we will be awoken from that later.
- Maintain a per-thread pointer to the last mutex acquired by the app, to be used only as as a hint. Clear the pointer when releasing the mutex. - When releasing a mutex, wake all waiters. Makes it possible to tranfer waiters from another object to a mutex.
- Simplify the interface to pthread__park() and friends slightly. - If sysctl() fails, complain.
Remove the PTHREAD_SA option. If M:N threads is reimplemented it's better off done with a seperate library.
Build without sys/sa.h present.
Fix bugs with and improve upon previous.
Conditionalised support for 1:1 threads. Needs associated kernel changes and more work to be useful.
remove unused IDLESPINS.
starting the pthread library (ie. calling pthread__start()) before any threads are created turned out to be not such a good idea. there are stronger requirements on what has to work in a forked child while a process is still single-threaded. so take all that stuff back out and fix the problems with single-threaded programs that are linked with libpthread differently, by checking if the library has been started and doing completely different stuff if it hasn't been: - for pthread_rwlock_timedrdlock(), just fail with EDEADLK immediately. - for sem_wait(), the only thing that can unlock the semaphore is a signal handler, so use sigsuspend() to wait for a signal. - for pthread_mutex_lock_slow(), just go into an infinite loop waiting for signals. I also noticed that there's a "sem2" test that has never worked in its single-threaded form. the problem there is that a signal handler tries to take a sem_t interlock which is already held when the signal is received. fix this too, by adding a single-threaded case for sig_trywait() that blocks signals instead of using the userland interlock.
in pthread_mutex_lock_slow(), pthread_rwlock_timedrdlock() and sem_wait(), call pthread__start() if it hasn't already been called. this avoids an internal assertion from the library if these routines are used before any threads are created and they need to sleep. fixes PR 20256, PR 24241, PR 25722, PR 26096.
Keep the kernel updated with signal action signal masks (act.sa_mask) until threads are started, since before that the traditional signal invocation method will be used. Fixes regress/lib/libpthread/sigmask2.
Remove pt_blockuc. If the debugger attempts to muck with the state of a blocked thread, return an error; this should be done through ptrace(2).
Local whitespace police.
Add a flag that indicates that a thread took a signal.
const'ify the arguments to pthread__assertfunc and _errorfunc; needed because the __func__ pseudo-var is strictly "const*" in gcc-3.4
add libpthread part of concurrency support for SA on MP systems - enable concurrency according to environment variable PTHREAD_CONCURRENCY - add idle VP wakeup if there are additional jobs and idle VPs - make reidlequeue per VP - enable spinning for locks - fix race condition in alarm processing - fix race condition in mutex locking - make debugging output line buffered and add VP prefix to debug lines
Rename pt_stacksize, pt_stacksize_lg, and pt_stackmask to pthread_*. External symbols can't start with pt_, as that is in the application namespace.
userland part of no-syscall upcall stack return - add pt_stackinfo to struct __pthread_st - add pthread__stackinfo_offset returning the offset from ss_sp to pt_stackinfo - pass stackinfo_offset to sa_register and set SA_FLAG_STACKINFO to make the kernel use it - call pthread__sa_recycle in pthread__resolve_locks; g/c recycleq and pthread__recycle_bulk - return stack in pthread__sa_recycle by incrementing sasi_stackgen - make pthread__sa_recycle debugging output formatting conditional on pthread__debug_newline
Handle block/unblock for threads in critical section without sa_unblockyield. XXX g/c sa_unblockyield in kernel later
Set default stack size to the current limit on the stack size as set with the shell's command to change limits. Make the PTHREAD_STACKSIZE environment variable override the default stack size. The old fixed stack size behaviour can be enable with PT_FIXEDSTACKSIZE_LG when building libpthread.
GC sigcontext<->mcontext code and __HAVE_SIGINFO. All supported archs have siginfo implemented.
Add: int pthread_attr_setcreatesuspend_np(pthread_attr_t *); int pthread_suspend_np(pthread_t); int pthread_resume_np(pthread_t); needed for java. Approved and fixed by cl.
convert to use siginfo/ucontext style of signal delivery instead of sigcontext. Approved by nathanw.
Remove possible race condition in upcall recycling.
Add a lock on the pt_flags field.
pthread.c was getting a bit unwieldly. Move pthread_attr stuff out into a new file, and put the shared private structure definition in pthread_int.h.
Adapt to internal structure name changes. Add a couple of useful flags and symbols.
Introduce a new pointer, pt_trapuc, that stores thread context captured by the kernel. Separating this from pt_uc makes it possible to avoid a race condition in pt_uc management near the STACK_SWITCH part of pthread__switch() and pthread__locked_switch(). Remove pt_sleepuc pointer, which was made obsolete by the previous round of UC juggling but still present in the assembler files.
Tiny bit of infrastructure for ABI-supported thread-ID storage.
pass lint: 1. add new pthread__abort() and change pthread_assert(0) to it. 2. put constcond in the right place (in the macro). 3. no space after pthread__assert macro.
Introduce a pthread__error() macro, for detected application errors as opposed to internal errors. The setting of the PTHREAD_ERRORMODE environment variable determines the runtime behavior. Valid settings are "ignore", "abort", and "print". The default is currently "abort".
Use a __predict_true() in the definition of pthread__assert().
Add support for naming a thread, using an API compatible with Tru64 Unix: * pthread_attr_getname_np() * pthread_attr_setname_np() * pthread_getname_np() * pthread_setname_np() In addition to being query'able by the application (for log messages, etc.), it is intended that these names can show up in the debugger. Reviewed by nathanw.
interposition sigtimedwait() with a thread-aware version, which uses single proxy thread to do the actual syscall, and blocks other threads in userland
Define a pthread-specific assert function, pthread__assert(), that bails out without trying to flush stdio buffers.
minor whitespace changes
Add a new internal function, pthread__sched_sleepers(), which iterates over a sleep queue and puts everything on the run queue. This permits the iteration to be inside the acquisition of the run queue spinlock, avoiding repetitive acquire/release cycles.
More signal rearranging: - Signal handlers now simply continue executing the current thread, rather than trying to put themselves back on the queue that they came from, which was rather fragile. As a result, all callers of pthread__block() must be prepared to handle spurious wakeups. - When a signal arrives for a thread that is blocked in the kernel, note this in another field in pthread_st and set a flag. Process the signal and set up the trampoline for the handler *after* the thread unblocks, so that both the trampoline and the returned state from the kernel are preserved. - Factor out some code into a pthread__deliver_signal() routine; the signal-taking code in pthread_sigmask() should be able to use this soon. This is still gross, and there are still some terrible MP issues lurking here, but progress crawls along.
de-lint
Merge the nathanw_sa branch.
Back out previous, and fix the problem by only providing the ucontext prototypes if the names are not already #define'd.
Put the #include "pthread_md.h" back where it was so the #define stuff in i386/pthread_md.h works again.
* Move the pthread_sigmask() prototype to <signal.h>. * Don't include <signal.h> in <pthread.h>. * Add code to the signal trampoline to convert from the ucontext to a sigcontext, and back again (XXX though, only callee-save regs for _UC_USER contexts). This is necessary in order to support e.g. GCC's libjava, which depends on the traditional Unix semantics of changes made to the sigcontext being visible when the handler returns.
Move debug-only stuff into its own header.
Add support for using RAS lock primitives on uniprocessors where RAS is available.
Add a lock to the alarm structure, and adjust the alarm__fired() function to take a properly typed pointer.
Whitespace.
As pointed out recently on comp.programming.threads, POSIX requires that pthread_detach() and pthread_join() must return ESRCH, not invoke undefined behavior, even if handed completely nonsensical pthread_t values. So, search through the list of threads to see if the pthread_t value is valid.
Compact timer magic slightly.
Define a new macro _INITCONTEXT_U() which initializes a ucontext_t to "blank" values cheaply. Include _INITCONTEXT_U_MD() for setting machine-specific registers that must be set to have a valid context.
Implement a somewhat crude forced round-robin scheduling option, enabled by setting PTHREAD_RRTIME to the number of milliseconds between timer events. Makes use of CLOCK_VIRTUAL POSIX timers.
Fix a typo in a comment.
Infrastructure for machine-dependent init code.
Move inclusion of pthread_md.h down to just where it's needed.
Add a flag indicating that the thread should be woken up on a signal.
Make the caller of the alarm functions supply the alarm data structure, instead of malloc'ing() it. "malloc() inside spinlocks considered harmful." Duh.
Increase the thread stack size from 64k to 256k. This is necessary for some stack-hogging apps like Apache, and for systems with page sizes larger than 16k.
Many signal improvements: - Implement sigsuspend() - Take pending signals that are unblocked in pthread_sigmask(). - Tweak the signal mask passed by sigaction() to permit us to manage our own thread-specific signal masks. - Don't try to deliver signals to zombie threads. - Prevent a race between deciding a thread can take a signal and actually taking it. - Don't put threads that are blocked in a syscall on the run queue. - Add debug logging.
Track the object being slept on.
Add a numeric ID to the pthread structure. Add a "Running" state, distinct from "Runnable".
Oops. It's slightly less confusing if PT_ALARMTIMER_MAGIC and _PT_MUTEX_MAGIC have different values.
Declare POSIX spinlock interfaces: pthread_spin_*() and pthread_spinlock_t. Move the internal spinlock interface declarations back to the internal header.
Move alarm initialization into pthread_alarms.c.
Cancellation support. This includes implementing pthread_cancel() and pthread_testcancel(), making pthread_join() and pthread_cond_wait() cancellation points, introducing new states to distinguish waiting on a sleep queue from waiting in the kernel, and introducing a locking protocol around changing a thread's run state.
Rename pt_spin_t to pthread_spin_t for namespace cleanliness. Define cleanup data structures: cleanup queue, cancel state.
Define a "_UC_USER" flag to b eused in ucontext.uc_flags to indicate that only "user" (callee-save) state is stored.
pthread_malloc() and pthread_free() are superfluous.
Add thread-specific data area to thread structure. Declare pthread_init() startup function. Declare pthread__destroy_tsd() utility function. Move spinlock functions to pthread.h.
Oops, that's old_preempt, not new_preempt.
Add a field to the pthread structure to hold the "real" ucontext pointer for threads preempted in the early stage of locked_switch.
Prototype "force" argument to pthread__debuglog_init().
pthread__switch() no longer needs to be implemeted in assembler, and doesn't have a lock parameter.
Enable ERRORCHECK. Standardize MAGIC/DEAD values.
Some whitespace cleanup.
The _*context_u functions are now in the pthread library.
Add a mechanisim for debugging that is less likely to change scheduling behaviour than using printf (writing to a shared memory segment), and a simple tool for dumping the buffer. Partly inspired by the kernel msgbuf code.
Create more idle threads. In the current system, we need one for every level of preemption nesting.
Note copyright. Standardize RCS IDs.
Shrink the per-thread stack to a more reasonable size.
Use the new queue type and type headers. Prototype pthread_lockinit(). A little whitespace cleanup.
The beginnings of a scheduler activations-based pthread library.
file pthread_int.h was initially added on branch nathanw_sa.