c++ - Why is thread local storage not implemented with page table mappings? -
c++ - Why is thread local storage not implemented with page table mappings? -
i hoping utilize c++11 thread_local
keyword per-thread boolean flag going accessed frequently.
however, compilers seem implemented thread local storage table maps integer ids (slots) variable's address on current thread. lookup happen within performance-critical code path, have concerns performance.
the way have expected thread local storage implemented allocating virtual memory ranges backed different physical pages depending on thread. way, accessing flag same cost other memory access, since mmu takes care of mapping.
why none of mainstream compilers take advantage of page table mappings in way?
i suppose can implement own "thread-specific page" mmap
on linux , virtualalloc
on win32, seems pretty mutual use-case. if knows of existing or improve solutions, please point me them.
i've considered storing std::atomic<std::thread::id>
within each object represent active thread, profiling shows check std::this_thread::get_id() == active_thread
quite expensive.
on linux/x86-64 thread local storage implemented thru special segment register %fs
(per x86-64 abi page 21...)
so next code (i'm using c + gcc extension __thread
syntax, same c++11 thread_local
)
__thread int x; int f(void) { homecoming x; }
is compiled (with gcc -o -fverbose-asm -s
) into:
.text .ltext0: .globl f .type f, @function f: .lfb0: .file 1 "tl.c" .loc 1 3 0 .cfi_startproc .loc 1 3 0 movl %fs:x@tpoff, %eax # x, ret .cfi_endproc .lfe0: .size f, .-f .globl x .section .tbss,"awt",@nobits .align 4 .type x, @object .size x, 4 x: .zero 4
therefore, contrarily fears, access tls quick on linux/x86-64. not implemented table (instead kernel & runtime manage %fs
segment register point thread-specific memory zone, , compiler & linker manage offset there). however, old pthread_getspecific indeed went thru table, useless 1 time have tls.
btw, by definition, threads in same process share same address space in virtual memory, since process has own single address space. (see /proc/self/maps
etc... see proc(5) more /proc/
, , mmap(2); c++11 thread library based on pthreads implemented using clone(2)). "thread-specific memory mapping" contradiction: 1 time task (the thing run kernel scheduler) has own address space, called process (not thread). defining characteristic of threads in same process share mutual address space (and other entities, file descriptors).
c++ multithreading performance c++11 thread-local-storage
Comments
Post a Comment