runtime: run safe-point function before entering _Psyscall

Currently, we run a P's safe-point function immediately after entering _Psyscall state. This is unsafe, since as soon as we put the P in _Psyscall, we no longer control the P and another M may claim it. We'll still run the safe-point function only once (because doing so races on an atomic), but the P may no longer be at a safe-point when we do so. In particular, this means that the use of forEachP to dispose all P's gcw caches is unsafe. A P may enter a syscall, run the safe-point function, and dispose the P's gcw cache concurrently with another M claiming the P and attempting to use its gcw cache. If this happens, we may empty the gcw's workbuf after putting it on work.{full,partial}, or add pointers to it after putting it in work.empty. This will cause an assertion failure when we later pop the workbuf from the list and its object count is inconsistent with the list we got it from. Fix this by running the safe-point function just before putting the P in _Psyscall. Related to #11640. This probably fixes this issue, but while I'm able to show that we can enter a bad safe-point state as a result of this, I can't reproduce that specific failure. Change-Id: I6989c8ca7ef2a4a941ae1931e9a0748cbbb59434 Reviewed-on: https://go-review.googlesource.com/12124 Run-TryBot: Austin Clements <austin@google.com> Reviewed-by: Russ Cox <rsc@golang.org>

runtime: run safe-point function before entering _Psyscall
Currently, we run a P's safe-point function immediately after entering _Psyscall state. This is unsafe, since as soon as we put the P in _Psyscall, we no longer control the P and another M may claim it. We'll still run the safe-point function only once (because doing so races on an atomic), but the P may no longer be at a safe-point when we do so. In particular, this means that the use of forEachP to dispose all P's gcw caches is unsafe. A P may enter a syscall, run the safe-point function, and dispose the P's gcw cache concurrently with another M claiming the P and attempting to use its gcw cache. If this happens, we may empty the gcw's workbuf after putting it on work.{full,partial}, or add pointers to it after putting it in work.empty. This will cause an assertion failure when we later pop the workbuf from the list and its object count is inconsistent with the list we got it from. Fix this by running the safe-point function just before putting the P in _Psyscall. Related to #11640. This probably fixes this issue, but while I'm able to show that we can enter a bad safe-point state as a result of this, I can't reproduce that specific failure. Change-Id: I6989c8ca7ef2a4a941ae1931e9a0748cbbb59434 Reviewed-on: https://go-review.googlesource.com/12124 Run-TryBot: Austin Clements <austin@google.com> Reviewed-by: Russ Cox <rsc@golang.org>
edfc9797 · Austin Clements · 64e53337 · edfc9797
Commit edfc9797 authored Jul 13, 2015 by Austin Clements
Hide whitespace changes
Inline Side-by-side

Showing with 5 additions and 4 deletions

proc1.go src/runtime/proc1.go +5 -4

No files found.
--- a/src/runtime/proc1.go
+++ b/src/runtime/proc1.go
@@ -1828,15 +1828,16 @@ func reentersyscall(pc, sp uintptr) {
 		save(pc, sp)
 	}

+	if _g_.m.p.ptr().runSafePointFn != 0 {
+		// runSafePointFn may stack split if run on this stack
+		systemstack(runSafePointFn)
+	}
+
 	_g_.m.syscalltick = _g_.m.p.ptr().syscalltick
 	_g_.sysblocktraced = true
 	_g_.m.mcache = nil
 	_g_.m.p.ptr().m = 0
 	atomicstore(&_g_.m.p.ptr().status, _Psyscall)
-	if _g_.m.p.ptr().runSafePointFn != 0 {
-		// runSafePointFn may stack split if run on this stack
-		systemstack(runSafePointFn)
-	}
 	if sched.gcwaiting != 0 {
 		systemstack(entersyscall_gcwait)
 		save(pc, sp)