Commit f0dd0028 authored by Austin Clements's avatar Austin Clements

runtime: use separate count and note for forEachP

Currently, forEachP reuses the stopwait and stopnote fields from
stopTheWorld to track how many Ps have not responded to the safe-point
request and to sleep until all Ps have responded.

It was assumed this was safe because both stopTheWorld and forEachP
must occur under the worlsema and hence stopwait and stopnote cannot
be used for both purposes simultaneously and callers could always
determine the appropriate use based on sched.gcwaiting (which is only
set by stopTheWorld). However, this is not the case, since it's
possible for there to be a window between when an M observes that
gcwaiting is set and when it checks stopwait during which stopwait
could have changed meanings. When this happens, the M decrements
stopwait and may wakeup stopnote, but does not otherwise participate
in the forEachP protocol. As a result, stopwait is decremented too
many times, so it may reach zero before all Ps have run the safe-point
function, causing forEachP to wake up early. It will then either
observe that some P has not run the safe-point function and panic with
"P did not run fn", or the remaining P (or Ps) will run the safe-point
function before it wakes up and it will observe that stopwait is
negative and panic with "not stopped".

Fix this problem by giving forEachP its own safePointWait and
safePointNote fields.

One known sequence of events that can cause this race is as
follows. It involves three actors:

G1 is running on M1 on P1. P1 has an empty run queue.

G2/M2 is in a blocked syscall and has lost its P. (The details of this
don't matter, it just needs to be in a position where it needs to grab
an idle P.)

GC just started on G3/M3/P3. (These aren't very involved, they just
have to be separate from the other G's, M's, and P's.)

1. GC calls stopTheWorld(), which sets sched.gcwaiting to 1.

Now G1/M1 begins to enter a syscall:

2. G1/M1 invokes reentersyscall, which sets the P1's status to
   _Psyscall.

3. G1/M1's reentersyscall observes gcwaiting != 0 and calls
   entersyscall_gcwait.

4. G1/M1's entersyscall_gcwait blocks acquiring sched.lock.

Back on GC:

5. stopTheWorld cas's P1's status to _Pgcstop, does other stuff, and
   returns.

6. GC does stuff and then calls startTheWorld().

7. startTheWorld() calls procresize(), which sets P1's status to
   _Pidle and puts P1 on the idle list.

Now G2/M2 returns from its syscall and takes over P1:

8. G2/M2 returns from its blocked syscall and gets P1 from the idle
   list.

9. G2/M2 acquires P1, which sets P1's status to _Prunning.

10. G2/M2 starts a new syscall and invokes reentersyscall, which sets
    P1's status to _Psyscall.

Back on G1/M1:

11. G1/M1 finally acquires sched.lock in entersyscall_gcwait.

At this point, G1/M1 still thinks it's running on P1. P1's status is
_Psyscall, which is consistent with what G1/M1 is doing, but it's
_Psyscall because *G2/M2* put it in to _Psyscall, not G1/M1. This is
basically an ABA race on P1's status.

Because forEachP currently shares stopwait with stopTheWorld. G1/M1's
entersyscall_gcwait observes the non-zero stopwait set by forEachP,
but mistakes it for a stopTheWorld. It cas's P1's status from
_Psyscall (set by G2/M2) to _Pgcstop and proceeds to decrement
stopwait one more time than forEachP was expecting.

Fixes #10618. (See the issue for details on why the above race is safe
when forEachP is not involved.)

Prior to this commit, the command
  stress ./runtime.test -test.run TestFutexsleep\|TestGoroutineProfile
would reliably fail after a few hundred runs. With this commit, it
ran for over 2 million runs and never crashed.

Change-Id: I9a91ea20035b34b6e5f07ef135b144115f281f30
Reviewed-on: https://go-review.googlesource.com/10157Reviewed-by: 's avatarRuss Cox <rsc@golang.org>
parent 277acca2
......@@ -790,10 +790,10 @@ func forEachP(fn func(*p)) {
_p_ := getg().m.p.ptr()
lock(&sched.lock)
if sched.stopwait != 0 {
throw("forEachP: sched.stopwait != 0")
if sched.safePointWait != 0 {
throw("forEachP: sched.safePointWait != 0")
}
sched.stopwait = gomaxprocs - 1
sched.safePointWait = gomaxprocs - 1
sched.safePointFn = fn
// Ask all Ps to run the safe point function.
......@@ -813,11 +813,11 @@ func forEachP(fn func(*p)) {
for p := sched.pidle.ptr(); p != nil; p = p.link.ptr() {
if cas(&p.runSafePointFn, 1, 0) {
fn(p)
sched.stopwait--
sched.safePointWait--
}
}
wait := sched.stopwait > 0
wait := sched.safePointWait > 0
unlock(&sched.lock)
// Run fn for the current P.
......@@ -843,15 +843,15 @@ func forEachP(fn func(*p)) {
for {
// Wait for 100us, then try to re-preempt in
// case of any races.
if notetsleep(&sched.stopnote, 100*1000) {
noteclear(&sched.stopnote)
if notetsleep(&sched.safePointNote, 100*1000) {
noteclear(&sched.safePointNote)
break
}
preemptall()
}
}
if sched.stopwait != 0 {
throw("forEachP: not stopped")
if sched.safePointWait != 0 {
throw("forEachP: not done")
}
for i := 0; i < int(gomaxprocs); i++ {
p := allp[i]
......@@ -887,9 +887,9 @@ func runSafePointFn() {
}
sched.safePointFn(p)
lock(&sched.lock)
sched.stopwait--
if sched.stopwait == 0 {
notewakeup(&sched.stopnote)
sched.safePointWait--
if sched.safePointWait == 0 {
notewakeup(&sched.safePointNote)
}
unlock(&sched.lock)
}
......@@ -1262,9 +1262,9 @@ func handoffp(_p_ *p) {
}
if _p_.runSafePointFn != 0 && cas(&_p_.runSafePointFn, 1, 0) {
sched.safePointFn(_p_)
sched.stopwait--
if sched.stopwait == 0 {
notewakeup(&sched.stopnote)
sched.safePointWait--
if sched.safePointWait == 0 {
notewakeup(&sched.safePointNote)
}
}
if sched.runqsize != 0 {
......
......@@ -441,7 +441,9 @@ type schedt struct {
// safepointFn should be called on each P at the next GC
// safepoint if p.runSafePointFn is set.
safePointFn func(*p)
safePointFn func(*p)
safePointWait int32
safePointNote note
profilehz int32 // cpu profiling rate
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment