runtime: work around "P has cached GC work" failures

We still don't understand what's causing there to be remaining GC work
when we enter mark termination, but in order to move forward on this
issue, this CL implements a work-around for the problem.

If debugCachedWork is false, this CL does a second check for remaining
GC work as soon as it stops the world for mark termination. If it
finds any work, it starts the world again and re-enters concurrent
mark. This will increase STW time by a small amount proportional to
GOMAXPROCS, but fixes a serious correctness issue.

This works-around #27993.

Change-Id: Ia23b85dd6c792ee8d623428bd1a3115631e387b8
Reviewed-on: https://go-review.googlesource.com/c/156140
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Rick Hudson <rlh@golang.org>
This commit is contained in:
Austin Clements 2019-01-03 14:48:30 -05:00
parent 9a7278ae47
commit 95a6f112c6

View file

@ -1367,7 +1367,7 @@ var gcMarkDoneFlushed uint32
// termination.
//
// For debugging issue #27993.
const debugCachedWork = true
const debugCachedWork = false
// gcWorkPauseGen is for debugging the mark completion algorithm.
// gcWork put operations spin while gcWork.pauseGen == gcWorkPauseGen.
@ -1525,6 +1525,33 @@ top:
throw("throwOnGCWork")
}
}
} else {
// For unknown reasons (see issue #27993), there is
// sometimes work left over when we enter mark
// termination. Detect this and resume concurrent
// mark. This is obviously unfortunate.
//
// Switch to the system stack to call wbBufFlush1,
// though in this case it doesn't matter because we're
// non-preemptible anyway.
restart := false
systemstack(func() {
for _, p := range allp {
wbBufFlush1(p)
if !p.gcw.empty() {
restart = true
break
}
}
})
if restart {
getg().m.preemptoff = ""
systemstack(func() {
now := startTheWorldWithSema(true)
work.pauseNS += now - work.pauseStart
})
goto top
}
}
// Disable assists and background workers. We must do