Commit 1fe39339 authored by Brad Fitzpatrick's avatar Brad Fitzpatrick

net/http: fix Transport race returning bodyless responses and reusing conns

The Transport had a delicate protocol between its readLoop goroutine
and the goroutine calling RoundTrip. The basic concern is that the
caller's RoundTrip goroutine wants to wait for either a
connection-level error (the conn dying) or the response. But sometimes
both happen: there's a valid response (without a body), but the conn
is also going away. Both goroutines' logic dealing with this had grown
large and complicated with hard-to-follow comments over the years.

Simplify and document. Pull some bits into functions and do all
bodyless stuff in one place (it's special enough), rather than having
a bunch of conditionals scattered everywhere. One test is no longer
even applicable since the race it tested is no longer possible (the
code doesn't exist).

The bug that this fixes is that when the Transport reads a bodyless
response from a server, it was returning that response before
returning the persistent connection to the idle pool. As a result,
~1/1000 of serial requests would end up creating a new connection
rather than re-using the just-used connection due to goroutine
scheduling chance. Instead, this now adds bodyless responses'
connections back to the idle pool first, then sends the response to
the RoundTrip goroutine, but making sure that the RoundTrip goroutine
is outside of its select on the connection dying.

There's a new buffered channel involved now, which is a minor
complication, but it's much more self-contained and well-documented
than the previous complexity. (The alternative of making the
responseAndError channel itself unbuffered is too invasive and risky
at this point; it would require a number of changes to avoid
deadlocked goroutines in error cases)

In any case, flakes look to be gone now. We'll see if trybots agree.

Fixes #13633

Change-Id: I95a22942b2aa334ae7c87331fddd751d4cdfdffc
Reviewed-on: https://go-review.googlesource.com/17890Reviewed-by: 's avatarRuss Cox <rsc@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
parent 1babba2e
......@@ -34,10 +34,9 @@ func init() {
}
var (
SetInstallConnClosedHook = hookSetter(&testHookPersistConnClosedGotRes)
SetEnterRoundTripHook = hookSetter(&testHookEnterRoundTrip)
SetTestHookWaitResLoop = hookSetter(&testHookWaitResLoop)
SetRoundTripRetried = hookSetter(&testHookRoundTripRetried)
SetEnterRoundTripHook = hookSetter(&testHookEnterRoundTrip)
SetTestHookWaitResLoop = hookSetter(&testHookWaitResLoop)
SetRoundTripRetried = hookSetter(&testHookRoundTripRetried)
)
func SetReadLoopBeforeNextReadHook(f func()) {
......
This diff is collapsed.
......@@ -602,6 +602,7 @@ func TestTransportHeadChunkedResponse(t *testing.T) {
tr := &Transport{DisableKeepAlives: false}
c := &Client{Transport: tr}
defer tr.CloseIdleConnections()
// Ensure that we wait for the readLoop to complete before
// calling Head again
......@@ -2661,62 +2662,6 @@ func TestTransportRangeAndGzip(t *testing.T) {
res.Body.Close()
}
// Previously, we used to handle a logical race within RoundTrip by waiting for 100ms
// in the case of an error. Changing the order of the channel operations got rid of this
// race.
//
// In order to test that the channel op reordering works, we install a hook into the
// roundTrip function which gets called if we saw the connection go away and
// we subsequently received a response.
func TestTransportResponseCloseRace(t *testing.T) {
if testing.Short() {
t.Skip("skipping in short mode")
}
defer afterTest(t)
ts := httptest.NewServer(HandlerFunc(func(w ResponseWriter, r *Request) {
}))
defer ts.Close()
sawRace := false
SetInstallConnClosedHook(func() {
sawRace = true
})
defer SetInstallConnClosedHook(nil)
SetTestHookWaitResLoop(func() {
// Make the select race much more likely by blocking before
// the select, so both will be ready by the time the
// select runs.
time.Sleep(50 * time.Millisecond)
})
defer SetTestHookWaitResLoop(nil)
tr := &Transport{
DisableKeepAlives: true,
}
req, err := NewRequest("GET", ts.URL, nil)
if err != nil {
t.Fatal(err)
}
// selects are not deterministic, so do this a bunch
// and see if we handle the logical race at least once.
for i := 0; i < 10000; i++ {
resp, err := tr.RoundTrip(req)
if err != nil {
t.Fatalf("unexpected error: %s", err)
continue
}
resp.Body.Close()
if sawRace {
t.Logf("saw race after %d iterations", i+1)
break
}
}
if !sawRace {
t.Errorf("didn't see response/connection going away race")
}
}
// Test for issue 10474
func TestTransportResponseCancelRace(t *testing.T) {
defer afterTest(t)
......@@ -2953,6 +2898,40 @@ func TestTransportAutomaticHTTP2(t *testing.T) {
}
}
// Issue 13633: there was a race where we returned bodyless responses
// to callers before recycling the persistent connection, which meant
// a client doing two subsequent requests could end up on different
// connections. It's somewhat harmless but enough tests assume it's
// not true in order to test other things that it's worth fixing.
// Plus it's nice to be consistent and not have timing-dependent
// behavior.
func TestTransportReuseConnEmptyResponseBody(t *testing.T) {
defer afterTest(t)
cst := newClientServerTest(t, h1Mode, HandlerFunc(func(w ResponseWriter, r *Request) {
w.Header().Set("X-Addr", r.RemoteAddr)
// Empty response body.
}))
defer cst.close()
n := 100
if testing.Short() {
n = 10
}
var firstAddr string
for i := 0; i < n; i++ {
res, err := cst.c.Get(cst.ts.URL)
if err != nil {
log.Fatal(err)
}
addr := res.Header.Get("X-Addr")
if i == 0 {
firstAddr = addr
} else if addr != firstAddr {
t.Fatalf("On request %d, addr %q != original addr %q", i+1, addr, firstAddr)
}
res.Body.Close()
}
}
func wantBody(res *Response, err error, want string) error {
if err != nil {
return err
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment