1. 08 Oct, 2015 2 commits
  2. 07 Oct, 2015 4 commits
  3. 06 Oct, 2015 16 commits
  4. 05 Oct, 2015 6 commits
  5. 04 Oct, 2015 3 commits
  6. 03 Oct, 2015 6 commits
  7. 02 Oct, 2015 3 commits
    • Austin Clements's avatar
      runtime: use 4 byte writes in amd64p32 memmove/memclr · 9f6df6c9
      Austin Clements authored
      Currently, amd64p32's memmove and memclr use 8 byte writes as much as
      possible and 1 byte writes for the tail of the object. However, if an
      object ends with a 4 byte pointer at an 8 byte aligned offset, this
      may copy/zero the pointer field one byte at a time, allowing the
      garbage collector to observe a partially copied pointer.
      
      Fix this by using 4 byte writes instead of 8 byte writes.
      
      Updates #12552.
      
      Change-Id: I13324fd05756fb25ae57e812e836f0a975b5595c
      Reviewed-on: https://go-review.googlesource.com/15370
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: 's avatarBrad Fitzpatrick <bradfitz@golang.org>
      Reviewed-by: 's avatarKeith Randall <khr@golang.org>
      9f6df6c9
    • Austin Clements's avatar
      runtime: adjust huge page flags only on huge page granularity · 44078a32
      Austin Clements authored
      This fixes an issue where the runtime panics with "out of memory" or
      "cannot allocate memory" even though there's ample memory by reducing
      the number of memory mappings created by the memory allocator.
      
      Commit 7e1b61c7 worked around issue #8832 where Linux's transparent
      huge page support could dramatically increase the RSS of a Go process
      by setting the MADV_NOHUGEPAGE flag on any regions of pages released
      to the OS with MADV_DONTNEED. This had the side effect of also
      increasing the number of VMAs (memory mappings) in a Go address space
      because a separate VMA is needed for every region of the virtual
      address space with different flags. Unfortunately, by default, Linux
      limits the number of VMAs in an address space to 65530, and a large
      heap can quickly reach this limit when the runtime starts scavenging
      memory.
      
      This commit dramatically reduces the number of VMAs. It does this
      primarily by only adjusting the huge page flag at huge page
      granularity. With this change, on amd64, even a pessimal heap that
      alternates between MADV_NOHUGEPAGE and MADV_HUGEPAGE must reach 128GB
      to reach the VMA limit. Because of this rounding to huge page
      granularity, this change is also careful to leave large used and
      unused regions huge page-enabled.
      
      This change reduces the maximum number of VMAs during the runtime
      benchmarks with GODEBUG=scavenge=1 from 692 to 49.
      
      Fixes #12233.
      
      Change-Id: Ic397776d042f20d53783a1cacf122e2e2db00584
      Reviewed-on: https://go-review.googlesource.com/15191Reviewed-by: 's avatarKeith Randall <khr@golang.org>
      44078a32
    • Austin Clements's avatar
      runtime: remove sweep wait loop in finishsweep_m · 9a31d38f
      Austin Clements authored
      In general, finishsweep_m must block until any spans that are
      concurrently being swept have been swept. It accomplishes this by
      looping over all spans, which, as in the previous commit, takes
      ~1ms/heap GB. Unfortunately, we do this during the STW sweep
      termination phase, so multi-gigabyte heaps can push our STW time past
      10ms.
      
      However, there's no need to do this wait if the world is stopped
      because, in effect, stopping the world already had to wait for
      anything that was sweeping (and if it didn't, the wait in
      finishsweep_m would deadlock). Hence, we can simply skip this loop if
      the world is stopped, such as during sweep termination. In fact,
      currently all calls to finishsweep_m are STW, but this hasn't always
      been the case and may not be the case in the future, so we keep the
      logic around.
      
      For 24GB heaps, this reduces max pause time by 75% relative to tip and
      by 90% relative to Go 1.5. Notably, all pauses are now well under
      10ms. Here are the results for the garbage benchmark:
      
                     ------------- max pause ------------
      Heap   Procs   after change   before change   1.5.1
      24GB     12        3.8ms          16ms         37ms
      24GB      4        3.7ms          16ms         37ms
       4GB      4        3.7ms           3ms        6.9ms
      
      In the 4GB/4P case, it seems the "before change" run got lucky: the
      max went up, but the 99%ile pause time went down from 3ms to 2.04ms.
      
      Change-Id: Ica22189559f231d408ef2815019c9dbb5f38bf31
      Reviewed-on: https://go-review.googlesource.com/15071Reviewed-by: 's avatarRick Hudson <rlh@golang.org>
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      9a31d38f