• Dmitry Vyukov's avatar
    runtime: use per-goroutine sequence numbers in tracer · a3703618
    Dmitry Vyukov authored
    Currently tracer uses global sequencer and it introduces
    significant slowdown on parallel machines (up to 10x).
    Replace the global sequencer with per-goroutine sequencer.
    
    If we assign per-goroutine sequence numbers to only 3 types
    of events (start, unblock and syscall exit), it is enough to
    restore consistent partial ordering of all events. Even these
    events don't need sequence numbers all the time (if goroutine
    starts on the same P where it was unblocked, then start does
    not need sequence number).
    The burden of restoring the order is put on trace parser.
    Details of the algorithm are described in the comments.
    
    On http benchmark with GOMAXPROCS=48:
    no tracing: 5026 ns/op
    tracing: 27803 ns/op (+453%)
    with this change: 6369 ns/op (+26%, mostly for traceback)
    
    Also trace size is reduced by ~22%. Average event size before: 4.63
    bytes/event, after: 3.62 bytes/event.
    
    Besides running trace tests, I've also tested with manually broken
    cputicks (random skew for each event, per-P skew and episodic random skew).
    In all cases broken timestamps were detected and no test failures.
    
    Change-Id: I078bde421ccc386a66f6c2051ab207bcd5613efa
    Reviewed-on: https://go-review.googlesource.com/21512
    Run-TryBot: Dmitry Vyukov <dvyukov@google.com>
    Reviewed-by: 's avatarAustin Clements <austin@google.com>
    TryBot-Result: Gobot Gobot <gobot@golang.org>
    a3703618
trace.go 30.6 KB