11 Best Practices for Low Latency Systems

29 thoughts on “11 Best Practices for Low Latency Systems”

  1. Good article. One beef: Go doesn’t have a sophisticated memory model like Java or C++11. If your system fits with the go-routine and channels architecture it’s all good else no luck. AFAIK you cannot opt out of the run-time scheduler so no native OS threads and the ability to build your own lock free data structures like (SPSC queues/ring-buffers) is also severely lacking.

      1. Benjamin, the Go memory model detailed here: http://golang.org/ref/mem is mostly in terms of channels and mutexes. I looked through the packages that you listed and while the data structures there are “lock free” they are not equivalent to what one might build in Java/C++11. The sync package as of now, doesn’t have support for relaxed atomics or the acquire/release semantics of C++11. Without that support its difficult to build SPSC data structures as efficient as the ones possible in C++/Java. The projects that you link use atomic.Add… which is a sequentially consistent atomic. It’s built with XADD as it should be – https://github.com/tonnerre/golang/blob/master/src/pkg/sync/atomic/asm_amd64.s

        I am not trying to knock Go down. It takes minimal effort to write async IO and concurrent
        code that is sufficiently fast for most people. The std library too is highly tuned for performance. Golang also has support for structs which is missing in Java. But as it stands, I think the simplistic memory model and the go-routine runtime stand in the way of building the kind of systems you are talking about.

    1. While I’ll fight to use higher-level languages just as much as the next guy, I think the only way to achieve the low-latency apps people are looking for is to drop down to a language like C. It seems the tougher it is to write in a language, the faster it executes.

      1. I would strongly recommend you look at the work being done in the projects and blogs that I linked to. The JVM is quickly becoming the hot spot for these types of systems because it provides a strong memory model and garbage collection which enable lock free programming which is nearly or completely impossible with a weak or undefined memory model and reference counters for memory management.

      2. Garbage collection for lock free programming is a bit of a deus ex machina. MPMC and SPSC queues can both be built without needing GC. There are also plenty of ways to do lock free programming without garbage collection and reference counting is not the only way. Hazard pointers, RCU, Proxy-Collectors etc all provide support for deferred reclamation and are usually coded in support of an algorithm (not generic), hence they are usually much easier to build. Of course the trade-off lies in the fact that production quality GCs have a lot of work put into them and will help the less experienced programmer write lock-free algorithms (should they be doing this at all?) without coding up deferred reclamation schemes. Some links on work done in this field: http://www.cs.toronto.edu/~tomhart/papers/tomhart_thesis.pdf and http://queue.acm.org/detail.cfm?id=2488549

        Yes C/C++ just recently gained a memory model, but that doesn’t mean that they were completely unsuitable for lock-free code earlier. GCC and other high quality compilers had compiler specific directives to do lock free programming on supported platforms for a really long time – it was just not standardized in the language. Linux and other platforms have provided these primitives for some time too. Java’s unique position WAS that it provided a formalized memory model that it guaranteed to work on all supported platforms. Though in principle this is awesome, most server side developers work on one platform (Linux/Windows). They already had the tools to build lock free code for their platform.

        GC is a great tool but not a necessary one. It has a cost both in terms of performance and in complexity (all the tricks needed to avoid STW GC). C++11/C11 already have support for proper memory models. Let’s not forget that JVMs have no responsibility to support the Unsafe API in the future. Unsafe code is “unsafe” so you lose the benefits of Java’s safety features. Finally IMO the Unsafe code used to layout memory and simulate structs in Java looks a lot uglier than C/C++ structs where the compiler is doing that work for you in a reliable manner. C and C++ also provide access to all the low level platform specific power tools like the PAUSE ins, SSE/AVX/NEON etc. You can even tune your code layout through linker scripts! The power provided by the C/C++ tool chain is really unmatched by the JVM. Java is a great platform none the less, but I think it’s biggest advantage is that ordinary business logic (90% of your code?) can still depend on GC and safety features and make use of highly tuned and tested libraries written with Unsafe. This is a great trade-off between getting the last 5% of perf and being productive. A trade-off that makes sense for a lot of people but a trade-off none the less. Writing complicated application code in C/C++ is a nightmare after all.

        On Mon, Mar 10, 2014 at 12:52 PM, CodeDependents wrote:

        > Graham Swan commented: “I’ll take a look, Benjamin. Thanks for > pointing them out.”

  2. Missing the 12th: Do not use garbadge collected languages. GC is a bottleneck in the worstcase. It likely halts all threads. It’s a global. It distracts the architect to manage one of the most crital resources (CPU-near memory) himself.

    1. Actually a lot of this work comes directly from Java. To do lock free programming right you need a clear memory model which c++ only recently gained. If you know how to work with GC and not against it you can create low latency systems often with much more ease.

      1. I have to agree with Ben here. There has been a lot of progress in GC parallelism in the last decade or so with the G1 collector being the latest incantation. It may take a little time to tune the heap and various knobs to get the GC to collect with almost no pause, but this pales in comparison to the developer time it takes to not have GC.

      2. You can even go one step further and create systems that produce so little garbage that you can easily push your GC outside of your operating window. This is how all of the high frequency trading shops do it when running on the JVM.

      3. Garbage collection for lock free programming is a bit of a deus ex machina. MPMC and SPSC queues can both be built without needing GC. There are also plenty of ways to do lock free programming without garbage collection and reference counting is not the only way. Hazard pointers, RCU, Proxy-Collectors etc all provide support for deferred reclamation and are coded in support of an algorithm (not generic), hence they are much easier to build. Of course the trade-off lies in the fact that production quality GCs have a lot of work put into them and will help the less experienced programmer write lock-free algorithms (should they be doing this at all?) without coding up deferred reclamation schemes. Some links on work done in this field: http://www.cs.toronto.edu/~tomhart/papers/tomhart_thesis.pdf and http://queue.acm.org/detail.cfm?id=2488549

        Yes C/C++ just recently gained a memory model, but that doesn’t mean that they were completely unsuitable for lock-free code earlier. GCC and other high quality compilers had compiler specific directives to do lock free programming on supported platforms for a really long time – it was just not standardized in the language. Linux and other platforms have provided these primitives for some time too. Java’s unique position WAS that it provided a formalized memory model that it guaranteed to work on all supported platforms. Though in principle this is awesome, most server side developers work on one platform (Linux/Windows). They already had the tools to build lock free code for their platform.

        GC is a great tool but not a necessary one. It has a cost both in terms of performance and in complexity (all the tricks needed to delay and avoid STW GC). C++11/C11 already have support for proper memory models. Let’s not forget that JVMs have no responsibility to support the Unsafe API in the future. Unsafe code is “unsafe” so you lose the benefits of Java’s safety features. Finally IMO the Unsafe code used to layout memory and simulate structs in Java looks a lot uglier than C/C++ structs where the compiler is doing that work for you in a reliable manner. C and C++ also provide access to all the low level platform specific power tools like the PAUSE ins, SSE/AVX/NEON etc. You can even tune your code layout through linker scripts! The power provided by the C/C++ tool chain is really unmatched by the JVM. Java is a great platform none the less, but I think it’s biggest advantage is that ordinary business logic (90% of your code?) can still depend on GC and safety features and make use of highly tuned and tested libraries written with Unsafe. This is a great trade-off between getting the last 5% of perf and being productive. A trade-off that makes sense for a lot of people but a trade-off none the less. Writing complicated application code in C/C++ is a nightmare after all.

    2. > Do not use garbadge collected languages

      Or, at least, “traditional” garbage collected languages. Because they are different – while Erlang too has a collector, it doesn’t creates bottlenecks because it doesn’t “stops the world” as Java while collecting garbage – instead it halts individual small “micro-threads” on a microsecond scale, so it’s not noticeable on the large.

      1. Rewrite that to “traditional” garbage collection [i]algorithms[/i]. At LMAX we use Azul Zing, and just by using a different JVM with a different approach to garbage collection, we’ve seen huge improvements in performance, because both major and minor GCs are orders of magnitude cheaper.

        There are other costs which offset that, of course: you use a hell of a lot more heap, and Zing isn’t cheap.

  3. Pingback: Quora
  4. Reviving an old thread, but (amazingly) this has to be pointed out:

    1) Higher level languages (eg Java) don’t elicit functionality from the hardware that isn’t available to lower level languages (eg C); to state that so-and-so is “completely impossible” in C while readily doable in Java is complete rubbish without acknowledging that Java runs on virtual hardware where the JVM has to synthesize functionality required by Java but not provided by the physical hardware. If a JVM (eg written in C) can synthesize functionality X, then so can a C programmer.

    2) “Lock free” isn’t what people think it is, except almost by coincidence in certain circumstances, such as single core x86; multicore x86 cannot run lock free without memory barriers, which have complexities and cost similar to regular locking. As per 1 above, if lock free works in a given environment, it is because it is supported by hardware, or emulated/synthesised by software in a virtual environment.

    — Julius

    1. Great Points Julius. The point I was trying (maybe unsuccessfully so) is that it’s prohibitively difficult to apply many of these patterns in C since they rely on GC. It goes beyond simply using memory barriers. You have to consider freeing memory as well which gets particularly difficult when you are dealing with lock free and wait free algorithms. This is where GC adds a huge win. That said, I hear Rust has some very interesting ideas around memory ownership that might begin to address some of these issues.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s