Scaling

Wednesday 13:20 - 14:05 PDT
Not Attending Concurrency Kit: Towards accessible non-blocking technology for C
Despite more than 20 years of active research and development, non-blocking technologies remain inaccessible to many students, engineers and open-source projects. This is especially true in the context of an unmanaged language such as C despite its popularity in highly complex concurrent systems. Even in light of attractive performance properties, small to medium-sized corporations are extremely hesitant in adopting patent-free technology due to the technology lock-down associated with the various interfaces of existing concurrency libraries. To top it off, when introducing engineers to this area, many are overwhelmed by the literature and the sparsity of performance data. This topic will walk the audience through the story of the struggles Samy and his peers have faced in the last couple of years in developing sufficient working knowledge to (efficiently) leverage existing non-blocking data structures as well as design, implement and verify new algorithms for use by mission-critical systems. It will highlight the holes faced in existing open-source projects tackling the concurrency problem for the C programming language and the literature associated with much of existing technology. The culmination of frustrations lead to the development of Concurrency Kit, a library designed to aid in the design and implementation of high performance concurrent systems. It is designed to minimize dependencies on operating system-specific interfaces and most of the interface relies only on a strict subset of the standard library and more popular compiler extensions. Topic Lead: Samy Bahra <email address hidden> Samy is an engineer focused on developing a leading, real-time low latency online advertising platform. Before moving to New York, Samy played a crucial role on the engineering team behind the leading high-performance messaging platform. Prior to that, he was an active member of a high-performance computing laboratory and was primarily involved with Unified Parallel C and the performance modeling and analysis of shared-memory multi-processor systems. He has been involved with several open-source projects.

Participants:
attending mathieu-desnoyers (Mathieu Desnoyers)
attending paulmck (Paul McKenney)

Microconfs:
  • Scaling
Nautilus 5 Go to Blueprint
Friday 10:05 - 10:50 PDT
Not Attending ASMP: Improving performance through dedication of OS tasks to specific processors
Processors increase performance by adding cores instead of clock speed these days and therefore algorithms in general need to be able to work in a distributed way. In the kernel we have tried to go to more fine grained locking in order to increase performance. However, with that approach locking overhead grows if highly concurrent processing occurs in the Kernel. Synchronization becomes expensive. This session investigates how performance is affected if we do the opposite: Use coarse grained locking to perform large chunks of work on a single core instead which means that locking overhead is reduced and the processor caches are fully available for a significant piece of work. Topic Lead: Christoph Lameter <email address hidden> Christoph has been contributing to various core kernel subsystems over the years and created much of the NUMA infrastructure in the Linux Kernel when he worked as a Principal Engineer for Silicon Graphics on adapting Linux for use in Supercomputers. Scaling Linux is a focus of his work both in terms of performance for HPC (High Performance Computing) as well s for low latency in HFT (High Frequency Trading). Christoph maintains the slab allocators and the per cpu subsystem in the Linux Kernel and currently works as an architect for a leading HFT company.

Participants:
attending apm (Antti P Miettinen)
attending mathieu-desnoyers (Mathieu Desnoyers)
attending paulmck (Paul McKenney)

Microconfs:
  • Scaling
Nautilus 3 Go to Blueprint
Friday 11:55 - 12:40 PDT
Not Attending An RCU-protected scalable trie for user-mode use
In the past year, the RCU lock-free hash table hash been polished and made production-ready within the Userspace RCU project. It performs and scales really well for updates, key lookups and traversals in no particular key order, but does not fulfill ordered key traversal use-cases. This talk is presenting ongoing work on an ordered data structure that supports RCU reads: a cache-efficient, compact, fast, and scalable trie, inspired from Judy Arrays. Topic Lead: Mathieu Desnoyers <email address hidden> Mathieu Desnoyers main contributions are in the area of tracing (monitoring/performance analysis/debugging) and scalability, both at the kernel and user-space levels. He is maintainer of the LTTng project and the Userspace RCU library. He works in close collaboration with the telecommunication industry, many Linux distributions, and with customers developing hardware scaling from small embedded devices to large-deployment servers. He is CEO and Senior Software Architect at EfficiOS.

Participants:
attending mathieu-desnoyers (Mathieu Desnoyers)
attending paulmck (Paul McKenney)

Microconfs:
  • Scaling
Nautilus 5 Go to Blueprint