CERN Accelerating science

Platform

Jump to related content

In collaboration with Intel, the CERN openlab Platform Competence Centre (PCC) continues to address crucial fields such as thermal optimisation, multi-core scalability, application tuning and benchmarking. The strong emphasis on teaching and knowledge dissemination allows a broader audience to enjoy the fruit of the PCC’s work. The last twelve months constituted yet another year of such intensive studies and development – with tangible effects.

Optimisation and benchmarks

A solid frame of reference is indispensable when evaluating computing systems in terms of their performance, thermal and energy consumption characteristics. Such studies allow for significant optimizations and are indispensable when making choices in the computing domain at CERN. Thus, the PCC maintains and continuously enhances its own set of benchmarks, sourced from various domains of High Energy Physics (HEP), as well as from industry standard suites.

HEPSPEC06, a subset that represents the C++ benchmarks from the SPEC2006 suite, still serves as a fundamental measurement tool. With its resource consumption pattern representative of typical High Energy Physics jobs, it yields comparable baseline numbers for whole families of systems evaluated by the PCC. Other sub-programs, such as the multi-threaded Geant 4 particle simulation prototype with the “FullCMS” detector geometry, the parallel maximum likelihood fitter based on the ROOT analysis package or the ALICE/CBM particle detector trackfitter and trackfinder, are extracted at a significant effort from large real-life software frameworks with millions of lines of code. These handpicked portions are customized and developed by the PCC team in order to allow for seamless throughput, scalability and performance studies of hardware. Extracting a benchmark can go as far as cutting out just a few lines or a few loops with surgical precision. Such “snippets” can later be easily recompiled, distributed and used to demonstrate, for instance, a specific point regarding compiler performance or correctness.

Myriads of systems

All of PCC’s benchmarks are run on a wide variety of systems, which are often still in the alpha or beta stage when assessed. All this hardware needs to be deeply understood along with its wide spectrum of configuration options. This process is made easier by collaborating Intel engineers, such as Mark Myers, a server platform architect who visited CERN openlab in 2010. The feedback and numerous PCC-produced reports cover a vast number of indicators, ranging from performance and thermal properties to specific comments about the microarchitecture and its relation to the software that runs on it. Tested systems are split into several distinct families. The core of these evaluation activities is in the dual x86 processor segment, represented by the Intel Xeon® “EP” family. These have long been the main workhorse at the CERN Computing Centre, and are often chosen for their performance to cost ratios and predictable performance.

As the LHC is entering a land of potential discoveries, a correct interpretation of the data collected by the experiments is paramount. Since the degree of belief used to claim a discovery has its foundation in the probability and statistical concepts deployed in the data analysis, there is a common effort in the HEP community to develop software tools for data analysis. CERN openlab team members (Yngve Sneen Lindal and Alfio Lazzaro in the picture) collaborate in this activity, in particular for developing efficient code for new computing platforms.

The four-socket segment, with Intel Xeon “EX” processors, is also of importance. These systems allow primarily for advanced scalability studies, with up to 80 hardware threads per tested system, as in the case of the Intel Xeon E7-4870 “Westmere-EX” based machines. The behaviour and characteristics of these many-core systems are closely monitored by the PCC and the Database Competence Centre (DCC), as they are good candidates for database deployment and also indicative of possible future developments in the dual-socket space.

A third group of systems includes additional and non-routine evaluations. One example is the desktop segment, which often makes a specific microarchitecture available before the dual-socket parts are launched. In this context, a “Sandy Bridge” microarchitecture-based Intel Core™ i7-2600k processor was evaluated, and was found to have an average 10% clock for clock improvement with respect to its “Westmere”-based predecessors, and a 9-19% performance per Watt improvement. Some existing “Montecito” Itanium servers are still in use by the PCC Compiler Project, which has been active since 2003, evaluating compiler performance on both the Intel64 and the IA64 architecture. This year, the AMS (Alpha Magnetic Spectrometer) experiment searching for antimatter and dark matter on the International Space Station, requested - and was granted - access to these Itanium servers for some of their software.

The PCC receives early versions of several of Intel’s products. Installation, configuration and testing of the new hardware represent a significant part of the work of Julien Leduc (left). As an early adopter, the PCC team proactively sends feedback about the hardware: processors, firmware, platform architectures, power consumption, and peripherals such as SSDs. These evaluations aim at improving the efficiency of CERN’s Tier-0 and the WLCG.

Since the CERN Computer centre facilities are severely limited both in terms of electrical input and cooling output, the need for intelligent power optimisation is paramount. Hence, the continued quest for better performance to power ratios prompts the investigation of some non-standard avenues. For example, the idea of power efficient microservers, notably supported years ago by the PCC, is being tried in an ongoing pilot project with Intel hardware. The PCC’s interest in the low power Intel Atom architecture has been revived by new parts built in smaller processes. The PCC team is eagerly awaiting new offerings from Intel that will show further advantages with respect to their predecessors. 

Finally, the PCC’s keen interest in future technologies and the development of the x86 architecture has been fuelled even further by the emergence of the Intel® Many Integrated Core (MIC) architecture and development system. This 32-core part, with wide 512-bit vectors and 4-wide hardware threading, is not just an enticing development vehicle, but also an interesting indication of the direction in which the x86 microarchitecture is headed. The PCC maintains a high level of involvement in this project, being one of the first Intel collaborative teams worldwide to work with the part, and providing an on-stage testimonial during the official launch of the MIC technology. This activity, as well as all of the above, helps directly to prepare CERN software for future hardware, and give software developers at CERN a privileged peek into the future of computing.

The PCC, given its broad experience and close relationship to Intel, was amongst the first external teams world-wide to receive a “Knights Ferry” (Many Integrated Core) accelerator for testing and evaluation. Consequently, Sverre Jarp (right) was able to comment on early experience when the chip was announced by Kirk Skaugen, Vice President and General Manager Intel Data Center Group (left), at the International Supercomputing Conference in Hamburg, in May 2010.

Support software

Advanced performance and scalability studies would not be possible without the proper support software. The Intel Software tools have for a long time been amongst the market leaders in their segments, regularly bringing additional functionality, productivity and performance. The PCC’s long involvement with these tools has led to their general availability at CERN and to an increased, common interest in their usage. The state of the art XE 2011 suite is composed, amongst others, of the Intel 12.0 Compiler, a next-generation performance tuning tool called VTune Amplifier and a next-generation correctness and memory checking tool called Inspector. All of the above have been exhaustively evaluated by the PCC, to the point where a senior CERN developer at CERN stated about VTune Amplifier that “this is the first tool that actually works out of the box with our software”. This was good feedback to the PCC team who had been actively participating in the testing of the product.

2010 was a very satisfying year for Intel Compiler studies. Over 50 compiler incidents were filed; many of them concerning the new 12.0 compiler which was closely examined before release. The new release was made available centrally and is now being picked up by developers across CERN, wishing to use it with their software for optimized performance.

Performance monitoring is one of the essential areas of the PCC and one of the key working domains for Andrzej Nowak (in his teaching activities in the picture). CERN openlab co-developed and improved pfmon, a performance monitoring tool for Linux. These efforts evolved into a closer relationship with the developers of Intel’s performance tuning tools such as PTU and VTune Amplifier XE –including extensive pre-release evaluations. Instant access to Intel experts allows openlab to efficiently use and teach these new technologies as soon as they are available.

Workshops, teaching and dissemination

The combined activities of the PCC regularly produce a large amount of expertise which is broadly shared with the scientific community. In this spirit, the PCC continues to offer training sessions both for intermediate and advanced programmers. Regular courses cover parallelism (twice a year) and computer performance and architecture (also twice a year). As the slide sets evolve with the technology that they describe, some attendees even decide to revisit the training. A participant satisfaction ratio of over 90% allows a steady stream of subscribers to be maintained. Since their inception, over 200 students have attended these workshops. In addition, visits of Intel engineers working on the tools mentioned in the previous section, as well as on other technologies, allowed the PCC to organise special seminars and “expert to expert” training sessions, attended by CERN’s most senior programmers. Amongst such Intel visitors were David Levinthal –a world renowned x86 performance expert teaching low-level performance optimization-, and Levent Akyil –an experienced engineer driving the software tools effort demonstrating the new products. Jeff Arnold, a Senior Software Engineer in the Intel compiler team, continues teaching at regular workshops and gave two IT seminars on floating point and on the Intel C++ compiler.

Furthermore, the PCC participates actively in external conferences and symposiums. PCC representative were present and both ACAT 2010 and CHEP 2010, with four papers submitted and accepted for the latter alone. The Intel Developer Forum 2010 in San Francisco in September was a very effective mutual learning arena. Teaching sessions were held once more at international computing schools, such as the CERN School of Computing, which has been organised regularly for over 40 years.

The team just started to evaluate Intel® Data Center Manager (DCM) on its compatible servers in the CERN Computer Centre. Intel DCM is a web service aimed at easing and optimising data centre management by monitoring two crucial metrics: power consumption and inlet temperature. It aggregates those metrics and takes action on the nodes to limit their individual power consumption, which allows maximising the data centre node density while preserving nominal conditions for all the servers. 

Related content