What is a supercomputer?
Although supercomputers are unique, custom-built machines, they fundamentally share the design of the computers you use at home—a processor (i.e., a central processing unit or CPU), small and fast memory (i.e., random-access memory or RAM), storage (i.e., hard disk drive/CD/DVD), and a network to communicate with other computers. A typical high-performance computing (HPC) system could be considered a personal computer on a much grander scale, with tens of thousands of processors, terabytes (i.e., trillions of bytes) of memory, and petabytes (i.e., quadrillions of bytes) of storage (see figure 1). High-performance computers can readily fill a large room, if not a whole building, have customized cooling infrastructure, use enough electricity to power a small town, and take an act of Congress to purchase. Such an investment is not made without a great deal of study and thought.
FIGURE 1. A high-performance computer is like a personal computer on a much grander scale—it has tens of thousands of processors, terabytes of memory, and petabytes of storage.
Simulating a supercomputer
Although HPC technology is not unique to NSA, the specialized problems faced by the Agency can necessitate unique customizations. Because NSA's applications and software are often classified, they cannot be shared with the architects and engineers developing supercomputers. At the same time, an investment of this magnitude requires confidence that a proposed system will offer the performance sought.
Currently, benchmarks, simplified unclassified software that exercises important attributes of a computer system, are developed and used to evaluate the performance of potential computing system hardware. However, these benchmarks may not paint the complete picture. To better understand this problem, there is substantial value to the construction of a model. Architects, engineers, and scientists have a long history of building models to study complex objects, such as buildings, bridges, and aircrafts.
A new team—the Modeling, Simulation, and Emulation (MSE) team—within the Laboratory of Physical Sciences' Advanced Computing Systems Research Program  has been assembled to address this gap between classified software, which cannot be distributed to vendors, and the vendors' hardware systems, which have not been purchased by NSA. As an additional twist, the proposed hardware may be built from prototype components such as the hybrid memory cube (HMC; see figure 2), a three dimensional stacked memory device designed by a consortium of industry leaders and researchers  The core objectives of the MSE team include exploration of system architectures, analysis of emerging technologies, and analysis of optimization techniques.
FIGURE 2. NSA collaborated with the University of Maryland and Micron to develop a simulation tool for Micron's Hybrid Memory Cube that is helping to advance supercomputing applications. Micron now is sampling the three-dimensional package that combines logic and memory functions onto a single chip.
Owners of HPC systems desire a computer that is infinitely fast, has infinite memory, takes up no space, and requires no energy. None of these attributes are truly realizable, and when considering a practical HPC system, trade-offs must be considered. When analyzing a prospective HPC system, four primary metrics are customarily considered: financial cost, system resilience, time-to-solution, and energy efficiency. These metrics are interdependent. For example, increasing the speed of an HPC system will increase the amount of power it consumes and ultimately increase the cost necessary to operate it. In order to measure these metrics, one could build the system and test it. However, this would be extremely expensive and difficult to optimize. A model simulating the computer can be developed in far less time, and design parameters can be adjusted in software to achieve the desired balance of power, performance, reliability, and cost.
Any simulation or model of a computer should address the metrics listed above. If any are not addressed, then the model could yield incomplete results because optimizing for fewer than all relevant variables potentially leads to non-global extrema. Many scalar benchmarks, for example the TOP500 and the Graph500, focus exclusively on one characteristic, like time-to-solution, to the neglect of the other parameters of interest. The MSE team is collaborating to evangelize a more balanced approach to multiple facets of HPC system characterization, assuring an optimal solution to the Agency's needs.
The use of benchmarking software allows for a more comprehensive evaluation of a proposed computer architecture's performance. This enables HPC system architects to better target their designs to serve NSA's needs. Simply stated, NSA has to work within budgetary and power (i.e., electricity) constraints, and it is vital to maximize the return on investment of money and time.
While this description is somewhat generic to all HPC system purchasers, NSA is willing to build special purpose hardware devices and to employ specially developed programming languages if a cost-benefit analysis demonstrates noteworthy benefits. Unlike developers in the scientific community whose expertise usually does not span science, computer programming, and computer architecture, developers at NSA access and understand the full software and hardware stack—algorithm, source code, processors, memory, network topology, and system architecture. Compute efficiency is often lost in the process of separating these abstraction layers; as a result, NSA makes an effort to comprehend the full solution.
This approach to mission work is reflected in the work of the MSE team. A simulation or model should take a holistic approach, targeting the network, CPU, memory hierarchy, and accelerators (e.g., a graphics processing unit or field-programmable gate array). Multiple levels of detail for a simulation are required to accomplish this. A simulation may be compute-cycle or functionally accurate; it may range from an abstract model to a hardware simulation including device physics.
To accomplish the objective of enabling HPC system simulation within NSA, the MSE group carried out a survey of existing simulators from academia, industry, and national labs. Although many simulators exist for HPC systems, few attempt to model a complete architecture. There have been previous efforts like University of California, Los Angeles's POEMS and Hewlett Packard's COTSon, but these projects are no longer actively supported. Two simulation frameworks, the Structural Simulation Toolkit (SST; see figure 3)  from Sandia National Laboratories and Manifold from Georgia Institute of Technology represent today's most promising candidates. Additionally, NSA researchers have been crafting simulation tools which are also being considered for application in the HPC problem space.
FIGURE 3. Sandia National Laboratories' Structural Simulation Toolkit is one of today's most promising high-performance computing system simulators.
Both SST and Manifold use component simulators to construct a larger-scale system. For example, SST can use the gem5  CPU simulator along with the University of Maryland's DRAMSim2 to capture performance characterization of processor to memory latency and bandwidth. Since simulating a full-scale HPC system would require an even larger supercomputer to run in a reasonable time, SST breaks the simulation into two components: SST/micro, a node-level simulation (e.g., CPU, memory), and SST/macro, which handles network communication between nodes. With the emerging HMC memory technology, the MSE team is making plans to employ the SST family of tools to extensively model an HPC system and gain perspective on its potential capabilities. This will place NSA's HPC programs on the leading edge in understanding the application potential for this new technology.
At this time, SST/micro is capable of simulating the execution of programs in a single processor core and of monitoring the application's use of the simulated CPU and memory. By 2014, the development team at Sandia plans on parallelizing the simulation, enabling multiple processor cores to be simultaneously simulated. This would allow parallel applications (i.e., software designed to simultaneously run on multiple processor cores) to be run in a realistic compute node configuration (i.e., multiple cores concurrently accessing the same memory hierarchy) while potentially reducing the time needed to complete a simulation.
SST/macro, combined with NSA's benchmarking software, has already been used to demonstrate how different network topologies, used to connect an HPC system's processing cores, can affect the time-to-solution metric. SST/macro allowed researchers to specify data-routing algorithms used in the network configuration and to study how a modified network topology serves to optimize the performance of a system. The clear benefit of this research is in the ability to enable application and network codesign to create an optimal and cost-effective architecture.
The SST and its counterpart, Manifold, are being actively developed and are useful for research, but they are not yet ready for use as decision-making tools by NSA. The MSE team is actively collaborating with Sandia and the Georgia Institute of Technology, providing feedback, guidance, and assistance to the simulation framework developers. Multiple other national labs, academic researchers, and vendors are also participating in the effort driven by the MSE team. Other potential applications for simulation techniques could be codesign of software before the actual hardware is available, software performance analysis/optimization, and debugging of software.
About the authors
Noel Wheeler is the lead of the Modeling, Simulation, and Emulation (MSE) team in the Advanced Computing Systems group at NSA's Laboratory of Physical Sciences. Ben Payne, PhD, is a physicist working as a postdoctoral researcher for the MSE team.
 Klomparens W. "50 Years of research at the Laboratory for Physical Sciences." The Next Wave. 2007;16(1):4-5.
 Binkert N, Beckman G, Black G, Reinhardt SK, Saidi A, Basu A, Hestness J, Hower DR, Krishna T, Sardashti S, et. al. "The gem5 simulator." ACM SIGARCH Computer Architecture News. 2011;39(2):1-7. doi: 10.1145/2024716.2024718.
View PDF version of this article (655 KB)