Future of Computing Giants
Published:
This article is based on a talk conducted by Professor Jack Dongarra, who is a world renowned expert in numerical computing and tools for parallel computing and also the Turing Award winner in the year 2021.
High-performance computing systems are used in different fields from providing cloud services to conducting complex simulations to prove complex mathematical concepts at present. The problems and applications that they are solving have drastic differences from each other. As an example, it is used by the chemical industry to produce new molecular structures while it is being used by geologists to model seismic models and perform analysis and predictions. These applications require performing large and complex mathematical computations with a considerable amount of floating point precision. On the other hand, some of these mathematical models are required to operate on larger data sets, which should be accommodated by high-performance computing systems.
AI Generated Image : credits to deepAI
Let’s elaborate on some of the practical problems related to handling big data and associated constraints in modern high-performance computing systems. To operate these machines continuously, requires a lot of effort and capital. The efficiency of these machines typically gets outdated within three years of life span. Therefore there should be a lot of planning related to physical hardware as well as human resources when starting a project related to high-performance computing.
Several key factors that affect the evolution of high-performance computing systems can be listed as follows,
- Evaluation and ranking of Top 500 supercomputers in the world with benchmarking.
- Communication in high-performance computing systems is a constraint.
- Theoretical computation capabilities versus infield computational contribution of these systems have a considerable gap.
- Future supercomputers should be developed from the ground-up specific to the applications (application-specific supercomputers).
Professor Dongarra was a founding member of the team LINPACK, which was a supercomputer benchmarking suite that was started back in the 1960s. The intention of the LINPACK was to benchmark supercomputers and rank them considering their performance and speed. The LINPACK benchmark consists of matrix multiplication calculations and the size of the matrix is increased until the performance can reach an asymptotic point. Over the past 30 years, LINPACK collected data from various supercomputers equipped with different technologies. If we compare the performance of a supercomputer against the performance of a modern-day portable laptop, modern-day laptops are far more powerful. This happens with the effect of Moores low. It can be observed from the published results of LINPACK, that every year until 2008, the performance of supercomputers has increased by 10 times to the previous year. But this trend is not observable from 2008 and onwards. According to Professor Dongarra’s explanation, the reason for this is the financial crisis persisted in 2008. On the other hand, semiconductor technology is slowly reaching its limits. Although, reportedly most powerful supercomputer at present can perform billion-billion ($10^{18}$) operations per second. As an analogy, in order to perform this many operations it would take 4 years if everybody on Earth does one operation per second. That is a huge improvement compared to the beginning of super computing.
Let’s look at the communication perspective of these giants. Data transfers in supercomputers are very expensive. Most of the scientific applications that exist today operate on larger data sets. These applications produce tons of metadata, that should be transferred through different accelerators of the supercomputer. According to the analysis conducted on the benchmark results, it has been calculated that the ratio of flops per word should be at least one. Professor Dongarra’s view is that the communications designs of supercomputer architectures should be done in a more application-specific manner. This would reduce the transfer of data through the systems considerably. Furthermore, Chiplet technology, which can introduce more functionalities at the silicon level should be integrated into these systems. Some applications can have compromised precision in computations. Machine learning algorithms and AI models can work effectively even without high-precision floating point arithmetics (64/32-bit). Therefore communication overhead of such applications can be limited by operating at lower precision floating point arithmetics (8/16-bits).
Another issue caused by not taking application scenarios into consideration during the supercomputer architecture modeling process is that most of these supercomputers are operating way below their theoretical peak performance capabilities. The analogy for underutilized supercomputers is like, someone with a racecar that has a consistent top speed of 200mph, keeping it garaged and occasionally driving it below 20mph. This wastes a lot of computation power and energy considering the lower lifespan of 3 years for a supercomputer.
Considering all these factors, it is apparent that we should rethink supercomputers. Specifically, it highlights the requirement of redefining supercomputer hardware designs by considering the application scenarios. Specific algorithms that involve a lot of data transfers with traditional computer architectures should get a lot of attention during the requirement specification stage of future supercomputer development.
You can find the original talk from here.