Case study: The Importance of Open Source for Research in High-Performance Computing


This post is part of the OSSG series “the role of open source in the UK”, where we collect and publish statements from companies and individuals in the UK regarding their experience with Open Source Software. We welcome any submission to this series. If you are interested, please send an email to Dr Julian Kunkel.

by Dr Julian Kunkel, Lecturer, Department of Computer Science, University of Reading

Open source is vital in providing teaching, in conducting research in computer science, and in enabling reproducible large-scale experiments in computational science that support the society. In this post, Julian describes his experience with Open Source in his career.

The Relevance of Open Source: A Personal Statement

Open-source software is for me the key enabler for productive work and for providing training and research environments for various reasons. Firstly, in my own work environment, I rely upon Ubuntu as the operating system to give me the freedom to conduct research and programming experiments easily on my laptop that can later be scaled up to data-center wide experiments. 

Having full control over the system and easy means to repair a broken system, I haven’t lost any data in my 20-year usage of Linux albeit my work often requires to perform rigorous stress-testing of hardware components. I have high confidence and trust in the software stack due to the openness of the software stack. There are no hidden data transmission of private data and proper security schemes in place that protect my data and research. Another benefit I acknowledge is that key APIs are robust and software I rely on that has started to be developed 20 years ago can still be used.

In my capacity as a lecturer for programming and parallel computing, I particularly value the benefits of open source software being able to share it with students freely, to show them how the software works internally, and finally, to extend the software as part of coursework. The potential of inspecting a bigger software product can be highly motivational and can outline that a student will be able to generate or modify such a product in the future.

For my research of optimizing storage for High-Performance Computing environments, open-source software provides the basis to observe, understand, and optimize existing systems and to quickly implement new algorithms into existing workflows. By utilizing the existing software, students and professionals in computational science can stand on the shoulders of giants, bootstrap their own project, and contribute quickly to community efforts. When I was an undergraduate student, I downloaded the software for the parallel file system PVFS2 back at that time and performed an analysis of its behavior and contributed optimizations back to the developers. This was very motivating for me as a student to discuss the changes with an international development team located in the US and actually benefited that effort.

Looking at the important field of computational science, trust, and reproducibility of scientific results are of crucial importance. Therefore, open-source software has been demonstrating to me to be a critical factor for the success of research and development processes. While I engage in my professional research also with companies and explore their closed-source software, I often face challenges that delay the research significantly and therefore, consider closed-source software to be high risk in my research activities. 

Still, I understand the need for particularly small companies to keep certain intellectual property closed for a period of time. My experience shows though that a company generally benefits by opening the code and using alternative financial models like a support model.

Risks of Closed-Source Software

Regardless of my personal choice to stick to open source software, I am sometimes forced to use other software products as certain file formats are expected or as a given lecture hall comes with specific software. I find it frustrating to be waiting for the computer to forcefully install an update if I have to give a presentation in 5 minutes or to find out that the software changed the expected behavior with a patch, sometimes even that an important feature has been removed or doesn’t work anymore. 

In my experience, the consolidation that is done for critical systems in companies across the world into cloud services and toward the usage of specific closed-source software bears risk to every professional in the long run: to lose control and understanding of technology. For critical systems, I believe that service-level agreements are nice in theory but in practice, only open systems maintained by professionals of the company can meet the requirements for critical systems. An example: If you are giving a sales presentation for a $10 Million product or just presenting your PhD work, you want your computer to work properly and for that to be something over which you have no control. Do you want to risk your sales opportunity or degree since the cloud provider is down for whatever reason?

The loss of knowledge is apparent in computer science students nowadays: when their computer doesn’t work properly, students often are unable to debug and understand the reasons, and then repair it. 

I believe that the education system and the IT industry must value the fundamental skill sets of practitioners more and preserve the roles of professionals such as administrators for critical systems like email

Also, critical systems such as control systems for airplanes or nuclear power plants should these be closed source? Personally, I have a strong opinion here. My dream would be that any public institution or any critical system must use open-source software to ensure that these systems can be maintained and data products are safe and accessible in the future. 

Computational Science: A Primer

Let me take a moment to give you a quick primer to my research domain. In Computational Science, the modeling and simulation of the laws of nature within computer systems offer a well-defined environment for experimental investigation. Models for climate, protein folding, or nanomaterials, for example, can be simulated and manipulated at will without being restricted by the laws of nature, and scientists no longer have to conduct time-consuming and error-prone experiments in the real world. 

This method leads to new observations and understandings of phenomena that would otherwise be too fast or too slow to comprehend in vitro. The processing of observational data like sensor networks, satellites, and other data-driven workflows is yet another challenge as it usually dominated by the input/output of data. 

Complex climate and weather simulations can have 100.000 to million lines of codes and must be maintained and developed further for a decade at least. Therefore, scientific software is mostly open-source, particularly for large scale simulations and bleeding-edge research in a scientific domain. Note that there are also independent software vendors that generate products that ease experimentation for industries — this is a key enabler for such industries to conduct well-defined experiments.

With the improvement of computing performance, better experiments can be designed and conducted. As such, a thorough understanding of hardware and software design is vital to providing the necessary computing power for scientists. This understanding has developed into its own branch within the computer science field: High-Performance Computing (HPC). High-Performance Computing is the discipline in which supercomputers are designed, integrated, programmed, and maintained. Supercomputers are the tools used in the natural sciences to analyze scientific questions in silico and reshape the practice and concept of science, and indeed the philosophy of science. Basically, all supercomputers run open-source software stacks as required by the rapid progress and development cycles.

High-Performance storage provides the hardware and software technology to query and store the largest data volumes at a high velocity of input/output while ensuring data consistency. On Exascale systems – i.e, systems processing 10^18 floating-point operations per second, workflows will harness 100k of processors with millions of threads while producing or reading Petabytes to Exabytes of data.

Conclusions

Open-source software is crucial for my research and teaching activities. HPC systems provide the compute and storage infrastructure to address questions of relevance to fundamental research, industry, and the most complex social challenges. 

I believe that the education system and the IT industry must value the fundamental skillsets of practitioners more and preserve the roles of professionals such as administrators. By having access to open source, we support the development of these practitioners.

Dr Julian Kunkel has been a Linux enthusiast since his high-school time in 1998. He had been working at the German Climate Computing Center — a large-scale datacenter — for 10 years. Julian is currently a Lecturer in the Department of Computer Science at the University of Reading. His research revolves around high-performance storage (https://hps.vi4io.org/) and he believes that intelligent storage systems need to emerge to meet the needs of future generations. Since his arrival in the UK in 2018, he serves as BCS Open Source Specialist Group Membership Secretary and lead for Advocacy.