Open source software for scientific and parallel computing

June 11, 2020 @ 6:30 pm – 8:30 pm
BCS Online Meeting
Julian Kunkel public data centers and in computational science, open-source software plays a key role to create a productive environment for researchers.

Computational science is the modeling and simulation of the laws of nature within computer systems that offer a well-defined environment for experimental investigation. Models for climate, protein folding, or nanomaterials, for example, can be simulated and manipulated at will without being restricted by the laws of nature, and scientists no longer have to conduct time-consuming and error-prone experiments in the real world. This method leads to new observations and understandings of phenomena that would otherwise be too fast or too slow to comprehend in vitro. The processing of observational data like sensor networks, satellites, and other data-driven workflows is yet another challenge as it usually dominated by the input/output of data.

Complex climate and weather simulations can have 100.000 to million lines of codes and must be maintained and developed further for a decade at least. Therefore, scientific software is mostly open-source, particularly for large scale simulations and bleeding-edge research in a scientific domain.

This month we’ll be hosting our evening of talks online. You can join remotely from the comfort of your own home to listen to the speakers and chat in realtime with the other attendees.

18:15 – Join online meeting to chat with other participants

18:30 – Short introduction (5 min) of the evening by Julian Kunkel — Slides

18:35 – Presentations

20:35 – Close

We were live streaming via GoToWebinar and recording the talks for later posting on YouTube.

The videos are now published on YouTube and slides are linked below.


Open Source Software in High-Performance Computing

Shane Canon, Lawrence Berkeley National Laboratory/NERSC, USA

High-Performance Scientific Computing is heavily dependent on a rich ecosystem of open-source software.  The HPC community is both a consumer and a contributor to the broader open-source community.  In this presentation, we will review the evolution of open-source software use in HPC, give examples of how the HPC community has contributed to its growth, and the future direction of open-source in HPC.

Some lessons learned from creating and using the Ceph open-source storage system

Carlos Maltzahn, UC Santa Cruz, USA
In 2005 at UC Santa Cruz, Sage Weil created a fully functional parallel file system we called Ceph. In 2006 we open-sourced Ceph under the LGPLv2 license. After his graduation in 2007, Sage was able to continue working on Ceph, built a community around it, got the Ceph client accepted into the Linux kernel in 2010, achieved production readiness in 2012, founded the startup Inktank in 2012, and sold it to Red Hat in 2014 for $175m. Sage then donated $2m to UC Santa Cruz to create a structure that would help other students to have a similar career at the university as he had. In 2015 I kicked off the Center for Research in Open Source Software (CROSS). Meanwhile, starting in 2009 I built a research program around programmable storage systems and decided to use Ceph as the primary research prototyping platform. As Ceph became more widely used, that choice gained a positive impact on the job prospects of our students, on the generality of our research results, on the funding prospects of our research, and on the pace of delivering research.
In this talk, I will report on some of the lessons learned, including how our use of Ceph as a research platform evolved and why an open-source storage system like Ceph can fundamentally change the dynamics of research and entire industries. I will end with a quick overview of the SkyhookDM project (
Bio: Dr. Carlos Maltzahn is the founder and director of the UC Santa Cruz Center for Research in Open Source Software (CROSS). Dr. Maltzahn also co-founded the Systems Research Lab, known for its cutting-edge work on programmable storage systems, big data storage & processing, scalable data management, distributed system performance management, and practical reproducible evaluation of computer systems. Carlos joined UC Santa Cruz in 2004, after five years at Netapp working on network-intermediaries and storage systems. In 2005 he co-founded and became a key mentor on Sage Weil’s Ceph project. In 2008 Carlos became a member of the computer science faculty at UC Santa Cruz and has graduated nine Ph.D. students since. Carlos graduated with a M.S. and Ph.D. in Computer Science from University of Colorado at Boulder.

High Performance Computing in a world of Data Science

Martin Callaghan, University of Leeds, UK

Slides as PDF

Universities and other research organisations have developed and used High Performance Computing (HPC) systems for a number of years to support problems solving across many computational domains including Computational Fluid Dynamics and Molecular Dynamics.  Their design features such as a batch processing system and a fast interconnect make them ideal to support these often highly parallel tools and applications.

In recent years though, with the increased interest in Data Science across a number of research fields, HPC has found itself in the position of having to support quite different tools and methodologies.

In this talk, I’ll discuss the design journey we have taken for our institutional HPC, some of the Open Source projects, tools and techniques we use with our research colleagues to support Data Science problems and some of our plans for the future.

Martin Callaghan Research Computing Manager and lead the Research Software Engineering team at the University of Leeds in the UK, where we provide High Performance Computing (HPC), Programming and Software Development consultancy across a diverse research community. My role involves Research Software Engineering, training, consultancy and outreach.

He also manages a comprehensive HPC and Research Computing training programme designed to be a ‘zero to hero’ structured introduction to HPC, Cloud and research software development.

Before joining the University of Leeds, he worked as an Engineer designing machine tool control systems, a teacher and ran my own training and consultancy business. Personal research interests are in text analytics, particularly using neural networks to summarise text at scale.


Note: Please aim to connect at the latest by 18:25 as the event will start at 18:30 prompt.