Sustainable Research Software through Open-Source Communities

23 June 2021 7:20am

A recent Eos article entitled “It’s Time to Shift Emphasis Away from Code Sharing” (Green and Thirumalai, 2019) describes a reality of research software in the earth sciences. It shares an anecdote that highlights a common challenge for new graduate students – the seemingly simple task of implementing common analysis procedures needed for their research. Typically, researchers either share code amongst themselves, or worse, have to reimplement commonly used routines. This can lead researchers to duplicating efforts and complicating current reproducibility and replicability issues. The authors propose that developing toolkits of functions designed to work together in streamlined workflows addresses some of the fundamental issues facing software sustainability in the earth sciences by enabling community development. We believe the earthquake science community needs more well verified and validated software toolkits that can be used and accepted by the research community. We’ve followed suit and taken this approach on a new software project at SCEC.

In line with this principle of developing well-validated scientific software toolkits, the Collaboratory for the Study of Earthquake Predictability (CSEP) has developed a Python toolkit (dubbed pyCSEP) for evaluating and working with earthquake forecasts. It was developed from the ground up to encourage researchers to contribute code directly to the toolkit—while their research is in progress. Contributed code is then immediately available for others to use, test, or further refine. It’s hard to argue the benefits of this approach: shared analysis routines, confidence that results can be replicated across research efforts, and citable toolkits. But, there are practical issues to address to shift our thinking from traditional research software development practices.

The end products of our research efforts are usually peer-reviewed publications, not GitHub repositories and software documentation. This places the burden on individual researchers to provide well-written and usable codes to reproduce published results. Sometimes this is the case, but oftentimes it is not. Researchers could be incentivized to make this additional effort. Once a toolkit exists, the effort required to implement new features decreases significantly, and everyone can take advantage of the collective development in progress. If the code is destined for publication with research results, why duplicate it needlessly across different groups?

Toolkit maintainers should be encouraged to lead community building, as well as software development, and to provide well-written documentation. Building sustainable research software requires a community that includes both research software engineers and scientists (Anzt et al., 2021). Research centers are perfectly positioned to support community open-source software development and to assist other researchers in developing the skills needed to contribute to and develop shared toolkits.

In the case of pyCSEP, our goal is to create a research community that uses and develops a software toolkit. To do this, we have hosted workshops and tutorials to introduce researchers to the toolkit and the dev-ops tools needed to contribute to open-source software. We plan to host an ongoing tutorial series (several times per year) to train interested researchers in best-practices for working with software tools such as git and GitHub and developing Python toolkits. We’ve found that the payoff is well worth the time spent. In the short time that pyCSEP has been in development, five different research groups have contributed to the toolkit, and assisted with three publications (Bayona et al., 2020; Savran et al., 2020; Bayona et al., in prep). We encourage anyone interested to email software@scec.org to inquire about the pyCSEP software training schedule.

Sustainable research software requires a community effort that bridges software engineers and scientists. As Green and Thirumalai suggested in their Eos article, we have seen significant benefits for the CSEP community from operating in this open-source, community development mode. These benefits include community-vetted access to forecast evaluation tests and performance metrics, new visualisation plots added, receiver operating characteristic (ROC) and new tests using binary likelihood functions to name a few. If you take anything away from this article, please think about how your research codes could be organized into an open-source toolkit.

About the Authors

William Savran is a SCEC software engineer at the University of Southern California. He is the lead developer of the Collaboratory for the Study of Earthquake Predictability, and works with researchers around the world to develop and implement methods for unbiased evaluations of earthquake forecasting models in California and beyond.

Max Werner is Associate Professor (Reader) of Geophysics and Natural Hazards at the University of Bristol (UK), where he leads a diverse research group investigating earthquake processes, interactions, predictability and hazards. He leads the SCEC node of the global Collaboratory for the Study of Earthquake Predictability (CSEP), which provides tools, concepts and a software platform for testing earthquake forecasts and predictions.

Acknowledgements

This research was supported by the Southern California Earthquake Center, which is funded by NSF Cooperative Agreement EAR-1600087 and USGS Cooperative Agreement G17AC00047. The research also received funding from the European Union’s Horizon 2020 research and innovation program under Grant Agreement Number 821115, Real-Time Earthquake Risk Reduction for a Resilient Europe (RISE). We thank our international CSEP colleagues for collaborating in this research and software development.

References

Anzt, H., F. Bach, S. Druskat, F. Löffler, A. Loewe, B. Y. Renard, G. Seemann, A. Struck, E. Achhammer, P. Aggarwal, F. Appel, M. Bader, L. Brusch, C. Busse, G. Chourdakis, P. W. Dabrowski, P. Ebert, B. Flemisch, S. Friedl, B. Fritzsch, M. D. Funk, V. Gast, F. Goth, J.-N. Grad, J. Hegewald, S. Hermann, F. Hohmann, S. Janosch, D. Kutra, J. Linxweiler, T. Muth, W. Peters-Kottig, F. Rack, F. H. C. Raters, S. Rave, G. Reina, M. Reißig, T. Ropinski, J. Schaarschmidt, H. Seibold, J. P. Thiele, B. Uekermann, S. Unger, and R. Weeber (2021). An environment for sustainable research software in Germany and beyond: current state, open challenges, and call for action, F1000Research 9 295.
Bayona, J. A., W. Savran, A. Strader, S. Hainzl, F. Cotton, and D. Schorlemmer (2020). Two global ensemble seismicity models obtained from the combination of interseismic strain measurements and earthquake-catalogue information, Geophysical Journal International 224 1945-1955.
Bayona, J. A., W. Savran, D. A. Rhoades, and M. J. Werner (In prep). Prospective evaluation of multiplicative hybrid earthquake forecasting models in California.
Greene, C. A., and K. Thirumalai (2019), It’s time to shift emphasis away from code sharing, Eos, 100, https://doi.org/10.1029/2019EO116357. Published on 20 February 2019.
Savran, W. H., M. J. Werner, W. Marzocchi, D. A. Rhoades, D. D. Jackson, K. Milner, E. Field, and A. Michael (2020). Pseudoprospective Evaluation of UCERF3-ETAS Forecasts during the 2019 Ridgecrest Sequence, Bulletin of the Seismological Society of America 110 1799-1817.
Wessel, P., Smith, W. H., Scharroo, R., Luis, J., & Wobbe, F. (2013). Generic mapping tools: improved version released. Eos, Transactions American Geophysical Union, 94(45), 409-410.