OSG User School 2018

The OSG User School 2018 was held at the University of Wisconsin–Madison on July 9–13. This year’s event set a new record with 65 participants in total, up from 56 participants in 2017. And due to the large and record-setting number of applicants, 140, it was also one of the most selective offerings of the School.

Participants included mostly graduate students, post-doctoral researchers, a couple of advanced undergraduates, some faculty, and some research staff from research institutions in the United States (and one each from Brazil, South Korea, and Uganda). The range of scholarly domains was very diverse, including physics, biology, chemistry, medicine, several branches of engineering, statistics, earth sciences, animal sciences, plant sciences, neuroscience, and economics. Participants were selected by demonstrating need for large-scale computing and by being in a position to transform their scholarly work through computation. The instructors this year were Brian Lin, and Derek Weitzel from the OSG; Bala Desinghu, from Rutgers University (and formerly OSG staff); plus Christina Koch and Lauren Michael from the UW–Madison’s Center for High Throughput Computing.

This year’s curriculum continued the tradition of focusing on hands-on practice with a wide variety of user tools, providing a solid grounding for advanced and theoretical topics later in the School as well as further learning afterward. Much of the curriculum was carried over from 2017, with minor updates to stay current. This year, though, there was more discussion about accessing different kinds of computing resources, such as graphics-processing units (GPUs), and about expanding resource pools using commercial clouds, such as Amazon EC2. The larger changes reflected both changes in the technologies involved plus improved pedagogical approaches based on experiences with past OSG User Schools and other science end-user engagements.

All of the training materials from the School remain available online after the event, to be available to others around the world and to serve as reference material. Participants also received several clear options for getting ongoing help with their large-scale computing needs. Plus, every participant left the School with at least two ways to run jobs — an account on a UW–Madison HTCondor submit node and an OSG Connect account — so that there are as few barriers to computing and storage resources as possible.

Participants of the OSG User School 2018
Participants of the OSG User School 2018.

From formal training evaluations to informal comments and emails, the School was clearly a success. Participants were happy with the program, with how much they learned, and with the new paths that are now open to them. Further, many participants completed a final written assignment after the event, describing a research computing challenge and their plans for applying material from the School to handle the challenge using distributed high throughput computing. From these assignments, it is clear that most participants have concrete, realistic plans to advance their research through computing, and many have already begun doing so.

As it takes time for the full effect of the School training to be realized — for research and computing plans to be made, for planned work to be performed, and for results to be analyzed and written — we list here the known publications from 2017 School participants using OSG:

Patrick Forscher (University of Arkansas) and colleagues investigated whether PI names on NIH R01 grant proposals could induce race or gender bias, the statistical sensitivity analysis for which used about 20,000 hours of computing on OSG. The first resulting publication is:

  • Forscher, P. S., Cox, W. T. L., Brauer, M., & Devine, P. G. (in press). An experiment manipulating Principal Investigator names finds little to no race or gender bias in the initial reviews of NIH R01 grant proposals. Nature Human Behaviour. https://doi.org/10.31234/osf.io/r2xvb

Ariella Gladstein (University of North Carolina at Chapel Hill) used whole-chromosome simulations to infer the demographic history of the Ashkenazi Jews with Approximate Bayesian Computation and, as part of that work, developed a tool (SimPrily) to perform such simulations and calculate population genetic summary statistics. This work was enabled by using approximately 7 million hours of computing on OSG, XSEDE, University of Arizona, and University of Wisconsin resources. The first two resulting publications are:

  • Gladstein, A. L., & Hammer, M. F. (2018). Substructured population growth in the Ashkenazi Jews inferred with Approximate Bayesian Computation. Manuscript submitted for publication.

  • Gladstein, A. L., Quinto-Cortés, C. D., Pistorius, J. L., Christy, D., Gantner, L., & Joyce, B. L. (2018). SimPrily: A Python framework to simplify high-throughput genomic simulations. SoftwareX, 7, 335–340. https://doi.org/10.1016/j.softx.2018.09.003

Raymond Tsang (Pacific Northwest National Laboratory) generated toy models for evaluating the suitability of various Bayesian priors for radioassay measurement results in projecting sensitivity of low-background experiments. This work was enabled through the use of approximately 80,000 hours of computing on OSG. The first resulting publication is:

  • Tsang, R. H. M., Arnquist, I. J., Hoppe, E. W., Orrell, J. L., & Saldanha, R. (2018). Treatment of material radioassay measurements in projecting sensitivity for low-background experiments. Manuscript submitted for publication. arXiv:1808.05307v2

Sarah Turner (University of Wisconsin–Madison) processed hundreds of images and completed thousands of permutation tests for quantitative loci mapping of forty traits of carrot to help improve breeding and genetic studies. This work used about 900 hours of computing on OSG, showing that it does not necessarily take a large number of computing hours to make a meaningful difference in research outcomes. The first resulting publication is:

  • Turner, S. D., Ellison, S. L., Senalik, D. A., Simon, P. W., Spalding, E. P., & Miller, N. D. (2018). An automated, high-throughput image analysis pipeline enables genetic studies of shoot and root morphology in carrot (Daucus carota L.). Manuscript submitted for publication. https://doi.org/10.1101/384974

– Tim Cartwright