Tutorial Resonates with People Eager to Know More about a Popular Research Tool
R, the free open-source software environment for statistical computing and graphics, brings out a crowd. In fact, so much so that a recent online and in-person tutorial on the subject broke training participation records for the event's organizers—the National Institute for Computational Sciences (NICS), the Extreme Science and Engineering Discovery Environment (XSEDE), and the National Institute for Mathematical and Biological Synthesis (NIMBioS).
Approximately 800 people worldwide signed up for the four-hour, online and in-person tutorial on “Using R for HPC [High-performance Computing].” More than 420 logged in to watch online, and thirty of the participants were on site at the venue for the event, the NIMBioS presentation room in the Claxton Education Building at the University of Tennessee, Knoxville.
“I think it [the level of participation] shows that people are really excited about R and are very, very interested in performance optimization,” says NICS Research Associate Drew Schmidt, presenter of the tutorial.
"R has a reputation for being slow and inappropriate for big data, and perhaps some of this is well deserved," he says. "However, there are many viable strategies for improving R's performance." The tutorial covered a broad set of content to help researchers scale up their R code, including debugging, profiling, and parallel programming. A live Twitter feed (#learnR) captured the spontaneous comments of attendees during the event.
The tutorial is available for viewing on the NIMBioS YouTube channel, which can be accessed here.
The Presenter's R Experience
Drew Schmidt [Image credit: Scott Gibson]
R is unique among programming languages, in that it was created for the sole purpose of analyzing data, and currently 6,508 R packages are available on the major R package repository (CRAN).
A limitation of R, however, is that it was designed for the desktop rather than advanced systems capable of taking on big-data challenges. So, Schmidt and colleagues at NICS and Oak Ridge National Laboratory (ORNL) have been focusing on scaling R to supercomputers.
“The best way to become an expert in R is to solve problems that R is really bad at solving, and that’s what I did for two years straight,” says Schmidt, who holds a master’s degree in mathematics from UT Knoxville. “I very quickly understood how the thing worked and how it didn’t work.”
But Schmidt’s experience with R goes well beyond studying hard and finding useful ways to apply R to the most powerful computer systems. He really dug into R to figure out how to pick problems apart. For example, he is a key programmer for the pbdR project, which enables R to use large supercomputing resources, such as those at the Joint Institute for Computational Sciences (JICS).
“He looked at the way the linear model is solved in R and found a way to break that down and spread it across the many nodes of a supercomputer,” says colleague Bob Muenchen, manager of the research support group in the Office of Information Technology at UT. “And he did that in a way that people can load his package and not change anything else about their programming and get a big speed boost. He made the solving of some problems tens to thousands of times faster by loading pbdR, without the users really even having to change their code. Plenty of people tried to do that kind of thing previously, and I don’t think anyone was ever able to pull it off with anywhere near as much success as Drew was able to.”
Even before the pbdR accomplishment, Muenchen had already become familiar with Schmidt’s strong aptitude for problem solving and programming.
“The first time he saw R was when he took one of my workshops at UT. Later he came in with some really interesting questions, and I hired him as a graduate assistant,” says Muenchen, who would soon come to know Schmidt as a person with a lot of ideas and a knack for making programs run faster.
Today, Muenchen operates and develops the content for the website r4stats.com, which helps people learn to use R, and statistically monitors the popularity of data analysis software. He also teaches non-credit workshops for UT, mostly in R, and during vacation days is an R instructor in partnership with Revolution Analytics, R Studio.com, Xerox Learning Services, and DataCamp.com.
Schmidt and Muenchen enjoy running ideas past each other to get feedback about things they’re working on. Muenchen’s projects are focused on desktop applications, which complement and contrast with Schmidt’s HPC explorations. And given the current popularity of R and the rapid growth in its use, the knowledge they have to share appears likely to continue to be of interest to a lot of people in the research community.
Scott Gibson, science writer, NICS, JICS
Article posting date: 13 April 2015
About JICS and NICS:The Joint Institute for Computational Sciences (JICS) was established by the University of Tennessee and Oak Ridge National Laboratory (ORNL) to advance scientific discovery and leading-edge engineering, and to further knowledge of computational modeling and simulation. JICS realizes its vision by taking full advantage of petascale-and-beyond computers housed at ORNL and by educating a new generation of scientists and engineers well-versed in the application of computational modeling and simulation for solving the most challenging scientific and engineering problems. JICS operates the National Institute for Computational Sciences (NICS), which had the distinction of deploying and managing the Kraken supercomputer. NICS is a leading academic supercomputing center and a major partner in the National Science Foundation's eXtreme Science and Engineering Discovery Environment (XSEDE). In November 2012, JICS sited the Beacon system, which set a record for power efficiency and captured the number one position on the Green500 list of the most energy-efficient computers.