The National Institute for Computational Sciences

Digital Drug Discovery

Researchers from Tennessee and South Carolina enhance codes to speed up computational drug discovery

by Caitlin Elizabeth Rockett


An important element of computational simulation is the bridge it creates between theory and experimental testing. Perhaps no other field exemplifies the significance of this link quite like pharmaceutical drug design—arguably one of the most costly and universally essential fields of scientific endeavor. With the help of one of the most powerful supercomputers in the world, scientists Yuri Peterson of the Medical University of South Carolina (MUSC) and Bhanu Rekepalli of the National Institute for Computational Sciences (NICS) are working on taking drug simulation to a new level—the petascale.

Petascale refers to computer systems capable of performing a quadrillion calculations per second (petaflops)—that’s a one followed by 15 zeros. HPC systems like Kraken make it possible to sort through vast arrays of compounds that may work as treatments against viruses and diseases. However, having this kind of computational power is only half of the solution; researchers need codes that can scale to a machine of such proportions. Rekepalli and Peterson have taken on this challenge by improving the speed and scaling of the molecular docking code Dock6.

The researchers are using Kraken, a Cray XT5 system housed at NICS, which has a peak performance of nearly 1.2 petaflops. NICS is funded by the National Science Foundation (NSF) and managed by the University of Tennessee.

Tweaking the process

Contemporary drug discovery starts with a process known as high-throughput screening (HTS) where huge public libraries of chemical compounds are tested for their effectiveness against a biological target—like a protein—that is known to play a key role in a disease. These libraries number into the millions and grow every year, making laboratory testing complex to say the least.

“Experimentalists identify 100 or so promising chemical compounds and they order perhaps 50 of these from a company and begin testing them experimentally against their target proteins,” explained Rekepalli. “It’s tedious, time consuming, and very expensive.”

While clinical testing (using microorganisms and eventually humans) will always be necessary before a drug can be brought to the market, computers can help to significantly narrow down a subset of compounds to be tested.

Yet the size of chemical libraries has even made computational HTS difficult as few scientists have access to systems capable of modeling millions of compounds in a reasonable time frame. Thankfully, powerful machines like Kraken provide a means for academic scientists and their students to study drug candidates at the molecular and even atomic level in record time, but it does require codes that can utilize such large systems, which is precisely what Rekepalli and Peterson have teamed up to do. Their collaboration grew from NICS’ involvement in EPSCoR, an NSF-funded program aimed at strengthening research and education in science and engineering throughout the United States.

“Our EPScOR project addresses how we can help researchers from various universities in South Carolina and in Tennessee to achieve high-level science with the facilities that we have at NICS,” explained Rekepalli.

Peterson’s group at MUSC was running Dock6 on a small cluster called CBRC (Computational Biology Resource Center) located on campus. The group was using the High Performance Docking (HP-D) application to study how 1.3 million compounds interact, or dock, with seven target proteins that have been identified in ovarian cancer. If this weren’t a large enough endeavor, the team also wanted to change the conformation of the proteins, meaning each compound would be docked to a few variations of each of the seven proteins.

“Assuming their cluster was running 24 hours a day, seven days a week, 365 days a year, it would still take them a few years to achieve that,” said Rekepalli. As such, Rekepalli’s team began to optimize Dock6 to be run on Kraken. The researchers decided to split the work between Kraken and CBRC, using Kraken to run flexible docking while the cluster ran rigid docking simulations. Flexible docking is anywhere from three to five times more computationally intensive (requiring many processors) than rigid docking because it allows the compound to change shape (as it would in nature) during the docking process.

Kraken was able to dock all 1.3 million compounds to each of the seven proteins in around 40 minutes using only 8,000 of the machine’s 112,896 cores—it took three months to complete this task on the CBRC cluster. The scientists also changed the conformation of the proteins (four conformations for each protein), leading to a total of around 36 million compound-protein interactions, all completed within a matter of days. Since these initial test runs, the researchers have done testing with Dock6 up to 36,000 cores on Kraken and results are looking consistent as they scale higher and higher.

Figure: The docking target preparation workflow for HP-D, A.) Identification of the appropriate drug interaction site on oncoprotein. B.) Generation of spheres from showsphere, C.) Manual selection of active site spheres, D.) Docking grid creation as defined by inclusion boundary box.

But this is only the beginning of Rekepalli and Peterson’s work. Once the researchers have scaled Dock6 to the entirety of Kraken they will run different conformations of each protein until they have narrowed the millions of compounds down to a few hundred. This subset will undergo molecular dynamics (MD) simulations to more deeply investigate the atomic interactions taking place during docking, and while this is also a very computationally intensive problem for even a single compound, Rekepalli and Peterson hope to eventually automate this process in order to run compounds in parallel.

The Future of HP-D

The future of this research doesn’t stop with Dock6—another molecular docking application called AutoDock will undergo the same optimization and scaling techniques in order to achieve full scaling on Kraken. While Dock6 only allows the compound to be flexible, AutoDock allows malleability of both compound and protein, leading to a more natural representation of the actual process. Again, millions of compounds will be narrowed down to hundreds and then submitted to MD simulations. By comparing results from AutoDock and Dock6, Rekepalli and Peterson hope that this thorough process eliminates statistical errors and leads to a stronger subset of compounds for experimental testing.

Rekepalli and Peterson understand that not every experimentalist will have a system like Kraken at their disposal, which is why another key element of their research involves creating a database containing results from their simulations. They also plan to make the optimized versions of Dock6 and AutoDock available to any who want to use them.

While the research has only begun, Rekepalli and Peterson have made great strides in short time—even the most epic journeys begin with one step.

About NICS: The National Institute for Computational Sciences (NICS) is a joint effort of the University of Tennessee and Oak Ridge National Laboratory that is funded by the National Science Foundation (NSF). Located on the campus of Oak Ridge National Laboratory, NICS is a major partner in NSF’s Extreme Science and Engineering Discovery Environment (XSEDE).