A brief History of CCP4, one of Structural Biology’s Original Software Tools
Sit down in front of a newly installed copy of CCP4 today, and you will find approximately 250 computer programs for solving protein structures. The list of programs includes several with catchy names, such as beast (for molecular replacement), dimple (for ligand identification in difference maps), crank (for experimental phasing) and buccaneer (for model building), and some cryptic, such as seqwt and npo. Nearly two dozen applications support file manipulations and format conversions. Still more are riders-on, either deprecated or unsupported.
The seeming mishmash is so by design. "CCP4 has always been a very loose collaboration," says Phil Evans, a structural biologist at the Medical Research Council Laboratory of Molecular Biology in Cambridge, UK, and one of the earliest members of the team that created CCP4 in 1979.
CCP4 stands for Collaborative Computing Project Number 4, the fourth effort in a series of 14 projects originally funded in the late 1970s by the UK's Science Research Council. The CCP projects were designed to support collaboration between researchers developing software for science and to provide shared access to expensive hardware. Today, CCP4 has around 2 million lines of code, dozens of active contributors and thousands of users worldwide, including expert crystallographers, academic researchers interested in solving structures, and scientists doing drug development at pharmaceutical companies.
Out of Necessity, Invention
CCP4 got its start in the 1970s when a number of new macromolecular crystallography groups spawned from the pioneering labs at the Laboratory of Molecular Biology at Cambridge and the biophysics department at Oxford. These two labs had developed a tradition of researchers developing methods, writing programs and sharing code.
But as the number of labs increased, the crystallographers became more far-flung. Being so far apart, they were missing out on collaboration, code sharing, and corrections. "You need someone who'll shout at you and say look, that's a stupid result, you must've done something wrong," says Eleanor Dodson, an early CCP4 member who moved from Oxford to the University of York in 1976.
When the late David Blow, a creator of X-ray crystallography, moved from Cambridge to London, he found scant computer facilities. He and Sir Tom Blundell, also head of a London lab, realized the crystallography community needed to find a new way to work together. They followed the lead of other groups and established a crystallography CCP.
This initiative couldn't have been more timely. The original CCP4 grant covered the cost of one person to coordinate program sharing between the crystallographers. "People were solving structures, so there were a lot of programs around, but they often didn't talk to each other very well," says Evans. Early discussions involved agreeing on common data formats.
From these talks emerged the philosophy that still underlies CCP4 today. Rather than integrate all of the programs into one tightly coupled system, CCP4 is a collection of free-standing programs that share data through files with standard formats.
A Structure Emerges
Now supported by the Biotechnology and Biological Sciences Research Council (BBSRC), the aim of CCP4 has not changed, though its purview has widened. Crystallographers from around the world now contribute their programs to CCP4, such as Amore from a developer in France and Crank from The Netherlands.
New contributions are reviewed by a panel of academics representing CCP4 users and assessed for technical suitability by the Core Team, which may suggest modifications. "We have become a little bit picky," says Eugene Krissinel, who is CCP4 Core Team Leader and responsible for maintenance and support. "Managing a collection of 250 programs is quite a chore."
When a submission falls through, which happens to about 1 in 10 programs, it typically happens during license negotiations. Though CCP4 only requests non-exclusive rights to distribute, governmental and institutional regulations sometimes raise insurmountable barriers.
CCP4 also has an Executive Committee, a group of twelve researchers that oversee the work of the Core Team, assess progress and provide direction. The CCP4 Core Team operates out of the Science and Technology Facilities Council Rutherford Appleton Laboratory in the UK. The seven-person team manages infrastructure, packaging, maintenance and support.
Recently, a group at CCP4 headquarters launched CCPEM, an independent initiative that will focus on electron microscopy. Free electron laser technology has also captured the groups attention, but an initiative has not yet begun. CCPN, for Nuclear Magnetic Resonance technology, is run independently out of Cambridge University.
More and more often, crystallography is being done not by people who are crystallographers but rather by biochemists, molecular biologists and medical chemists who want to solve biological problems. Recognizing this trend, the Executive Committee has prioritized automation and validation to assist those who are not expert crystallographers.
Given this strategic direction, the CCP4 Core Team and funded developers at the University of York are putting a major effort into updating CCP4's aging user interface, which was developed a decade ago. The new interface aims to improve its appearance and usability, reducing user intervention, increasing automation and improving validation. "Ultimately, the idea is that you've got some data, you press a button, and a structure spits out the other end," says Evans. "The programs will decide what's best and also say, hmmm, something's wrong here."
Even though the process of solving structures is mature — "if you can get a good crystal, we can solve it," says Evans — getting a good crystal is often difficult. "There are pathological cases. Things can go horribly wrong."
Still, users with bad data still want to answer their biological questions. This puts pressure on developers to constantly retool their programs to squeeze more meaning from bad data. "This work is always going on in the background," says Krissinel. "It is the essence of CCP4."
Education has also always been a CCP4 priority. The CCP4 community is there to help both new and experienced users with an online forum, workshops held throughout the world, and a Study Weekend held in the UK each January where crystallographers gather to exchange ideas and learn new ways of doing things. "People in our field love methodology," says Dodson. "They really appreciate where things can be improved and like to improve it."
Published December 12, 2012
Schrödinger makes science fun for structural biologists
Schrödinger, a little like a German car, has good looks and power under the hood. The 3D exterior is powered by Maestro, the primary molecular visualization interface in the Schrödinger Suite that integrates all of the other computational tools. It even supports 3D monitors and glasses that embed the user in a 3D viewing experience.
Overkill? Not so, says Woody Sherman, VP of application science at Schrödinger. Aha moments come when scientists view structures in these 3D renderings. “The most important moments come from combining graphics with calculations,” he says. “We gain intuition from looking at a structure, but the calculations shed light on the underlying physics. What's the balance between entropy and enthalpy? What hydrogen bonds are important? The calculations tell us how the physics is driving the properties of the system.”
In addition to Schrödinger's Maestro front end, which is licensed to academic users free of charge, there are many computational back-ends, including applications such as Jaguar (quantum mechanics), Glide (ligand docking), Desmond (molecular dynamics), PrimeX (crystal structure refinement) and nearly two dozen others. It is these tools that help scientists search a library of compounds to find a drug candidate or determine how a candidate molecule can be optimized to interact better with a target.
“We've invested heavily in Maestro's graphical modeling capabilities, but our primary software efforts are in the back-end,” says Sherman. “We want to help users to be the best modelers possible using Maestro so they can answer their scientific questions using the computational back-end.”
Depending on the goal, the backend may run on a desktop or it may need to be run on a cluster of hundreds of processors. For academics who are becoming entrepreneurs, the shift from using Schrödinger as a research tool to using it for drug discovery can be daunting, so Schrödinger created a partnership program to build entrepreneurial collaborations.
Collaboration with Schrödinger is also possible through the code itself. Most of Schrödinger's features are available via a Python API, and the Schrödinger development team has provided access through Maestro to many independent programs that have Python interfaces themselves. In addition, many of Schrodinger's computational algorithms come from contributing scientists who are members of the Schrodinger Scientific Advisory Board.
The May 2011 release of Schrödinger includes general-purpose graphics processing unit (GPGPU) support for the Core Hopping program. The GPGPU-enabled program combined with algorithmic enhancements allows for the processing of 1017 cores per hour. The new release also supports enhanced sampling methods for macrocycles, an improved physics-based energy model, and a structure-based approach to P450 site of metabolism prediction. For more information on the Schrödinger Suite, entrepreneurial partnerships, and code integration, please visit the Schrödinger website.
– Elizabeth Dougherty
Published June 17, 2011
Harnessing the power of modern software engineering and computing power for structural biology
It used to be that to book a trip you'd first need to call every airline to compare flights. Then you'd need to find good hotel deals. Then you'd have to revisit the flights. And so on. The same trial and error approach also used to hold true for structural biology. Frequent failures made scientists all too familiar with square one.
Now, however, what Orbitz and Expedia have done for travel, Phenix has done for structural biology.
Phenix helps investigators solve Xray crystal structures using multiple approaches, including molecular replacement and experimental phasing. It makes things easier by streamlining these multi-step procedures programmatically. Experimental data goes in, and a solution comes out.
Phenix Wizards perform the magic. For instance, the autosol program drives experimental phasing by pre-processing experimental data, doing substructure location, actual phasing, phase improvement, and model building. But instead of just running these procedures in a pipeline, the Wizard makes decisions, choosing the best solution and backtracking when it hits a wall. Phenix, says program director Paul Adams, a senior scientist at Lawrence Berkeley Laboratory, thinks like a crystallographer. In some cases, he says, “it may be able to resolve something even an expert would have difficulty with.”
Adams and his colleagues started developing Phenix in 2000. In the late 1990s, he had wanted to automate the many procedures required for of solving a structure. At the same time, Tom Terwilliger, a bioscientist at Los Alamos National Laboratory, wrote Solve, an automated tool for solving structures using Multiple Anomolous Dispersion and Multiple Isomophous Replacement. “It was natural for us to work together,” says Terwilliger, who now leads one of the four main groups that contribute to Phenix.
The system is now developed and maintained by between 15 and 20 scientists. Key collaborators include Adams, who directs the National Institutes of Health grant that funds the project, Terwilliger, Randy Read, professor of hematology at the University of Cambridge and lead developer of Phaser, and Jane and David Richardson, professors of biochemistry at Duke University who contribute structure validation algorithms.
“Fifteen years ago, people had to try many different software packages before moving on to the next step. They might have many, many false starts before they solve the structure,” says Terwilliger. Today, with Phenix, it's much simpler. “For about seventy to eighty percent of cases, it's straightforward. You put in the data. You press a button. You wait while it does all these things for you.” Those more experienced with Phenix can use its array of tools to try more approaches than feasible manually, allowing them to solve much more difficult structures than ever before.
Computing power has also increased, making Phenix and it's team of experts in both structural biology and software development all the more powerful. “When we started, we tested new methods on one or two examples because that's all that was possible,” says Adams. “Now we routinely test new methods on the whole of the protein database.” The result isn't just faster development but also more certainty that a new method works as it should.
Adams and his software team have also built a rapid development system that allows them to deliver new features across the board—in Phenix code and in sub-components developed far away, such as Phaser—literally overnight. “We do this all the time,” says Terwilliger.
Of course, not all features are built in a day, especially not those that incorporate new scientific knowledge. For instance, the team recently integrated a new method for finding a template structure to bootstrap solution of a novel structure. This new method involves a process called molecular modeling and a tool called Rosetta developed by a research team led by David Baker at the University of Washington. Molecular modeling uses the protein's amino acid sequence and the structures of proteins with related amino acid sequences to predict its structure.
“We're taking advantage of these powerful molecular modeling tools and algorithms and putting them together with ours so we can extend the range over which the whole procedure works,” says Terwilliger. The new module that supports this approach is called phenix.mrrosetta, for molecular replacement plus Rosetta. Details appear in the May 2011 issue of Nature. In the future, Adams hopes to use SBGrid computing resources as a means to find new, unexpected templates for bootstrapping the molecular replacement approach.
The software engineering approach taken by Adams and colleagues makes advancing the tools easier. They selected object oriented programming languages and the Python scripting language for building Phenix. These tools allow them to build modular components, such as an algorithm or a procedure that can be accessed by other modular components.
As an example, the autosol wizard contains components developed by three separate groups. Terwilliger developed the density modification routines, Read developed the maximum likelihood single anomalous diffraction phasing components, and Adams' group contributed substructure location and refinement modules. “The integration was possible because it's done in a modular way with Python,” says Adams.
The team is currently working to integrate more of the Richardsons' validation algorithms to not only automatically detect errors during the model building process, but also to automatically fix errors during structure refinement.
For more information about Phenix, please visit the Phenix website, which contains extensive documentation and email addresses for reporting bugs or getting additional help. The Phenix team travels worldwide to train researchers how to use Phenix, including the annual Cold Spring Harbor macromolecular crystallography course and the RapiData course at Brookhavenn National Laboratory. “If a group has an interest in having their own workshop, we're happy to come out and do it,” says Adams.
– Elizabeth Dougherty
Published June 2, 2011
Putting the maximum likelihood expertise of a few into the hands of many structural biologists
Beta-lactamase disarms penicillin, breaking it down before it can do its antibacterial work. But the beta-lactamase inhibitor protein, BLIP, interferes, paving the way for penicillin to do its work.
Exactly how is no longer a mystery. The complex of beta-lactamase and BLIP was solved, painfully, long ago. “It took Natalie Strynadka”—now at the University of British Columbia—“a couple of years to solve,” says Randy Read, professor of hematology at the University of Cambridge and lead developer of Phaser, a structural biology software tool. “Today it's something that can be solved by one of Phaser's tutorials in five minutes.”
One problem with the old, painful methods, was signal-to-noise. Researchers had already solved the structures of both molecules and they could find the beta-lactamase in the crystal of the complex. But that protein, being twice as large as it's inhibitor, overshadowed the tiny BLIP.
Phaser, however, uses probabilistic methods to keep track of what the beta-lactamase component has already explained in the data. “All we have to do is explain what's unexplained by orienting the BLIP component,” says Read. “It increases the signal-to-noise ratio considerably.”
This concept underlies Phaser, which integrates with both CCP4 and Phenix, and uses statistical methods to select and place models for solving structures by molecular replacement. These methods provide more sophisticated algorithms than a black and white selection or rejection of models that agree well or do not. “We're able to deal in a nice, smooth way with models of different quality,” says Read. As a result, Phaser allows the solution of structures with poorer models than possible with older technology.
Phaser was recently put to work in an SBGrid project that employed its grid computing resource to search the protein database and try thousands of different models to find the best possible starting point. This procedure, published in 2010 in the Proceedings of the National Academies of Science, was effective but slow. “It required that the full set of calculations be run on every possible choice in the database even if the first choice gave a clear solution,” says Read. The latest version of Phaser recognizes the first clear solution immediately.
There are many approaches to using Phaser and many possible stumbling blocks. For this reason, Read maintains the Phaser Wiki (http://www.phaser.cimr.cam.ac.uk/index.php/Phaser_Crystallographic_Software), which includes a FAQ, troubleshooting information, and a section where users can share their own success stories and strategies for using Phaser.
As an example, Read himself recently used Phaser in a creative way to solve the structure of human angiotensinogen, a protein involved in regulating blood pressure. The human angiotensinogen electron density maps were not good enough to finish a complete model. “We got hold of rat and mouse angiotensinogens and developed some new tricks to use the density from one crystal to solve the structure in another crystal,” says Read. The process, which required sophisticated knowledge of Phaser, is something Read is folding into the program for anyone to use automatically.
This is the beauty of technologies as they mature. “The expertise gets concentrated in the instruments and in the people who make programs for the instruments,” says Read. “That frees up the people who use those techniques to become sophisticated in something else.”
– Elizabeth Dougherty
Published May 19, 2011