It used to be that to book a trip you'd first need to call every airline to compare flights. Then you'd need to find good hotel deals. Then you'd have to revisit the flights. And so on. The same trial and error approach also used to hold true for structural biology. Frequent failures made scientists all too familiar with square one.
Now, however, what Orbitz and Expedia have done for travel, Phenix has done for structural biology.
Phenix helps investigators solve Xray crystal structures using multiple approaches, including molecular replacement and experimental phasing. It makes things easier by streamlining these multi-step procedures programmatically. Experimental data goes in, and a solution comes out.
Phenix Wizards perform the magic. For instance, the autosol program drives experimental phasing by pre-processing experimental data, doing substructure location, actual phasing, phase improvement, and model building. But instead of just running these procedures in a pipeline, the Wizard makes decisions, choosing the best solution and backtracking when it hits a wall. Phenix, says program director Paul Adams, a senior scientist at Lawrence Berkeley Laboratory, thinks like a crystallographer. In some cases, he says, “it may be able to resolve something even an expert would have difficulty with.”
Adams and his colleagues started developing Phenix in 2000. In the late 1990s, he had wanted to automate the many procedures required for of solving a structure. At the same time, Tom Terwilliger, a bioscientist at Los Alamos National Laboratory, wrote Solve, an automated tool for solving structures using Multiple Anomolous Dispersion and Multiple Isomophous Replacement. “It was natural for us to work together,” says Terwilliger, who now leads one of the four main groups that contribute to Phenix.
The system is now developed and maintained by between 15 and 20 scientists. Key collaborators include Adams, who directs the National Institutes of Health grant that funds the project, Terwilliger, Randy Read, professor of hematology at the University of Cambridge and lead developer of Phaser, and Jane and David Richardson, professors of biochemistry at Duke University who contribute structure validation algorithms.
“Fifteen years ago, people had to try many different software packages before moving on to the next step. They might have many, many false starts before they solve the structure,” says Terwilliger. Today, with Phenix, it's much simpler. “For about seventy to eighty percent of cases, it's straightforward. You put in the data. You press a button. You wait while it does all these things for you.” Those more experienced with Phenix can use its array of tools to try more approaches than feasible manually, allowing them to solve much more difficult structures than ever before.
Computing power has also increased, making Phenix and it's team of experts in both structural biology and software development all the more powerful. “When we started, we tested new methods on one or two examples because that's all that was possible,” says Adams. “Now we routinely test new methods on the whole of the protein database.” The result isn't just faster development but also more certainty that a new method works as it should.
Adams and his software team have also built a rapid development system that allows them to deliver new features across the board—in Phenix code and in sub-components developed far away, such as Phaser—literally overnight. “We do this all the time,” says Terwilliger.
Of course, not all features are built in a day, especially not those that incorporate new scientific knowledge. For instance, the team recently integrated a new method for finding a template structure to bootstrap solution of a novel structure. This new method involves a process called molecular modeling and a tool called Rosetta developed by a research team led by David Baker at the University of Washington. Molecular modeling uses the protein's amino acid sequence and the structures of proteins with related amino acid sequences to predict its structure.
“We're taking advantage of these powerful molecular modeling tools and algorithms and putting them together with ours so we can extend the range over which the whole procedure works,” says Terwilliger. The new module that supports this approach is called phenix.mrrosetta, for molecular replacement plus Rosetta. Details appear in the May 2011 issue of Nature. In the future, Adams hopes to use SBGrid computing resources as a means to find new, unexpected templates for bootstrapping the molecular replacement approach.
The software engineering approach taken by Adams and colleagues makes advancing the tools easier. They selected object oriented programming languages and the Python scripting language for building Phenix. These tools allow them to build modular components, such as an algorithm or a procedure that can be accessed by other modular components.
As an example, the autosol wizard contains components developed by three separate groups. Terwilliger developed the density modification routines, Read developed the maximum likelihood single anomalous diffraction phasing components, and Adams' group contributed substructure location and refinement modules. “The integration was possible because it's done in a modular way with Python,” says Adams.
The team is currently working to integrate more of the Richardsons' validation algorithms to not only automatically detect errors during the model building process, but also to automatically fix errors during structure refinement.
For more information about Phenix, please visit the Phenix website, which contains extensive documentation and email addresses for reporting bugs or getting additional help. The Phenix team travels worldwide to train researchers how to use Phenix, including the annual Cold Spring Harbor macromolecular crystallography course and the RapiData course at Brookhavenn National Laboratory. “If a group has an interest in having their own workshop, we're happy to come out and do it,” says Adams.
– Elizabeth Dougherty