Developer Tale

The Raw and the Cooked

Graeme Winter

Diamond Light Source, UK

Published January 24, 2014

Graeme Winter, author of the xia2 x-ray crystallography data processing software, got his start programming during a stint as an astrophysics graduate student working on software to simulate galaxies. He left astrophysics behind, leveraging his newly minted programming skills to land himself a job as a programmer in crystallography at the Medical Research Council's Laboratory of Molecular Biology in Cambridge, UK.

Crystallography stuck. It wasn't the programming that hooked him, but the mathematics. "Each step in the crystallographic process involves several different areas of mathematics, so I quite enjoyed that aspect of it," says Winter, who is now a scientist at the UK's Diamond Light Source.

The idea for xia2, originally xia, short for Crystallographic Infrastructure for Automation, came to Winter while working at LMB on a graphical user interface to guide users through data processing. "I realized people often run the same sequence," he says. "Why not have a program that does it for them?"

In 2002, Winter moved to the Darsebury Laboratory, former home of the UK synchrotron, and began work on xia to automate the invocation of existing data processing programs, such as XDS, CCP4, and Mosflm. "I quickly reached a dead end because I'd never designed it," he says. "I'd just started writing scripts."

He started over by taking a big step back. He picked the brains of other crystallographers and processed hundreds of data sets himself to determine the decisions involved in processing raw diffraction data. The result was xia2, this time designed as an expert system to navigate the data processing decisions. "Basically, it understands data processing just about as well as I do," he says.

This research ended up amounting to a doctoral thesis, so Winter earned his PhD part-time at the University of Manchester while still employed at Daresbury.

By automating what in the past was an arcane, manual process, xia2 makes these crucial early data processing decisions reproducible and traceable. "It's exceedingly hard to scientifically reproduce what someone has done if the data processing was done manually," says Winter, who has made xia2 an open source program so that others can see how it works and, if necessary, alter it.

Winter recently collaborated with colleagues at Diamond Light Source to solve a histamine membrane protein in complex with an anti-histamine drug. The work, published in Nature in 2011, presented a data processing challenge. Winter had to stitch together data from dozens of crystals to get a sufficiently strong signal. He did the data processing manually, but is now working to code his decisions in xia2. "We're trying to automate the process because the interesting research, like membrane or virus protein structures, involves stitching together data from multiple, possibly hundreds, of crystals," he says.

In addition, Winter is delving into the underlying data processing programs themselves. While the existing programs work very well, new data collection technologies, such as free electron lasers and microfocused beamlines, are creating data sets with more subtle signals. "To detect those signals, we have to get closer to the raw data and do a much more complete mathematical treatment. You can't make those improvements without getting your hands dirty," says Winter, who is working on this in collaboration with a team at Diamond Light Source and groups at LMB and Lawrence-Berkeley Laboratories. "Throughout the development of xia2, I've been very lucky to have a lot of very experienced people to work with."

-- Elizabeth Dougherty