
Tracking down the specific genes that trigger a disease is more than just searching for individual needles in a "genetic haystack." It requires sorting through massive amounts of data to look for clusters of genes that work together to influence human traits.
To make this formidable task significantly easier, researchers draw upon the latest statistical analysis tools. And this calls for the kind of services offered by the Illinois Statistics Office (ISO), affiliated with the University of Illinois Department of Statistics.
ISO offers its expertise to the University, government, and industry by helping to design experiments, construct survey plans, analyze data, and develop models. When tackling genetic research specifically, ISO teams up with the Keck Center for Comparative and Functional Genomics.
The need for this service is "demand-driven," says Xuming He, professor of statistics in the College of Liberal Arts and Sciences and former director of ISO. He says that statistical assistance has become especially critical in the wake of new technology, such as microarray platforms.
A microarray is a small chip that can contain more than 10,000 genes. It makes it possible to analyze gene activity much faster than with the more traditional method of gene sequencing. In addition, microarrays add a mind-boggling level of complexity because they can be used to analyze the interaction of tens of thousands of genes-not just individual genes, He says.
They call it "intensity data" for good reason. It's intense.
Because microarray data are so complex and massive, the database is considerably more "noisy" than with traditional gene sequencing, which means the numbers are more subject to error or variation. As a result, it requires statistical savvy to extract meaningful results.
"But that is just what statisticians do," He points out. "Sometimes, people expect magic, but we don't work that way. We tell people that, given the data you have, let's see how much we can separate the signal from the noise."
One of the keys to extracting meaningful numbers from a huge database is to design the experiment properly and efficiently. That is why ISO also offers its know-how to help researchers in designing their experiments.
"There's a whole area of the statistical sciences devoted to the design of experiments," He says. "How you design the experiment and how you conduct it will affect your ability later to extract the signal from the noise."
Because microarrays analyze so many genes at a time, He says there is also a greater risk of "false positives"-genes mistakenly identified as influencing a particular trait. But by using the latest statistical methods, researchers can dramatically cut the rate of false positives from 50 percent to only 5 percent, for example.
"Work in this area is very interdisciplinary," He adds. In addition to collaborating with the Keck Center, ISO can draw upon the expertise of the National Center for Supercomputing Applications and the Department of Computer Science.
As he puts it, when these various forces combine, "what you can do is amazing."