
Some like it hot—very, very hot. One such creature is the archaea, which have been described as among the strangest forms of life on the planet.
University of Illinois researchers are studying archaea that live near volcanoes in Naples, Italy, trying to unlock the mystery of how some forms of these microbial creatures can live in temperatures close to 180 F, and how they can devour sulfur.
While the biologists tackle the organism itself, Wenxuan Zhong wades into an environment that many people find as formidable and as frightening as a superheated volcanic field—the extreme world of statistics. She is working with biologists to make sense out of the data being generated in the study of archaea, as well as in many other projects across the campus and country.
“Advances in science and technology in the past few decades have led to an explosive growth of high–dimensional data across a variety of areas, such as genetics, molecular biology, cognitive sciences, astrophysics, finance, and Internet commerce,” says Zhong, U of I professor of statistics. “My job is to find out what the data tell us.”
Her task is to pull vital information out of massive data sets, such as which “transcription factors” regulate particular genes. In the case of the archaea, Zhong and the Illinois team are pinpointing transcription factors that regulate genes that give these microbes the ability to survive in extreme environments and consume sulfur. This information may someday be used in efforts to control air pollution or purify petroleum.
Zhong is also performing her statistical magic in a project in which Illinois microbiologists are looking for transcription factors that regulate genes that trigger Alzheimer’s. And in another study, she is working with researchers at Harvard to identify marker genes that could give an early warning of a bioterrorist attack.
Whatever set of data she is grappling with, her method is unique in that it is “model–free.” The traditional method is to apply a model to a large data set to extract information, but a model makes certain assumptions that may or may not be accurate, she says.
As she puts it, “I don’t put on these constraints. I let the data speak.”
By letting the data speak, she is also helping electronic sensors smell. With her assistance, the opto–electronic nose developed by U of I chemistry professor, Kenneth Suslick, has seen an 18 percent increase in its identification accuracy compared to the accuracy using the traditional data analysis method for some special simulated examples.
Suslick’s system features a sensor array with 36 dots of different–colored dyes. When a particular toxic gas hits the sensor, it changes the colors of some of the dyes, creating a unique pattern or fingerprint of colors. This fingerprint pinpoints what particular gas the sensor has been exposed to, creating a warning system similar to the radiation badges that physicists wear to alert them to dangerous exposure.
In the real world, however, various environmental factors, such as humidity or the presence of other smells—even innocuous odors from cooking—can interfere with the system. But Zhong has come up with a new statistical classification system that cuts through this “background noise” and improves the sensor’s ability to classify gases correctly.