Computational Biology: Big Data’s Powerful Promise

Massachusetts General
  • Robert Tomsho

Research laboratories at Massachusetts General Hospital are in the midst of a revolution, although it is not one that is always easy to see. From day to day, researchers still conduct experiments with traditional beakers, microscopes and test tubes. What is changing is the speed, breadth and precision of what researchers can do with such iconic lab tools.

Computational biology is the catalyst. The phenomenon entails using high-level math and innovative computer technology to harness the waves of new genetic data spawned by the historic Human Genome Project of a few years ago. The results are providing researchers with new ways to approach and assess their work in the traditional lab as well as fingertip access to vast banks of data from around world. Still in its early stages, computational biology is helping researchers avoid blind alleys, zero in on the most promising research targets and track the paths of individual diseases back to their biological headwaters.

Medical experts have high expectations for the field’s future. With the likes of breast cancer and leukemia, the technology has already helped fuel the development of targeted therapies designed to hit specific cells and treat disease with fewer side effects. Researchers also believe computational biology will help them diagnose a broad range of conditions with greater precision and tailor treatments designed to meet the specific needs of individual patients. And its use is having an impact beyond research settings. Partners HealthCare recently launched a “whole genome” sequencing and interpretation service for patients at Mass General and other Partners hospitals.

The massive data processing involved requires more sophisticated technological skills than most medical researchers needed in the past. “Twenty years ago, a biological researcher would do an experiment and it would generate data that he or she could bring home in a briefcase, look at for an evening and interpret,” recalls Robert Kingston, PhD, chief of Molecular Biology at Mass General. “Now, people in my group and many groups are generating data sets that they could never process, even if they were willing to go through it by hand for a year.”

As a result, more Mass General biologists and clinicians are collaborating with physicists, engineers, mathematicians and computer scientists. In addition to looking into specific diseases, such teams are also developing innovative software tools to help other scientists mine their own data with greater precision and ease. These computer tools typically involve creating algorithms, or step-by-step mathematical instructions, which direct powerful computers to process medical data in a certain way.

“The laboratory and experimental techniques available to medical researchers are incredible,” says Mark Daly, PhD, chief of Mass General’s Analytic and Translational Genetics Unit. “They are beyond the wildest dreams any of us had even a decade ago.”

Dr. Daly’s own Mass General lab reflects some of the changes afoot. Collectively, he and his ATGU colleagues have degrees in medicine, physics, psychiatry, genetics and statistics, among other specialties. With a particular focus on inflammatory and psychiatric disorders, they are exploring the complex genetic underpinnings of disease.

In an airy lounge on the sixth floor of the Richard B. Simches Research Center, Dr. Daly and his ATGU colleagues hold regular coffee hours with various other MGH research groups to trade ideas and electronically sift through genetic data. Lately, some of those sessions have made use of xBrowse, a software developed by ATGU to trace the possible genetic causes of certain rare diseases. Rather than simply trying to pinpoint a single telltale mutation in one person, the xBrowse is designed to highlight suspect patterns of genetic interplay in groups of people.

One recent ATGU coffee hour included pediatric endocrinologists looking into GnRH deficiency, a disorder that can impede puberty and cause infertility. The researchers had obtained genetic samples from several generations of 17 different families. In images projected onto a lounge wall, xBrowse translated the resulting data into diagrams resembling family trees.

Individual research subjects were represented by geometric shapes, all of them color-coded to reflect genetic mutations. From his laptop, Dr. Daly was able to visually zero in on different families within the research group and track genetic changes from generation to generation. With the software, the genetic patterns found among test subjects can also be compared with those of participants in studies elsewhere.

“In order for us to make durable discoveries about what might be causing a disease or what damaging mutations in a certain gene do, we can’t simply ask that question about one patient,” Dr. Daly explains. “We need to ask it about thousands of patients.”

Gathering such a broad spectrum of genetic data presents a funding challenge. Although the cost of sequencing an individual human’s genome continues to fall, it remains in the $5,000 to $10,000 range — a financial stretch for the researchers and institutions involved. And while computational biology may help spawn a range of significant medical advances, it probably won’t singlehandedly produce the sort of headline making new treatments that quickly attract broad attention and support. “You truly have to find philanthropists with a longterm view,” Dr. Daly says.

In the meantime, the traditional laboratory, sometimes also known as a “wet” lab, isn’t going away. Computers, after all, can’t collect blood samples, observe research subjects or administer experimental drugs. At the end of the day, even a data-driven discovery process still depends upon a human researcher to use deduction, knowledge and scientific principles to form a hypothesis and conduct the experiments to test it. Historically, that tried-and-true scientific method has produced medical advances ranging from antibiotics to X-rays.

But worldwide, failures have always outnumbered successes in medical research. For every 5,000 to 10,000 compounds that enter the medical research and development process, only five make it to the point of human trials, according to the recent study by FasterCures, a nonprofit think tank.

Among other things, computational biology aims to improve such odds by dramatically expanding the level of information available to researchers before they finalize a hypothesis to test. The phenomenon began to gather steam in the wake of the federally funded Human Genome Project, whose initial results were published in 2000. Under the project, teams of scientists around the world spent a decade mapping out the more than 20,000 genes that make up the human genome and plotting out the sequence of the more than 3 billion chemical bases that comprise human DNA.

The genome project provided researchers with unprecedented detail about human biology, all of it expressed in short combinations of letters and numbers. At the same time, rapid advances in technology began allowing scientists to organize and manipulate such data to make it more readily useful.

Researchers needed the extra computational firepower. For while the Human Genome Project laid out the component elements of the human genome, researchers still required systems to help them figure out and chart how those parts communicate and work together. Adding to the complexity was the fact that the sequence or order of every individual’s genome varies slightly because of small mutations. Most of those mutations are harmless. Determining which might cause disease and how means comparing data from large numbers of people to ferret out suspect patterns of genetic activity.

“You’re trying to get the computer to give you back the information in a way that says, in all of these billions of bits of information, here is something interesting that is going on,” explains Dr. Kingston, the Mass General molecular biologist. “Then, you can zoom in on it and look at the details.”

Increasingly, researchers are able to simultaneously track and measure the workings of whole biological systems instead of focusing on the role of a single gene, protein or molecule. “Some of these complex diseases may not boil down to one gene,” says Sridhar Ramaswamy, MD, Tucker Gosnell Investigator at the Mass General Cancer Center. “They may involve 50 genes or 100 genes, and it may be those genes working in some combination.”

At his Mass General lab, Dr. Ramaswamy is trying to unravel just such a mystery, one with huge stakes in terms of medical science and human suffering. The project involves an aspect of how cancer metastasizes. Specifically, Dr. Ramaswamy wants to find out how some cancer cells manage to break off from the primary tumor, travel to another part of the body and remain dormant, sometimes for years, before causing another outbreak of the disease. His goal is to determine how those sleeper cells function and find a way to keep them from reactivating, whether by killing them outright or perhaps by devising a treatment that will keep them dormant permanently.

Dr. Ramaswamy is counting on computational biology for help in investigating the sleeper cells’ biological landscape. Computers will be used to organize and distill findings from past cancer research, especially as it relates to metastasis. Combining existing data with the results of their own sampling, he and his colleagues will also use technology to search for patterns in how thousands of different genes behave or express themselves in primary tumors compared to metastatic tumors, or secondary tumors that develop as a result of cancer cells traveling through the bloodstream.

“You measure every gene that’s expressed in the primary tumors and every gene that’s expressed in the metastatic tumors and you compare the two profiles,” Dr. Ramaswamy explains. “You are going to find a small set of genes that have increased activity here but not there. And that would be a logical starting point for saying that those are the genes that might have caused this process to happen so let’s focus on those.”

Vamsi Mootha, MD, a systems biologist at Mass General’s Department of Molecular Biology, says the hospital is in a unique position to make use of such technology. “Here at MGH, we can generate new large-scale data sets, analyze them and translate back to the patient. We have all the ingredients.”

A MacArthur “Genius” grant recipient, Dr. Mootha is trained in mathematics and medicine. He focuses on mitochondria — often termed the “powerhouses of the cell” — that are crucial for generating cell’s energy. Dr. Mootha’s team uses a mix of large scale biology and computation to reverse-engineer mitochondrial circuitry in disease, using algorithms similar to those at the heart of the Google and Netflix search engines. They recently took a penetrating new look at a nugget of cancer theory that has been around since Calvin Coolidge occupied the White House.

Known as the “Warburg effect,” the theory holds that cancer cells rewire their mitochondria and consume an extraordinary amount of glucose to support rapid proliferation. For decades, scientists haven’t been able to determine whether the process is unique, critical or perhaps the key to a cure. Glucose, after all, is only one of hundreds of so-called metabolites, or nutrients, that cells consume and release to create energy.

Dr. Mootha and his colleagues developed an innovative experimental method called CORE (short for “consumption and release”) profiling. Factoring in the time it takes cancer cells to divide, CORE allowed the researchers to measure the speed at which cells consume metabolites.

The team didn’t just look at glucose or one kind of cancer. CORE permitted them to trace, measure and compare the cellular journeys of more than 200 metabolites, across 60 different cancer cell lines. And their home-brewed dataset was just the beginning. They compared their own data set with the wealth of data available on the World Wide Web for download. One study, for example, measured the expression of each of 20,000 genes across the 60 cancers. Another database reports the sensitivity of each of the 60 cancers to 100,000 different drugs.

Dr. Mootha’s team integrated all of these data to achieve a holistic view of cancer cell metabolism. Much to their surprise, they made the discovery that the fastest-growing cancer cells are reliant on a non-essential amino acid known as glycine. “This wasn’t on anyone’s radar screen,” Dr. Mootha says, “and points to a brand new property of cancer that can be targeted with existing drugs as well as with dietary interventions.”

Dr. Mootha marvels at how dramatically computation is changing the way research is done. “The Internet houses infinitely more data than a single lab could ever produce,” he says. “Big data algorithms allow us to integrate all of this data to achieve holistic views of disease biology.”

More From Massachusetts General >>
Read Next Article >>
Got grapes? UCLA researchers have demonstrated how resveratrol, an antioxidant derived from grapes...
Our bodies may not have caught up with the Bombardier Global 6000, but new research illuminates how...
UCLA has been tapped by the Defense Advanced Research Projects Agency to spearhead an innovative...
Harry Campbell
Neurologist Bruce Miller calls tau—a floppy, free-form protein—“the holy grail of dementia.” That...
Wrong-level spine surgeries—when a surgeon accidentally operates on the wrong vertebra—occur with...
Physicians have long assumed that ovarian cancer—one of the deadliest cancers among women—...
Photo by Sam Kaplan; Copyright Sam Kaplan
In 2012, when the journal Science published a study by a group of Cleveland researchers touting a...
Photo by Movus; http://www.gettyimages.com
Patients who come to Johns Hopkins for epilepsy surgery sometimes require a prequel to their...
Copyright Getty Images
In the late 1990s, the process of isolating genes from the chromosomes in which they reside was...
Photo by Sergey Anatolievich Pristyazhnyuk; copyright Getty Images
The human body has many natural barriers intended to prevent microorganisms, diseases, and toxins...