| |
Despite
such advances, it is the changing fortunes of
the drug industry that are pushing biology and
computing together. According to the Boston
Consulting Group, the average drug now costs
$880m to develop and takes almost 15 years to
reach the market. With the pipelines of new
drugs under development running dry, and patents
of many blockbuster drugs expiring, the best
hope that drug firms have is to improve the
way they discover and develop new products.
Paradoxically,
the biggest gains are to be made from failures.
Three-quarters of the cost of developing a successful
drug goes to paying for all the failed hypotheses
and blind alleys pursued along the way. If drug
makers can kill an unpromising approach sooner,
they can significantly improve their returns.
Simple mathematics shows that reducing the number
of failures by 5% cuts the cost of discovery
by nearly a fifth. By enabling researchers to
find out sooner that their hoped-for compound
is not working out, bioinformatics can steer
them towards more promising candidates. Boston
Consulting believes bioinformatics can cut $150m
from the cost of developing a new drug and a
year off the time taken to bring it to market.
That
has made drug companies sit up. Throughout the
1990s, they tended to use bioinformatics to
create and cull genetic data. More recently,
they have started using it to make sense of
it all. Researchers now find themselves swamped
with data. Each time it does an experimental
run, the average microarray spits out some 50
megabytes of data—all of which has to
be stored, managed and made available to researchers.
Today, firms such as Millennium Pharmaceuticals
of Cambridge, Massachusetts, screen hundreds
of thousands of compounds each week, producing
terabytes of data annually.
The
data themselves pose a number of tricky problems.
For one thing, most bioinformatics files are
“flat”, meaning they are largely
text-based and intended for browsing by eye.
Meanwhile, sets of data from different bioinformatics
sources are often in different formats, making
it harder to integrate and mine them than in
other industries, such as engineering or finance,
where formal standards for exchanging data exist.
More
troubling still, a growing proportion of the
data is proving inaccurate or even false. A
drug firm culls genomic and chemical data from
countless sources, both inside and outside the
company. It may have significant control over
the data produced in its own laboratories, but
little over data garnered from university research
and other sources. Like any other piece of experimental
equipment, the microarrays themselves have varying
degrees of accuracy built into them. “What
people are finding is that the tools are getting
better but the data itself is no good,”
says Peter Loupos of Aventis, a French drug
firm based in Strasbourg.
To
help solve this problem, drug firms, computer
makers and research organisations have organised
a standards body called the Interoperable Informatics
Infrastructure Consortium. Their Life Science
Identifier, released in mid-2002, defines a
simple convention for identifying and accessing
biological data stored in multiple formats.
Meanwhile, the Distributed Annotation System,
a standard for describing genome annotation
across sources, is gaining popularity. This
is making it easier to compare different groups'
genome data. |