Startup Idea: Drug Discovery De-replication Software
I work in the pharmaceutical industry and my work is oriented at the very earliest stages of drug development. We call this stage discovery. In the 1960-1990's all of the big pharmaceutical companies had expansive drug discovery programs in which the goal was to find new molecules in nature that have potential therapeutic potential. This literally involved people digging in the dirt, but it also involved people going to remote rain forests and jungles to harvest exotic plants and microbes.
In the 1990's and to recently, many pharmaceutical companies opted for something called "combi-chem" or combinatorial chemistry, where they simply had massive libraries of thousands of compounds synthesized and they tested them against various assays to determine if these new molecules had potential therapeutic properties. Combi-chem has kind of run its course and yielded very little.
Now the focus has shifted back to identifying molecules from natural environments. The rationale is two-fold: First, modern analytical chemistry techniques, such as liquid chromatography mass spectrometry (LC/MS) has undergone exponentially large improvements in speed, quality, and sensitivity.
Second, computational power has increased enormously as well, as we are all so very familiar. So researchers are returning back to harvesting exotic soils, plants, and microbes in an attempt to find new therapeutic molecules. The problem is that now we researchers have too much data, and it is difficult to compare new samples to old sample effectively.
Our goal is to limit the number of times we "discover" some molecule that has already been discovered 1, 2, 5, 10 or 20 years ago. This process of unraveling the chemical information of these molecules and identifying everything that is in our samples is called "de-replication".
The problem is that every sample that we collect data on has a mixture of information that is comprised of true/pure chemical signatures from single molecules, heterogeneous chemical information from one or more molecules (this is confound and often discarded) and instrument background signal noise (think static from a radio).
If we could have software that would speed-up the identification of previously discovered molecules (thus speeding-up de-replication) then that would allow us to focus more time and effort on characterizing the truly new molecules.
Software to accomplish this would be worth $1,000,000's of dollars to the whole industry. I think that software that worked almost analogously to plagarism-checking software would do the trick, but aimed at molecules.