-- Creation of integrated system of databases of biological knowledge. The system should include at least databases of 1. sequences, 2. sequence and structure-related classifications, 3. metabolic pathways, 4. mutation Òhot spotsÓ, 5. generalized, site-specific and illegitimate recombination Òhot spotsÓ (germ line recombinations should probably be distinguished from somatic cell recombinations), 6. properties allowing defining and re-defining taxonomic classifications, 7. Ecosystems and species occupying them, 8. Published and unpublished research pertaining to all biology, 9. Proteins and their putative function (multiple definitions of ÒfunctionÓ included), 10. Inference and re-formating tool that will allow navigating through different databases while formulating and re-formulating queries.
-- Integrated (but truly RELIABLE) sequence research software. One of the first problems to be addressed is UNAMBIGUOUS identification of putative functional domains in nucleotide and protein sequences (The existing ad hoc software is painfuly insufficient for domain identification purposes).
-- Logic and methodology of computational experiments. The logical problem in biology is a need to answer relatively simple queries while having tremendous amount of data to consider. Development of computer-assisted data evaluation and inference tools is INDISPENSABLE for biology. No research in this area has been done in the past but some methods of computational experimentation already exist.
-- Scene simulation games. The formulation of hypothesis and research question will be assisted by simulation of mechanistic scenario that will take into account all available data and indicate the kind of data which are not available (i.e. need to be supplied by additional research work).