GeneFlow
What is GeneFlow?
Geneflow is a workflow system for automating many aspects of
genome data analysis in a collaborative research
environment. These data may consist of
genomic DNA sequence, physical mapping hybridization data,
gene expression profiling data, etc. Data generation and analysis
may take place at multiple facilities. The system is designed to
leverage the power of the Web to tie together the various activities of
data generation and analysis at multiple facilities.
The system is currently
being developed with the immediate goal of providing an
informatics infrastructure for the Neurospora crassa and
Pneumocystis carinii genome projects. The system is
evolving rapidly, so this page may be updated frequently to reflect
the current system status as well as the design of components to
be developed in the future.
What does GeneFlow consist of?
GeneFlow consists of a number of software modules that carry out
a particular function or set of functions. These modules are
being developed in the Java and Perl programming languages. The
modules are designed to be integrated into workflows using the
METEOR workflow management system.
Using METEOR workflows can be created visually, by using a graphical
"drag and drop" utility. This makes it easy for non-programmers to
create and modify workflows once the application modules have been
developed. Additionally, METEOR, is fully web-enabled. Users of
GenFlow will be able to access the different components of the
application via a browser.
Availability of Software
The application modules will be made freely available for download.
Use of the METEOR workflow management system requires a license
from Infocosm. Alternatively,
workflow applications can be created with GeneFlow software modules
by hand by writing scripts.
Demonstration
The following image is a screenshot of the METEOR graphical workflow
building utility showing a number of GeneFlow modules linked together
in a workflow.

Description of Icons in Above Figure
- SETUP
Searches special directories for new cosmid or EST sequence, then
launches instances of the workflow accordingly.
- ASSEMBLE_X
Assembles new sequence reads using Phread and Phrap.
Runs Consed in autofinish mode to design new primers for
sequence finishing. ASSEMBLE_1 and ASSEMBLE_2 operate at different
institutions. As more institutions participate in the sequencing,
this module could be cloned at installed at these new institutions.
- CLUSTER_EXT
Clusters EST data.
- ANALYZE
Run a plethora of sequence analysis applications on new data
including BLAST, FASTA, PrositeSearch, etc.
- ANNOTATE
Allow user to add annotation to sequence based on analysis data.
- POST_PRIMERS
Primer sequences created in ASSEMBLE_X are put on the web for
the oligonucleotide synthesis personel to retrieve.
- SUBMIT_GENBANK
Submit sequence to GenBank using Sequin
A series of screenshots
have been set up that illustrate how a user
interacts with the system.
Authors
Currently GeneFlow is being developed by
David Hall and
John Miller with
input from the fungal genomics community.