Four basic steps to running the Crossbow computation. Two scenarios are shown: one where Amazon's EC2 and S3 services are used, and one where a local cluster is used. In step 1 (red) short reads are copied to the permanent store. In step 2 (green) the cluster is allocated (may not be necessary for a local cluster) and the scripts driving the computation are uploaded to the master node. In step 3 (blue) the computation is run. The computation download reads from the permanent store, operates on them, and stores the results in the Hadoop distributed filesystem. In step 4 (orange), the results are copied to the client machine and the job completes. SAN (Storage Area Network) and NAS (Network-Attached Storage) are two common ways of sharing filesystems across a local network.
Langmead et al. Genome Biology 2009 10:R134 doi:10.1186/gb-2009-10-11-r134