[SRA (Sequence Read Archive)](http://www.ncbi.nlm.nih.gov/sra) is an NCBI-defined format for NGS data. Every data submitted to NCBI needs to be in SRA format. The SRA Toolkit provides tools for converting different formats of data into SRA format, and vice versa, extracting SRA data in other different formats.
[SRA (Sequence Read Archive)](http://www.ncbi.nlm.nih.gov/sra) is an NCBI-defined format for NGS data. Every data submitted to NCBI needs to be in SRA format. The SRA Toolkit provides tools for downloading data, converting different formats of data into SRA format, and vice versa, extracting SRA data in other different formats.
The SRA Toolkit allows converting data from the SRA format to the following formats: `ABI SOLiD native`, `fasta`, `fastq`, `sff`, `sam`, and `Illumina native`. Also, the SRA Toolkit allows converting data from `fasta`, `fastq`, `AB SOLiD-SRF`, `AB SOLiD-native`, `Illumina SRF`, `Illumina native`, `sff`, and `bam` format into the SRA format.
The SRA Toolkit supports downloading SRA data using the **"prefetch"** command:
{{<highlightbash>}}
$ prefetch <sra_id>
{{</highlight>}}
where `<sra_id>` is the assigned SRA identification in NCBI (e.g., SRR1482462).
The SRA Toolkit contains multiple **"format"-dump** commands, where **format** is the file format the SRA data is converted to **abi-dump**, **fastq-dump**, **illumina-dump**, **sam-dump**, **sff-dump**, and **vdb-dump**.
...
...
@@ -16,6 +22,7 @@ One of the most commonly used commands is **fastq-dump**:
{{<highlightbash>}}
$ fastq-dump [options] input_reads.sra
{{</highlight>}}
This command can be applied on the downloaded SRA data with **"prefetch"**.
An example of running **fastq-dump** on Crane to convert SRA file containing paired-end reads is:
...
...
@@ -30,7 +37,7 @@ An example of running **fastq-dump** on Crane to convert SRA file containing pai
-**prefetch**: allows command-line downloading of SRA, dbGaP, and ADSP data
-**sra-stat**: generate statistics about SRA data
-**sra-pileup**: generate pileup statistics on aligned SRA data
-**vdb-config**: display and modify VDB configuration information
-**vdb-encrypt**: encrypt non-SRA dbGaP data
-**vdb-decrypt**: decrypt non-SRA dbGaP data
-**vdb-validate**: validate the integrity of downloaded SRA data
{{% notice info %}}
**Prefetch instructions:**
\\
\\
When **prefetch** is used, the files are downloaded in **${HOME}/ncbi/public** by default.
\\
Since the */home* directory (*$HOME*) is not writable from the worker nodes, the file can not be saved in *$(HOME)/ncbi/public* when submitting a SLURM job.
\\
\\
To change the default output directory for **prefetch** to **${WORK}/ncbi/public**, please follow these three steps:
Here, set *"/repository/user/main/public/root"* to *"/work/group/username/ncbi/public"*, where **group** is the name of **your HCC group**, and **username** is **your HCC username**.
where **SRR\|ERR\|DRR** should be either **SRR**, **ERR **or **DRR** and should match the prefix of the target **.sra** file.
More **ascp** options can be seen by using:
{{<highlightbash>}}
$ ascp --help
{{</highlight>}}
For example, if you want to download the **SRR304976** file from NCBI in your $WORK **data/** directory with downloading speed of **1000 Mbps**, you should use the following command: