Natasha Pavlovikj · Natasha Pavlovikj · 54bd0d9a
--- a/content/applications/app_specific/bioinformatics_tools/data_manipulation_tools/sratoolkit.md

+ 9

− 26
+++ b/content/applications/app_specific/bioinformatics_tools/data_manipulation_tools/sratoolkit.md

+ 9

− 26
 @@ -5,10 +5,16 @@ weight = "10"
 @@ -5,10 +5,16 @@ weight = "10"
 +++
-[SRA (Sequence Read Archive)](http://www.ncbi.nlm.nih.gov/sra) is an NCBI-defined format for NGS data. Every data submitted to NCBI needs to be in SRA format. The SRA Toolkit provides tools for converting different formats of data into SRA format, and vice versa, extracting SRA data in other different formats.
+[SRA (Sequence Read Archive)](http://www.ncbi.nlm.nih.gov/sra) is an NCBI-defined format for NGS data. Every data submitted to NCBI needs to be in SRA format. The SRA Toolkit provides tools for downloading data, converting different formats of data into SRA format, and vice versa, extracting SRA data in other different formats.
 The SRA Toolkit allows converting data from the SRA format to the following formats: `ABI SOLiD native`, `fasta`, `fastq`, `sff`, `sam`, and `Illumina native`. Also, the SRA Toolkit allows converting data from `fasta`, `fastq`, `AB SOLiD-SRF`, `AB SOLiD-native`, `Illumina SRF`, `Illumina native`, `sff`, and `bam` format into the SRA format.
+The SRA Toolkit supports downloading SRA data using the **"prefetch"** command:
+{{< highlight bash >}}
+$ prefetch <sra_id>
+{{< /highlight >}}
+where `<sra_id>` is the assigned SRA identification in NCBI (e.g., SRR1482462). 
 The SRA Toolkit contains multiple **"format"-dump** commands, where **format** is the file format the SRA data is converted to **abi-dump**, **fastq-dump**, **illumina-dump**, **sam-dump**, **sff-dump**, and **vdb-dump**.
 @@ -16,6 +22,7 @@ One of the most commonly used commands is **fastq-dump**:
 @@ -16,6 +22,7 @@ One of the most commonly used commands is **fastq-dump**:
 {{< highlight bash >}}
 $ fastq-dump [options] input_reads.sra
 {{< /highlight >}}
+This command can be applied on the downloaded SRA data with **"prefetch"**.
 An example of running **fastq-dump** on Crane to convert SRA file containing paired-end reads is:
 @@ -30,7 +37,7 @@ An example of running **fastq-dump** on Crane to convert SRA file containing pai
 @@ -30,7 +37,7 @@ An example of running **fastq-dump** on Crane to convert SRA file containing pai
 #SBATCH --output=SRAtoolkit.%J.out
 #SBATCH --error=SRAtoolkit.%J.err
-module load SRAtoolkit/2.9
+module load SRAtoolkit/2.11
 fastq-dump --split-files input_reads.sra
 {{< /highlight >}}
 @@ -51,33 +58,9 @@ $ bam-load \-o input_reads.sra input_alignments.bam
 @@ -51,33 +58,9 @@ $ bam-load \-o input_reads.sra input_alignments.bam
 Other frequently used SRAtoolkit tools are:
- **prefetch**: allows command-line downloading of SRA, dbGaP, and ADSP data
 - **sra-stat**: generate statistics about SRA data
 - **sra-pileup**: generate pileup statistics on aligned SRA data
 - **vdb-config**: display and modify VDB configuration information
 - **vdb-encrypt**: encrypt non-SRA dbGaP data
 - **vdb-decrypt**: decrypt non-SRA dbGaP data
 - **vdb-validate**: validate the integrity of downloaded SRA data
-{{% notice info %}}
-**Prefetch instructions:**
-\\
-\\
-When **prefetch** is used, the files are downloaded in **${HOME}/ncbi/public** by default.
-\\
-Since the */home* directory (*$HOME*) is not writable from the worker nodes, the file can not be saved in *$(HOME)/ncbi/public* when submitting a SLURM job.
-\\
-\\
-To change the default output directory for **prefetch** to **${WORK}/ncbi/public**, please follow these three steps:
-\\
-**$ wget https://raw.githubusercontent.com/ncbi/ncbi-vdb/master/libs/kfg/default.kfg -P $HOME/.ncbi/**
-\\
-**$ vim $HOME/.ncbi/default.kfg**
-\\
-Here, set *"/repository/user/main/public/root"* to *"/work/group/username/ncbi/public"*, where **group** is the name of **your HCC group**, and **username** is **your HCC username**.
-\\
-**$ export VDB_CONFIG=$HOME/.ncbi/default.kfg**
-\\
-\\
-You need to do these steps only once.
-{{% /notice %}}