blat.md 1.85 KB
Newer Older
npavlovikj's avatar
npavlovikj committed
1
2
3
4
5
+++
title = "BLAT"
description =  "How to run BLAT on HCC resources"
weight = "10"
+++
6
7


npavlovikj's avatar
npavlovikj committed
8
BLAT is a pairwise alignment tool similar to BLAST. It is more accurate and about 500 times faster than the existing tools for mRNA/DNA alignments and it is about 50 times faster with protein/protein alignments. BLAT accepts short and long query and database sequences as input files.
9
10

The basic usage of BLAT is:
npavlovikj's avatar
npavlovikj committed
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
{{< highlight bash >}}
$ blat database query output_alignment.txt [options]
{{< /highlight >}}
where **database** is the name of the database used for the alignment, **query** is the name of the input file of sequence data in `fasta/nib/2bit` format, and **output_alignment.txt** is the output alignment file.

Additional parameters for BLAT alignment can be found in the [manual] (http://genome.ucsc.edu/FAQ/FAQblat), or by using:
{{< highlight bash >}}
$ blat
{{< /highlight >}}

\\
Running BLAT on Tusker with query file `input_reads.fasta` and database `db.fa` is shown below:
{{% panel header="`blat_alignment.submit`"%}}
{{< highlight bash >}}
#!/bin/sh
#SBATCH --job-name=Blat
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --time=168:00:00
#SBATCH --mem=50gb
#SBATCH --output=Blat.%J.out
#SBATCH --error=Blat.%J.err

module load blat/35x1

blat db.fa input_reads.fasta output_alignment.txt
{{< /highlight >}}
{{% /panel %}}

Although BLAT is a single threaded program (`#SBATCH --nodes=1`, `#SBATCH --ntasks-per-node=1`) it is still much faster than the other alignment tools.

\\
<span style="color: rgb(0,0,0);font-size: 20.0px;line-height: 1.5;">BLAT Output</span>

BLAT output is a list containing the following information:

- the score of the alignment
- the region of query sequence that matches the database sequence
- the size of the query sequence
- the level of identity as a percentage of the alignment
- the chromosome and position that the query sequence maps to