The profiling of short tandem repeats (STRS) fromnext-generation sequencing (NGS) data by establishing a whole genome STRS pipeline for forensic bioinformatics
Repeat region with length of one to six base pairs (bp), found in DNA sequences is known as short tandem repeats (STRs). Currently, high-throughput next-generation sequencers have facilitated the effective polymorphic STR markers’ identification. In this study, a new whole genome STRs pipeline has...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Malaysian Society of Applied Biology
2016
|
Online Access: | http://journalarticle.ukm.my/11816/ http://journalarticle.ukm.my/11816/ http://journalarticle.ukm.my/11816/1/45_02_12.pdf |
Summary: | Repeat region with length of one to six base pairs (bp), found in DNA sequences is known as short tandem repeats (STRs).
Currently, high-throughput next-generation sequencers have facilitated the effective polymorphic STR markers’ identification.
In this study, a new whole genome STRs pipeline has been established in order to call and profile STRs from next-generation
sequencing (NGS) data. Firstly, genome sequences of Helicobater pylori strain, CPY1124 and PeCan4 as reference genome
were retrieved from European Nucleotide Archive (ENA) database which then the quality of sequences were checked using
FastQC. The assembly of genome sequences was done by VELVET de novo assembler. Unordered contigs from VELVET’s
output was realigned using multiple genome alignment (MAUVE) to obtain ordered contigs sequence. Lastly, STRs calling
and profiling by Tandem Repeat Finder was done with the parameters of (2: match, 7: mismatch and 7: indels). These
parameters are for Smith-Waterman style local alignment using wrap-around dynamic programming. As a result, this new
pipeline enables to identify polymorphic and unique STRs which are GTTTG and AAACCC from CPY1124. This pipeline
has been compared with other available STRs profiling pipeline like pSTR Finder and Tandem Repeat Database (TRDB) for
validation purpose. The similar output producing by both tools thus indicates the reliability of this new pipeline for future
usage. |
---|