CSE 549: Suffix Arrays
[Pages:64]CSE 549: Suffix Arrays
All slides in this lecture not marked with "*" of Ben Langmead.
Suffix array
T$ = abaaba$
As with suffix tree, T is part of index
SA(T) = 6 $ (SA = "Suffix Array") 5 a $
2 aaba$ 3 aba$ 0 abaaba$ 4 ba$ 1 baaba$
m + 1 integers
Suffix array of T is an array of integers in [0, m] specifying the lexicographic order of T$'s suffixes
Another Example Suffix Array
s = cattcat$
? Idea: lexicographically sort
all the suffixes.
? Store the starting indices of
the suffixes in an array.
1 cattcat$ 2 attcat$ 3 ttcat$ 4 tcat$ 5 cat$ 6 at$ 7 t$ 8$
index of suffix
suffix of s
sort the suffixes alphabetically
the indices just "come along for
the ride"
8$ 6 at$ 2 attcat$ 5 cat$ 1 cattcat$ 7 t$ 4 tcat$ 3 ttcat$
slide courtesy of Carl Kingsford
Suffix array
O(m) space, same as suffix tree. Is constant factor smaller? 32-bit integer can distinguish characters in the human genome, so suffix array is ~12 GB, smaller than MUMmer's 47 GB suffix tree.
Relationship Between Suffix Trees & Suffix Arrays
= {$,a,c,t} s = cattcat$
12345678
8
$
6
$ at
t
cat tcat$
$ cat$
tcat$
$ tcat$ 7
4
3
2 5
1
Red #s = starting position of the suffix ending at that leaf
Leaf labels left to right: 86251743 Edges leaving each node are sorted by label (left-to-right).
s = cattcat$
8$ 6 at$ 2 attcat$ 5 cat$ 1 cattcat$ 7 t$ 4 tcat$ 3 ttcat$
*slide courtesy of Carl Kingsford
Puglisi, Smyth,Turpin.A Taxonomy of Suffix Array Construction Algorithms. ACM Computing Surveys, 39(2):4, 2007.
*slide courtesy of Carl Kingsford
Suffix array: querying
Is P a substring of T?
1. For P to be a substring, it must be a prefix of 1 of T's suffixes
2. Suffixes sharing a prefix are consecutive in the suffix array
Use binary search
6$ 5 a$ 2 aaba$ 3 aba$ 0 abaaba$ 4 ba$ 1 baaba$
Suffix array: querying
Is P a substring of T?
1. For P to be a substring, it must be a prefix of 1 of T's suffixes
2. Suffixes sharing a prefix are consecutive in the suffix array
Use binary search
6$ 5 a$ 2 aaba$ 3 aba$ 0 abaaba$ 4 ba$ 1 baaba$
................
................
In order to avoid copyright disputes, this page is only a partial summary.
To fulfill the demand for quickly locating and searching documents.
It is intelligent file search solution for home and business.
Related download
- csc263 week 5
- s python cheat sheet data science free
- hitchhiker s guide to python and arcgis esri
- scientific and mathematical computing using python
- cs331 first list adt lecture notes
- python programing an introduction to computer science
- cheat sheet numpy python copy
- python for data a r r a y m a t h e m a t i c s science
- chapter 2 lists arrays and dictionaries
- lists in python stanford university
Related searches
- 549 powerful coaching questions
- cse project ideas
- medical prefix and suffix chart
- prefix root and suffix dictionary
- prefix and suffix examples
- suffix word lists for kids
- suffix and prefix
- prefix and suffix worksheets
- suffix chart printable
- suffix worksheets 4th grade
- prefix and suffix search engine
- prefix and suffix charts