Algorithms - Princeton University

[Pages:66]Algorithms

ROBERT SEDGEWICK | KEVIN WAYNE

Algorithms FOURTH EDITION

ROBERT SEDGEWICK | KEVIN WAYNE

5.1 STRING SORTS

strings in Java key-indexed counting LSD radix sort MSD radix sort 3-way radix quicksort suffix arrays

Algorithms

ROBERT SEDGEWICK | KEVIN WAYNE

5.1 STRING SORTS

strings in Java key-indexed counting LSD radix sort MSD radix sort 3-way radix quicksort suffix arrays

String processing

String. Sequence of characters.

Important fundamental abstraction.

Genomic sequences. Information processing. Communication systems (e.g., email). Programming systems (e.g., Java programs). ...

" The digital information that underlies biochemistry, cell biology, and development can be represented by a simple string of G's, A's, T's and C's. This string is the root data

0

structure of an organism's biology. " -- M. V. Olson

3

The char data type

C char data type. Typically an 8-bit integer.

Supports

7-bit

ASCII. 6.5 Q

Data

Compression

667

Can represent at most 256 characters.

HexDump a bitncoded characul for reference. use the first hex econd hex digit d the character 31 encodes the J, and so forth. so the first hex umbers starting bers 20 and 7F)

0123456789ABCDEF 0 NUL SOH STX ETX EOT ENQ ACK BEL BS HT LF VT FF CR SO SI 1 DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC FS GS RS US 2 SP ! " # $ % & ` ( ) * + , - . / 30123456789:;? 4@ABCDEFGHIJKLMNO 5PQRSTUVWXYZ[\]^_ 6`abcdefghijklmno 7 p q r s t u v w x y z { | } ~ DEL

Hexadecimal to ASCII conversion table

U+0041 U+00E1 U+2202 U+1D50A some Unicode characters

control charac-

aracters are left over from the days when physical devices

lled by ASCII input; the table highlights a few that you

mple SP isJathveaspcahceacrhadraacttaer,tNyUpLeis.thAe n1u6ll -cbhaitrauctners, LigF ned integer.

e-return. Supports original 16-bit Unicode.

ata compressioSnurepqpuiorerstuss 2to1re-boriitenUt onuicr tohdinekin3g.0abo(autwkwardly).

output to include binary encoding of data. BinaryStdIn

4

I Unicode

U+0041

5

The String data type

String data type in Java. Immutable sequence of characters.

Length. Number of characters. Indexing. Get the ith character. Concatenation. Concatenate one string to the end of another.

s.length()

0 1 2 3 4 5 6 7 8 9 10 11 12

s

A T T A C K A T D A W N

s.charAt(3)

s.substring(7, 11)

Fundamental constant-time String operations

6

The String data type: immutability

Q. Why immutable?

A. All the usual reasons.

Can use as keys in symbol table.

Don't need to defensively copy.

Ensures consistent state. Supports concurrency.

Improves security.

public class FileInputStream {

private String filename; public FileInputStream(String filename) {

if (!allowedToReadFile(filename)) throw new SecurityException();

this.filename = filename; } ... }

attacker could bypass security if string type were mutable

7

The String data type: representation

Representation (Java 7). Immutable char[] array + cache of hash.

operation length

indexing concatenation

Java s.length() s.charAt(i)

s + t

running time

1 1 M + N

8

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download