String Comparison in R

String Comparisons

in R Reuben McCreanor

Motivation

R stringdist

An example

References

String Comparison in R

Reuben McCreanor

Stat 521 - Data Mining and Predictive Modeling

Thursday, September 2, 2015

Motivation: Why would you want to compare strings?

String Comparisons

in R Reuben McCreanor

Motivation

R stringdist

An example

References

"No one should ever claim to be a data analyst until he or she has done string manipulation" - Gaston Sanchez

Strings in R are largely lexicographic

String comparisons can be used for: Cleaning dirty data Web search Biomedical research Matching in data frames

R stringdist: How do you compare strings?

String Comparisons

in R Reuben McCreanor

Motivation

R stringdist

An example

References

Stringdist is a package that calculates distances between strings

Adds functionality to R by allowing approximate string matching

Very flexible - allows the user to set what should be considered a match

Key Functions

amatch returns the position of the closest string match aint indicates wether an element approximately matches stringdist computes distances between different strings phonetic translates text into phonetic codes

An example: Using stringdist to match similar words

String Comparisons

in R Reuben McCreanor

Motivation R stringdist An example References

References and further reading

String Comparisons

in R Reuben McCreanor

Motivation

R stringdist

An example

References

Want to know more? Handling and Processing Strings in R by Gaston Sanchez Strings_in_R.pdf

References Relational Operators in R R-manual/R-devel/library/base/html/Comparison.html R Tutorial - Characters r-introduction/basic-data-types/character Package stringdist packages/stringdist/stringdist.pdf

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download