A GENERAL SOLUTION FOR THE PROBLEM OF TNRS: A …



The Taxonomic Name Resolution Service: an online tool for automated standardization of plant namesBrad Boyle1,2*, Nicole Hopkins2,3, Zhenyuan Lu2,4, Juan Antonio Raygoza Garay2,3, Dmitry Mozzherin5, Tony Rees6, Naim Matasci1,2,3, Martha L. Narro2,3, William H. Piel7, Sheldon J. Mckay2,3,4, Sonya Lowry2,3, Chris Freeland8, Robert K. Peet9, Brian J. Enquist1,101Department of Ecology and Evolutionary Biology, University of Arizona Tucson, P.O. Box 210088, Tucson, AZ 85721, USA, 2The iPlant Collaborative, Thomas W. Keating Bioresearch Building,1657 East Helen Street, Tucson, Arizona 85721, USA, 3BIO5 Institute, 1657 East Helen Street, PO Box 210240, Tucson, Arizona 85721-0240, Tucson, AZ, USA, 4Cold Spring Harbor Laboratory, 1 Bungtown Road, Cold Spring Harbor, NY 11724-2202, USA, 57 MBL street, Center for Library and Informatics, Marine Biological Laboratory, 7 MBL Street, Woods Hole, MA 02543, USA, 6Divisional Data Centre, CSIRO Marine and Atmospheric Research, GPO Box 1538, Hobart, Tasmania 7001, Australia, 7Yale-NUS College, 6 College Avenue East, Singapore 138614, 8Missouri Botanical Garden, 4344 Shaw Blvd. | St. Louis, MO 63110, USA, 9Department of Biology, CB 3280, University of North Carolina, Chapel Hill, NC 27599-3280, USA, 10The Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM 87501, USA* Corresponding author: bboyle@email.arizona.eduAbstractBackground: The digitization of biodiversity data is leading to the widespread application of taxon names that are superfluous, ambiguous or incorrect, resulting in mismatched records and inflated species numbers. The ultimate consequences of misspelled names and bad taxonomy are erroneous scientific conclusions and faulty policy decisions. The lack of tools for correcting this ‘names problem’ has become a fundamental obstacle to integrating disparate data sources and advancing the progress of biodiversity science. Results: The TNRS, or Taxonomic Name Resolution Service, is an online application for automated and user-supervised standardization of plant scientific names. The TNRS builds upon and extends existing open-source applications for name parsing and fuzzy matching. Names are standardized against multiple reference taxonomies, including that of Missouri Botanical Garden's Tropicos database. Capable of processing thousands of names in a single operation, the TNRS parses and corrects misspelled names and authorities, standardizes variant spellings, and converts nomenclatural synonyms to accepted names. Family names can be included with species to increase match accuracy and resolve many types of homonyms. Partial matching of higher taxa combined with extraction of annotations, accession numbers and morphospecies allows the TNRS to standardize taxonomy across a broad range of active and legacy datasets. Conclusions: We show how the TNRS can resolve many forms of taxonomic semantic heterogeneity, correct spelling errors and eliminate spurious names. As a result, the TNRS can greatly facilitate the integration of disparate biological datasets. Although the TNRS was developed to aid in standardizing plant names, its underlying algorithms and design can be extended to all organisms and nomenclatural codes. The TNRS is accessible via a web interface at and as a RESTful web service and application programming interface. Source code is available at : biodiversity informatics; database integration; taxonomy; plantsBackgroundThe past two decades have seen an explosive growth of biodiversity databases, providing access to millions of specimen records. The more prominent large databases include compilations of museum records and observations (e.g., GBIF ADDIN CSL_CITATION { "citationItems" : [ { "id" : "ITEM-1", "itemData" : { "URL" : "", "author" : [ { "dropping-particle" : "", "family" : "GBIF", "given" : "", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" } ], "id" : "ITEM-1", "issued" : { "date-parts" : [ [ "0" ] ] }, "title" : "Global Biodiversity Information Facility", "type" : "webpage" }, "uris" : [ "" ] } ], "mendeley" : { "previouslyFormattedCitation" : "[1]" }, "properties" : { "noteIndex" : 0 }, "schema" : "" }[1], Tropicos ADDIN CSL_CITATION { "citationItems" : [ { "id" : "ITEM-1", "itemData" : { "URL" : "", "author" : [ { "dropping-particle" : "", "family" : "Tropicos", "given" : "", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" } ], "id" : "ITEM-1", "issued" : { "date-parts" : [ [ "0" ] ] }, "title" : "Tropicos", "type" : "webpage" }, "uris" : [ "" ] } ], "mendeley" : { "previouslyFormattedCitation" : "[2]" }, "properties" : { "noteIndex" : 0 }, "schema" : "" }[2], REMIB ADDIN CSL_CITATION { "citationItems" : [ { "id" : "ITEM-1", "itemData" : { "URL" : "", "author" : [ { "dropping-particle" : "", "family" : "REMIB", "given" : "", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" } ], "id" : "ITEM-1", "issued" : { "date-parts" : [ [ "0" ] ] }, "title" : "REMIB - Red mundial de informacion sobre biodiversidad", "type" : "webpage" }, "uris" : [ "" ] } ], "mendeley" : { "previouslyFormattedCitation" : "[3]" }, "properties" : { "noteIndex" : 0 }, "schema" : "" }[3], OBIS ADDIN CSL_CITATION { "citationItems" : [ { "id" : "ITEM-1", "itemData" : { "URL" : "", "id" : "ITEM-1", "issued" : { "date-parts" : [ [ "0" ] ] }, "title" : "OBIS", "type" : "webpage" }, "uris" : [ "" ] } ], "mendeley" : { "previouslyFormattedCitation" : "[4]" }, "properties" : { "noteIndex" : 0 }, "schema" : "" }[4], VertNet ADDIN CSL_CITATION { "citationItems" : [ { "id" : "ITEM-1", "itemData" : { "URL" : "", "id" : "ITEM-1", "issued" : { "date-parts" : [ [ "0" ] ] }, "title" : "VertNet", "type" : "webpage" }, "uris" : [ "" ] } ], "mendeley" : { "previouslyFormattedCitation" : "[5]" }, "properties" : { "noteIndex" : 0 }, "schema" : "" }[5], MaNIS ADDIN CSL_CITATION { "citationItems" : [ { "id" : "ITEM-1", "itemData" : { "URL" : "", "id" : "ITEM-1", "issued" : { "date-parts" : [ [ "0" ] ] }, "title" : "MaNIS", "type" : "webpage" }, "uris" : [ "" ] } ], "mendeley" : { "previouslyFormattedCitation" : "[6]" }, "properties" : { "noteIndex" : 0 }, "schema" : "" }[6]), fossil occurrence datasets (The Paleobiology Database ADDIN CSL_CITATION { "citationItems" : [ { "id" : "ITEM-1", "itemData" : { "URL" : "", "id" : "ITEM-1", "issued" : { "date-parts" : [ [ "0" ] ] }, "title" : "The Paleobiology Database", "type" : "webpage" }, "uris" : [ "" ] } ], "mendeley" : { "previouslyFormattedCitation" : "[7]" }, "properties" : { "noteIndex" : 0 }, "schema" : "" }[7]), ecological inventories (VegBank ADDIN CSL_CITATION { "citationItems" : [ { "id" : "ITEM-1", "itemData" : { "author" : [ { "dropping-particle" : "", "family" : "Peet", "given" : "R.K.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Lee", "given" : "M.T.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Jennings", "given" : "M.D.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "D. Faber-Langendoen", "given" : "D.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" } ], "container-title" : "Biodiversity and Ecology", "id" : "ITEM-1", "issued" : { "date-parts" : [ [ "2012" ] ] }, "page" : "233-241", "title" : "VegBank: a permanent, open-access archive for vegetation plot data", "type" : "article-journal", "volume" : "4" }, "uris" : [ "" ] } ], "mendeley" : { "previouslyFormattedCitation" : "[8]" }, "properties" : { "noteIndex" : 0 }, "schema" : "" }[8], SALVIAS ADDIN CSL_CITATION { "citationItems" : [ { "id" : "ITEM-1", "itemData" : { "author" : [ { "dropping-particle" : "", "family" : "Enquist", "given" : "Brian", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Boyle", "given" : "Bradley", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" } ], "container-title" : "Vegetation databases for the 21st century. \u2013 Biodiversity & Ecology 4", "editor" : [ { "dropping-particle" : "", "family" : "Dengler", "given" : "J.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Oldeland", "given" : "J.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Jansen", "given" : "F.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Chytr\u00fd", "given" : "M.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Ewald", "given" : "J.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Finckh", "given" : "M.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Gl\u00f6ckler", "given" : "F.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Lopez-Gonzalez", "given" : "G.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Peet", "given" : "R.K.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Schamin\u00e9e", "given" : "J.H.J.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" } ], "id" : "ITEM-1", "issued" : { "date-parts" : [ [ "2012" ] ] }, "page" : "288-288", "title" : "The SALVIAS vegetation inventory database", "type" : "chapter" }, "uris" : [ "" ] } ], "mendeley" : { "previouslyFormattedCitation" : "[9]" }, "properties" : { "noteIndex" : 0 }, "schema" : "" }[9], USFS FIA database ADDIN CSL_CITATION { "citationItems" : [ { "id" : "ITEM-1", "itemData" : { "author" : [ { "dropping-particle" : "", "family" : "Gray", "given" : "A.N.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Brandeis", "given" : "T.J.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Shaw", "given" : "J.D.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "McWilliams", "given" : "W.H.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Miles", "given" : "P.D.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" } ], "container-title" : "Biodiversity & Ecology", "id" : "ITEM-1", "issued" : { "date-parts" : [ [ "2012" ] ] }, "page" : "225-231", "title" : "Forest Inventory and Analysis Database of the United States of America", "type" : "article-journal", "volume" : "4" }, "uris" : [ "" ] } ], "mendeley" : { "previouslyFormattedCitation" : "[10]" }, "properties" : { "noteIndex" : 0 }, "schema" : "" }[10], Forest Plots Database ADDIN CSL_CITATION { "citationItems" : [ { "id" : "ITEM-1", "itemData" : { "DOI" : "10.1111/j.1654-1103.2011.01312.x", "author" : [ { "dropping-particle" : "", "family" : "Lopez-Gonzalez", "given" : "Gabriela", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Lewis", "given" : "Simon L.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Burkitt", "given" : "Mark", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Phillips", "given" : "Oliver L.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" } ], "container-title" : "Journal of Vegetation Science", "id" : "ITEM-1", "issue" : "4", "issued" : { "date-parts" : [ [ "2011", "8", "6" ] ] }, "page" : "610-613", "title" : ": a web application and research tool to manage and analyse tropical forest plot data", "type" : "article-journal", "volume" : "22" }, "uris" : [ "" ] } ], "mendeley" : { "previouslyFormattedCitation" : "[11]" }, "properties" : { "noteIndex" : 0 }, "schema" : "" }[11], CTFS ADDIN CSL_CITATION { "citationItems" : [ { "id" : "ITEM-1", "itemData" : { "URL" : "", "id" : "ITEM-1", "issued" : { "date-parts" : [ [ "0" ] ] }, "title" : "Center for Tropical Forest Science", "type" : "webpage" }, "uris" : [ "" ] } ], "mendeley" : { "previouslyFormattedCitation" : "[12]" }, "properties" : { "noteIndex" : 0 }, "schema" : "" }[12]; see GIVD ADDIN CSL_CITATION { "citationItems" : [ { "id" : "ITEM-1", "itemData" : { "DOI" : "10.1111/j.1654-1103.2011.01265.x", "author" : [ { "dropping-particle" : "", "family" : "Dengler", "given" : "J\u00fcrgen", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Jansen", "given" : "Florian", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Gl\u00f6ckler", "given" : "Falko", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Peet", "given" : "Robert K.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "C\u00e1ceres", "given" : "Miquel", "non-dropping-particle" : "De", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Chytr\u00fd", "given" : "Milan", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Ewald", "given" : "J\u00f6rg", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Oldeland", "given" : "Jens", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Lopez-Gonzalez", "given" : "Gabriela", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Finckh", "given" : "Manfred", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Mucina", "given" : "Ladislav", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Rodwell", "given" : "John S.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Schamin\u00e9e", "given" : "Joop H. J.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Spencer", "given" : "Nick", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" } ], "container-title" : "Journal of Vegetation Science", "id" : "ITEM-1", "issue" : "4", "issued" : { "date-parts" : [ [ "2011", "8", "6" ] ] }, "page" : "582-597", "title" : "The Global Index of Vegetation-Plot Databases (GIVD): a new resource for vegetation science", "type" : "article-journal", "volume" : "22" }, "uris" : [ "" ] } ], "mendeley" : { "previouslyFormattedCitation" : "[13]" }, "properties" : { "noteIndex" : 0 }, "schema" : "" }[13]), trait measurements (TraitNet ADDIN CSL_CITATION { "citationItems" : [ { "id" : "ITEM-1", "itemData" : { "URL" : "", "id" : "ITEM-1", "issued" : { "date-parts" : [ [ "0" ] ] }, "title" : "TraitNet", "type" : "webpage" }, "uris" : [ "" ] } ], "mendeley" : { "manualFormatting" : "[14], ", "previouslyFormattedCitation" : "[14]" }, "properties" : { "noteIndex" : 0 }, "schema" : "" }ADDIN CSL_CITATION { "citationItems" : [ { "id" : "ITEM-1", "itemData" : { "URL" : "", "id" : "ITEM-1", "issued" : { "date-parts" : [ [ "0" ] ] }, "title" : "TraitNet", "type" : "webpage" }, "uris" : [ "" ] } ], "mendeley" : { "previouslyFormattedCitation" : "[14]" }, "properties" : { "noteIndex" : 0 }, "schema" : "" }[14], TRY ADDIN CSL_CITATION { "citationItems" : [ { "id" : "ITEM-1", "itemData" : { "DOI" : "10.1111/j.1365-2486.2011.02451.x", "author" : [ { "dropping-particle" : "", "family" : "Kattge", "given" : "J.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "D\u00edaz", "given" : "S.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Lavorel", "given" : "S.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Prentice", "given" : "I. C.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Leadley", "given" : "P.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "B\u00f6nisch", "given" : "G.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Garnier", "given" : "E.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Westoby", "given" : "M.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Reich", "given" : "P. B.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Wright", "given" : "I. J.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "J. H. C. Cornelissen", "given" : "", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Violle", "given" : "C.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Harrison", "given" : "S. P.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "van", "family" : "Bodegom", "given" : "P. M.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Reichstein", "given" : "M.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Enquist", "given" : "B. J.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Soudzilovskaia", "given" : "N. A.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Ackerly", "given" : "D. D.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Anand", "given" : "M.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Atkin", "given" : "O.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Bahn", "given" : "M.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Baker", "given" : "T. R.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Baldocchi", "given" : "D.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Bekker", "given" : "R.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Blanco", "given" : "C. C.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Blonder", "given" : "B.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Bond", "given" : "W. J.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Bradstock", "given" : "R.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Bunker", "given" : "D. E.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Casanoves", "given" : "F.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Cavender-Bares", "given" : "J.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Chambers", "given" : "J. Q.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "III", "given" : "F. S. Chapin", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Chave", "given" : "J.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Coomes", "given" : "D.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Cornwell", "given" : "W. K.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Craine", "given" : "J. M.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Dobrin", "given" : "B. H.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Duarte", "given" : "L.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Durka", "given" : "W.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Elser", "given" : "J.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Esser", "given" : "G.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Estiarte", "given" : "M.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Fagan", "given" : "W. F.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Fang", "given" : "J.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Fern\u00e1ndez-M\u00e9ndez", "given" : "F.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Fidelis", "given" : "A.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Finegan", "given" : "B.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Flores", "given" : "O.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Ford", "given" : "H.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Frank", "given" : "D.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Freschet", "given" : "G. T.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Fyllas", "given" : "N. M.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "V.", "family" : "Gallagher", "given" : "R.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Green", "given" : "W. A.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Gutierrez", "given" : "A. G.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Hickler", "given" : "T.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Higgins", "given" : "S. I.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Hodgson", "given" : "J. G.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Jalili", "given" : "A.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Jansen", "given" : "S.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Joly", "given" : "C. A.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Kerkhoff", "given" : "A. J.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Kirkup", "given" : "D.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Kitajima", "given" : "K.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Kleyer", "given" : "M.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Klotz", "given" : "S.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Knops", "given" : "J. M. H.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Kramer", "given" : "K.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "K\u00fchn", "given" : "I.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Kurokawa", "given" : "H.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Laughlin", "given" : "D.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Lee", "given" : "T. D.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Leishman", "given" : "M.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Lens", "given" : "F.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Lenz", "given" : "T.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Lewis", "given" : "S. L.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Lloyd", "given" : "J.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Llusi\u00e0", "given" : "J.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Louault", "given" : "F.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Ma", "given" : "S.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Mahecha", "given" : "M. D.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Manning", "given" : "P.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Massad", "given" : "T.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Medlyn", "given" : "B. E.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Messier", "given" : "J.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Moles", "given" : "A. T.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "M\u00fcller", "given" : "S. C.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Nadrowski", "given" : "K.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Naeem", "given" : "S.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Niinemets", "given" : "\u00dc.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "N\u00f6llert", "given" : "S.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "N\u00fcske", "given" : "A.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Ogaya", "given" : "R.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Oleksyn", "given" : "J.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Onipchenko", "given" : "V. G.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Onoda", "given" : "Y.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Ordo\u00f1ez", "given" : "J.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Overbeck", "given" : "G.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Ozinga", "given" : "W. A.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Pati\u00f1o", "given" : "S.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Paula", "given" : "S.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Pausas", "given" : "J. G.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Pe\u00f1uelas", "given" : "J.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Phillips", "given" : "O. L.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Pillar", "given" : "V.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Poorter", "given" : "H.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Poorter", "given" : "L.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Poschlod", "given" : "P.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Prinzing", "given" : "A.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Proulx", "given" : "R.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Rammig", "given" : "A.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Reinsch", "given" : "S.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Reu", "given" : "B.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Sack", "given" : "L.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Salgado-Negret", "given" : "B.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Sardans", "given" : "J.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Shiodera", "given" : "S.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Shipley", "given" : "B.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Siefert", "given" : "A.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Sosinski", "given" : "E.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Soussana", "given" : "J.-F.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Swaine", "given" : "E.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Swenson", "given" : "N.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Thompson", "given" : "K.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Thornton", "given" : "P.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Waldram", "given" : "M.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Weiher", "given" : "E.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "White", "given" : "M.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "White", "given" : "S.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Wright", "given" : "S. J.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Yguel", "given" : "B.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Zaehle", "given" : "S.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Zanne", "given" : "A. E.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Wirth", "given" : "C.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" } ], "container-title" : "Global Change Biology", "id" : "ITEM-1", "issued" : { "date-parts" : [ [ "2011" ] ] }, "page" : "2905\u20132935", "publisher" : "Wiley Online Library", "title" : "TRY\u2013a global database of plant traits", "type" : "article-journal", "volume" : "17" }, "uris" : [ "" ] } ], "mendeley" : { "previouslyFormattedCitation" : "[15]" }, "properties" : { "noteIndex" : 0 }, "schema" : "" }[15]), molecular sequences (GenBank ADDIN CSL_CITATION { "citationItems" : [ { "id" : "ITEM-1", "itemData" : { "DOI" : "10.1093/nar/gkn723", "abstract" : "GenBank is a comprehensive database that contains publicly available nucleotide sequences for more than 300,000 organisms named at the genus level or lower, obtained primarily through submissions from individual laboratories and batch submissions from large-scale sequencing projects. Most submissions are made using the web-based BankIt or standalone Sequin programs, and accession numbers are assigned by GenBank(R) staff upon receipt. Daily data exchange with the European Molecular Biology Laboratory Nucleotide Sequence Database in Europe and the DNA Data Bank of Japan ensures worldwide coverage. GenBank is accessible through the National Center for Biotechnology Information (NCBI) Entrez retrieval system, which integrates data from the major DNA and protein sequence databases along with taxonomy, genome, mapping, protein structure and domain information, and the biomedical journal literature via PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bimonthly releases and daily updates of the GenBank database are available by FTP. To access GenBank and its related retrieval and analysis services, begin at the NCBI Homepage: ncbi.nlm..", "author" : [ { "dropping-particle" : "", "family" : "Benson", "given" : "Dennis a", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Karsch-Mizrachi", "given" : "Ilene", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Lipman", "given" : "David J", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Ostell", "given" : "James", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Sayers", "given" : "Eric W", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" } ], "container-title" : "Nucleic acids research", "id" : "ITEM-1", "issue" : "Database issue", "issued" : { "date-parts" : [ [ "2009", "1" ] ] }, "page" : "D26-31", "title" : "GenBank.", "type" : "article-journal", "volume" : "37" }, "uris" : [ "" ] } ], "mendeley" : { "previouslyFormattedCitation" : "[16]" }, "properties" : { "noteIndex" : 0 }, "schema" : "" }[16]) and phylogenies (TreeBase ADDIN CSL_CITATION { "citationItems" : [ { "id" : "ITEM-1", "itemData" : { "URL" : "", "id" : "ITEM-1", "issued" : { "date-parts" : [ [ "0" ] ] }, "title" : "TreeBASE", "type" : "webpage" }, "uris" : [ "" ] } ], "mendeley" : { "previouslyFormattedCitation" : "[17]" }, "properties" : { "noteIndex" : 0 }, "schema" : "" }[17]). Collectively, these databases encompass hundreds of thousands of species ADDIN CSL_CITATION { "citationItems" : [ { "id" : "ITEM-1", "itemData" : { "author" : [ { "dropping-particle" : "", "family" : "Thomas", "given" : "Claire", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" } ], "container-title" : "Science", "id" : "ITEM-1", "issue" : "5935", "issued" : { "date-parts" : [ [ "2009" ] ] }, "page" : "1632", "publisher" : "American Association for the Advancement of Science", "title" : "Biodiversity Databases Spread, Prompting Unification Call", "type" : "article-journal", "volume" : "324" }, "uris" : [ "" ] } ], "mendeley" : { "previouslyFormattedCitation" : "[18]" }, "properties" : { "noteIndex" : 0 }, "schema" : "" }[18]. This vast and growing information resource is being used to address fundamental questions in ecology, evolution and systematics ADDIN CSL_CITATION { "citationItems" : [ { "id" : "ITEM-1", "itemData" : { "DOI" : "10.1098/rstb.2003.1439", "abstract" : "Recently, advances in information technology and an increased willingness to share primary biodiversity data are enabling unprecedented access to it. By combining presences of species data with electronic cartography via a number of algorithms, estimating niches of species and their areas of distribution becomes feasible at resolutions one to three orders of magnitude higher than it was possible a few years ago. Some examples of the power of that technique are presented. For the method to work, limitations such as lack of high-quality taxonomic determination, precise georeferencing of the data and availability of high-quality and updated taxonomic treatments of the groups must be overcome. These are discussed, together with comments on the potential of these biodiversity informatics techniques not only for fundamental studies but also as a way for developing countries to apply state of the art bioinformatic methods and large quantities of data, in practical ways, to tackle issues of biodiversity management.", "author" : [ { "dropping-particle" : "", "family" : "Sober\u00f3n", "given" : "Jorge", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Peterson", "given" : "a Townsend", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" } ], "container-title" : "Philosophical transactions of the Royal Society of London. Series B, Biological sciences", "id" : "ITEM-1", "issue" : "1444", "issued" : { "date-parts" : [ [ "2004", "4", "29" ] ] }, "page" : "689-98", "title" : "Biodiversity informatics: managing and applying primary biodiversity data.", "type" : "article-journal", "volume" : "359" }, "uris" : [ "" ] } ], "mendeley" : { "manualFormatting" : "[17", "previouslyFormattedCitation" : "[19]" }, "properties" : { "noteIndex" : 0 }, "schema" : "" }[17, ADDIN CSL_CITATION { "citationItems" : [ { "id" : "ITEM-1", "itemData" : { "DOI" : "10.1111/j.1461-0248.2007.01063.x", "abstract" : "Biodiversity data are rapidly becoming available over the Internet in common formats that promote sharing and exchange. Currently, these data are somewhat problematic, primarily with regard to geographic and taxonomic accuracy, for use in ecological research, natural resources management and conservation decision-making. However, web-based georeferencing tools that utilize best practices and gazetteer databases can be employed to improve geographic data. Taxonomic data quality can be improved through web-enabled valid taxon names databases and services, as well as more efficient mechanisms to return systematic research results and taxonomic misidentification rates back to the biodiversity community. Both of these are under construction. A separate but related challenge will be developing web-based visualization and analysis tools for tracking biodiversity change. Our aim was to discuss how such tools, combined with data of enhanced quality, will help transform today's portals to raw biodiversity data into nexuses of collaborative creation and sharing of biodiversity knowledge.", "author" : [ { "dropping-particle" : "", "family" : "Guralnick", "given" : "Robert P", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Hill", "given" : "Andrew W", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Lane", "given" : "Meredith", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" } ], "container-title" : "Ecology letters", "id" : "ITEM-1", "issue" : "8", "issued" : { "date-parts" : [ [ "2007", "8" ] ] }, "page" : "663-72", "title" : "Towards a collaborative, global infrastructure for biodiversity assessment.", "type" : "article-journal", "volume" : "10" }, "uris" : [ "" ] } ], "mendeley" : { "manualFormatting" : "18]", "previouslyFormattedCitation" : "[20]" }, "properties" : { "noteIndex" : 0 }, "schema" : "" }18] and to explore patterns in the distribution of organismal form, function and diversity at previously impossible temporal and spatial scales ADDIN CSL_CITATION { "citationItems" : [ { "id" : "ITEM-1", "itemData" : { "author" : [ { "dropping-particle" : "", "family" : "Funk", "given" : "V A", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Zermoglio", "given" : "M F", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Nasir", "given" : "N", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" } ], "container-title" : "Biodiversity and Conservation", "id" : "ITEM-1", "issue" : "6", "issued" : { "date-parts" : [ [ "1999", "6" ] ] }, "page" : "727-751", "title" : "Testing the use of specimen collection data and GIS in biodiversity exploration and conservation decision making in Guyana", "type" : "article-journal", "volume" : "8" }, "uris" : [ "" ] } ], "mendeley" : { "manualFormatting" : "[21-25]", "previouslyFormattedCitation" : "[21]" }, "properties" : { "noteIndex" : 0 }, "schema" : "" }[21-25]. Researchers are only beginning to explore the potential applications of these global biodiversity data sources for agriculture ADDIN CSL_CITATION { "citationItems" : [ { "id" : "ITEM-1", "itemData" : { "author" : [ { "dropping-particle" : "", "family" : "Frese", "given" : "L", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" } ], "container-title" : "Crop wild relative", "id" : "ITEM-1", "issue" : "6", "issued" : { "date-parts" : [ [ "2008" ] ] }, "title" : "Towards improved in situ management of Europe\u2019s crop wild relatives", "type" : "article-journal", "volume" : "3627" }, "uris" : [ "" ] } ], "mendeley" : { "previouslyFormattedCitation" : "[22]" }, "properties" : { "noteIndex" : 0 }, "schema" : "" }[22], plant products research ADDIN CSL_CITATION { "citationItems" : [ { "id" : "ITEM-1", "itemData" : { "DOI" : "10.1016/j.jep.2011.03.029", "abstract" : "Ethnobotanically driven drug-discovery programs include data related to many aspects of the preparation of botanical medicines, from initial plant collection to chemical extraction and fractionation. The Traditional Medicine Collection Tracking System (TM-CTS) was created to organize and store data of this type for an international collaborative project involving the systematic evaluation of commonly used Traditional Chinese Medicinal plants.", "author" : [ { "dropping-particle" : "", "family" : "Harris", "given" : "Eric S J", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Erickson", "given" : "Sean D", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Tolopko", "given" : "Andrew N", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Cao", "given" : "Shugeng", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Craycroft", "given" : "Jane a", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Scholten", "given" : "Robert", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Fu", "given" : "Yanling", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Wang", "given" : "Wenquan", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Liu", "given" : "Yong", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Zhao", "given" : "Zhongzhen", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Clardy", "given" : "Jon", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Shamu", "given" : "Caroline E", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Eisenberg", "given" : "David M", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" } ], "container-title" : "Journal of ethnopharmacology", "id" : "ITEM-1", "issue" : "2", "issued" : { "date-parts" : [ [ "2011", "5", "17" ] ] }, "page" : "590-3", "publisher" : "Elsevier Ireland Ltd", "title" : "Traditional Medicine Collection Tracking System (TM-CTS): a database for ethnobotanically driven drug-discovery programs.", "type" : "article-journal", "volume" : "135" }, "uris" : [ "" ] } ], "mendeley" : { "previouslyFormattedCitation" : "[23]" }, "properties" : { "noteIndex" : 0 }, "schema" : "" }[23] and conservation biology ADDIN CSL_CITATION { "citationItems" : [ { "id" : "ITEM-1", "itemData" : { "DOI" : "10.1016/j.tplants.2009.08.007", "abstract" : "Primary baseline data on taxonomy and species distribution, and its integration with environmental variables, has a valuable role to play in achieving internationally recognised targets for plant diversity conservation, such as the Global Strategy for Plant Conservation. The importance of primary baseline data and the role of biodiversity informatics in linking these data to other environmental variables are discussed. The need to maintain digital resources and make them widely accessible is an additional requirement of institutions who already collect and maintain this baseline data. The lack of resources in many species-rich areas to gather these data and make them widely accessible needs to be addressed if the full benefit of biodiversity informatics on plant conservation is to be realised.", "author" : [ { "dropping-particle" : "", "family" : "Paton", "given" : "Alan", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" } ], "container-title" : "Trends in plant science", "id" : "ITEM-1", "issue" : "11", "issued" : { "date-parts" : [ [ "2009", "11" ] ] }, "page" : "629-37", "title" : "Biodiversity informatics and the plant conservation baseline.", "type" : "article-journal", "volume" : "14" }, "uris" : [ "" ] } ], "mendeley" : { "previouslyFormattedCitation" : "[24]" }, "properties" : { "noteIndex" : 0 }, "schema" : "" }[24]. Integration of such large, disparate, and heterogeneous datasets has involved overcoming numerous challenges of data exchange, interoperability, and scaling ADDIN CSL_CITATION { "citationItems" : [ { "id" : "ITEM-1", "itemData" : { "DOI" : "10.1126/science.289.5488.2312", "author" : [ { "dropping-particle" : "", "family" : "Edwards", "given" : "J. L.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" } ], "container-title" : "Science", "id" : "ITEM-1", "issue" : "5488", "issued" : { "date-parts" : [ [ "2000", "9", "29" ] ] }, "page" : "2312-2314", "title" : "Interoperability of Biodiversity Databases: Biodiversity Information on Every Desktop", "type" : "article-journal", "volume" : "289" }, "uris" : [ "" ] } ], "mendeley" : { "manualFormatting" : "[18, 29]", "previouslyFormattedCitation" : "[25]" }, "properties" : { "noteIndex" : 0 }, "schema" : "" }[18, 29]. Despite considerable progress ADDIN CSL_CITATION { "citationItems" : [ { "id" : "ITEM-1", "itemData" : { "DOI" : "10.1111/j.1461-0248.2007.01063.x", "abstract" : "Biodiversity data are rapidly becoming available over the Internet in common formats that promote sharing and exchange. Currently, these data are somewhat problematic, primarily with regard to geographic and taxonomic accuracy, for use in ecological research, natural resources management and conservation decision-making. However, web-based georeferencing tools that utilize best practices and gazetteer databases can be employed to improve geographic data. Taxonomic data quality can be improved through web-enabled valid taxon names databases and services, as well as more efficient mechanisms to return systematic research results and taxonomic misidentification rates back to the biodiversity community. Both of these are under construction. A separate but related challenge will be developing web-based visualization and analysis tools for tracking biodiversity change. Our aim was to discuss how such tools, combined with data of enhanced quality, will help transform today's portals to raw biodiversity data into nexuses of collaborative creation and sharing of biodiversity knowledge.", "author" : [ { "dropping-particle" : "", "family" : "Guralnick", "given" : "Robert P", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Hill", "given" : "Andrew W", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Lane", "given" : "Meredith", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" } ], "container-title" : "Ecology letters", "id" : "ITEM-1", "issue" : "8", "issued" : { "date-parts" : [ [ "2007", "8" ] ] }, "page" : "663-72", "title" : "Towards a collaborative, global infrastructure for biodiversity assessment.", "type" : "article-journal", "volume" : "10" }, "uris" : [ "" ] } ], "mendeley" : { "previouslyFormattedCitation" : "[20]" }, "properties" : { "noteIndex" : 0 }, "schema" : "" }[20], however, one critical challenge remains largely unsolved: the correction and standardization of taxonomic names in scientific data and literature.Incorrect, ambiguous or synonymous taxon names present a fundamental problem for the study of comparative biology and biodiversity ADDIN CSL_CITATION { "citationItems" : [ { "id" : "ITEM-1", "itemData" : { "author" : [ { "dropping-particle" : "", "family" : "Dayrat", "given" : "Beno\u00eet", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" } ], "container-title" : "Biological Journal of the Linnean Society", "id" : "ITEM-1", "issue" : "3", "issued" : { "date-parts" : [ [ "2005", "6", "24" ] ] }, "page" : "407-415", "title" : "Towards integrative taxonomy", "type" : "article-journal", "volume" : "85" }, "uris" : [ "" ] } ], "mendeley" : { "previouslyFormattedCitation" : "[26]" }, "properties" : { "noteIndex" : 0 }, "schema" : "" }[26]. Ecological studies that encompass large numbers of species, conservation decisions based on data from many sources, and phylogenetic analyses linking sequence data to phenotypic traits all require accurate matching of species identities among datasets. If uncorrected, lack of standardization of species names can result in mismatched observations and inflated measures of species richness, leading to erroneous scientific conclusions, faulty conservation policy, and an inability to make reliable predictions across space and time ADDIN CSL_CITATION { "citationItems" : [ { "id" : "ITEM-1", "itemData" : { "DOI" : "10.1579/0044-7447(2008)37", "author" : [ { "dropping-particle" : "", "family" : "Bortolus", "given" : "A.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" } ], "container-title" : "AMBIO: A Journal of the Human Environment", "id" : "ITEM-1", "issue" : "2", "issued" : { "date-parts" : [ [ "2008" ] ] }, "page" : "114\u2013118", "publisher" : "BioOne", "title" : "Error cascades in the biological sciences: the unwanted consequences of using bad taxonomy in ecology", "type" : "article-journal", "volume" : "37" }, "uris" : [ "" ] } ], "mendeley" : { "previouslyFormattedCitation" : "[27]" }, "properties" : { "noteIndex" : 0 }, "schema" : "" }[27]. Although progress has been made toward developing an authoritative global taxonomy (Global Names ADDIN CSL_CITATION { "citationItems" : [ { "id" : "ITEM-1", "itemData" : { "URL" : "", "id" : "ITEM-1", "issued" : { "date-parts" : [ [ "0" ] ] }, "title" : "Global Names", "type" : "webpage" }, "uris" : [ "" ] } ], "mendeley" : { "previouslyFormattedCitation" : "[28]" }, "properties" : { "noteIndex" : 0 }, "schema" : "" }[28], The Plant List ADDIN CSL_CITATION { "citationItems" : [ { "id" : "ITEM-1", "itemData" : { "URL" : "", "id" : "ITEM-1", "issued" : { "date-parts" : [ [ "0" ] ] }, "title" : "The Plant List", "type" : "webpage" }, "uris" : [ "" ] } ], "mendeley" : { "previouslyFormattedCitation" : "[29]" }, "properties" : { "noteIndex" : 0 }, "schema" : "" }[29]), the growing availability of digitized sources of names (International Plant Names Index ADDIN CSL_CITATION { "citationItems" : [ { "id" : "ITEM-1", "itemData" : { "URL" : "", "id" : "ITEM-1", "issued" : { "date-parts" : [ [ "0" ] ] }, "title" : "International Plant Names Index", "type" : "webpage" }, "uris" : [ "" ] } ], "mendeley" : { "manualFormatting" : "[32", "previouslyFormattedCitation" : "[30]" }, "properties" : { "noteIndex" : 0 }, "schema" : "" }[32], Global Names [ADDIN CSL_CITATION { "citationItems" : [ { "id" : "ITEM-1", "itemData" : { "URL" : "", "id" : "ITEM-1", "issued" : { "date-parts" : [ [ "0" ] ] }, "title" : "Global Names", "type" : "webpage" }, "uris" : [ "" ] } ], "mendeley" : { "manualFormatting" : "30", "previouslyFormattedCitation" : "[28]" }, "properties" : { "noteIndex" : 0 }, "schema" : "" }30], Tropicos [ADDIN CSL_CITATION { "citationItems" : [ { "id" : "ITEM-1", "itemData" : { "URL" : "", "author" : [ { "dropping-particle" : "", "family" : "Tropicos", "given" : "", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" } ], "id" : "ITEM-1", "issued" : { "date-parts" : [ [ "0" ] ] }, "title" : "Tropicos", "type" : "webpage" }, "uris" : [ "" ] } ], "mendeley" : { "manualFormatting" : "2", "previouslyFormattedCitation" : "[2]" }, "properties" : { "noteIndex" : 0 }, "schema" : "" }2], ZooBank [ADDIN CSL_CITATION { "citationItems" : [ { "id" : "ITEM-1", "itemData" : { "URL" : "", "id" : "ITEM-1", "issued" : { "date-parts" : [ [ "0" ] ] }, "title" : "ZooBank", "type" : "webpage" }, "uris" : [ "" ] } ], "mendeley" : { "manualFormatting" : "33", "previouslyFormattedCitation" : "[31]" }, "properties" : { "noteIndex" : 0 }, "schema" : "" }33], UBio [ADDIN CSL_CITATION { "citationItems" : [ { "id" : "ITEM-1", "itemData" : { "URL" : "", "id" : "ITEM-1", "issued" : { "date-parts" : [ [ "0" ] ] }, "title" : "UBio", "type" : "webpage" }, "uris" : [ "" ] } ], "mendeley" : { "manualFormatting" : "34", "previouslyFormattedCitation" : "[32]" }, "properties" : { "noteIndex" : 0 }, "schema" : "" }34], Encyclopedia of Life [ADDIN CSL_CITATION { "citationItems" : [ { "id" : "ITEM-1", "itemData" : { "URL" : "", "id" : "ITEM-1", "issued" : { "date-parts" : [ [ "0" ] ] }, "title" : "Encyclopedia of Life", "type" : "webpage" }, "uris" : [ "" ] } ], "mendeley" : { "manualFormatting" : "35", "previouslyFormattedCitation" : "[33]" }, "properties" : { "noteIndex" : 0 }, "schema" : "" }35], Integrated Taxonomic Information System [ADDIN CSL_CITATION { "citationItems" : [ { "id" : "ITEM-1", "itemData" : { "URL" : "", "author" : [ { "dropping-particle" : "", "family" : "ITIS", "given" : "", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" } ], "id" : "ITEM-1", "issued" : { "date-parts" : [ [ "2009" ] ] }, "title" : "Integrated Taxonomic Information System (ITIS)", "type" : "webpage" }, "uris" : [ "" ] } ], "mendeley" : { "manualFormatting" : "36", "previouslyFormattedCitation" : "[34]" }, "properties" : { "noteIndex" : 0 }, "schema" : "" }36], Catalogue of Life [ADDIN CSL_CITATION { "citationItems" : [ { "id" : "ITEM-1", "itemData" : { "URL" : "", "id" : "ITEM-1", "issued" : { "date-parts" : [ [ "0" ] ] }, "title" : "Catalogue of Life", "type" : "webpage" }, "uris" : [ "" ] } ], "mendeley" : { "manualFormatting" : "37]", "previouslyFormattedCitation" : "[35]" }, "properties" : { "noteIndex" : 0 }, "schema" : "" }37]), identifiers (Global Names ADDIN CSL_CITATION { "citationItems" : [ { "id" : "ITEM-1", "itemData" : { "URL" : "", "id" : "ITEM-1", "issued" : { "date-parts" : [ [ "0" ] ] }, "title" : "Global Names", "type" : "webpage" }, "uris" : [ "" ] } ], "mendeley" : { "manualFormatting" : "[30", "previouslyFormattedCitation" : "[28]" }, "properties" : { "noteIndex" : 0 }, "schema" : "" }[30], UBio [ADDIN CSL_CITATION { "citationItems" : [ { "id" : "ITEM-1", "itemData" : { "URL" : "", "id" : "ITEM-1", "issued" : { "date-parts" : [ [ "0" ] ] }, "title" : "ZooBank", "type" : "webpage" }, "uris" : [ "" ] } ], "mendeley" : { "manualFormatting" : "33", "previouslyFormattedCitation" : "[31]" }, "properties" : { "noteIndex" : 0 }, "schema" : "" }33], ZooBank [ADDIN CSL_CITATION { "citationItems" : [ { "id" : "ITEM-1", "itemData" : { "URL" : "", "id" : "ITEM-1", "issued" : { "date-parts" : [ [ "0" ] ] }, "title" : "UBio", "type" : "webpage" }, "uris" : [ "" ] } ], "mendeley" : { "manualFormatting" : "34", "previouslyFormattedCitation" : "[32]" }, "properties" : { "noteIndex" : 0 }, "schema" : "" }34]) and taxonomic opinion (Tropicos ADDIN CSL_CITATION { "citationItems" : [ { "id" : "ITEM-1", "itemData" : { "URL" : "", "author" : [ { "dropping-particle" : "", "family" : "Tropicos", "given" : "", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" } ], "id" : "ITEM-1", "issued" : { "date-parts" : [ [ "0" ] ] }, "title" : "Tropicos", "type" : "webpage" }, "uris" : [ "" ] } ], "mendeley" : { "manualFormatting" : "[2", "previouslyFormattedCitation" : "[2]" }, "properties" : { "noteIndex" : 0 }, "schema" : "" }[2], The Plant List [ADDIN CSL_CITATION { "citationItems" : [ { "id" : "ITEM-1", "itemData" : { "URL" : "", "id" : "ITEM-1", "issued" : { "date-parts" : [ [ "0" ] ] }, "title" : "The Plant List", "type" : "webpage" }, "uris" : [ "" ] } ], "mendeley" : { "manualFormatting" : "31]", "previouslyFormattedCitation" : "[29]" }, "properties" : { "noteIndex" : 0 }, "schema" : "" }31]), has yet to provide a solution to the rapid accumulation of non-standardized names in the scientific literature and data repositories. Recent applications for the automated recognition of taxon names ADDIN CSL_CITATION { "citationItems" : [ { "id" : "ITEM-1", "itemData" : { "DOI" : "10.1186/1471-2105-11-85", "abstract" : "The task of recognizing and identifying species names in biomedical literature has recently been regarded as critical for a number of applications in text and data mining, including gene name recognition, species-specific document retrieval, and semantic enrichment of biomedical articles.", "author" : [ { "dropping-particle" : "", "family" : "Gerner", "given" : "Martin", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Nenadic", "given" : "Goran", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Bergman", "given" : "Casey M", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" } ], "container-title" : "BMC bioinformatics", "id" : "ITEM-1", "issued" : { "date-parts" : [ [ "2010", "1" ] ] }, "page" : "85", "title" : "LINNAEUS: a species name identification system for biomedical literature.", "type" : "article-journal", "volume" : "11" }, "uris" : [ "" ] } ], "mendeley" : { "previouslyFormattedCitation" : "[36]" }, "properties" : { "noteIndex" : 0 }, "schema" : "" }[36] have accelerated the digitization of biodiversity literature ADDIN CSL_CITATION { "citationItems" : [ { "id" : "ITEM-1", "itemData" : { "DOI" : "10.1177/0340035208102032", "ISBN" : "0340035208102", "author" : [ { "dropping-particle" : "", "family" : "Gwinn", "given" : "N. E.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Rinaldo", "given" : "C.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" } ], "container-title" : "IFLA Journal", "id" : "ITEM-1", "issue" : "1", "issued" : { "date-parts" : [ [ "2009", "3", "1" ] ] }, "page" : "25-34", "title" : "The Biodiversity Heritage Library: sharing biodiversity literature with the world", "type" : "article-journal", "volume" : "35" }, "uris" : [ "" ] } ], "mendeley" : { "previouslyFormattedCitation" : "[37]" }, "properties" : { "noteIndex" : 0 }, "schema" : "" }[37]. Unfortunately, the inability of these applications to recognize and correct ambiguous or erroneous scientific names means they fall short of meeting the needs of researchers. Combining large datasets from different sources requires careful standardization of hundreds or thousands of taxon names—a task that must be performed manually or with ad hoc scripting, resulting in duplication of effort and propagation of error. In short, for much of the scientific community, the lack of automated tools and standardized workflows for correcting taxonomic names is a major impediment to conducting synthetic science with heterogeneous and disparate sources of biodiversity data ADDIN CSL_CITATION { "citationItems" : [ { "id" : "ITEM-1", "itemData" : { "abstract" : "Wood density is a crucial variable in carbon accounting programs of both secondary and old-growth tropical forests. It also is the best single descriptor of wood: it correlates with numerous morphological, mechanical, physiological, and ecological properties. To explore the extent to which wood density could be estimated for rare or poorly censused taxa, and possible sources of variation in this trait, we analyzed regional, taxonomic, and phylogenetic variation in wood density among 2456 tree species from Central and South America. Wood density varied over more than one order of magnitude across species, with an overall mean of 0.645 g/cm3. Our geographical analysis showed significant decreases in wood density with increasing altitude and significant differences among low-altitude geographical regions: wet forests of Central America and western Amazonia have significantly lower mean wood density than dry forests of Central and South America, eastern and central Amazonian forests, and the Atlantic forests of Brazil; and eastern Amazonian forests have lower wood densities than the dry forests and the Atlantic forest. A nested analysis of variance showed that 74% of the species-level wood density variation was explained at the genus level, 34% at the Angiosperm Phylogeny Group (APG) family level, and 19% at the APG order level. This indicates that genus-level means give reliable approximations of values of species, except in a few hypervariable genera. We also studied which evolutionary shifts in wood density occurred in the phylogeny of seed plants using a composite phylogenetic tree. Major changes were observed at deep nodes (Eurosid 1), and also in more recent divergences (for instance in the Rhamnoids, Simaroubaceae, and Anacardiaceae). Our unprecedented wood density data set yields consistent guidelines for estimating wood densities when species-level information is lacking and should significantly reduce error in Central and South American carbon accounting programs.", "author" : [ { "dropping-particle" : "", "family" : "Chave", "given" : "J\u00e9r\u00f4me", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Muller-Landau", "given" : "Helene C", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Baker", "given" : "Timothy R", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Easdale", "given" : "Tom\u00e1s a", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Steege", "given" : "Hans", "non-dropping-particle" : "ter", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Webb", "given" : "Campbell O", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" } ], "container-title" : "Ecological applications : a publication of the Ecological Society of America", "id" : "ITEM-1", "issue" : "6", "issued" : { "date-parts" : [ [ "2006", "12" ] ] }, "page" : "2356-67", "title" : "Regional and phylogenetic variation of wood density across 2456 Neotropical tree species.", "type" : "article-journal", "volume" : "16" }, "uris" : [ "" ] } ], "mendeley" : { "previouslyFormattedCitation" : "[38]" }, "properties" : { "noteIndex" : 0 }, "schema" : "" }[38].How widespread is taxonomic error? A recent study of New World plant distributions and species richness ADDIN CSL_CITATION { "citationItems" : [ { "id" : "ITEM-1", "itemData" : { "author" : [ { "dropping-particle" : "", "family" : "Weiser", "given" : "M D", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Enquist", "given" : "B J", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Boyle", "given" : "B", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Killeen", "given" : "T J", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "J\ufffdrgensen", "given" : "P M", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Fonseca", "given" : "G", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Jennings", "given" : "M D", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Kerkhoff", "given" : "A J", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Lacher Thomas E.", "given" : "J", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Monteagudo", "given" : "A", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "N\ufffd\ufffdez Vargas", "given" : "M P", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Phillips", "given" : "O L", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Swenson", "given" : "N G", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "V\ufffdsquez Mart\ufffdnez", "given" : "R", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" } ], "container-title" : "Global Ecology and Biogeography", "id" : "ITEM-1", "issued" : { "date-parts" : [ [ "2007" ] ] }, "page" : "679-688", "title" : "Latitudinal patterns of range size and species richness of New World woody plants", "type" : "article-journal", "volume" : "16" }, "uris" : [ "" ] } ], "mendeley" : { "previouslyFormattedCitation" : "[39]" }, "properties" : { "noteIndex" : 0 }, "schema" : "" }[39] illustrates the severity of the problem. Compilation of 308,000 geo-referenced plant observations from 51 digitized sources of herbarium specimens and forest inventories resulted in 22,100 unique species names; after correcting misspellings and updating synonymous names, that total was reduced to 12,980 accepted species. Thus, over 42% of the names in the original data were erroneous, obsolete, or otherwise inconsistent with currently accepted names. Uncritical use of the original, uncorrected taxon names would have grossly inflated species richness and led to distorted, possibly biased distributional patterns due to spurious species with artificially small ranges. At best, erroneous taxon names limit the usefulness of the data they mislabel by preventing linkages among observations of the same organism; at worst, they represent an insidious source of error. Misspelled names are just one component of the larger problem of taxonomic semantic heterogeneity ADDIN CSL_CITATION { "citationItems" : [ { "id" : "ITEM-1", "itemData" : { "DOI" : "10.1017/S147720000800282X", "author" : [ { "dropping-particle" : "", "family" : "Franz", "given" : "N.M.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Peet", "given" : "R.K.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" } ], "container-title" : "Systematics and Biodiversity", "id" : "ITEM-1", "issue" : "1", "issued" : { "date-parts" : [ [ "2009", "3" ] ] }, "page" : "5-20", "title" : "Perspectives: Towards a language for mapping relationships among taxonomic concepts", "type" : "article-journal", "volume" : "7" }, "uris" : [ "" ] } ], "mendeley" : { "previouslyFormattedCitation" : "[40]" }, "properties" : { "noteIndex" : 0 }, "schema" : "" }[40]. Such ambiguity can arise for a number of reasons: (1) misspellings, vernacular variants, and lexical variants (different ways of writing the same name); (2) homotypic synonyms (sets of different scientific names based on the same type specimen and representing changes in genus classification or technical changes such as substitute names, that objectively refer to the same taxon); (3) heterotypic synonyms (names that may or may not refer to the same taxon, depending on expert opinion); (4) homonyms (identical names that refer to different taxa); and (5) differing taxonomic concepts (narrower or broader interpretations of taxa represented by the same name and authority ADDIN CSL_CITATION { "citationItems" : [ { "id" : "ITEM-1", "itemData" : { "DOI" : "10.1017/S147720000800282X", "author" : [ { "dropping-particle" : "", "family" : "Franz", "given" : "N.M.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Peet", "given" : "R.K.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" } ], "container-title" : "Systematics and Biodiversity", "id" : "ITEM-1", "issue" : "1", "issued" : { "date-parts" : [ [ "2009", "3" ] ] }, "page" : "5-20", "title" : "Perspectives: Towards a language for mapping relationships among taxonomic concepts", "type" : "article-journal", "volume" : "7" }, "uris" : [ "" ] } ], "mendeley" : { "previouslyFormattedCitation" : "[40]" }, "properties" : { "noteIndex" : 0 }, "schema" : "" }[40]). While automated and semi-automated applications can frequently address the first and second sources of error, the third, fourth and fifth present significantly more difficult challenges. For example, resolution of complex, or pro parte, synonyms (for example, a species which was split into two or more species) requires additional information such as when and where the name was used. Disambiguating homonyms requires information on higher taxa such as family or kingdom (although homonyms in the same family can only be distinguished by the authority portions of the scientific names). Even if a name is correctly resolved to an accepted taxon, the exact circumscription of that taxon can vary from expert to expert; such taxon concepts are not easily or precisely communicated by names alone ADDIN CSL_CITATION { "citationItems" : [ { "id" : "ITEM-1", "itemData" : { "DOI" : "10.1017/S147720000800282X", "author" : [ { "dropping-particle" : "", "family" : "Franz", "given" : "N.M.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Peet", "given" : "R.K.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" } ], "container-title" : "Systematics and Biodiversity", "id" : "ITEM-1", "issue" : "1", "issued" : { "date-parts" : [ [ "2009", "3" ] ] }, "page" : "5-20", "title" : "Perspectives: Towards a language for mapping relationships among taxonomic concepts", "type" : "article-journal", "volume" : "7" }, "uris" : [ "" ] } ], "mendeley" : { "previouslyFormattedCitation" : "[40]" }, "properties" : { "noteIndex" : 0 }, "schema" : "" }[40]. While no automated system can perfectly resolve all the kinds of taxonomic problems listed, a service that corrects variant and erroneous spellings, disambiguates homonyms by means of higher taxonomic filtering, and updates simple synonyms with reference to authoritative taxonomic sources would go a long way toward solving the "names problem". Here we present such a solution, the Taxonomic Name Resolution Service.ImplementationOverviewThe Taxonomic Name Resolution Service, or TNRS, is an application for automated and user-supervised correction and standardization of plant taxonomic names. Developed by the iPlant Collaborative ADDIN CSL_CITATION { "citationItems" : [ { "id" : "ITEM-1", "itemData" : { "DOI" : "10.3389/fpls.2011.00034", "author" : [ { "dropping-particle" : "", "family" : "Goff", "given" : "Stephen A.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Vaughn", "given" : "Matthew", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "McKay", "given" : "Sheldon", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Lyons", "given" : "Eric", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Stapleton", "given" : "Ann E.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Gessler", "given" : "Damian", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Matasci", "given" : "Naim", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Wang", "given" : "Liya", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Hanlon", "given" : "Matthew", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Lenards", "given" : "Andrew", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Muir", "given" : "Andy", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Merchant", "given" : "Nirav", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Lowry", "given" : "Sonya", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Mock", "given" : "Stephen", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Helmke", "given" : "Matthew", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Kubach", "given" : "Adam", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Narro", "given" : "Martha", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Hopkins", "given" : "Nicole", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Micklos", "given" : "David", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Hilgert", "given" : "Uwe", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Gonzales", "given" : "Michael", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Jordan", "given" : "Chris", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Skidmore", "given" : "Edwin", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Dooley", "given" : "Rion", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Cazes", "given" : "John", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "McLay", "given" : "Robert", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Lu", "given" : "Zhenyuan", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Pasternak", "given" : "Shiran", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Koesterke", "given" : "Lars", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Piel", "given" : "William H.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Grene", "given" : "Ruth", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Noutsos", "given" : "Christos", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Gendler", "given" : "Karla", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Feng", "given" : "Xin", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Tang", "given" : "Chunlao", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Lent", "given" : "Monica", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Kim", "given" : "Seung-Jin", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Kvilekval", "given" : "Kristian", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Manjunath", "given" : "B. S.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Tannen", "given" : "Val", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Stamatakis", "given" : "Alexandros", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Sanderson", "given" : "Michael", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Welch", "given" : "Stephen M.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Cranston", "given" : "Karen A.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Soltis", "given" : "Pamela", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Soltis", "given" : "Doug", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "O\u2019Meara", "given" : "Brian", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Ane", "given" : "Cecile", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Brutnell", "given" : "Tom", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Kleibenstein", "given" : "Daniel J.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "White", "given" : "Jeffery W.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Leebens-Mack", "given" : "James", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Donoghue", "given" : "Michael J.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Spalding", "given" : "Edgar P.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Vision", "given" : "Todd J.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Myers", "given" : "Christopher R.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Lowenthal", "given" : "David", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Enquist", "given" : "Brian J.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Boyle", "given" : "Brad", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Akoglu", "given" : "Ali", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Andrews", "given" : "Greg", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Ram", "given" : "Sudha", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Ware", "given" : "Doreen", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Stein", "given" : "Lincoln", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Stanzione", "given" : "Dan", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" } ], "container-title" : "Frontiers in Plant Science", "id" : "ITEM-1", "issued" : { "date-parts" : [ [ "2011" ] ] }, "page" : "1-16", "title" : "The iPlant collaborative : cyberinfrastructure for plant biology", "type" : "article-journal", "volume" : "2" }, "uris" : [ "" ] } ], "mendeley" : { "previouslyFormattedCitation" : "[41]" }, "properties" : { "noteIndex" : 0 }, "schema" : "" }[41] as a collaboration between the iPlant Tree of Life project ADDIN CSL_CITATION { "citationItems" : [ { "id" : "ITEM-1", "itemData" : { "URL" : "", "id" : "ITEM-1", "issued" : { "date-parts" : [ [ "0" ] ] }, "title" : "The iPlant Tree of Life Project", "type" : "webpage" }, "uris" : [ "" ] } ], "mendeley" : { "previouslyFormattedCitation" : "[42]" }, "properties" : { "noteIndex" : 0 }, "schema" : "" }[42] and the Botanical Information and Ecology Network ADDIN CSL_CITATION { "citationItems" : [ { "id" : "ITEM-1", "itemData" : { "URL" : "", "author" : [ { "dropping-particle" : "", "family" : "BIEN", "given" : "", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" } ], "id" : "ITEM-1", "issued" : { "date-parts" : [ [ "0" ] ] }, "title" : "The Botanical Information and Ecology Network", "type" : "webpage" }, "uris" : [ "" ] } ], "mendeley" : { "previouslyFormattedCitation" : "[43]" }, "properties" : { "noteIndex" : 0 }, "schema" : "" }[43], the TNRS standardizes names according to one or more authoritative taxonomic sources. Capable of processing thousands of names in a single batch operation, the TNRS detects likely misspelled taxon names, transforms names and authorities to a single canonical form, converts synonyms to accepted names, discriminates among many types of homonyms, and detects and flags ambiguous results. The TNRS also handles features peculiar to both ecological data (such as morphospecies and partial identifications) and phylogenetic data (such as embedded accession codes). The TNRS is accessible both as a web service and a user-friendly web interface.Four core principles guided the development of the TNRS. First, use existing sources of high-quality, digitized taxonomy that provide information on synonymy in addition to names. Second, build on existing applications whenever possible. Third, use Open Source tools and adhere to Open Source principles ADDIN CSL_CITATION { "citationItems" : [ { "id" : "ITEM-1", "itemData" : { "URL" : "", "accessed" : { "date-parts" : [ [ "2011", "8", "22" ] ] }, "author" : [ { "dropping-particle" : "", "family" : "Tiemann", "given" : "Michael", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" } ], "id" : "ITEM-1", "issued" : { "date-parts" : [ [ "2006" ] ] }, "title" : "History of the OSI", "type" : "webpage" }, "uris" : [ "" ] } ], "mendeley" : { "previouslyFormattedCitation" : "[44]" }, "properties" : { "noteIndex" : 0 }, "schema" : "" }[44], including public release of all source code. Fourth, provide a generalizable solution extendable to other organisms and nomenclatural codes—not just plants.Taxonomic sourcesThe TNRS resolves names against a local cache of external taxonomic sources (see The TNRS database, below). Currently, the default taxonomic sources used by the TNRS are the Missouri Botanical Garden's Tropicos database ADDIN CSL_CITATION { "citationItems" : [ { "id" : "ITEM-1", "itemData" : { "URL" : "", "author" : [ { "dropping-particle" : "", "family" : "Tropicos", "given" : "", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" } ], "id" : "ITEM-1", "issued" : { "date-parts" : [ [ "0" ] ] }, "title" : "Tropicos", "type" : "webpage" }, "uris" : [ "" ] } ], "mendeley" : { "previouslyFormattedCitation" : "[2]" }, "properties" : { "noteIndex" : 0 }, "schema" : "" }[2], the Global Compositae Checklist ADDIN CSL_CITATION { "citationItems" : [ { "id" : "ITEM-1", "itemData" : { "URL" : "", "author" : [ { "dropping-particle" : "", "family" : "Flann", "given" : "C.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" } ], "id" : "ITEM-1", "issued" : { "date-parts" : [ [ "0" ] ] }, "title" : "Global Compositae Checklist", "type" : "webpage" }, "uris" : [ "" ] } ], "mendeley" : { "previouslyFormattedCitation" : "[45]" }, "properties" : { "noteIndex" : 0 }, "schema" : "" }[45] and USDA Plants ADDIN CSL_CITATION { "citationItems" : [ { "id" : "ITEM-1", "itemData" : { "URL" : "", "author" : [ { "dropping-particle" : "", "family" : "USDA", "given" : "", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" } ], "id" : "ITEM-1", "issued" : { "date-parts" : [ [ "2009" ] ] }, "title" : "USDA Plants", "type" : "webpage" }, "uris" : [ "" ] } ], "mendeley" : { "previouslyFormattedCitation" : "[46]" }, "properties" : { "noteIndex" : 0 }, "schema" : "" }[46] (Table 1); users select one of these sources as a standard against which to standardize their names. Combining sources is also possible, though this should be done with caution due to different spelling conventions and potentially conflicting synonymies. A partial solution to such conflicts is to assign an order of priority to each source, such that a second source is consulted for a particular name only if that name cannot be matched using the first source (see User options, below). NCBI taxonomy ADDIN CSL_CITATION { "citationItems" : [ { "id" : "ITEM-1", "itemData" : { "URL" : "", "id" : "ITEM-1", "issued" : { "date-parts" : [ [ "0" ] ] }, "title" : "NCBI Taxonomy", "type" : "webpage" }, "uris" : [ "" ] } ], "mendeley" : { "previouslyFormattedCitation" : "[47]" }, "properties" : { "noteIndex" : 0 }, "schema" : "" }[47] (Table 1) is also provided as an optional source for users wishing to match their names to taxa with molecular sequence data in GenBank ADDIN CSL_CITATION { "citationItems" : [ { "id" : "ITEM-1", "itemData" : { "DOI" : "10.1093/nar/gkn723", "abstract" : "GenBank is a comprehensive database that contains publicly available nucleotide sequences for more than 300,000 organisms named at the genus level or lower, obtained primarily through submissions from individual laboratories and batch submissions from large-scale sequencing projects. Most submissions are made using the web-based BankIt or standalone Sequin programs, and accession numbers are assigned by GenBank(R) staff upon receipt. Daily data exchange with the European Molecular Biology Laboratory Nucleotide Sequence Database in Europe and the DNA Data Bank of Japan ensures worldwide coverage. GenBank is accessible through the National Center for Biotechnology Information (NCBI) Entrez retrieval system, which integrates data from the major DNA and protein sequence databases along with taxonomy, genome, mapping, protein structure and domain information, and the biomedical journal literature via PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bimonthly releases and daily updates of the GenBank database are available by FTP. To access GenBank and its related retrieval and analysis services, begin at the NCBI Homepage: ncbi.nlm..", "author" : [ { "dropping-particle" : "", "family" : "Benson", "given" : "Dennis a", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Karsch-Mizrachi", "given" : "Ilene", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Lipman", "given" : "David J", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Ostell", "given" : "James", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Sayers", "given" : "Eric W", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" } ], "container-title" : "Nucleic acids research", "id" : "ITEM-1", "issue" : "Database issue", "issued" : { "date-parts" : [ [ "2009", "1" ] ] }, "page" : "D26-31", "title" : "GenBank.", "type" : "article-journal", "volume" : "37" }, "uris" : [ "" ] } ], "mendeley" : { "previouslyFormattedCitation" : "[16]" }, "properties" : { "noteIndex" : 0 }, "schema" : "" }[16]. However, due to missing taxa, inconsistent taxonomy and the presence of numerous informal names or "dark taxa"ADDIN CSL_CITATION { "citationItems" : [ { "id" : "ITEM-1", "itemData" : { "URL" : "", "accessed" : { "date-parts" : [ [ "2011", "9", "5" ] ] }, "author" : [ { "dropping-particle" : "", "family" : "Page", "given" : "Roderic", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" } ], "id" : "ITEM-1", "issued" : { "date-parts" : [ [ "0" ] ] }, "title" : "Dark taxa: GenBank in a post-taxonomic world", "type" : "webpage" }, "uris" : [ "" ] } ], "mendeley" : { "previouslyFormattedCitation" : "[48]" }, "properties" : { "noteIndex" : 0 }, "schema" : "" }[48], users are cautioned against using NCBI for taxonomic standardization.Taxonomic sources currently accessed by the TNRS provide nearly complete coverage of land plants (mosses, liverworts, hornworts, ferns, lycophytes, gymnosperms and flowering plants) for the New World (Table 1). With the exception of the flowering plant family Asteraceae, coverage of Old World plant names is less complete. A central goal of the TNRS is to enable users to resolve names of all organisms governed by the International Code of Nomenclature for algae, fungi and plants (ICN) ADDIN CSL_CITATION { "citationItems" : [ { "id" : "ITEM-1", "itemData" : { "author" : [ { "dropping-particle" : "", "family" : "Karthick", "given" : "B", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Williams", "given" : "DM", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" } ], "container-title" : "Current Science (Bangalore)", "id" : "ITEM-1", "issue" : "4", "issued" : { "date-parts" : [ [ "2012" ] ] }, "page" : "551-552", "title" : "The International Code for Nomenclature for algae , fungi and plants \u2013 a significant rewrite of the International Code of Botanical Nomenclature", "type" : "article-journal", "volume" : "102" }, "uris" : [ "" ] } ], "mendeley" : { "previouslyFormattedCitation" : "[49]" }, "properties" : { "noteIndex" : 0 }, "schema" : "" }[49], and we invite curators of high quality taxonomic databases to help fill gaps in our current taxonomic coverage by exposing their content via the iPlant TNRS. The TNRS website provides information on how to become a data provider to the TNRS, including a description of a simple exchange schema which can be used to expose taxonomic content to automatic validation and ingest by the TNRS (see ). Alternatively, taxonomic data providers can deploy their own instance of the TNRS using source code available from the iPlant OpenSource repository on GitHub (see ).ComponentsThe TNRS consists of four main components: (1) the TNRS database, which contains names and synonymy from external taxonomic sources; (2) a name resolution engine consisting of a name parsing application and a fuzzy matching application; (3) a web services layer and application programming interface (API); and (4) a web-based user interface.1. The TNRS databaseThe TNRS database is a periodically refreshed local cache of external sources of taxonomy, and consists of two interrelated components: (1) a MySQL core database containing the normalized and indexed names, synonymy and higher classifications, and (2) partially denormalized representations of the same taxonomic content, optimized for use by the fuzzy matching application. Information stored in the core database includes names and authors, an indication of taxonomic rank, a pointer to the immediate parent within the taxonomic hierarchy, and assertions as to the validity of a name (e.g., "accepted", "not accepted") accompanied by a pointer to the accepted name for synonymous names. Parent-child links and synonym-accepted name assertions are stored separately from the names themselves, thus allowing storage of multiple classifications and taxonomic opinions. Retrieval of ancestor and descendent taxa to arbitrary depth is supported by secondary indexing according to a modified preorder tree traversal algorithm ADDIN CSL_CITATION { "citationItems" : [ { "id" : "ITEM-1", "itemData" : { "ISBN" : "1558609202", "author" : [ { "dropping-particle" : "", "family" : "Celko", "given" : "Joe", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" } ], "id" : "ITEM-1", "issued" : { "date-parts" : [ [ "2004" ] ] }, "publisher" : "Morgan Kaufmann Publishers Inc.", "publisher-place" : "San Francisco, CA, USA", "title" : "Joe Celko's SQL for Smarties: Trees and Hierarchies", "type" : "book" }, "uris" : [ "" ] } ], "mendeley" : { "previouslyFormattedCitation" : "[50]" }, "properties" : { "noteIndex" : 0 }, "schema" : "" }[50]. Two sources (Tropicos, equivalent to the APG III classification ADDIN CSL_CITATION { "citationItems" : [ { "id" : "ITEM-1", "itemData" : { "author" : [ { "dropping-particle" : "", "family" : "Haston", "given" : "Elspeth", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Richardson", "given" : "James E", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Stevens", "given" : "Peter F", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Chase", "given" : "Mark W", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Harris", "given" : "David J", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Garden", "given" : "Missouri Botanical", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Box", "given" : "P O", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Louis", "given" : "St", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" } ], "container-title" : "Botanical Journal of the Linnean Society", "id" : "ITEM-1", "issued" : { "date-parts" : [ [ "2009" ] ] }, "page" : "128-131", "title" : "The Linear Angiosperm Phylogeny Group ( LAPG ) III : a linear sequence of the families in APG III", "type" : "article-journal" }, "uris" : [ "" ] } ], "mendeley" : { "previouslyFormattedCitation" : "[51]" }, "properties" : { "noteIndex" : 0 }, "schema" : "" }[51], and NCBI taxonomy) serve as alternative family classifications; genera, species and infraspecific taxa from all sources are joined to these families by genus.Taxonomic content is normalized to the source database by loading scripts written in PHP. The normalization process separates names from classifications and assertions of synonymy, joining new names to alternative family classifications and building foreign keys and indexes. The loading scripts also pre-load the fuzzy matching tables and perform critical validations such as checking for missing or conflicting parent-child links. Taxonomic content can be exposed to the TNRS using an exchange schema based on Simple Darwin Core ADDIN CSL_CITATION { "citationItems" : [ { "id" : "ITEM-1", "itemData" : { "URL" : "rs.dwc/terms/simple/index.htm", "author" : [ { "dropping-particle" : "", "family" : "TDWG", "given" : "", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" } ], "id" : "ITEM-1", "issued" : { "date-parts" : [ [ "2012" ] ] }, "title" : "Simple Darwin Core", "type" : "webpage" }, "uris" : [ "" ] } ], "mendeley" : { "previouslyFormattedCitation" : "[52]" }, "properties" : { "noteIndex" : 0 }, "schema" : "" }[52]. For details of the "TNRS Simple Darwin Core Format" see 2. Name resolution engineName resolution by the TNRS consists of four steps: pre-processing, name parsing, fuzzy matching and post-processing. 2.1 Pre-processingPrior to submitting names to the parsing and fuzzy-matching applications, family names pre-pended to species names are removed by searching the initial string of the name for standard family endings (“aceae” and “idae”) and checking against a list of conserved plant family names (Gramineae, Compositae, etc.; although plant-specific, the latter check could be generalized by expanding this list to include conserved family names from all nomenclatural codes). Indications of uncertain identification such as “cf.” and “aff.” ADDIN CSL_CITATION { "citationItems" : [ { "id" : "ITEM-1", "itemData" : { "author" : [ { "dropping-particle" : "", "family" : "Stearn", "given" : "William T.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" } ], "id" : "ITEM-1", "issued" : { "date-parts" : [ [ "2004" ] ] }, "page" : "560", "publisher" : "Timber Press", "title" : "Botanical Latin", "type" : "book" }, "uris" : [ "" ] } ], "mendeley" : { "previouslyFormattedCitation" : "[53]" }, "properties" : { "noteIndex" : 0 }, "schema" : "" }[53] are also removed. For names submitted as all capital letters, case is adjusted by capitalizing the first letter and setting all remaining letters to lower case. This last step is necessary as the name parser uses case to identify name components and cannot correctly parse all-caps names.The final step in pre-processing is to match the remaining string directly against the core database. Strings matching completely are given an overall match score of 1.0 (see Match score calculation) and removed from further processing Unmatched names are passed to the name parser (see Name parsing). The results of parsing are matched a second time against the core database before passing the remaining unmatched names to the fuzzy matching application (see Fuzzy matching).2.2. Name parsingSeparation and classification of name components is performed by the GNA Scientific Name Parser ADDIN CSL_CITATION { "citationItems" : [ { "id" : "ITEM-1", "itemData" : { "URL" : "", "author" : [ { "dropping-particle" : "", "family" : "Mozzerhin", "given" : "Dmitry", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" } ], "id" : "ITEM-1", "issued" : { "date-parts" : [ [ "2011" ] ] }, "title" : "GNI Name Parser", "type" : "webpage" }, "uris" : [ "" ] } ], "mendeley" : { "previouslyFormattedCitation" : "[54]" }, "properties" : { "noteIndex" : 0 }, "schema" : "" }[54], which is distributed as a Ruby gem library, a command line utility and a server script ADDIN CSL_CITATION { "citationItems" : [ { "id" : "ITEM-1", "itemData" : { "author" : [ { "dropping-particle" : "", "family" : "Mozzerhin", "given" : "Dmitry", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" } ], "id" : "ITEM-1", "issued" : { "date-parts" : [ [ "2010" ] ] }, "title" : "GNA Scientific Name Parser server script", "type" : "article" }, "uris" : [ "" ] } ], "mendeley" : { "previouslyFormattedCitation" : "[55]" }, "properties" : { "noteIndex" : 0 }, "schema" : "" }[55]. It is based on Treetop gem which implements Parsing Expression Grammars algorithm ADDIN CSL_CITATION { "citationItems" : [ { "id" : "ITEM-1", "itemData" : { "DOI" : "10.1145/982962.964011", "ISBN" : "1-58113-729-X", "author" : [ { "dropping-particle" : "", "family" : "Ford", "given" : "Bryan", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" } ], "container-title" : "ACM SIGPLAN Notices", "id" : "ITEM-1", "issue" : "1", "issued" : { "date-parts" : [ [ "2004", "1", "1" ] ] }, "page" : "111-122", "title" : "Parsing expression grammars", "type" : "article-journal", "volume" : "39" }, "uris" : [ "" ] } ], "mendeley" : { "previouslyFormattedCitation" : "[56]" }, "properties" : { "noteIndex" : 0 }, "schema" : "" }[56]. The parser defines the components of a scientific name as a series of recursive regular expressions. It begins by using white spaces to separate the components of the scientific name and authorship, and then moves to identifying each components as a genus, specific epithet, infraspecific epithet, author, year, etc. The higher level definitions describe how simpler components combine together as a name or a conglomerate of names (hybrids). At first the parser follows the rules of all nomenclatural codes inclusively; if something is allowed in the ICBN (International Code of Botanical Nomenclature, but not allowed in the ICZN (International Code of Zoological Nomenclature), it is allowed by the parser. If the parser fails to atomize a name it moves into ‘relaxed’ mode, where common mistakes in writing names or authorship are taken into account. For example, relaxed mode allows diacritic characters not permitted by zoological or botanical codes, double parentheses surrounding author names, year without an author, square brackets and question marks around years (as in the example ‘[185?]’), etc. Relaxed mode does not perform fuzzy matching. If relaxed mode fails as well the parser uses ‘salvage’ mode, which tries to extract the canonical form of the name from the string, discarding anything to the right of it.` Parsing is case sensitive, which means, for example, that the genus part of a binomial must be capitalized, and the species epithet must be in lower case to be recognized. Scientific names that do not follow a rigid linear structure (for example, hybrid names such as Coeloglossum viride (L.) Hartman ? Dactylorhiza majalis (Rchb. f.) P.F. Hunt & Summerhayes ssp. praetermissa (Druce) D.M. Moore & Soó) are also supported as a result of a recursive nature of the algorithm.In addition to separating the author from the taxon name, the parser detects and separates the genus from specific and infraspecific epithets, and extracts rank indicators such as “var.”, “ssp.”, “subsp.”, etc. For example, “Bromus inermis var. confinis (Nees ex Steud.) Stapf" is separated into genus "Bromus", specific epithet "inermis", infraspecific rank indicator "var.", infraspecific epithet "confinis", basionym author "Nees ex Steud." and combining author "Stapf". The results of parsing are also used to determine the overall taxonomic rank of the name submitted (for example, genus, species, subspecies, variety, etc.). This information is required for flagging partial matches and for constraining matches by higher taxonomy (see User options).2.3. Fuzzy matchingFuzzy matching is performed by a modified version of the PHP implementation ADDIN CSL_CITATION { "citationItems" : [ { "id" : "ITEM-1", "itemData" : { "URL" : "", "author" : [ { "dropping-particle" : "", "family" : "Giddens", "given" : "Michael", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" } ], "id" : "ITEM-1", "issued" : { "date-parts" : [ [ "0" ] ] }, "title" : "Taxamatch Web Service", "type" : "webpage" }, "uris" : [ "" ] } ], "mendeley" : { "previouslyFormattedCitation" : "[57]" }, "properties" : { "noteIndex" : 0 }, "schema" : "" }[57] of Taxamatch ADDIN CSL_CITATION { "citationItems" : [ { "id" : "ITEM-1", "itemData" : { "URL" : "", "author" : [ { "dropping-particle" : "", "family" : "Rees", "given" : "Tony", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" } ], "id" : "ITEM-1", "issued" : { "date-parts" : [ [ "0" ] ] }, "title" : "TAXAMATCH - fuzzy matching algorithm for genus and species scientific names", "type" : "webpage" }, "uris" : [ "" ] } ], "mendeley" : { "previouslyFormattedCitation" : "[58]" }, "properties" : { "noteIndex" : 0 }, "schema" : "" }[58]. The Taxamatch algorithm speeds matching of taxonomic names by matching higher taxonomic name components first, then searching only for taxa within the best-matching higher taxon (for example, genera, followed by the species within the best-matching genus). Matches to names minus the authority are determined using two separate tests: phonetic similarity and orthographic (spelling) similarity. A name passing either of these tests, or both, is considered a "match" (although see Candidate match selection for additional rules enforced by the TNRS).Phonetic similarity is assessed using a custom algorithm that substitutes specific characters or character pairs for others, thereby transforming each name to a simplified phonetic equivalent. Although similar to approaches such as Soundex ADDIN CSL_CITATION { "citationItems" : [ { "id" : "ITEM-1", "itemData" : { "author" : [ { "dropping-particle" : "", "family" : "Odell", "given" : "M.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Russell", "given" : "R.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" } ], "id" : "ITEM-1", "issued" : { "date-parts" : [ [ "1918" ] ] }, "number" : "US Patents 1,261,167", "title" : "The soundex coding system", "type" : "patent" }, "uris" : [ "" ] } ], "mendeley" : { "previouslyFormattedCitation" : "[59]" }, "properties" : { "noteIndex" : 0 }, "schema" : "" }[59] and Phonix ADDIN CSL_CITATION { "citationItems" : [ { "id" : "ITEM-1", "itemData" : { "abstract" : "PHONIX: The algorithm", "author" : [ { "dropping-particle" : "", "family" : "Gadd", "given" : "T.N.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" } ], "container-title" : "Program: electronic library and information systems", "id" : "ITEM-1", "issue" : "4", "issued" : { "date-parts" : [ [ "1990" ] ] }, "page" : "363-366", "publisher" : "MCB UP Ltd", "title" : "PHONIX: The algorithm", "type" : "article-journal", "volume" : "24" }, "uris" : [ "" ] } ], "mendeley" : { "previouslyFormattedCitation" : "[60]" }, "properties" : { "noteIndex" : 0 }, "schema" : "" }[60], the Taxamatch algorithm also takes into account specific lexical conventions of scientific names and incorporates a degree of “stemming" of species epithets, in which a range of possible variant word endings are transformed to a single standardized form (cf. ADDIN CSL_CITATION { "citationItems" : [ { "id" : "ITEM-1", "itemData" : { "author" : [ { "dropping-particle" : "", "family" : "Fuller", "given" : "M.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Zobel", "given" : "J.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" } ], "container-title" : "Proceedings of the Third Australian Document Computing Symposium, Sydney, Australia, August 21, 1998", "editor" : [ { "dropping-particle" : "", "family" : "Kay", "given" : "J.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Milosavlje", "given" : "M.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" } ], "id" : "ITEM-1", "issued" : { "date-parts" : [ [ "1998" ] ] }, "page" : "8-13", "publisher" : "University of Sydney", "publisher-place" : "Sydney", "title" : "Conflation-based comparison of stemming algorithms", "type" : "chapter" }, "uris" : [ "" ] } ], "mendeley" : { "previouslyFormattedCitation" : "[61]" }, "properties" : { "noteIndex" : 0 }, "schema" : "" }[61]). The stemming (equivalent) in Taxamatch equates -a, -is -us, -ys, -es, -um, -as and -os when they occur at the end of a species epithet (or infraspecies) by changing them all to -a. Thus (for example) the epithets “nitidus”, “nitidum”, “nitidus” and “nitida” will all be considered equivalent following this process. Once transformed, names are compared using an exact match; this operation is very fast as reference names are transformed in advance during the loading of each taxonomic source to the TNRS database.Orthographic similarity for each name component (e.g., genus, species, subspecies; but not author; see below) is calculated using a modified Damerau-Levenshtein Distance ADDIN CSL_CITATION { "citationItems" : [ { "id" : "ITEM-1", "itemData" : { "DOI" : "10.1145/363958.363994", "ISBN" : "0000000000", "author" : [ { "dropping-particle" : "", "family" : "Damerau", "given" : "Fred J.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" } ], "container-title" : "Communications of the ACM", "id" : "ITEM-1", "issue" : "3", "issued" : { "date-parts" : [ [ "1964", "3", "1" ] ] }, "page" : "171-176", "title" : "A technique for computer detection and correction of spelling errors", "type" : "article-journal", "volume" : "7" }, "uris" : [ "" ] } ], "mendeley" : { "previouslyFormattedCitation" : "[62]" }, "properties" : { "noteIndex" : 0 }, "schema" : "" }[62,ADDIN CSL_CITATION { "citationItems" : [ { "id" : "ITEM-1", "itemData" : { "abstract" : "Investigations of transmission of binary information usually consider a channel model in which failures of the type 0 >1 and 1 0 (which we will call reversals) are admitted. In the present paper (as in ll) we investigate a channel model in which it is also possible to have failures of the form 0 A, 1> A, which are called deletions, and failures of the form A 0, A l, which are called insertions (here A is the empty word). For such channels, by analogy to the combinatorial problem of constructing optimal codes capable of correcting s reversals, we will consider the problem of constructing optimal codes capable of correcting deletions, insertions, and reversals.", "author" : [ { "dropping-particle" : "", "family" : "Levenshtein", "given" : "V I", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" } ], "container-title" : "Soviet Physics Doklady", "id" : "ITEM-1", "issue" : "8", "issued" : { "date-parts" : [ [ "1966" ] ] }, "page" : "707-710", "title" : "Binary codes capable of correcting deletions, insertions, and reversals", "type" : "article-journal", "volume" : "10" }, "uris" : [ "" ] } ], "mendeley" : { "previouslyFormattedCitation" : "[63]" }, "properties" : { "noteIndex" : 0 }, "schema" : "" } 63] with additional corrections for transposed syllables (T. Rees, unpubl.), hereafter referred to as edit distance (ED). “Classic” ED using Levenstein’s original algorithm ADDIN CSL_CITATION { "citationItems" : [ { "id" : "ITEM-1", "itemData" : { "abstract" : "Investigations of transmission of binary information usually consider a channel model in which failures of the type 0 >1 and 1 0 (which we will call reversals) are admitted. In the present paper (as in ll) we investigate a channel model in which it is also possible to have failures of the form 0 A, 1> A, which are called deletions, and failures of the form A 0, A l, which are called insertions (here A is the empty word). For such channels, by analogy to the combinatorial problem of constructing optimal codes capable of correcting s reversals, we will consider the problem of constructing optimal codes capable of correcting deletions, insertions, and reversals.", "author" : [ { "dropping-particle" : "", "family" : "Levenshtein", "given" : "V I", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" } ], "container-title" : "Soviet Physics Doklady", "id" : "ITEM-1", "issue" : "8", "issued" : { "date-parts" : [ [ "1966" ] ] }, "page" : "707-710", "title" : "Binary codes capable of correcting deletions, insertions, and reversals", "type" : "article-journal", "volume" : "10" }, "uris" : [ "" ] } ], "mendeley" : { "previouslyFormattedCitation" : "[63]" }, "properties" : { "noteIndex" : 0 }, "schema" : "" }[63] is a measure of the minimum number of single-character deletions, insertions, or substitutions required to transform one string into a second string. Thus, “faveolata” and “flaveolata” have an edit distance of 1 by that measure (single character insertion) as do “Ficus” and “Fucus” (single character substitution). The “Damerau-Levenshtein” version of the algorithm also allows single character transpositions (for example, “Nais” vs. “Nias”) at a cost of ED 1 which under “classic” Levenshtein would incur a cost of 2 (substitutions), since transpositions are not recognised in the original case. The additional modification introduced for Taxamatch, termed Modified Damerau-Levenshtein Distance or MDLD, further permits multi-character transpositions (for example, “vecusilosus” to “vesiculosus”), at a cost of the number of transposed characters only (ED 2 in this case) rather than the more expensive cost (ED 4) that would be incurred if each character were to be substituted individually, as in either of the preceding algorithms. Due to its variable spelling, abbreviation and format, similarity of the author is calculated using the more relaxed n-gram method ADDIN CSL_CITATION { "citationItems" : [ { "id" : "ITEM-1", "itemData" : { "abstract" : "Approximate string matching is an important operation in information systems because an input string is often an inexact match to the strings already stored. Commonly known accurate methods are computationally expensive as they compare the input string to every entry in the stored dictionary. This paper describes a two-stage process. The first uses a very compact ngram table to preselect sets of roughly similar strings. The second stage compares these with the input string using an accurate method to give an accurately matched set of strings. A new similarity measure based on the Levenshtein metric is defined for this comparison. The resulting method is both computationally fast and storage-efficient.", "author" : [ { "dropping-particle" : "", "family" : "Owolabi", "given" : "O", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "McGregor", "given" : "D. R.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" } ], "container-title" : "SOFTWARE-PRACTICE AND EXPERIENC", "id" : "ITEM-1", "issue" : "4", "issued" : { "date-parts" : [ [ "1988", "1" ] ] }, "page" : "387-393", "title" : "Fast approximate string matching", "type" : "article-journal", "volume" : "18" }, "uris" : [ "" ] } ], "mendeley" : { "previouslyFormattedCitation" : "[64]" }, "properties" : { "noteIndex" : 0 }, "schema" : "" }[64], which produces an author match score (AMS) ranging from 0-1. This index is calculated as a blend of 2/3 bigram and 1/3 trigram similarity between the strings, for which known botanical author abbreviations are expanded according to a dictionary of stored abbreviations prior to the comparison. The abbreviations are chiefly a subset of the standard abbreviations found in Brummitt & Powell ADDIN CSL_CITATION { "citationItems" : [ { "id" : "ITEM-1", "itemData" : { "author" : [ { "dropping-particle" : "", "family" : "Brummitt", "given" : "R.K.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Powell", "given" : "C.E.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" } ], "id" : "ITEM-1", "issued" : { "date-parts" : [ [ "1992" ] ] }, "page" : "732", "publisher" : "Kew: Royal Botanical Gardens", "publisher-place" : "London, U.K.", "title" : "Authors of Plant Names", "type" : "book" }, "uris" : [ "" ] } ], "mendeley" : { "previouslyFormattedCitation" : "[65]" }, "properties" : { "noteIndex" : 0 }, "schema" : "" }[65], supplemented with additional abbreviated forms, including some for animal names, as compiled in one of the authors' (TR) Interim Register of Marine and Nonmarine Genera database ADDIN CSL_CITATION { "citationItems" : [ { "id" : "ITEM-1", "itemData" : { "URL" : "", "author" : [ { "dropping-particle" : "", "family" : "IRMNG", "given" : "", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" } ], "id" : "ITEM-1", "issued" : { "date-parts" : [ [ "0" ] ] }, "title" : "Interim Register of Marine and Nonmarine Genera", "type" : "webpage" }, "uris" : [ "" ] } ], "mendeley" : { "previouslyFormattedCitation" : "[66]" }, "properties" : { "noteIndex" : 0 }, "schema" : "" }[66]. The index is calculated twice, once using the original UTF8 strings and a second time using plain ASCII version, so as to reduce differences solely due to presence or absence of diacritical marks. The final author similarity score is the unweighted average of the two calculations. For example, consider the authority portions of the two species name strings “Jovetia erecta Guédès” vs. “Jovetia erecta M.Guedes”. Treating the letters with diacritics (“é” and “è”) as different characters from their non-diacritic equivalents (“e” in both cases) would result in an undesirably low similarity (0.305, or 0.411 if the leading “M.” initial is omitted) whereas treating both as identical to “e” results in arguably too high a similarity (0.795, or 1.0 if the leading “M.” initial is omitted). Therefore, in order to score these variants as similar but not identical, the average value of the two approaches is used (0.550, or 0.705 if the leading “M.” initial is omitted). This example of multiple accented characters in a comparatively short word is somewhat unusual; in most cases the difference between the two approaches will be apparent but less extreme. Extensions to the original Taxamatch code and schema were made to support matching of family names, trinomials (e.g., Bromus inermis subsp. inermis) and quadrinomials (e.g., Bromus inermis subsp. inermis var. divaricatus). An overall match score based on both name and author similarity scores is calculated during the post-processing stage (see Match score calculation, below).2.4. Post-processingAfter fuzzy matching is complete, the following "post-processing" steps are performed: (1) calculating and scaling the overall match score, (2) applying thresholds to select the candidate best matches, (3) ranking results to select the single best match, and (4) assigning warnings. After these steps are complete, the results are returned as JSON (JavaScript Object Notation) to the web services layer. 2.4.1. Match score calculationAfter fuzzy matching is complete, the EDs of each name component (family, genus, species, variety, etc.; see Fuzzy matching, above) are combined and transformed to an Overall Match Score (OMS). The OMS provides a more intuitive measure of the confidence that a submitted string matches a name, with 0 indicating no confidence in a match (or, possibly, high confidence in a non-match) and 1 indicating certainty that the returned name is the correct match for the submitted name. With the exception of names matching perfectly to the TNRS database—which are automatically assigned an OMS of 1—calculation of the OMS involves the following four steps. First, each name component (except author; see below) is assigned a partial match score (PMS) based on the ED between it and the closest name in the TNRS database as follows: PMS = 1 - 2 × (ED /MaxED) (1)where MaxED, the maximum possible value of ED, is equal to the length of the longest of the two strings compared. PMS thus ranges between -1 and 1, where 1 is an exact match. A penalty of -0.3 is applied if a rank indicator (“var.”, “ssp.”, “subsp.”, etc.) is present in the submitted string but is not the correct one. For example, in the case of Chondrophora nudata var. virgata, if the user submits Chondrophora nudata fo. virgata, the infraspecific taxon will receive a score of 0.7: 1 for the infraspecific epithet, minus 0.3 for the incorrect infraspecific rank indicator ("fo." instead of "var.").The second and third steps involve calculation of the original and transformed scientific name match scores (SNMS and SNMStr, respectively). SNMS is simply the sum of the PMSs of all name components. SNMStr is a non-linear transformation of SNMS, scaled to provide a more intuitive measure of the confidence that a submitted string matches a name. SNMStr , which ranges from 0-1, where 1 is a perfect match, is tolerant of variation in SNMS when the submitted name has a very good match or no match, but sensitive to small differences in the middle of the range of SNMS. This reflects the intuitive perception that it takes more evidence to change an opinion when one has very high confidence that it is correct than when one is uncertain. SNMStr is an arctangent transformation of SNMS, normalized by the number of name components:SNMStr = atan((s* SNMS/k)^(2*t+1))/(2*atan(s^(2*t+1)))+0.5(3)where k is the number of name components, s > 0 and t ≥ 0 are two parameters that change the shape of the transformation, from a linear relationship (s≈0, t=0), to different forms of logistic (s > 1, t = 0) and double-logistic functions (s > 1, t > 1). The parameter s > 1 can be used to control the steepness of the curve, whereas t > 1 controls the size of the center. The TNRS uses values of s = 2 and t = 1. This configuration divides the curve in 5 regions: 2 regions of certainty at the two extremes, a central region of uncertainty and 2 regions of discrimination that fall in between (Figure 2). In the regions of certainty and in the central region, differences in SNMS produce only small changes in SNMStr whereas in the regions of discrimination, small differences in SNMS are amplified by the transformation.The fourth and final step in the calculation of OMS takes into account the authority and unmatched name components, if any. A fixed penalty of 0.1 is subtracted if any unmatched text was found that did not matched to a name, an author, or a standard annotation such as "cf.". If an author was submitted, the OMS is calculated as a weighted average of the SNMStr and the AMS. The TNRS is implemented with 0.8 and 0.2 as the weights for the SNMStr and AMS, respectively. Thus, for a name plus author,OMS = (0.8 * SNMStr) + (0.2 * AMS) - p(4)where p is a penalty which equals 0.1 if unmatched text was found, otherwise 0. If no author was submitted, OMS = SNMStr - p(5)This is the final OMS that is presented to the user.2.4.2. Candidate match selectionTo qualify as a candidate match, a name must pass the maximum ED test and also pass either the phonetic test or the match threshold test. The phonetic test is performed during fuzzy matching (see Fuzzy matching, above). The remaining two tests are performed during post processing, as described below.To pass the maximum ED test, the following must be true: ED ≤ 2 * (number of name parts). Rank indicators of infraspecific taxon names ("var.", "subsp.", etc.) are not counted as name parts. Thus, for a variety such as Poa annua var. spuria, the number of name parts is three, and the maximum ED is 6. For a species name, which consists of two parts, the maximum ED is 4.The match threshold test is based on the EDs of each name component, weighted by the lengths of the strings compared. The following conditions must be satisfied for each name component: (ED / MSL ≤ MaxEDR) AND ((2 ≤ ED < 4 AND the first character matches) OR (ED = 4 AND the first 3 characters match)), where MSL is the minimum length of the two strings being compared and MaxEDR is the maximum edit distance ratio, a constant which takes on one of two values depending on the value of MSL. For MSL < 6, MaxEDR= 0.5; for MSL ≥ 6, MaxEDR= 0.3334. The values of MaxEDR were determined empirically by examining performance for samples of names. Although in general MaxEDR=1/3 provides intuitively "reasonable" matching of most names (BB, pers. observation), it is increased for strings of five characters or less to compensate for a bias against matching short strings. For example, "Marsilleya", which differs from its target genus "Marsilea" by an ED of 2 (MSL=8), passes the match threshold test (ED/MSL ≤ 0.3334). "Ulleya", which also differs from its target "Ulea" by an ED of 2 (MSL=4), passes the match threshold test at the less stringent MaxEDR of 0.5 (ED/MSL ≤ 0.5), but would fail at MaxEDR=0.3334.2.4.3 Ranking and best match selectionOnce candidate matches have been determined by applying the phonetic, match threshold and maximum ED tests, multiple candidate matches to a single submitted name are ranked to select the best match. During name processing, the TNRS performs and stores two alternative sets of rankings, one unconstrained and the other constrained by higher taxonomy. Using the unconstrained algorithm, the TNRS ranks all candidate matches by descending SNMS, OMS and taxonomic status. Taxonomic status is ranked as follows: “accepted” > “synonym” > “no opinion” ("illegitimate" and "invalid" are treated as "synonym" for ranking purposes). The highest ranking candidate match is then presented to the user as the best match. If two or more candidate matches have identical values of SNMS, OMS and taxonomic status, the name with the lowest alphabetical sort order is presented as the best match but flagged as “Ambiguous Match” (see Warnings, below).The taxonomically-constrained rank calculation is similar to the unconstrained algorithm, except that the calculation is performed separately for each name component, starting with genus (or family, if a family was submitted with the name), then species, then infraspecific taxa, if any. The result of the taxonomically-constrained algorithm is that the best (highest ranked) match for a species name with a misspelled genus but perfectly spelled specific epithet will be the best-matching genus, whereas the best match using the unconstrained algorithm will be best-matching species. For example, the best overall match for Fucus insipida is the species Ficus insipida (OMS = 0.96) in the Moraceae (fig family), whereas the best genus match is Fucus (OMS = 0.50) in the Fucaceae (brown algae). Under the default unconstrained ranking algorithm, Ficus insipida will be displayed as the best match and flagged with the warning "Better higher taxonomic match available". Under the taxonomically-constrained ranking algorithm, Fucus will be displayed as the best match and given two warnings: "Partial match" and "Better spelling match in different higher taxon" (see Warnings, below).Both sets of rankings are calculated and stored during name processing, and the user may switch between taxonomically-constrained or unconstrained matches after name processing is complete by checking or unchecking "Constrain by higher taxonomy" under "Best match settings" (see User options). In addition, a second user setting provides the option of constraining best matches by taxonomic source. This ranking is performed "on the fly" by checking "Constrain by source" under "Best match settings", and causes all candidate matches from the top-ranked source to rank above those of lower-ranked sources. Thus, a candidate match with a low OMS will appear as the best match even if candidate match with a higher OMS is found for a lower-ranked taxonomic source. Adjusting this setting only has an effect if >1 taxonomic source has been used. This setting is recommended if using multiple sources simultaneously, as it minimizes the effect of spelling and synonymy conflicts between sources. 2.4.4. WarningsThe TNRS issues four types of warnings about names matched. "Partial match" indicates that the name matched is of a higher taxonomic rank than the name submitted by the user. For example, if the user submitted a species name but the TNRS was able to match only the genus, the TNRS would return the genus along with the warning "Partial match". "Ambiguous match" indicates a “tie”, meaning that one or more other candidate matches have identical match scores and taxonomic status. Two additional warnings indicate that the name submitted matches to taxa which are not closely related. "Better spelling match in different higher taxon" indicates that another candidate match with a better overall match score is available in a different higher taxon. "Better higher taxonomic match available" indicates that a another candidate match with a lower overall match score provides a better match to the higher taxon of the name submitted (see Ranking and best match selection, above).3. Web services and application programming interfaceThe TNRS web-services layer acts as an asynchronous job execution and data management server. It controls traffic between the user interface and the TNRS name resolution. These services manage input and output files and schedule jobs submitted for parsing and fuzzy matching. The TNRS web services can be accessed programmatically via a RESTful API, using a GET call with two parameters: retrieve (followed by options requesting the return of all matches or the single best match only) and names (followed by a comma-separated list of URL-encoded taxon names). Results are returned from the web service as JSON. Details of the TNRS API are provided at . As a demonstation of how to use the TNRS in third-party software, Additional file 1 provides an example R script that calls the TNRS API in the context of adding taxon names to a phylogeny.4. Web interfaceThe TNRS web interface uses a Rich Internet Application (RIA) ADDIN CSL_CITATION { "citationItems" : [ { "id" : "ITEM-1", "itemData" : { "author" : [ { "dropping-particle" : "", "family" : "Farrell", "given" : "J.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Nezlek", "given" : "G.S.;", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" } ], "container-title" : "29th International Conference on Information Technology Interfaces, ITI 2007, June 25-28", "id" : "ITEM-1", "issued" : { "date-parts" : [ [ "2007" ] ] }, "page" : "413-418", "title" : "Rich Internet Applications The Next Stage of Application Development", "type" : "chapter" }, "uris" : [ "" ] } ], "mendeley" : { "previouslyFormattedCitation" : "[67]" }, "properties" : { "noteIndex" : 0 }, "schema" : "" }[67] front end and is built using the Google Web Toolkit for a high degree of user interactivity within a web browser. The interface is supported by a layer of web services that provide a bridge between the user interface (UI) and the underlying algorithms that perform the matching (see Web services and application programming interface).The TNRS web interface allows the user to submit names in a text box or by uploading a file (Figure 1). Names submitted via the file load utility may also be preceded by an integer ID separated by a tab. Including a numeric ID provides an alternative way of joining results back to the original database. Name resolution results are displayed below the data entry panel (Fig 1). Only the single best match is displayed by the web interface; however, users may view alternative matches by clicking on the “(+n more)” hyperlink. Additional hyperlinks allow the user to view matched names and accepted names in their original source databases. Taxonomic status of each name is indicated as “Accepted”, “Synonym” or “No opinion” (some sources further distinguish "Illegitimate" or "Invalid" non-accepted names). For matched names that are not accepted according to the sources consulted a link to the accepted name is provided. Results can be copied directly from the results display or downloaded as a comma-delimited text file. 4.1. User options.Users can configure two types of options: Name processing settings and best match settings. Name processing settings must be adjusted prior to submitting names for processing, and are displayed to the right of the data entry panel (Figure 1; see Table 2 for details). Best match settings affect how the best match is selected by adjusting the algorithm used to rank multiple candidate matches (see Ranking and best match selection). These settings are adjusted "on the fly" after names have be processed by the TNRS, and are displayed in a drop-down menu on the upper left of the results display (Figure 1). Changes in best match settings are reflected immediately in the display and in the downloaded results file. Results and DiscussionPerformance evaluation1. Comparison with existing name resolution applicationsWhile some the functionality of the TNRS can be found within existing name resolution applications, none combine all the capabilities of fuzzy matching, synonym correction, partial matching, return of alternative matches, and homonym resolution within both an API and user-friendly web interface (Table 3). Web services such as the Tropicos web service ADDIN CSL_CITATION { "citationItems" : [ { "id" : "ITEM-1", "itemData" : { "URL" : "", "id" : "ITEM-1", "issued" : { "date-parts" : [ [ "0" ] ] }, "title" : "Tropicos Web Services", "type" : "webpage" }, "uris" : [ "" ] } ], "mendeley" : { "previouslyFormattedCitation" : "[68]" }, "properties" : { "noteIndex" : 0 }, "schema" : "" }[68] and Catalogue of Life ADDIN CSL_CITATION { "citationItems" : [ { "id" : "ITEM-1", "itemData" : { "URL" : "", "id" : "ITEM-1", "issued" : { "date-parts" : [ [ "0" ] ] }, "title" : "Catalogue of Life", "type" : "webpage" }, "uris" : [ "" ] } ], "mendeley" : { "previouslyFormattedCitation" : "[35]" }, "properties" : { "noteIndex" : 0 }, "schema" : "" }[35] are capable of bulk resolution of plant names but do not use fuzzy matching to correct misspelled names or standardize variant spellings. The Tropicos batch name matching utility ADDIN CSL_CITATION { "citationItems" : [ { "id" : "ITEM-1", "itemData" : { "URL" : "", "id" : "ITEM-1", "issued" : { "date-parts" : [ [ "0" ] ] }, "title" : "Tropicos Name Matching", "type" : "webpage" }, "uris" : [ "" ] } ], "mendeley" : { "previouslyFormattedCitation" : "[69]" }, "properties" : { "noteIndex" : 0 }, "schema" : "" }[69] performs exact matching but does not correct misspelled names. The Taxamatch implementation used by the Interim Register of Marine and Nonmarine Genera (IRMNG) ADDIN CSL_CITATION { "citationItems" : [ { "id" : "ITEM-1", "itemData" : { "URL" : "", "author" : [ { "dropping-particle" : "", "family" : "IRMNG", "given" : "", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" } ], "id" : "ITEM-1", "issued" : { "date-parts" : [ [ "0" ] ] }, "title" : "Interim Register of Marine and Nonmarine Genera", "type" : "webpage" }, "uris" : [ "" ] } ], "mendeley" : { "previouslyFormattedCitation" : "[66]" }, "properties" : { "noteIndex" : 0 }, "schema" : "" }[66] performs fuzzy matching but does not currently handle infraspecific taxa or perform batch correction of misspelled names. The GRIN Taxonomic Nomenclature Checker (GRIN-TNC) ADDIN CSL_CITATION { "citationItems" : [ { "id" : "ITEM-1", "itemData" : { "URL" : "", "author" : [ { "dropping-particle" : "", "family" : "Germplasm Resources Information Network", "given" : "", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" } ], "id" : "ITEM-1", "issued" : { "date-parts" : [ [ "0" ] ] }, "title" : "GRIN Taxonomic Nomenclature Checker", "type" : "webpage" }, "uris" : [ "" ] } ], "mendeley" : { "previouslyFormattedCitation" : "[70]" }, "properties" : { "noteIndex" : 0 }, "schema" : "" }[70], an important early name resolution application, performs fuzzy matching based on Levenshtein EDs and is capable of bulk name resolution; however, resolution must be done in stages by first correcting genera, then resubmitting species. The GRIN-TNC does not provide confidence scores or alternative matches and is available only via a web user interface.To our knowledge, Plantminer ADDIN CSL_CITATION { "citationItems" : [ { "id" : "ITEM-1", "itemData" : { "DOI" : "10.1016/j.envsoft.2009.11.014", "author" : [ { "dropping-particle" : "", "family" : "Carvalho", "given" : "Gustavo Henrique", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Cianciaruso", "given" : "Marcus Vinicius", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Batalha", "given" : "Marco Ant\u00f4nio", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" } ], "container-title" : "Environmental Modelling & Software", "id" : "ITEM-1", "issue" : "6", "issued" : { "date-parts" : [ [ "2010", "6" ] ] }, "page" : "815-816", "title" : "Plantminer: A web tool for checking and gathering plant species taxonomic information", "type" : "article-journal", "volume" : "25" }, "uris" : [ "" ] } ], "mendeley" : { "previouslyFormattedCitation" : "[71]" }, "properties" : { "noteIndex" : 0 }, "schema" : "" }[71] and The Global Names Resolver (GNResolver) ADDIN CSL_CITATION { "citationItems" : [ { "id" : "ITEM-1", "itemData" : { "URL" : "", "id" : "ITEM-1", "issued" : { "date-parts" : [ [ "0" ] ] }, "title" : "Global Names Resolver", "type" : "webpage" }, "uris" : [ "" ] } ], "mendeley" : { "previouslyFormattedCitation" : "[72]" }, "properties" : { "noteIndex" : 0 }, "schema" : "" }[72] are the only applications in addition to the TNRS that combine batch resolution of plant scientific names, spelling correction via fuzzy matching, and access via both a user interface and web services. Like the TNRS, both these applications convert synonyms to accepted names. Like the TNRS (but unlike Plantminer) the GNResolver can provide alternative matches and also returns a score indicating overall level of confidence in the match. To compare the name matching abilities of Plantminer and the GNResolver relative to the TNRS, we submitted to each application a list of 1000 uncorrected plant names from a database of ecological inventories (The SALVIAS Project ADDIN CSL_CITATION { "citationItems" : [ { "id" : "ITEM-1", "itemData" : { "author" : [ { "dropping-particle" : "", "family" : "Enquist", "given" : "Brian", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Boyle", "given" : "Bradley", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" } ], "container-title" : "Vegetation databases for the 21st century. \u2013 Biodiversity & Ecology 4", "editor" : [ { "dropping-particle" : "", "family" : "Dengler", "given" : "J.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Oldeland", "given" : "J.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Jansen", "given" : "F.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Chytr\u00fd", "given" : "M.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Ewald", "given" : "J.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Finckh", "given" : "M.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Gl\u00f6ckler", "given" : "F.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Lopez-Gonzalez", "given" : "G.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Peet", "given" : "R.K.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Schamin\u00e9e", "given" : "J.H.J.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" } ], "id" : "ITEM-1", "issued" : { "date-parts" : [ [ "2012" ] ] }, "page" : "288-288", "title" : "The SALVIAS vegetation inventory database", "type" : "chapter" }, "uris" : [ "" ] } ], "mendeley" : { "previouslyFormattedCitation" : "[9]" }, "properties" : { "noteIndex" : 0 }, "schema" : "" }[9]; see Additional file 2). The list contained a variety of errors such as misspelled taxon names, annotations, frame shifts, unconverted extended ASCII codes, morphospecies, etc. For the TNRS, we used Tropicos as the only taxonomic source; all other options were left at the default settings. As Plantminer checks names against both The Plant List and Tropicos, we expected that all names resolved by the TNRS should also be discoverable by Plantminer. Tropicos taxonomy is not available for use by the GNResolver; instead, we selected the International Plant Names Index (IPNI) ADDIN CSL_CITATION { "citationItems" : [ { "id" : "ITEM-1", "itemData" : { "URL" : "", "id" : "ITEM-1", "issued" : { "date-parts" : [ [ "0" ] ] }, "title" : "International Plant Names Index", "type" : "webpage" }, "uris" : [ "" ] } ], "mendeley" : { "previouslyFormattedCitation" : "[30]" }, "properties" : { "noteIndex" : 0 }, "schema" : "" }[30] as the taxonomic source due to the high overlap between the two databases ADDIN CSL_CITATION { "citationItems" : [ { "id" : "ITEM-1", "itemData" : { "URL" : "", "accessed" : { "date-parts" : [ [ "2011", "9", "5" ] ] }, "id" : "ITEM-1", "issued" : { "date-parts" : [ [ "0" ] ] }, "title" : "The Plant List: Sources", "type" : "webpage" }, "uris" : [ "" ] } ], "mendeley" : { "previouslyFormattedCitation" : "[73]" }, "properties" : { "noteIndex" : 0 }, "schema" : "" }[73]. We scored a name as successfully resolved if the application returned the expected name, as determined by inspection against the Tropicos database. For the GNResolver, we excluded from error counts any names which failed to resolve because the intended name was not in IPNI. Due to different conventions for spelling and abbreviation of author names between the source databases, we did not require matching of authorities. The TNRS processed the 1000 names in 43 sec, or 0.04 sec/name, successfully correcting 980 names. Of the 20 failed matches, 17 were incomplete (matching to genus only), one was a non-match, and two were incorrect matches (matches to the wrong name). Plantminer processed the names in 10 min 13 sec, or 0.6 sec/name, successfully correcting 881 names. Of the 119 failed matches, 20 were incomplete, 76 were non-matches, and 23 were incorrect matches. The GNResolver processed the names in 5 min 12 sec or 0.3 sec/name successfully correcting 745 names. Of the 255 failed matches, 33 were incomplete, 226 were non-matches, and 4 were incorrect matches. Most errors made by the TNRS (13 names, 65% of total errors) were incomplete matches due to badly-misspelled names outside the match threshold (Table 4; see Additional file 3 for a complete list of all incorrectly matched names). The remaining failures were caused by numbers in the authority, capitalized specific epithets, and mistaking morphospecies names or non-standard annotations for scientific name components (for example, the second part of the variant annotation "sp. nova" ("sp. nov." or new species) was converted to the specific epithet "nana"). The largest category of name resolution errors by Plantminer (58 names, 48%) were due to failure to recognize the standard annotations "cf.", "aff." and the non-standard but commonly-used "indet.". In most cases the presence of these terms resulted in non-matches rather than partial matches. Also common was failing to make a partial match (35 names, 19 to family and 15 to genus) for names accompanied by annotations or morphospecies strings (e.g., "Fabaceae Indet. sp. 21" was correctly matched to "Fabaceae" by the TNRS, but was not matched by Plantminer). By far the largest source of error for the GNResolver (217 names, 85% of total erroneous names) were non-matches caused by names in all capital letters. Even when perfectly spelled, such names resulted in non-matches. Other major causes of error were misspelled names outside the match threshold, failure to recognize some annotations (in particular "aff." and embedded question marks), and parsing errors triggered by special characters such as pipe ("|") in the name submitted. Features of the TNRS that enabled it to achieve a higher rate of success than both Plantminer and the GNResolver included recognition of a larger diversity of botanical annotations and alternative formulations of infraspecific rank indicators ("ssp" instead of "subsp."), the ability to perform partial matches to genus or family when the full name cannot be matched, and reduced sensitivity to case. In addition, the above results suggest that most TNRS match failures are easily remedied by allowing a less strict match threshold than the current default (although at the risk of an increased rate of false positives). Finally, this test compared only the abilities of the three applications to match names. Features such as warning flags, constraining by higher taxonomy, and tools for comparing and selecting alternative matches are unique to the TNRS and cannot be compared to other applications.2. Improving linkages between taxonomic databasesAs a test of the ability of the TNRS to increase linkages among biodiversity datasets, we compared overlap between two major taxonomic databases, pre- and post-standardization with the TNRS. The databases compared were the Integrated Taxonomic Information System (ITIS ADDIN CSL_CITATION { "citationItems" : [ { "id" : "ITEM-1", "itemData" : { "URL" : "", "author" : [ { "dropping-particle" : "", "family" : "USDA", "given" : "", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" } ], "id" : "ITEM-1", "issued" : { "date-parts" : [ [ "2009" ] ] }, "title" : "USDA Plants", "type" : "webpage" }, "uris" : [ "" ] } ], "mendeley" : { "previouslyFormattedCitation" : "[46]" }, "properties" : { "noteIndex" : 0 }, "schema" : "" }[46]) and the National Center for Biotechnology Information taxonomic database (NCBI, the taxonomic component of GenBank ADDIN CSL_CITATION { "citationItems" : [ { "id" : "ITEM-1", "itemData" : { "URL" : "", "author" : [ { "dropping-particle" : "", "family" : "NCBI", "given" : "", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" } ], "id" : "ITEM-1", "issued" : { "date-parts" : [ [ "2009" ] ] }, "title" : "NCBI taxonomy ftp site", "type" : "webpage" }, "uris" : [ "" ] } ], "mendeley" : { "previouslyFormattedCitation" : "[74]" }, "properties" : { "noteIndex" : 0 }, "schema" : "" }[74]). From each database, we extracted all plant names at the rank of species or below (NCBI, viridiplantae subtree; ITIS, kingdom="Plantae"). From NCBI, we included only formal scientific names, excluding informal names referring to samples or accessions (so-called “dark taxa” sensu R. Page ADDIN CSL_CITATION { "citationItems" : [ { "id" : "ITEM-1", "itemData" : { "URL" : "", "accessed" : { "date-parts" : [ [ "2011", "9", "5" ] ] }, "author" : [ { "dropping-particle" : "", "family" : "Page", "given" : "Roderic", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" } ], "id" : "ITEM-1", "issued" : { "date-parts" : [ [ "0" ] ] }, "title" : "Dark taxa: GenBank in a post-taxonomic world", "type" : "webpage" }, "uris" : [ "" ] } ], "mendeley" : { "previouslyFormattedCitation" : "[48]" }, "properties" : { "noteIndex" : 0 }, "schema" : "" }[48]). The lists of unique names from both databases combined were then standardized using the TNRS. As both USDA Plants and NCBI can be used as taxonomic sources by the TNRS (USDA species are in theory a subset of those in ITIS), we used only Tropicos and the GCC (Global Compositae Checklist) as taxonomic sources. All other options were left at their default settings. Prior to standardization, 4,412 names out of a combined total of 141,814 (roughly 3%) were shared between the two databases. After standardization and matching of names by the TNRS, plus conversion of synonyms, total names dropped to 114,497 and the overlap between the two databases increased to 20,670, or 18% (Table 5). Interestingly, much of the gain in overlap (or conversely, loss of superfluous or incorrect names) occurred during matching (a 350% increase), rather than conversion of synonyms (an additional 16% increase; see Table 5). Overall, at least 27,317 names in the combined databases, or 19.2%, were erroneous or redundant entries due to spelling errors, variant spellings or synonymy.The most important outcomes of name resolution was the nearly five-fold increase in taxonomic overlap between the two taxonomic databases. This result highlights the potential of taxonomic resolution as a general tool for integrating and building linkages between biodiversity databases.Future DirectionsOne of the primary strengths of the TNRS is to provide a repeatable and efficient workflow for accessing existing, best available taxonomic sources. The ease with which new sources are added to the TNRS database suggest that future efforts should be directed to encouraging providers of high-quality taxonomy to make their information available via the TNRS. Although the TNRS provides a way to resolve many common forms of taxonomic semantic heterogeneity—in particular ambiguities due to misspellings and lexical variants, nomenclatural synonyms, and many forms of homonyms—major challenges remain. In particular, divergent taxonomic concepts can translate to differences in traits and geographic distributions; yet such differences are not reflected by differences in taxon names and authorities ADDIN CSL_CITATION { "citationItems" : [ { "id" : "ITEM-1", "itemData" : { "DOI" : "10.1017/S147720000800282X", "author" : [ { "dropping-particle" : "", "family" : "Franz", "given" : "N.M.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Peet", "given" : "R.K.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" } ], "container-title" : "Systematics and Biodiversity", "id" : "ITEM-1", "issue" : "1", "issued" : { "date-parts" : [ [ "2009", "3" ] ] }, "page" : "5-20", "title" : "Perspectives: Towards a language for mapping relationships among taxonomic concepts", "type" : "article-journal", "volume" : "7" }, "uris" : [ "" ] } ], "mendeley" : { "previouslyFormattedCitation" : "[40]" }, "properties" : { "noteIndex" : 0 }, "schema" : "" }[40]. For example, depending on the concept used, Abies lasiocarpa (Hook.) Nutt. (subalpine fir) is either (a) widely distributed throughout the Pacific Northwest and the interior Rocky Mountains of North America or (b) restricted to the coastal ranges of British Columbia and Alaska. A more complex example is provided by the grass Andropogon virginicus where the name has at least 5 meanings that overlap with 17 different taxon concepts that are variously given 27 scientific names ADDIN CSL_CITATION { "citationItems" : [ { "id" : "ITEM-1", "itemData" : { "author" : [ { "dropping-particle" : "", "family" : "Franz", "given" : "NM", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Peet", "given" : "RK", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Weakley", "given" : "AS", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" } ], "container-title" : "The New Taxonomy, Systematics Association Special Volume Series 74", "editor" : [ { "dropping-particle" : "", "family" : "Wheeler", "given" : "Q. D.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" } ], "id" : "ITEM-1", "issued" : { "date-parts" : [ [ "2008" ] ] }, "page" : "61-84", "publisher" : "Taylor & Francis", "publisher-place" : "oca Raton, FL", "title" : "On the use of taxonomic concepts in support of biodiversity research and taxonomy", "type" : "chapter" }, "uris" : [ "" ] } ], "mendeley" : { "previouslyFormattedCitation" : "[75]" }, "properties" : { "noteIndex" : 0 }, "schema" : "" }[75]. Unfortunately, at the present time, disambiguating such taxonomic ambiguity due to differing taxon concepts requires information on usage not communicated by the name alone, and rarely provided by most current taxonomic sources (but see ADDIN CSL_CITATION { "citationItems" : [ { "id" : "ITEM-1", "itemData" : { "author" : [ { "dropping-particle" : "", "family" : "Franz", "given" : "NM", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Peet", "given" : "RK", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" }, { "dropping-particle" : "", "family" : "Weakley", "given" : "AS", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" } ], "container-title" : "The New Taxonomy, Systematics Association Special Volume Series 74", "editor" : [ { "dropping-particle" : "", "family" : "Wheeler", "given" : "Q. D.", "non-dropping-particle" : "", "parse-names" : false, "suffix" : "" } ], "id" : "ITEM-1", "issued" : { "date-parts" : [ [ "2008" ] ] }, "page" : "61-84", "publisher" : "Taylor & Francis", "publisher-place" : "oca Raton, FL", "title" : "On the use of taxonomic concepts in support of biodiversity research and taxonomy", "type" : "chapter" }, "uris" : [ "" ] } ], "mendeley" : { "previouslyFormattedCitation" : "[75]" }, "properties" : { "noteIndex" : 0 }, "schema" : "" }[75]). As such information becomes available, future efforts should be directed toward the resolution not simply of names but of biologically more meaningful taxon concepts. Although the TNRS was developed to resolve plant names, relatively minor changes are needed to extend coverage to other organisms and nomenclatural codes. Such improvements are beyond the scope of the current project, but we encourage others in the community to adapt the TNRS to their needs by accessing the source code at our publicly available repository.ConclusionsThe increasing availability of large, digitized biological datasets, while clearly a boon for biodiversity research, is also leading to an accumulation of incorrect, ambiguous or outdated taxon names, with negative consequences for comparative biological science, policy making, and data discovery. In an effort to provide a way forward we have developed the Taxonomic Name Resolution Service or TNRS, an application for correcting and standardizing taxonomic names with reference to existing sources of high-quality taxonomy.The TNRS combines, within a single application, automated name parsing and correction with tools for inspection and resolution of ambiguous results. The TNRS provides a labor-saving and repeatable workflow for standardizing taxonomic names across an array of legacy and contemporary biodiversity data. A web interface makes the TNRS accessible to non-specialist users, while web services support programmatic access by expert users in need of automated name resolution. Tests demonstrate the potential of the TNRS for reducing error and increasing integration among major organismal databases.Availability and requirementsProject name: Taxonomic Name Resolution ServiceProject home page: systems: Linux basedProgramming languages: PHP, MySQL, Ruby, JavaOther requirements: Java JDK 1.7.0 or higher, Git 1.7.4 or higher, MySQL 5.0.95 or higher, PHP 5.3.3 or higher (including mysql and mbstring extensions), Maven 2.2.1 or higher, Apache Tomcat 7.0.33 or higher, Apache HTTP Server 2.2.3 or higher, Apache JK Modules 1.2.31 or higher, YAML 0.1.4, Ruby 1.9.3 or higher, Rubygems 1.8.23 or higher. The setup has been tested on CentOS 5.8. Details are available at : The TNRS was built on two existing open-source projects, each of which retain their original licensing. The SilverBiology PHP port of Taxamatch [60] uses the Apache 2.0 license, and GNI's name parser uses a BSD style license. All other code is licensed using a standard BSD license ADDIN CSL_CITATION { "citationItems" : [ { "id" : "ITEM-1", "itemData" : { "URL" : "", "id" : "ITEM-1", "issued" : { "date-parts" : [ [ "0" ] ] }, "title" : "Open Source Initiative", "type" : "webpage" }, "uris" : [ "" ] } ], "mendeley" : { "previouslyFormattedCitation" : "[76]" }, "properties" : { "noteIndex" : 0 }, "schema" : "" }[76]. Any restrictions to use by non-academics: NoneAccess to source code: The TNRS user interface is freely accessible via the TNRS website at . Instructions for accessing the TNRS matchNames web service can be found at . Developers wishing to modify the TNRS for their own needs can download source code from the iPlant GitHub repository at . A virtual machine image of the TNRS pre-loaded with an example database can be launched from within iPlant's Atmosphere cloud computing environment (; requires iPlant credentials).List of AbbreviationsAMS: Author match scoreAPI: Application programming interfaceED: Edit distanceMaxEDR: Maximum edit distance ratioMSL: Minimum length of the two strings being comparedOMS: Overall match scorePMS: Partial match scoreSNMS: Scientific name match scoreSNMStr: Transformed scientific name match scoreCompeting interestsThe authors declare that they have no competing interests.Authors' contributionsInitial concept for the TNRS was developed by BB and BJE, with later suggestions from WHP, RKP, ZL, JARG, CF, SJM, NM, MN and NH. BB, CF, WP and RKP participated in the preliminary TNRS planning meeting at the Missouri Botanical Garden. The GNI Parser was designed by DM. Taxamatch was conceived and developed as a standalone product by TR, who assisted with advice regarding its implementation within the TNRS. The core database and database loading scripts were developed by BB, with assistance from ZL and TR. Pre- and post-processing code and Taxamatch extensions were written by ZL. User interface and web services layer were coded by JARG. Scoring and ranking algorithms were developed by TR, DM, ZL, BB and NM. Project direction was provided by NH, MN and SL. The initial draft of the paper was written by BB. All authors read, participated in revision, and agreed with the final manuscript.AcknowledgementsWe thank the Missouri Botanical Garden for sharing data from Tropicos, for modifying their web service API to enable improved access to these data and for their collaboration throughout this project. We thank Michael Giddens (Silver Biology) for access to his PHP implementation of Taxamatch. David Shorthouse collaborated with DM in developing the algorithm upon which the TNRS score transformation is based. Several iPlant developers made important contributions: Andrew Muir and Sriram Srinivasan developed the framework for the user interface, Hariolf Haefele provided database support, John Wregglesworth assisted with the job execution framework, and members of the Core Software team (Edwin Skidmore, Sangeeta Kuchimanchi, Steven Gregory and Andy Edmonds) deployed services and provide ongoing support. Bob Magill, Charles Miller, Paul Morris, Peter J?rgensen, Jay Paige, Alan Paton, Cam Webb, and Amy Zanne attended the preliminary TNRS planning meeting and provided many valuable suggestions. Finally, we would like to acknowledge Shannon Oliver for her invaluable contribution to the website design and documentation. BJE was supported by NSF grant DBI 0850373 and TR by CSIRO Marine and Atmospheric Research, Australia,. BB and BJE acknowledge early financial support from Conservation International and TEAM who funded the development of early prototypes of taxonomic name resolution. The iPlant Collaborative () is funded by a grant from the National Science Foundation (#DBI-0735191).ReferencesADDIN Mendeley Bibliography CSL_BIBLIOGRAPHY 1. Global Biodiversity Information Facility [].2. Tropicos [].3. REMIB - Red mundial de informacion sobre biodiversidad [].4. OBIS [].5. VertNet [].6. MaNIS [].7. The Paleobiology Database [].8. Peet RK, Lee MT, Jennings MD, D. Faber-Langendoen D: VegBank: a permanent, open-access archive for vegetation plot data. Biodiversity and Ecology 2012, 4:233–241.9. Enquist B, Boyle B: The SALVIAS vegetation inventory database. In Vegetation databases for the 21st century. – Biodiversity & Ecology 4. edited by Dengler J, Oldeland J, Jansen F, et al.2012:288–288.10. Gray AN, Brandeis TJ, Shaw JD, McWilliams WH, Miles PD: Forest Inventory and Analysis Database of the United States of America. Biodiversity & Ecology 2012, 4:225–231.11. Lopez-Gonzalez G, Lewis SL, Burkitt M, Phillips OL: : a web application and research tool to manage and analyse tropical forest plot data. Journal of Vegetation Science 2011, 22:610–613.12. Center for Tropical Forest Science [].13. Dengler J, Jansen F, Gl?ckler F, et al.: The Global Index of Vegetation-Plot Databases (GIVD): a new resource for vegetation science. Journal of Vegetation Science 2011, 22:582–597.14. TraitNet [].15. Kattge J, Díaz S, Lavorel S, et al.: TRY–a global database of plant traits. Global Change Biology 2011, 17:2905–2935.16. Benson D a, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW: GenBank. Nucleic acids research 2009, 37:D26–31.17. TreeBASE [].18. Thomas C: Biodiversity Databases Spread, Prompting Unification Call. Science 2009, 324:1632.19. Soberón J, Peterson a T: Biodiversity informatics: managing and applying primary biodiversity data. Philosophical transactions of the Royal Society of London. Series B, Biological sciences 2004, 359:689–98.20. Guralnick RP, Hill AW, Lane M: Towards a collaborative, global infrastructure for biodiversity assessment. Ecology letters 2007, 10:663–72.21. Funk VA, Zermoglio MF, Nasir N: Testing the use of specimen collection data and GIS in biodiversity exploration and conservation decision making in Guyana. Biodiversity and Conservation 1999, 8:727–751.22. Frese L: Towards improved in situ management of Europe’s crop wild relatives. Crop wild relative 2008, 3627.23. Harris ESJ, Erickson SD, Tolopko AN, et al.: Traditional Medicine Collection Tracking System (TM-CTS): a database for ethnobotanically driven drug-discovery programs. Journal of ethnopharmacology 2011, 135:590–3.24. Paton A: Biodiversity informatics and the plant conservation baseline. Trends in plant science 2009, 14:629–37.25. Edwards JL: Interoperability of Biodiversity Databases: Biodiversity Information on Every Desktop. Science 2000, 289:2312–2314.26. Dayrat B: Towards integrative taxonomy. Biological Journal of the Linnean Society 2005, 85:407–415.27. Bortolus A: Error cascades in the biological sciences: the unwanted consequences of using bad taxonomy in ecology. AMBIO: A Journal of the Human Environment 2008, 37:114–118.28. Global Names [].29. The Plant List [].30. International Plant Names Index [].31. ZooBank [].32. UBio [].33. Encyclopedia of Life [].34. Integrated Taxonomic Information System (ITIS) [].35. Catalogue of Life [].36. Gerner M, Nenadic G, Bergman CM: LINNAEUS: a species name identification system for biomedical literature. BMC bioinformatics 2010, 11:85.37. Gwinn NE, Rinaldo C: The Biodiversity Heritage Library: sharing biodiversity literature with the world. IFLA Journal 2009, 35:25–34.38. Chave J, Muller-Landau HC, Baker TR, et al.: Regional and phylogenetic variation of wood density across 2456 Neotropical tree species. Ecological applications?: a publication of the Ecological Society of America 2006, 16:2356–67.39. Weiser MD, Enquist BJ, Boyle B, et al.: Latitudinal patterns of range size and species richness of New World woody plants. Global Ecology and Biogeography 2007, 16:679–688.40. Franz NM, Peet RK: Perspectives: Towards a language for mapping relationships among taxonomic concepts. Systematics and Biodiversity 2009, 7:5–20.41. Goff SA, Vaughn M, McKay S, et al.: The iPlant collaborative?: cyberinfrastructure for plant biology. Frontiers in Plant Science 2011, 2:1–16.42. The iPlant Tree of Life Project [].43. The Botanical Information and Ecology Network [].44. History of the OSI [].45. Global Compositae Checklist [].46. USDA Plants [].47. NCBI Taxonomy [].48. Dark taxa: GenBank in a post-taxonomic world [].49. Karthick B, Williams D: The International Code for Nomenclature for algae , fungi and plants – a significant rewrite of the International Code of Botanical Nomenclature. Current Science (Bangalore) 2012, 102:551–552.50. Celko J: Joe Celko’s SQL for Smarties: Trees and Hierarchies. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.; 2004.51. Haston E, Richardson JE, Stevens PF, et al.: The Linear Angiosperm Phylogeny Group ( LAPG ) III?: a linear sequence of the families in APG III. Botanical Journal of the Linnean Society 2009:128–131.52. Simple Darwin Core [rs.dwc/terms/simple/index.htm].53. Stearn WT: Botanical Latin. Timber Press; 2004:560.54. GNI Name Parser [].55. Mozzerhin D: GNA Scientific Name Parser server script. 2010.56. Ford B: Parsing expression grammars. ACM SIGPLAN Notices 2004, 39:111–122.57. Taxamatch Web Service [].58. TAXAMATCH - fuzzy matching algorithm for genus and species scientific names [].59. Odell M, Russell R: The soundex coding system. 1918.60. Gadd TN: PHONIX: The algorithm. Program: electronic library and information systems 1990, 24:363–366.61. Fuller M, Zobel J: Conflation-based comparison of stemming algorithms. In Proceedings of the Third Australian Document Computing Symposium, Sydney, Australia, August 21, 1998. edited by Kay J, Milosavlje M Sydney: University of Sydney; 1998:8–13.62. Damerau FJ: A technique for computer detection and correction of spelling errors. Communications of the ACM 1964, 7:171–176.63. Levenshtein VI: Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics Doklady 1966, 10:707–710.64. Owolabi O, McGregor DR: Fast approximate string matching. SOFTWARE-PRACTICE AND EXPERIENC 1988, 18:387–393.65. Brummitt RK, Powell CE: Authors of Plant Names. London, U.K.: Kew: Royal Botanical Gardens; 1992:732.66. Interim Register of Marine and Nonmarine Genera [].67. Farrell J, Nezlek GS.: Rich Internet Applications The Next Stage of Application Development. In 29th International Conference on Information Technology Interfaces, ITI 2007, June 25-28. 2007:413–418.68. Tropicos Web Services [].69. Tropicos Name Matching [].70. GRIN Taxonomic Nomenclature Checker [].71. Carvalho GH, Cianciaruso MV, Batalha MA: Plantminer: A web tool for checking and gathering plant species taxonomic information. Environmental Modelling & Software 2010, 25:815–816.72. Global Names Resolver [].73. The Plant List: Sources [].74. NCBI taxonomy ftp site [].75. Franz N, Peet R, Weakley A: On the use of taxonomic concepts in support of biodiversity research and taxonomy. In The New Taxonomy, Systematics Association Special Volume Series 74. edited by Wheeler QD oca Raton, FL: Taylor & Francis; 2008:61–84.76. Open Source Initiative []. Table 1. Details of taxonomic sources used by the TNRS. Total names includes higher taxa and infraspecific taxa in addition to species. Taxonomic scope refers to the subset of the database used for the TNRS (for example, NCBI Taxonomy covers the entire tree of life, not just embryophytes). "Embryophytes" are flowering plants, conifers, ferns, mosses, hornworts and liverworts.NameTotal namesTaxonomic scopeGeographic scopePrimary URLTropicos1,250,897EmbryophytesComprehensive coverage of North, Central and South America; partial coverage of Old World, especially Madagascar, Aast Africa and China. Plants93,307Embryophytes and lichensU.S. and its territories, Canada, Greenland Compositae Checklist123,551AsteraceaeGlobal Taxonomy210,214EmbryophytesGlobal 2. Name processing settings. These user options must be set prior to submitting names for processing. Best match settings (not listed; see User options) are adjusted after processing is complete.SettingDescriptionOptionsProcessing modeDetermines whether the name is parsed and resolved (corrected) or parsed onlyFull name resolution (default)Parse names onlyMatch accuracyAdjusts the minimum OMS required to return a name as a candidate matchSlider from lowest (default) to highest (perfect match, OMS = 1.0)Allow partial matchesIf enabled, the TNRS will match a higher taxonomic component of a name if it cannot match the name at the rank submittedEnabled (default)Not enabledSourcesTaxonomic sources used to resolve names. Higher-ranked sources applied first if Best match setting "Constrain by source" enabled (see text)SelectDeselectRank by dragging/droppingFamily classificationSource of family classification for matched and accepted names Tropicos / APG III (default)NCBI (similar to APG III, with recent changes)Table 3. Comparison of features of name resolution applications.ApplicationBatch processingFuzzy matchingCorrects synonymsProvides confidence scoreReturns alternative matchesAPIUser interfaceHandles infraspecific taxaTNRSxxxxxxxxTropicos web servicexxxxCatalogue of Lifex?x???xxTropicos name matching utilityxxxTaxamatch (IRMNG)xxxxxGNResolverxxxxxxxxGRIN Taxonomic Nomenclature CheckerxxxxxPlantminerxxxxxxTable 4. Types of errors made during resolution of 1000 names by Plantminer, GNResolver and the TNRS. Most likely cause of errorPlantminerGNResolverTNRSAnnotation not recognized58213Name all caps217Capitalized specific epithet11Failed to match family or genus34Infraspecific rank indicator not recognized3Morphospecies treated as taxon151Name submitted matches to >1 name84Failed fuzzy match, outside threshold913Parsing error caused by number in authority2Parsing error caused by special character in name2Unknown11Total11925520Table 5. Total names within two plant taxonomic databases before and after name resolution using the TNRS. Totals after matching include the original name if no match was found by the TNRS. Totals after matching and synonym conversion use accepted names in place of synonymous matched names. Name sourceOriginal namesAfter matching by TNRSAfter matching & synonym conversion by TNRSNCBI997439773490142ITIS464834596045025NCBI+ITIS (shared names)44121993520670NCBI+ITIS (total unique names)141814123759114497Figure legendsFigure 1Screenshot of the main TNRS user interface. Up to 5000 names, one per line, may be entered manually or pasted into the “Enter list” text box. Larger lists are uploaded using the "Upload and Submit List" tab. Name processing settings are adjusted prior to submitting the names using the controls in the upper left box. Best match settings, on the upper left of the results display, are set after results are returned, and affect how multiple results for the same name are ranked and therefore how the single best match is selected. The "(+n more)" link allows the user to view and select any alternative matches found. The "Details" hyperlink displays the results and match scores for each name component (genus, species, author, etc.). The remaining hyperlinks link to entries in the original source databases. "Download settings" displays a report of all settings used to resolve the current batch of names. The "Download results" button displays options for downloading results as a plain text file.Figure 2Transformed scientific name match score (SNMStr) versus original, untransformed score (SNMS) of a submitted binomial, showing the differing degrees of certainty defined by the transformation function. In the two regions of certainty, small score differences have a smaller impact on the outcome: either there is a mismatch (SNMS=-2) or a perfect match (SNMS=2). Similarly, in the region of uncertainty, small score differences do not help to distinguish between matches and mismatches. In the regions of discrimination, instead, there is already a preference towards matches or mismatches, and small differences can help tip the balance.Additional filesAdditional file 1 – addlFile1_tnrsExample.RExample R script which uses the TNRS API to correct names on a phylogeny.Additional file 2 – addlFile2_1000TestNames.csvTaxonomic names used to compare performance of TNRS, Plantminer and GNResolver.Additional file 3 – addlFile3_matchingErrors.xlsxList of submitted names, expected targets, and descriptions of errors for names which failed matching by one or more name resolution applications. ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download