Combined study on digital sequence information in public ...



CBDDistr.GeneralCBD/DSI/AHTEG/2020/1/431 January 2020ENGLISH ONLYAd Hoc Technical Expert Group on Digital Sequence Information on Genetic ResourcesMontreal, Canada, 17-20 March 2020Combined study on digital sequence information in public and private databases and traceability Note by the Executive SecretaryAt its fourteenth meeting, the Conference of the Parties to the Convention on Biological Diversity requested the Executive Secretary “to commission a peer-reviewed study on ongoing developments in the field of traceability of digital information, including how traceability is addressed by databases, and how these could inform discussions on digital sequence information on genetic resources” (decision 14/20, para. 11 (c)), and “to commission a peer-reviewed study on public and, to the extent possible, private databases of digital sequence information on genetic resources, including the terms and conditions on which access is granted or controlled, the biological scope and the size of the databases, numbers of accessions and their origin, governing policies, and the providers and users of the digital sequence information on genetic resources and encourages the owners of private databases to provide the necessary information;” (decision 14/20, para. (d)).Accordingly, and with financial support from Norway and the European Union, the Executive Secretary commissioned a research team to carry out the studies in a combined manner, taking into account the conceptual linkages between the two studies, and also partly for practical reasons.A draft of the combined study was made available online for peer review from 22 October to 22?November 2019. The comments received in response have been made available online. The research team revised the study in the light of the comments received and prepared, in consultation with the Secretariat, the final version as presented herein. Any views expressed in the study are those of the authors or the sources cited in the study and do not necessarily reflect the views of the Secretariat.It should also be noted that this study is distinct but complementary to two studies that the Executive Secretary was requested to commission pursuant to decision 14/20, paragraphs 11(b) and (e) and the synthesis of views prepared pursuant to decision 14/20, paragraph 11(a).The executive summary of the study is presented below, and the full text of the combined study is contained in the annex. The study is presented in the form and language in which it was received by the Secretariat.EXECUTIVE SUMMARYStudy mandate and terminology. In decision 14/20, paragraph 11, the Conference of the Parties requested four studies. This is a combined study on digital sequence information (DSI) in databases and DSI traceability (decision 14/20, paras. 11(c) and?(d)).“DSI” is widely acknowledged as a placeholder term for which no consensus on a replacement exists to date. To fulfil the study mandate within the allotted less than three months to a first draft, and without bias for future discussions or the parallel commissioned study on the concept and scope of DSI (para. 11(a)), the present study focuses on a defined and tractable data type – “Nucleotide Sequence Data” (NSD) – which is also the term used by the core database infrastructure, the International Nucleotide Sequence Data Collaboration (INSDC, discussed below). A secondary term “Subsidiary Information” (SI) is employed when datasets extend beyond NSD.Study scope and limitations. Public NSD databases have a 40-plus-year history that stretches back to the late 1970s and runs parallel to the technological developments and growth of DNA sequencing. We analysed more than 1,600 biological databases listed in the annual Nucleic Acids Research’s database issue (Figure 1, section 3.2) to understand the NSD database landscape and structure. The goal of the inventory was to determine when and where NSD enters the public sphere, in other words, when NSD first enters into an NSD database. Indeed, 95% (705 out of 743) of NSD databases directly link to or download NSD from the INSDC. The remaining 5% of NSD databases allow direct NSD submissions but require the use of unique identifiers – Accession Numbers (ANs) – which are generated by the INSDC and so are inherently connected to the infrastructure. Simply put, NSD databases rely on the INSDC and use ANs to enable traceability through the database landscape.By narrowing our analysis to NSD databases, we were able to perform a standardized, quantitative, peer-review-based, transparent analysis. This would not have been possible in the limited time available if we had expanded our analysis to include subsidiary information, which has a heterogeneous and ambiguous nature. As NSD is often used to predict protein sequences and the technological format of protein sequences and their databases is, in many ways, similar to the INSDC system, observations on NSD and NSD databases may be extendable to this particular type of subsidiary information. However, beyond nucleotide and protein sequence data, other types of subsidiary information are likely to be more difficult to understand, define and trace. Accordingly, this study provides a useful starting point for further analysis of databases and traceability issues associated with subsidiary information.INSDC is the core database infrastructure for publicly available NSD. The INSDC is an international collaboration between GenBank in the United States of America, the European Nucleotide Archive (ENA) in the United Kingdom of Great Britain and Northern Ireland, and as of the early 1980s, the DNA Data Bank of Japan (DDBJ). These three databases provide the scientific community around the world with a complete, high-quality, reliable, open, and free infrastructure for NSD. The three INSDC partners “mirror” (exchange) all NSD in their databases every 24 hours to maintain an up-to-date copy of all published NSD for global use (Figure 2, section 3.2).The INSDC enables scientists to submit their NSD and receive an AN, which is, in turn, required by the vast majority of life science journals when a scientist (from any country) reports on NSD-based results. The requirement to publish NSD is intended to enable scientific reproducibility and perpetuate scientific integrity. This practice was codified in 1996, during the Human Genome Project, by the Bermuda Principles, in 2003 by the Fort Lauderdale agreement, and in 2009 by the Toronto Agreement. In parallel, Good Scientific Practice codices, growing societal pressure for transparency and ethics in scientific discovery, and open-access requirements by funding agencies led to the now near-universal scientific practice of submitting newly generated NSD to the INSDC.In 2002, the INSDC published its use policy of “free and unrestricted access” with “no use restrictions” and said that data would be “permanently accessible”. In 2016, it reaffirmed this, stating that “the core of the INSDC policy is maintaining public access to the global archives of nucleotide data generated in publicly funded experiments. A key instrument for this is submission as a prerequisite for publication in scholarly journals…”. In addition, INSDC provides training, technical assistance, free software tools, and tutorials. The combined costs across all three INSDC databases are estimated at US$?50-60 million annually. The more than 700 public NSD databases that use and download NSD from INSDC agree to and depend on the INSDC’s use policy.What is actually stored in the INSDC databases? Since 1982, the number of bases in GenBank has doubled every 18 months, with a current average of 3,700 new submissions per week. The April 2019 release of GenBank contained over 212 million NSD entries consisting of over 321 billion bases (in the case of DNA, the nucleotides represented by the letters A, C, G, T), which included:Human NSD, which is out of scope of the CBD (Article 15), accounts for 12% of entries;Model organism NSD represents at least 12% of entries. (Model organisms are in-bred organisms used over decades (i.e., accessed prior to 1992) to study biological processes in a standardized manner but which can also occur in the wild. See section 3.4.);The remaining 76% of NSD are from (in decreasing order of abundance) animals, plants, bacteria/microorganisms, fungi, and viruses, as shown in Figure 3;The size of a single NSD entry varies by ten orders of magnitude from 1 base to 109 bases;About 85% of NSD entries are <1,000 bases long. The remaining 15% of entries store 95% of the total bases stored in INSDC;There are huge variations in the size, significance, and biological content of NSD entries. In recent years, larger entries have become more common as whole genome NSD production has increased.Who uses INSDC? Users within every sovereign State in the world. The 10 to 15 million users of INSDC are found in every country – both developed and developing (Figure 5a-b). The greatest volume of users is in the United States (23%) and China (15%). These two countries also provide the greatest amount of NSD to INSDC (see below) and have large populations. Approximately half of INSDC users are outside the countries that fund the INSDC. Use of INSDC also occurs via FTP download (partial or complete download of the entire INSDC dataset) to 140 countries. In terms of data volume, FTP use accounts for approximately 50 times more data than web page access. Germany, the United States and China are leading in FTP usage although FTP access is often automated rather than user-originated.How does the existing traceability system of NSD work? Figure 6 (section 3.2) provides a simplified, schematic overview of how NSD is generated, analysed, published in INSDC, imported into other databases, linked to publications, and used by public and private research. There are two key informatics tools for NSD traceability within the scientific ecosystem that emerged through scientific collaboration and innovation over the decades: accession numbers (ANs) and digital object identifiers (DOIs).Ascension numbers are the cornerstone of NSD traceability. Over decades of international partnership and iterative discussions among INSDC members, sequencing experts and the scientific community, the modern-day seamless exchange and traceability of NSD within INSDC and with thousands of biological databases was established via a unique identifier system. ANs are generated by INSDC databases following NSD submission and are linked to every individual NSD entry in the INSDC. ANs are also used for NSD metadata, such as information on the country of origin, experimental design information, sequencing centre, etc. ANs are at the centre of a web of internal and external traceability supported by a complex database schema in the background. DOIs are used by journals and literature databases and provide a link between submitted NSD and the respective publication(s). ANs and DOIs enable traceability once NSD leaves the INSDC databases and enters other databases.Can you trace NSD to the GR? Only if the GR is deposited in a collection and the submitter reports it. There are three categories of metadata that enable a scientist to submit NSD and establish a link to a publicly available GR (i.e., from a museum, culture collection or botanical garden). INSDC provides information on best practices regarding required syntax, and the collections provide the unique identifier. About 6% of the INSDC entries have a link to publicly available GR. There are additional metadata fields that can enable a connection to privately held GR.Can you trace NSD to the country of origin? Yes, if it is relevant and the submitter reports it. The INSDC databases provide a metadata tag “/country” that enables scientists to label the country of origin of the NSD. Not all categories of NSD can be labelled with a country tag (e.g., human, model organism, synthetic NSD) and the definitions of “country of origin” and “/country” are not identical (see section 2.2). Furthermore, the country tag came into existence in 1998 and became a required field for environmental samples in 2011, so the total percentage should be understood within these constraints. Importantly, the country tag is not where the genetic resource was sequenced. We manually inspected a subset of data and found no false country reporting or reporting of a sequencing centre location instead of the country of origin. Figure 8a shows the geographic distribution of NSD with a country tag:16% of all INSDC entries have a country of origin listed in the metadata.Over one third of these entries (35%) come from either China (18%) or the United States (17%).Every country in the world has NSD in INSDC (Figure 8a).Over half of the country-tagged NSD come from four countries (United States, China, Canada and Japan). Our observations suggest that most publicly available NSD currently come from countries that are also major users of genetic resources in the context of the Convention on Biological Diversity. We checked entries with no country tag and found that 44% of these entries did not report the country although it was reported in the associated scientific publication. The missing-country-tag NSD followed similar country-of-origin ratios as the country-tagged NSD. The reporting of country-of-origin information is increasing over time (Figure 9). In 2018, over 40% of the NSD entries submitted reported a country of origin. These data suggest that the combined effect of the required field and user awareness of the importance of country of origin has led to better reporting and thus improved traceability. The country tag is accurate and increasingly used, but scientists need to improve reporting of country information.Is it possible to trace NSD to the access permits of the underlying GR? Theoretically yes. Technically, the AN of an NSD entry could be linked to a stable link where access permits (e.g., PIC/MAT) are published. The only system of which we know where this is practically possible is the unique identifier and link generated by the ABS Clearing-House when an Internationally Recognized Certificate of Compliance (IRCC) is published. If a user submitted NSD to INSDC and provided the link from their IRCC, traceability could be established. However, we could not find an example of this linkage, perhaps due to the relative novelty of the IRCC. Importantly, this would not be possible with other forms of access permits (e.g. PDFs) that do not have stable links.What about NSD in private databases? Private databases can be categorized into two general subgroups: “in-house databases” that contain NSD used internally by a company and “commercial databases”, which are available to any paying member of the public and contain curated NSD and SI. All companies interviewed use combinations of a downloaded copy of all or parts of INSDC as well as internally generated NSD and SI. Companies are able to trace their internal NSD to the original GR, but they noted that there is limited country-of-origin information on older NSD found in INSDC. They submit NSD to INSDC as part of the patent application disclosure process and publish NSD and SI, e.g. for scientific publications with collaborators. They use commercial databases that collect and curate information on patent disclosed NSD to check for existing patents at the start of R?and?D projects. Expert interviews suggest commercial databases exist for other scientific specialty areas other than patent NSD databases, but we could not find any verifiable examples of commercial NSD databases. This is probably because almost all NSD is openly accessible, so charging fees for access to NSD is economically unrewarding.Can NSD listed on a patent application be traced? It depends. About 20% of GenBank entries consist of NSD submitted along with a patent application as part of patent disclosure requirements (e.g., required by national or international patent law). Country-of-origin information was not found to be associated with NSD disclosed as part of patent applications. Although country of origin is required in some patent jurisdictions under material requirements, this information does not appear to be transferred when NSD relevant to patents is submitted. Patent NSD per se is not “patented” but is submitted as part of patent law to enable a “practitioner with average skill in the art to practice the invention”. INSDC members either receive direct submissions from their respective patent offices or these patent jurisdictions allow patent applicants to provide the AN on the patent application. It is important to note that NSD is often uploaded to fulfil requirements for patent applications, but often receives a new AN, even if the same NSD already exists in the database. As such, the patent NSD contains large amounts of redundant NSD entries.Technological developments in information traceability. Blockchain technology is being developed and applied for human patient NSD and accompanying patient health information, enabling patients to control access to their private genetic data. Technically, this could be applied to non-human NSD if developments in the field of blockchain continue. However, it could only work for newly generated NSD, as it would need to establish a private, standalone system outside of the INSDC and the public databases. It would also need intensive financial investments and upkeep, and it is debatable whether the benefits could surpass the costs. Other restricted access models from the publishing or media world (e.g., Spotify or Netflix) target only passive access of the user (e.g. listening) rather than the interactive “hands-on” use required by scientists using NSD.Challenges for NSD traceability. The traceability of NSD described in this study is mainly focused on traceability through databases, i.e., the digital realm, for scientific purposes. However, regulatory traceability would need to account for potential challenges (section 6.1): (a) not all the NSD in INSDC is relevant to the Convention on Biological Diversity and its provisions on access and benefit-sharing; (b) NSD flows into and is transformed, parsed, exchanged with more than 1,600 downstream databases that create added value to NSD and require friction-less data flow; (c) NSD generation is growing exponentially and a regulatory system for NSD, or DSI more broadly, would need to be prepared for big data and “future-proofing” scenarios; (d) biology is highly repetitive and NSD is often not (uniquely) attributable to a sovereign State; (e) offline traceability (outside of databases) is nearly impossible.What do these findings imply? There is an existing traceability system for NSD that took INSDC decades to develop in close partnership with the scientific community. It represents a significant technical, scientific, and financial investment in both public and private databases and should be taken into consideration by the Parties in evaluating how to address DSI. The sheer volume and complexity of the public NSD data?set may imply that any measures adopted to address DSI would need to integrate with or align with this existing infrastructure in order to be effective, particularly given its widespread adoption and use. In Section 6 below, the broader implications of this study by sectors are discussed.Scientists could improve traceability during the process of NSD submission to INSDC by improving reporting on GR availability and country of origin, and the scientific and database communities could work on relevant awareness-raising. INSDC could stringently enforce country-of-origin requirements on new NSD submissions, improve metadata fields in order to enable a stable link to IRCCs and information on when GR was accessed from the country of origin and, where feasible, manually curate country-tag information based on information provided in the scientific literature or other reliable sources (which would require screening thousands of articles per year). Parties to the Convention on Biological Diversity could require themselves to exclusively generate IRCCs for users when granting access to GR instead of generating PDF/paper access permits. Furthermore, given their central role in NSD provisioning, the Parties could more closely involve INSDC in the process under the Convention on Biological Diversity. Patent NSD submissions could disclose (if applicable) the original AN if public NSD from INSDC was used in a patent application and if the country of origin was disclosed in the patent application. This information could also be listed in the NSD submission to INSDC.Limitations. Finally, due to the scope and time limitations of this study, the Parties may wish to explore the technical feasibility of traceability of SI (beyond NSD) or NSD outside the database system and examine the structure of data flows and country data that is (or is not) associated with these data types.AnnexCombined study onDigital Sequence Information (DSI)in public and private databasesand traceabilityLead authors: Fabian Rohden1*, Sixing Huang1, Gabriele Dr?ge2, Amber Hartman Scholz1*+Contributing authors (alphabetical): Katharine Barker3, Walter G. Berendsohn2, Jonathan A. Coddington3, Manuela da Silva4, J?rg Overmann1, Ole Seberg5, Michelle?van?der?Bank6, Xun?Xu7Author affiliations:1. Leibniz Institute DSMZ German Collection of Microorganisms and Cell Cultures, Inhoffenstrasse 7B, 38124, Braunschweig, Germany2. Botanic Garden and Botanical Museum Berlin, Freie Universit?t Berlin, K?nigin-Luise-Stra?e 6-8, Berlin, Germany3. The National Museum of Natural History, Smithsonian Institution, Washington, D.C., 20560, USA4. Fiocruz- Oswaldo Cruz Foundation, Avenida Brasil, 4365, CEP: 21040-900, Rio de Janeiro, Brazil5. Botanic Garden, Natural History Museum of Denmark, Oster Farimagsgade 2B, 1353, Copenhagen, Denmark6. The African Centre for DNA Barcoding, University of Johannesburg, Auckland Park, Gauteng, South Africa7. China National GeneBank, BGI-Shenzhen, Shenzhen, Guangdong 518083, China*These authors contributed equally to this work. +Corresponding author: amber.h.scholz@dsmz.de Table of Contents TOC \h \u \z _Toc306051851.EXECUTIVE SUMMARY PAGEREF _Toc30605186 \h 2Table of Contents PAGEREF _Toc30605187 \h 7List of figures and tables PAGEREF _Toc30605188 \h 11List of abbreviations PAGEREF _Toc30605189 \h 122. Introduction PAGEREF _Toc30605190 \h 132.1 Terminology PAGEREF _Toc30605191 \h 13Nucleotide Sequence Data (NSD) PAGEREF _Toc30605192 \h 14Subsidiary Information: PAGEREF _Toc30605193 \h 142.2 Technical Scope PAGEREF _Toc30605194 \h 143. Public and private databases PAGEREF _Toc30605195 \h 153.1 Brief history of the core public NSD infrastructure & data sharing PAGEREF _Toc30605196 \h 153.2 Analysis of the public NSD database landscape PAGEREF _Toc30605197 \h 17NSD submission to public databases (aka How important is INSDC really?) PAGEREF _Toc30605198 \h 18Public databases operating outside the INSDC system PAGEREF _Toc30605199 \h 20Access and use policies of non-INSDC biological databases PAGEREF _Toc30605200 \h 21GISAID and other biological databases outside of the NAR dataset PAGEREF _Toc30605201 \h 21Conclusions on the public database analysis PAGEREF _Toc30605202 \h 233.3 The INSDC PAGEREF _Toc30605203 \h 24How are the INSDC databases governed? PAGEREF _Toc30605204 \h 26INSDC access and use policies PAGEREF _Toc30605205 \h 26Financing of the INSDC PAGEREF _Toc30605206 \h 283.4 What NSD is publicly available in the INSDC? PAGEREF _Toc30605207 \h 28Biological scope PAGEREF _Toc30605208 \h 28Conclusions on publicly available NSD in the INSDC and NAR database issue PAGEREF _Toc30605209 \h 303.5 INSDC Users PAGEREF _Toc30605210 \h 31Limitations of the user data set PAGEREF _Toc30605211 \h 35Conclusions on users of NSD PAGEREF _Toc30605212 \h 363.6 Private databases PAGEREF _Toc30605213 \h 36In-house databases PAGEREF _Toc30605214 \h 36Commercial databases PAGEREF _Toc30605215 \h 37Case studies on private in-house databases PAGEREF _Toc30605216 \h 38Conclusions on private databases PAGEREF _Toc30605217 \h 403.7 Restricting and controlling access to NSD PAGEREF _Toc30605218 \h 404. Traceability of NSD PAGEREF _Toc30605219 \h 404.1 Overview of NSD flow through the scientific landscape PAGEREF _Toc30605220 \h 40Sequencing PAGEREF _Toc30605221 \h 41Scientific analysis: public research & workbench databases PAGEREF _Toc30605222 \h 42Accession Numbers (ANs) PAGEREF _Toc30605223 \h 42ANs for metadata PAGEREF _Toc30605224 \h 43Traceability of GR from public collections PAGEREF _Toc30605225 \h 43Traceability of GR from the environment PAGEREF _Toc30605226 \h 44Traceability after INSDC submission to publications PAGEREF _Toc30605227 \h 44Traceability to other databases and data layers PAGEREF _Toc30605228 \h 45Private sphere PAGEREF _Toc30605229 \h 45Traceability to GR accessed under the Nagoya Protocol PAGEREF _Toc30605230 \h 45Evolving technologies in biodiversity traceability PAGEREF _Toc30605231 \h 46Conclusions on existing NSD traceability mechanisms PAGEREF _Toc30605232 \h 474.2 Traceability to country of origin of underlying GR PAGEREF _Toc30605233 \h 48Analysis on the use of the country tag PAGEREF _Toc30605234 \h 51The country tag over time PAGEREF _Toc30605235 \h 52Another geographical traceability option: GPS coordinates PAGEREF _Toc30605236 \h 52Conclusions on the geographical origin of NSD PAGEREF _Toc30605237 \h 534.3 Traceability to patents & beyond PAGEREF _Toc30605238 \h 53Patent NSD in the INSDC PAGEREF _Toc30605239 \h 53New NSD reporting change in WIPO will improve traceability PAGEREF _Toc30605240 \h 54Conclusions on patent traceability PAGEREF _Toc30605241 \h 55Non-patent-based innovations PAGEREF _Toc30605242 \h 554.4 When does traceability “break down”? PAGEREF _Toc30605243 \h 555. Additional technological options for traceability PAGEREF _Toc30605244 \h 565.1 Tracking users of NSD PAGEREF _Toc30605245 \h 565.2 Blockchain PAGEREF _Toc30605246 \h 56Technical background PAGEREF _Toc30605247 \h 56Blockchain for Genetic Resources PAGEREF _Toc30605248 \h 58A putative example: Earth Bank of Codes PAGEREF _Toc30605249 \h 59Conclusions on blockchain PAGEREF _Toc30605250 \h 605.3 Data mining and cloud genomics PAGEREF _Toc30605251 \h 615.4 Other models for digital content PAGEREF _Toc30605252 \h 626. Implications for future discussions on DSI PAGEREF _Toc30605253 \h 626.1 Challenges for NSD traceability PAGEREF _Toc30605254 \h 626.2 Practical observations about NSD & DSI PAGEREF _Toc30605255 \h 646.3 Extension of lessons learned from NSD to DSI PAGEREF _Toc30605256 \h 65Acknowledgements PAGEREF _Toc30605257 \h 677. References PAGEREF _Toc30605258 \h 688. Technical Methods PAGEREF _Toc30605259 \h 748.1 Analysis of the public database inventory PAGEREF _Toc30605260 \h 748.2 Analysis of GenBank dataset PAGEREF _Toc30605261 \h 748.3 User data from GenBank PAGEREF _Toc30605262 \h 758.4 Private database case studies PAGEREF _Toc30605263 \h 75Case study 1: Novozymes A/S [106] PAGEREF _Toc30605264 \h 76Case study 2: Company X PAGEREF _Toc30605265 \h 76Case study 3: Company Y PAGEREF _Toc30605266 \h 77Case study 4: TraitGenetics [107] PAGEREF _Toc30605267 \h 77Case study 5: BASF SE [108] PAGEREF _Toc30605268 \h 77Case study 6: Company Z PAGEREF _Toc30605269 \h 788.5 Analysis of GenBank NSD entries PAGEREF _Toc30605270 \h 78Analysis of entries with country tag PAGEREF _Toc30605271 \h 78Analysis of entries without country tag PAGEREF _Toc30605272 \h 798.6 World maps PAGEREF _Toc30605273 \h 808.7 Similarity of short nucleotide sequences PAGEREF _Toc30605274 \h 81List of figures and tablesFigure 1: Public database inventory19Figure 2: Representation of INSDC and connected instruments25Figure 3: What is the biological scope of the NSD available in GenBank?29Figure 4: How long are the sequences in GenBank?31Figure 5a: Where are the users of DSI?32Figure 5b: Where do requests to GenBank come from?33Figure 5c: Users normalized by population33Figure 5d: Where do ftp requests come from?34Figure 5e: What is the volume of data requested via ftp?34Figure 6: How does NSD flow through the databases, users, and into research?41Figure 7: How do NSD traceability elements overlap? 47Figure 8a: What is the country of origin for non-human NSD?49Figure 8b: How does the user number compare to provided sequences?50Figure 8c: How does database usage compare to provided sequences?50Figure 9: How many sequences have a country tag? 52Table 1: Overview of case studies39Table 2: Check of random samples with country tag79Table 3: Check of random samples without country tag79Table 4: Probability of a random sequences to appear within different datasets81List of abbreviationsAHTEGAd Hoc Technical Expert GroupAN(s)Accession Number(s)CBDConvention on Biological DiversityCOPConference of PartiesDDBJDNA Data Bank of Japan DOI(s)Digital Object Identifier(s)DSIDigital Sequence InformationEBIEuropean Bioinformatics InstituteEMBLEuropean Molecular Biology LaboratoryENAEuropean Nucleotide ArchiveFTPFile Transfer ProtocolGISAIDGlobal Initiative on Sharing All Influenza DataGMRLNGlobal Measles and Rubella Laboratory NetworkGRGenetic ResourceINSDCInternational Nucleotide Sequence Database CollaborationIRCCInternationally Recognized Certificate of ComplianceMAT Mutually Agreed TermsMEXTJapanese Ministry of Education, Culture, Sports, Science and Technology NARNucleic Acids ResearchNCBINational Center for Biotechnology InformationNIGNational Institute for GeneticsNSD Nucleotide Sequence DataPICPrior Informed ConsentPubMed IDPubMed IdentifierR&DResearch and DevelopmentSBSTTASubsidiary Body on Scientific, Technical and Technological AdviceSISubsidiary InformationSRASequence Read ArchiveTAIRThe Arabidopsis Information ResourceUKRIUnited Kingdom Research and InnovationWIPOWorld Intellectual Property Organization2. IntroductionIn November 2018, at the fourteenth Conference of the Parties (COP) to the Convention on Biological Diversity (CBD), the Parties requested the commissioning of four different studies on Digital Sequence Information on Genetic Resources ADDIN EN.CITE <EndNote><Cite><Author>Parties to the Convention on Biological Diversity</Author><Year>2018</Year><RecNum>1</RecNum><DisplayText>[1]</DisplayText><record><rec-number>1</rec-number><foreign-keys><key app="EN" db-id="ravxf09ps92p0besvs75dswz50fap0paxx2x" timestamp="1563857882">1</key></foreign-keys><ref-type name="Legal Rule or Regulation">50</ref-type><contributors><authors><author>Parties to the Convention on Biological Diversity,</author></authors></contributors><titles><title>Decision 14/20. Digital sequence information on genetic resources</title></titles><dates><year>2018</year></dates><pub-location>Sharm El-Sheikh, Egypt</pub-location><publisher>United Nations</publisher><urls><related-urls><url>, 6</language></record></Cite></EndNote>[1]. This is a combined study as requested, respectively, in paragraph (d) and (c):“public and, to the extent possible, private databases of digital sequence information on genetic resources, including the terms and conditions on which access is granted or controlled, the biological scope and the size of the databases, numbers of accessions and their origin, governing policies, and the providers and users of the digital sequence information on genetic resources and encourages the owners of private databases to provide the necessary information”“ongoing developments in the field of traceability of digital information, including how traceability is addressed by databases, and how these could inform discussions on digital sequence information on genetic resources”In 2018, the fact finding and scoping study on Digital Sequence Information on Genetic Resources in the context of the Convention on Biological Diversity and the Nagoya Protocol was published ADDIN EN.CITE <EndNote><Cite><Author>Laird</Author><Year>2018</Year><RecNum>9</RecNum><DisplayText>[2]</DisplayText><record><rec-number>9</rec-number><foreign-keys><key app="EN" db-id="ravxf09ps92p0besvs75dswz50fap0paxx2x" timestamp="1563949168">9</key></foreign-keys><ref-type name="Report">27</ref-type><contributors><authors><author>Sarah A. Laird</author><author>Rachel P. Wynberg</author></authors></contributors><titles><title>A Fact Finding and Scoping Study on Digital Sequence Information on Genetic Resources in the Context of the Convention on Biological Diversity and the Nagoya Protocol</title></titles><dates><year>2018</year><pub-dates><date>Jan 10</date></pub-dates></dates><pub-location>Montreal, Canada</pub-location><publisher>United Nations</publisher><urls><related-urls><url>;[2]. This study builds on the outcomes of this study and will cite specific sections throughout the study as appropriate.A technical note: to assist readers that would like to focus on the take-home messages, we have employed bold text throughout the body of the text to highlight important statistics and conclusions.2.1 TerminologyThe term Digital Sequence Information (DSI) is an as-yet undefined term. The 2018 Ad Hoc Technical Expert Group (AHTEG) on Digital Sequence Information on Genetic Resources ADDIN EN.CITE <EndNote><Cite><Author>Ad Hoc Technical Expert Group on Digital Sequence Information on Genetic Resources</Author><Year>2018</Year><RecNum>10</RecNum><DisplayText>[3]</DisplayText><record><rec-number>10</rec-number><foreign-keys><key app="EN" db-id="ravxf09ps92p0besvs75dswz50fap0paxx2x" timestamp="1563949790">10</key></foreign-keys><ref-type name="Report">27</ref-type><contributors><authors><author>Ad Hoc Technical Expert Group on Digital Sequence Information on Genetic Resources,</author></authors></contributors><titles><title>Report of the Ad Hoc Technical Expert Group on Digital Sequence Information on Genetic Resources</title></titles><dates><year>2018</year><pub-dates><date>Feb 20</date></pub-dates></dates><pub-location>Montreal, Canada</pub-location><publisher>United Nations</publisher><urls><related-urls><url>;[3] noted that the term DSI is a placeholder and generated a list of what potentially could fall under the definition:“The nucleic acid sequence reads and the associated data Information on the sequence assembly, its annotation and genetic mapping. This information may describe whole genomes, individual genes or fragments thereof, barcodes, organelle genomes or single nucleotide polymorphismsInformation on gene expressionData on macromolecules and cellular metabolitesInformation on ecological relationships, and abiotic factors of the environmentFunction, such as behavioural dataStructure, including morphological data and phenotypeInformation related to taxonomyModalities of use”However, this list was not unanimously agreed by members of the AHTEG, was not taken up by either the 2018 Subsidiary Body on Scientific, Technical and Technological Advice (SBSTTA 22) or COP 14, is not legally binding nor has been agreed upon by the Parties to the CBD.Understanding traceability and databases for all items (a)-(i) in this list would be an extremely heterogeneous, challenging, and time-consuming task. This is due to the highly variable kinds of information and the often weak or non-existent connection of the information to a genetic resource (GR) (e.g., (e)…“abiotic factors of the environment”). For this reason, we will divide the term DSI into two subcategories:Nucleotide Sequence Data (NSD)For the purposes of this study, which has a mandate for sequence databases and traceability, and without bias towards upcoming CBD discussions, we will use the term “nucleotide sequence data” (NSD) to indicate that we are describing “(a) nucleic acid sequence reads and the associated data” as well as inadvertently “(b) Information on the sequence assembly, its annotation and genetic mapping” because most of the NSD in the sequence databases was already assembled and annotated by scientists before being submitted. NSD are the direct outcome of nucleotide (DNA or RNA) sequencing of a Genetic Resource (GR) and this direct linkage between the GR and the NSD fosters traceability. Furthermore, NSD is found directly in the name of the core public nucleotide sequence data infrastructure – the International Nucleotide Sequence Data Collaboration (INSDC).In parallel to the writing of this study, Study 1 ADDIN EN.CITE <EndNote><Cite><Author>Parties to the Convention on Biological Diversity</Author><Year>2018</Year><RecNum>1</RecNum><DisplayText>[1]</DisplayText><record><rec-number>1</rec-number><foreign-keys><key app="EN" db-id="ravxf09ps92p0besvs75dswz50fap0paxx2x" timestamp="1563857882">1</key></foreign-keys><ref-type name="Legal Rule or Regulation">50</ref-type><contributors><authors><author>Parties to the Convention on Biological Diversity,</author></authors></contributors><titles><title>Decision 14/20. Digital sequence information on genetic resources</title></titles><dates><year>2018</year></dates><pub-location>Sharm El-Sheikh, Egypt</pub-location><publisher>United Nations</publisher><urls><related-urls><url>, 6</language></record></Cite></EndNote>[1] on the concept of DSI is taking place and, upon counsel with the CBD Secretariat, we have attempted to carve a line between these two commissioned studies (Study 1 and Study 2/3). As a result, we will not further discuss the meaning, use, or possible definition of DSI. Furthermore, because our analysis will focus on (a) and (b) from the AHTEG list above, we will more often use the term NSD.Subsidiary Information:When necessary, the term “subsidiary information” (SI) will be used to cover categories (c) to (i) from the AHTEG list (see above). Here, traceability is generally more difficult or less standardized than with NSD. Indeed, the information is usually derived in follow-up studies or through research that is independent from NSD and the GR. Furthermore, storage, distribution and analysis of subsidiary information may be done independently of any known NSD. This makes identification, database analysis and classification difficult or impossible. For example, the 3D structure of proteins and information on ecological behavior constitute two completely different sets of information, neither of which requires NSD.The term “NSD+SI” will be used when talking about general aspects, which are similar for both NSD and SI. Thus, NSD+SI is intended to roughly mirror the placeholder term DSI.2.2 Technical ScopeSince CBD Decision II/11 paragraph 2 excludes human genetic resources from the framework of the CBD, this study will aim to avoid analysis of human NSD. Whenever possible, human NSD will be excluded from the data sets, except when displaying the overall biological scope (Figure 3) and, due to technical reasons, in the user data. However, current policies and systems in place for dealing with human patient NSD will be included at the end of this study to provide general insights on tracking and tracing options for NSD.Because of time limitations (<3 months to produce a first draft) and the size of the NSD landscape (>700) and associated databases (around 1,000 databases), detailed analysis undertaken in this study will focus predominantly on peer-reviewed public NSD databases, especially the core infrastructure databases of the INSDC, including biological scope, country of origin information, user demographics , traceability mechanisms, and access and governance policies.In coherence with the CBD, the term “country of origin” is used in this study to describe the country of origin of the underlying genetic resource of the NSD. Article 15 paragraph 3 of the CBD states:“For the purpose of this Convention, the genetic resources being provided by a Contracting Party, as referred to in this Article and Articles 16 and 19, are only those that are provided by Contracting Parties that are countries of origin of such resources or by the Parties that have acquired the genetic resources in accordance with this Convention.” (Italics added.)For the purposes of this study, we will use the term “country of origin” to indicate the non-italicized text as the italicized text is beyond the technical ability of this study. In other words, the analyses conducted here when assessing “country of origin” reflect countries providing the underlying genetic resources of the NSD in the INSDC system and not countries that have acquired the genetic resources in accordance with the CBD. For example, if a genetic resource comes from country A, is stored in a collection in country B and the NSD is generated by sequencing the material in country C, within this study the term “country of origin” implies only country A and not country B. Furthermore, we note that the INSDC “/country” field (see Section 4.2) generally but not completely, reflects the term “country of origin”. For the bioinformatics analyses in Section 4, we use the “/country” tag as a proxy for “country of origin” as described directly above, although there is an imperfect legal match between these two terms.Due to the inter-connectivity between databases and NSD traceability, study 2 and 3 (paragraphs c and d of Decision 14/20) are not presented in separate sections, but in an intertwined way. The focus of this study will be on the traceability, storage (mainly in databases, both public and private) and downstream use (publications, research, downstream biological databases, patents) of NSD. Contrary to NSD, SI is highly diverse and does not necessarily have a common component and, for reasons explained above, will largely fall outside the scope of this study.The complete technical methods used in this study are laid out in detail in Section 8. The majority of our data and technical analysis below are based on the infrastructure in place at GenBank because of our institutional familiarity with the technical platform, but are exemplary of all three INSDC databases (further described in Section 3.3), as these share identical NSD content and have similar or identical standards and procedures.3. Public and private databases3.1 Brief history of the core public NSD infrastructure & data sharingAs will be discussed at length in Study 1 (paragraph a, Decision 14/20), over the last thirty years, DNA sequencing has gone from being a novel cutting-edge technology to a standard routine. It is now an essential tool rooted in the daily work of biologists, whether they seek to understand individual genes, complete organismal genomes or even whole ecosystems. Nowadays scientists from all branches of the life sciences regularly generate, submit, access, and use NSD from public databases.NSD first began to be generated in the 1970s and continued to grow over the following decade. By the 1980s as technical abilities improved, the lengths of the generated sequence began to increase and the technology became more widespread. With this growth, the existing practice of publishing NSD in the publication itself (in tabular form using the nucleotide base letters, ACGT/U, along with minimal annotation such as gene name, length, function ADDIN EN.CITE <EndNote><Cite><Author>Parties to the Convention on Biological Diversity</Author><Year>2018</Year><RecNum>1</RecNum><DisplayText>[1]</DisplayText><record><rec-number>1</rec-number><foreign-keys><key app="EN" db-id="ravxf09ps92p0besvs75dswz50fap0paxx2x" timestamp="1563857882">1</key></foreign-keys><ref-type name="Legal Rule or Regulation">50</ref-type><contributors><authors><author>Parties to the Convention on Biological Diversity,</author></authors></contributors><titles><title>Decision 14/20. Digital sequence information on genetic resources</title></titles><dates><year>2018</year></dates><pub-location>Sharm El-Sheikh, Egypt</pub-location><publisher>United Nations</publisher><urls><related-urls><url>, 6</language></record></Cite></EndNote>[1]) was deemed impractical and unsustainable and instead the scientific community called for databases to host the growing (both in quantity and length) NSD. What started as small, distributed databases grew into large, inter-linked databases and, ultimately, to a core database infrastructure called the International Nucleotide Sequence Data Collaboration (INSDC) created by the tight, automated integration of three large NSD databases: DDBJ, EMBL-EBI and NCBI (see Section 3.3).Beginning in the 1990s, NSD doubled every few years leading to the now trillions of bases deposited in the databases ADDIN EN.CITE <EndNote><Cite><Author>National Center for Biotechnology Information</Author><RecNum>46</RecNum><DisplayText>[5]</DisplayText><record><rec-number>46</rec-number><foreign-keys><key app="EN" db-id="ravxf09ps92p0besvs75dswz50fap0paxx2x" timestamp="1564119421">46</key></foreign-keys><ref-type name="Web Page">12</ref-type><contributors><authors><author>National Center for Biotechnology Information,</author></authors></contributors><titles><title>GenBank and WGS Statistics</title></titles><volume>2019</volume><number>Jul 26</number><dates></dates><urls><related-urls><url>;[5]. As the INSDC infrastructure became well-known and integral to research use of NSD, scientific journals began to require accession numbers supplied by the INSDC for sequences contained or referred to in publications ADDIN EN.CITE <EndNote><Cite><Author>Nature Research</Author><RecNum>69</RecNum><DisplayText>[6]</DisplayText><record><rec-number>69</rec-number><foreign-keys><key app="EN" db-id="ravxf09ps92p0besvs75dswz50fap0paxx2x" timestamp="1565085413">69</key></foreign-keys><ref-type name="Web Page">12</ref-type><contributors><authors><author>Nature Research,</author></authors></contributors><titles><title>Reporting standards and availability of data, materials, code and protocols</title></titles><volume>2019</volume><number>Aug 06</number><dates></dates><urls><related-urls><url>;[6]. The submission of NSD to the INSDC is a near-universal pre-condition for publication involving NSD in a scientific journal. These accession numbers (ANs) serve as proof that the scientist deposited the NSD in an INSDC database and that the data is publicly available. This practice was underscored in a series of meetings on data sharing during the Human Genome Project (HGP). In 1996, scientists involved in the HGP agreed to the Bermuda Principles ADDIN EN.CITE <EndNote><Cite><Author>Bermuda Principles</Author><RecNum>11</RecNum><DisplayText>[7]</DisplayText><record><rec-number>11</rec-number><foreign-keys><key app="EN" db-id="ravxf09ps92p0besvs75dswz50fap0paxx2x" timestamp="1563950538">11</key></foreign-keys><ref-type name="Web Page">12</ref-type><contributors><authors><author>Bermuda Principles,</author></authors></contributors><titles><title>The Bermuda Principles Story</title></titles><volume>2019</volume><number>Jul 24</number><dates></dates><urls><related-urls><url>;[7] which committed them to the following three principles:Automatic release of sequence assemblies larger than 1 kb (preferably within 24 hours).Immediate publication of finished annotated sequences.Aim to make the entire sequence freely available in the public domain for both research and development in order to maximize benefits to society.In 2003, the Fort Lauderdale agreement ADDIN EN.CITE <EndNote><Cite><Author>The Wellcome Trust</Author><Year>2003</Year><RecNum>12</RecNum><DisplayText>[8]</DisplayText><record><rec-number>12</rec-number><foreign-keys><key app="EN" db-id="ravxf09ps92p0besvs75dswz50fap0paxx2x" timestamp="1563950792">12</key></foreign-keys><ref-type name="Report">27</ref-type><contributors><authors><author>The Wellcome Trust,</author></authors></contributors><titles><title>Sharing Data from Large-scale Biological Research Projects: A System of Tripartite Responsibility</title></titles><dates><year>2003</year><pub-dates><date>Jan 14-15</date></pub-dates></dates><pub-location>Fort Lauderdale, USA</pub-location><urls><related-urls><url>;[8] and the 2009 Toronto agreement extended the concept beyond the HGP to include pre-publication access to, respectively, all genome projects and all pre-publication NSD and other –omics and biological datasets, again emphasizing societal good over personal or monetary gain. Although these agreements have no legal status, as they were not signed by states, they are internationally recognized and spread and emphasize the practice of open access publication of NSD PEVuZE5vdGU+PENpdGU+PEF1dGhvcj5BbWFubjwvQXV0aG9yPjxZZWFyPjIwMTk8L1llYXI+PFJl

Y051bT4xMzwvUmVjTnVtPjxEaXNwbGF5VGV4dD5bOV08L0Rpc3BsYXlUZXh0PjxyZWNvcmQ+PHJl

Yy1udW1iZXI+MTM8L3JlYy1udW1iZXI+PGZvcmVpZ24ta2V5cz48a2V5IGFwcD0iRU4iIGRiLWlk

PSJyYXZ4ZjA5cHM5MnAwYmVzdnM3NWRzd3o1MGZhcDBwYXh4MngiIHRpbWVzdGFtcD0iMTU2Mzk1

MDkzNiI+MTM8L2tleT48L2ZvcmVpZ24ta2V5cz48cmVmLXR5cGUgbmFtZT0iSm91cm5hbCBBcnRp

Y2xlIj4xNzwvcmVmLXR5cGU+PGNvbnRyaWJ1dG9ycz48YXV0aG9ycz48YXV0aG9yPkFtYW5uLCBS

LiBJLjwvYXV0aG9yPjxhdXRob3I+QmFpY2hvbywgUy48L2F1dGhvcj48YXV0aG9yPkJsZW5jb3dl

LCBCLiBKLjwvYXV0aG9yPjxhdXRob3I+Qm9yaywgUC48L2F1dGhvcj48YXV0aG9yPkJvcm9kb3Zz

a3ksIE0uPC9hdXRob3I+PGF1dGhvcj5Ccm9va3NiYW5rLCBDLjwvYXV0aG9yPjxhdXRob3I+Q2hh

aW4sIFAuIFMuIEcuPC9hdXRob3I+PGF1dGhvcj5Db2x3ZWxsLCBSLiBSLjwvYXV0aG9yPjxhdXRo

b3I+RGFmZm9uY2hpbywgRC4gRy48L2F1dGhvcj48YXV0aG9yPkRhbmNoaW4sIEEuPC9hdXRob3I+

PGF1dGhvcj5kZSBMb3JlbnpvLCBWLjwvYXV0aG9yPjxhdXRob3I+RG9ycmVzdGVpbiwgUC4gQy48

L2F1dGhvcj48YXV0aG9yPkZpbm4sIFIuIEQuPC9hdXRob3I+PGF1dGhvcj5GcmFzZXIsIEMuIE0u

PC9hdXRob3I+PGF1dGhvcj5HaWxiZXJ0LCBKLiBBLjwvYXV0aG9yPjxhdXRob3I+SGFsbGFtLCBT

LiBKLjwvYXV0aG9yPjxhdXRob3I+SHVnZW5ob2x0eiwgUC48L2F1dGhvcj48YXV0aG9yPklvYW5u

aWRpcywgSi4gUC4gQS48L2F1dGhvcj48YXV0aG9yPkphbnNzb24sIEouIEsuPC9hdXRob3I+PGF1

dGhvcj5LaW0sIEouIEYuPC9hdXRob3I+PGF1dGhvcj5LbGVuaywgSC4gUC48L2F1dGhvcj48YXV0

aG9yPktsb3R6LCBNLiBHLjwvYXV0aG9yPjxhdXRob3I+S25pZ2h0LCBSLjwvYXV0aG9yPjxhdXRo

b3I+S29uc3RhbnRpbmlkaXMsIEsuIFQuPC9hdXRob3I+PGF1dGhvcj5LeXJwaWRlcywgTi4gQy48

L2F1dGhvcj48YXV0aG9yPk1hc29uLCBDLiBFLjwvYXV0aG9yPjxhdXRob3I+TWNIYXJkeSwgQS4g

Qy48L2F1dGhvcj48YXV0aG9yPk1leWVyLCBGLjwvYXV0aG9yPjxhdXRob3I+T3V6b3VuaXMsIEMu

IEEuPC9hdXRob3I+PGF1dGhvcj5QYXRyaW5vcywgQS4gQS4gTi48L2F1dGhvcj48YXV0aG9yPlBv

ZGFyLCBNLjwvYXV0aG9yPjxhdXRob3I+UG9sbGFyZCwgSy4gUy48L2F1dGhvcj48YXV0aG9yPlJh

dmVsLCBKLjwvYXV0aG9yPjxhdXRob3I+TXVub3osIEEuIFIuPC9hdXRob3I+PGF1dGhvcj5Sb2Jl

cnRzLCBSLiBKLjwvYXV0aG9yPjxhdXRob3I+Um9zc2VsbG8tTW9yYSwgUi48L2F1dGhvcj48YXV0

aG9yPlNhbnNvbmUsIFMuIEEuPC9hdXRob3I+PGF1dGhvcj5TY2hsb3NzLCBQLiBELjwvYXV0aG9y

PjxhdXRob3I+U2NocmltbCwgTC4gTS48L2F1dGhvcj48YXV0aG9yPlNldHViYWwsIEouIEMuPC9h

dXRob3I+PGF1dGhvcj5Tb3JlaywgUi48L2F1dGhvcj48YXV0aG9yPlN0ZXZlbnMsIFIuIEwuPC9h

dXRob3I+PGF1dGhvcj5UaWVkamUsIEouIE0uPC9hdXRob3I+PGF1dGhvcj5UdXJqYW5za2ksIEEu

PC9hdXRob3I+PGF1dGhvcj5UeXNvbiwgRy4gVy48L2F1dGhvcj48YXV0aG9yPlVzc2VyeSwgRC4g

Vy48L2F1dGhvcj48YXV0aG9yPldlaW5zdG9jaywgRy4gTS48L2F1dGhvcj48YXV0aG9yPldoaXRl

LCBPLjwvYXV0aG9yPjxhdXRob3I+V2hpdG1hbiwgVy4gQi48L2F1dGhvcj48YXV0aG9yPlhlbmFy

aW9zLCBJLjwvYXV0aG9yPjwvYXV0aG9ycz48L2NvbnRyaWJ1dG9ycz48YXV0aC1hZGRyZXNzPlRo

ZSBsaXN0IG9mIGF1dGhvciBhZmZpbGlhdGlvbnMgaXMgYXZhaWxhYmxlIGluIHRoZSBzdXBwbGVt

ZW50YXJ5IG1hdGVyaWFscy4mI3hEO1RoZSBsaXN0IG9mIGF1dGhvciBhZmZpbGlhdGlvbnMgaXMg

YXZhaWxhYmxlIGluIHRoZSBzdXBwbGVtZW50YXJ5IG1hdGVyaWFscy4gamlvYW5uaWRAc3RhbmZv

cmQuZWR1LjwvYXV0aC1hZGRyZXNzPjx0aXRsZXM+PHRpdGxlPlRvd2FyZCB1bnJlc3RyaWN0ZWQg

dXNlIG9mIHB1YmxpYyBnZW5vbWljIGRhdGE8L3RpdGxlPjxzZWNvbmRhcnktdGl0bGU+U2NpZW5j

ZTwvc2Vjb25kYXJ5LXRpdGxlPjxhbHQtdGl0bGU+U2NpZW5jZTwvYWx0LXRpdGxlPjwvdGl0bGVz

PjxwZXJpb2RpY2FsPjxmdWxsLXRpdGxlPlNjaWVuY2U8L2Z1bGwtdGl0bGU+PGFiYnItMT5TY2ll

bmNlPC9hYmJyLTE+PC9wZXJpb2RpY2FsPjxhbHQtcGVyaW9kaWNhbD48ZnVsbC10aXRsZT5TY2ll

bmNlPC9mdWxsLXRpdGxlPjxhYmJyLTE+U2NpZW5jZTwvYWJici0xPjwvYWx0LXBlcmlvZGljYWw+

PHBhZ2VzPjM1MC0zNTI8L3BhZ2VzPjx2b2x1bWU+MzYzPC92b2x1bWU+PG51bWJlcj42NDI1PC9u

dW1iZXI+PGRhdGVzPjx5ZWFyPjIwMTk8L3llYXI+PHB1Yi1kYXRlcz48ZGF0ZT5KYW4gMjU8L2Rh

dGU+PC9wdWItZGF0ZXM+PC9kYXRlcz48aXNibj4xMDk1LTkyMDMgKEVsZWN0cm9uaWMpJiN4RDsw

MDM2LTgwNzUgKExpbmtpbmcpPC9pc2JuPjxhY2Nlc3Npb24tbnVtPjMwNjc5MzYzPC9hY2Nlc3Np

b24tbnVtPjx1cmxzPjxyZWxhdGVkLXVybHM+PHVybD5odHRwOi8vd3d3Lm5jYmkubmxtLm5paC5n

b3YvcHVibWVkLzMwNjc5MzYzPC91cmw+PHVybD5odHRwczovL2VzY2hvbGFyc2hpcC5vcmcvY29u

dGVudC9xdDM4azBnOTNqL3F0MzhrMGc5M2oucGRmP3Q9cG53MW9rPC91cmw+PC9yZWxhdGVkLXVy

bHM+PC91cmxzPjxlbGVjdHJvbmljLXJlc291cmNlLW51bT4xMC4xMTI2L3NjaWVuY2UuYWF3MTI4

MDwvZWxlY3Ryb25pYy1yZXNvdXJjZS1udW0+PGxhbmd1YWdlPjExPC9sYW5ndWFnZT48L3JlY29y

ZD48L0NpdGU+PC9FbmROb3RlPn==

ADDIN EN.CITE PEVuZE5vdGU+PENpdGU+PEF1dGhvcj5BbWFubjwvQXV0aG9yPjxZZWFyPjIwMTk8L1llYXI+PFJl

Y051bT4xMzwvUmVjTnVtPjxEaXNwbGF5VGV4dD5bOV08L0Rpc3BsYXlUZXh0PjxyZWNvcmQ+PHJl

Yy1udW1iZXI+MTM8L3JlYy1udW1iZXI+PGZvcmVpZ24ta2V5cz48a2V5IGFwcD0iRU4iIGRiLWlk

PSJyYXZ4ZjA5cHM5MnAwYmVzdnM3NWRzd3o1MGZhcDBwYXh4MngiIHRpbWVzdGFtcD0iMTU2Mzk1

MDkzNiI+MTM8L2tleT48L2ZvcmVpZ24ta2V5cz48cmVmLXR5cGUgbmFtZT0iSm91cm5hbCBBcnRp

Y2xlIj4xNzwvcmVmLXR5cGU+PGNvbnRyaWJ1dG9ycz48YXV0aG9ycz48YXV0aG9yPkFtYW5uLCBS

LiBJLjwvYXV0aG9yPjxhdXRob3I+QmFpY2hvbywgUy48L2F1dGhvcj48YXV0aG9yPkJsZW5jb3dl

LCBCLiBKLjwvYXV0aG9yPjxhdXRob3I+Qm9yaywgUC48L2F1dGhvcj48YXV0aG9yPkJvcm9kb3Zz

a3ksIE0uPC9hdXRob3I+PGF1dGhvcj5Ccm9va3NiYW5rLCBDLjwvYXV0aG9yPjxhdXRob3I+Q2hh

aW4sIFAuIFMuIEcuPC9hdXRob3I+PGF1dGhvcj5Db2x3ZWxsLCBSLiBSLjwvYXV0aG9yPjxhdXRo

b3I+RGFmZm9uY2hpbywgRC4gRy48L2F1dGhvcj48YXV0aG9yPkRhbmNoaW4sIEEuPC9hdXRob3I+

PGF1dGhvcj5kZSBMb3JlbnpvLCBWLjwvYXV0aG9yPjxhdXRob3I+RG9ycmVzdGVpbiwgUC4gQy48

L2F1dGhvcj48YXV0aG9yPkZpbm4sIFIuIEQuPC9hdXRob3I+PGF1dGhvcj5GcmFzZXIsIEMuIE0u

PC9hdXRob3I+PGF1dGhvcj5HaWxiZXJ0LCBKLiBBLjwvYXV0aG9yPjxhdXRob3I+SGFsbGFtLCBT

LiBKLjwvYXV0aG9yPjxhdXRob3I+SHVnZW5ob2x0eiwgUC48L2F1dGhvcj48YXV0aG9yPklvYW5u

aWRpcywgSi4gUC4gQS48L2F1dGhvcj48YXV0aG9yPkphbnNzb24sIEouIEsuPC9hdXRob3I+PGF1

dGhvcj5LaW0sIEouIEYuPC9hdXRob3I+PGF1dGhvcj5LbGVuaywgSC4gUC48L2F1dGhvcj48YXV0

aG9yPktsb3R6LCBNLiBHLjwvYXV0aG9yPjxhdXRob3I+S25pZ2h0LCBSLjwvYXV0aG9yPjxhdXRo

b3I+S29uc3RhbnRpbmlkaXMsIEsuIFQuPC9hdXRob3I+PGF1dGhvcj5LeXJwaWRlcywgTi4gQy48

L2F1dGhvcj48YXV0aG9yPk1hc29uLCBDLiBFLjwvYXV0aG9yPjxhdXRob3I+TWNIYXJkeSwgQS4g

Qy48L2F1dGhvcj48YXV0aG9yPk1leWVyLCBGLjwvYXV0aG9yPjxhdXRob3I+T3V6b3VuaXMsIEMu

IEEuPC9hdXRob3I+PGF1dGhvcj5QYXRyaW5vcywgQS4gQS4gTi48L2F1dGhvcj48YXV0aG9yPlBv

ZGFyLCBNLjwvYXV0aG9yPjxhdXRob3I+UG9sbGFyZCwgSy4gUy48L2F1dGhvcj48YXV0aG9yPlJh

dmVsLCBKLjwvYXV0aG9yPjxhdXRob3I+TXVub3osIEEuIFIuPC9hdXRob3I+PGF1dGhvcj5Sb2Jl

cnRzLCBSLiBKLjwvYXV0aG9yPjxhdXRob3I+Um9zc2VsbG8tTW9yYSwgUi48L2F1dGhvcj48YXV0

aG9yPlNhbnNvbmUsIFMuIEEuPC9hdXRob3I+PGF1dGhvcj5TY2hsb3NzLCBQLiBELjwvYXV0aG9y

PjxhdXRob3I+U2NocmltbCwgTC4gTS48L2F1dGhvcj48YXV0aG9yPlNldHViYWwsIEouIEMuPC9h

dXRob3I+PGF1dGhvcj5Tb3JlaywgUi48L2F1dGhvcj48YXV0aG9yPlN0ZXZlbnMsIFIuIEwuPC9h

dXRob3I+PGF1dGhvcj5UaWVkamUsIEouIE0uPC9hdXRob3I+PGF1dGhvcj5UdXJqYW5za2ksIEEu

PC9hdXRob3I+PGF1dGhvcj5UeXNvbiwgRy4gVy48L2F1dGhvcj48YXV0aG9yPlVzc2VyeSwgRC4g

Vy48L2F1dGhvcj48YXV0aG9yPldlaW5zdG9jaywgRy4gTS48L2F1dGhvcj48YXV0aG9yPldoaXRl

LCBPLjwvYXV0aG9yPjxhdXRob3I+V2hpdG1hbiwgVy4gQi48L2F1dGhvcj48YXV0aG9yPlhlbmFy

aW9zLCBJLjwvYXV0aG9yPjwvYXV0aG9ycz48L2NvbnRyaWJ1dG9ycz48YXV0aC1hZGRyZXNzPlRo

ZSBsaXN0IG9mIGF1dGhvciBhZmZpbGlhdGlvbnMgaXMgYXZhaWxhYmxlIGluIHRoZSBzdXBwbGVt

ZW50YXJ5IG1hdGVyaWFscy4mI3hEO1RoZSBsaXN0IG9mIGF1dGhvciBhZmZpbGlhdGlvbnMgaXMg

YXZhaWxhYmxlIGluIHRoZSBzdXBwbGVtZW50YXJ5IG1hdGVyaWFscy4gamlvYW5uaWRAc3RhbmZv

cmQuZWR1LjwvYXV0aC1hZGRyZXNzPjx0aXRsZXM+PHRpdGxlPlRvd2FyZCB1bnJlc3RyaWN0ZWQg

dXNlIG9mIHB1YmxpYyBnZW5vbWljIGRhdGE8L3RpdGxlPjxzZWNvbmRhcnktdGl0bGU+U2NpZW5j

ZTwvc2Vjb25kYXJ5LXRpdGxlPjxhbHQtdGl0bGU+U2NpZW5jZTwvYWx0LXRpdGxlPjwvdGl0bGVz

PjxwZXJpb2RpY2FsPjxmdWxsLXRpdGxlPlNjaWVuY2U8L2Z1bGwtdGl0bGU+PGFiYnItMT5TY2ll

bmNlPC9hYmJyLTE+PC9wZXJpb2RpY2FsPjxhbHQtcGVyaW9kaWNhbD48ZnVsbC10aXRsZT5TY2ll

bmNlPC9mdWxsLXRpdGxlPjxhYmJyLTE+U2NpZW5jZTwvYWJici0xPjwvYWx0LXBlcmlvZGljYWw+

PHBhZ2VzPjM1MC0zNTI8L3BhZ2VzPjx2b2x1bWU+MzYzPC92b2x1bWU+PG51bWJlcj42NDI1PC9u

dW1iZXI+PGRhdGVzPjx5ZWFyPjIwMTk8L3llYXI+PHB1Yi1kYXRlcz48ZGF0ZT5KYW4gMjU8L2Rh

dGU+PC9wdWItZGF0ZXM+PC9kYXRlcz48aXNibj4xMDk1LTkyMDMgKEVsZWN0cm9uaWMpJiN4RDsw

MDM2LTgwNzUgKExpbmtpbmcpPC9pc2JuPjxhY2Nlc3Npb24tbnVtPjMwNjc5MzYzPC9hY2Nlc3Np

b24tbnVtPjx1cmxzPjxyZWxhdGVkLXVybHM+PHVybD5odHRwOi8vd3d3Lm5jYmkubmxtLm5paC5n

b3YvcHVibWVkLzMwNjc5MzYzPC91cmw+PHVybD5odHRwczovL2VzY2hvbGFyc2hpcC5vcmcvY29u

dGVudC9xdDM4azBnOTNqL3F0MzhrMGc5M2oucGRmP3Q9cG53MW9rPC91cmw+PC9yZWxhdGVkLXVy

bHM+PC91cmxzPjxlbGVjdHJvbmljLXJlc291cmNlLW51bT4xMC4xMTI2L3NjaWVuY2UuYWF3MTI4

MDwvZWxlY3Ryb25pYy1yZXNvdXJjZS1udW0+PGxhbmd1YWdlPjExPC9sYW5ndWFnZT48L3JlY29y

ZD48L0NpdGU+PC9FbmROb3RlPn==

ADDIN EN.CITE.DATA [9] which, in turn, has dramatically influenced the scientific culture.In parallel, in the late 1990s, codices establishing best scientific practice PEVuZE5vdGU+PENpdGU+PEF1dGhvcj5BbmRlcnNlbjwvQXV0aG9yPjxZZWFyPjE5OTk8L1llYXI+

PFJlY051bT4yMDwvUmVjTnVtPjxEaXNwbGF5VGV4dD5bMTAsIDExXTwvRGlzcGxheVRleHQ+PHJl

Y29yZD48cmVjLW51bWJlcj4yMDwvcmVjLW51bWJlcj48Zm9yZWlnbi1rZXlzPjxrZXkgYXBwPSJF

TiIgZGItaWQ9InJhdnhmMDlwczkycDBiZXN2czc1ZHN3ejUwZmFwMHBheHgyeCIgdGltZXN0YW1w

PSIxNTY0MDMwMzIwIj4yMDwva2V5PjwvZm9yZWlnbi1rZXlzPjxyZWYtdHlwZSBuYW1lPSJKb3Vy

bmFsIEFydGljbGUiPjE3PC9yZWYtdHlwZT48Y29udHJpYnV0b3JzPjxhdXRob3JzPjxhdXRob3I+

QW5kZXJzZW4sIEQuPC9hdXRob3I+PC9hdXRob3JzPjwvY29udHJpYnV0b3JzPjxhdXRoLWFkZHJl

c3M+RGFuaXNoIENvbW1pdHRlZSBvbiBTY2llbnRpZmljIERpc2hvbmVzdHksIERhbmlzaCBSZXNl

YXJjaCBBZ2VuY3ksIENvcGVuaGFnZW4uPC9hdXRoLWFkZHJlc3M+PHRpdGxlcz48dGl0bGU+R3Vp

ZGVsaW5lcyBmb3IgZ29vZCBzY2llbnRpZmljIHByYWN0aWNlPC90aXRsZT48c2Vjb25kYXJ5LXRp

dGxlPkRhbiBNZWQgQnVsbDwvc2Vjb25kYXJ5LXRpdGxlPjxhbHQtdGl0bGU+RGFuaXNoIG1lZGlj

YWwgYnVsbGV0aW48L2FsdC10aXRsZT48L3RpdGxlcz48cGVyaW9kaWNhbD48ZnVsbC10aXRsZT5E

YW4gTWVkIEJ1bGw8L2Z1bGwtdGl0bGU+PGFiYnItMT5EYW5pc2ggbWVkaWNhbCBidWxsZXRpbjwv

YWJici0xPjwvcGVyaW9kaWNhbD48YWx0LXBlcmlvZGljYWw+PGZ1bGwtdGl0bGU+RGFuIE1lZCBC

dWxsPC9mdWxsLXRpdGxlPjxhYmJyLTE+RGFuaXNoIG1lZGljYWwgYnVsbGV0aW48L2FiYnItMT48

L2FsdC1wZXJpb2RpY2FsPjxwYWdlcz42MC0xPC9wYWdlcz48dm9sdW1lPjQ2PC92b2x1bWU+PG51

bWJlcj4xPC9udW1iZXI+PGtleXdvcmRzPjxrZXl3b3JkPkRlbm1hcms8L2tleXdvcmQ+PGtleXdv

cmQ+Kkd1aWRlbGluZXMgYXMgVG9waWM8L2tleXdvcmQ+PGtleXdvcmQ+UmVzZWFyY2gvKnN0YW5k

YXJkczwva2V5d29yZD48L2tleXdvcmRzPjxkYXRlcz48eWVhcj4xOTk5PC95ZWFyPjxwdWItZGF0

ZXM+PGRhdGU+RmViPC9kYXRlPjwvcHViLWRhdGVzPjwvZGF0ZXM+PGlzYm4+MDkwNy04OTE2IChQ

cmludCkmI3hEOzA5MDctODkxNiAoTGlua2luZyk8L2lzYm4+PGFjY2Vzc2lvbi1udW0+MTAwODE2

NTQ8L2FjY2Vzc2lvbi1udW0+PHVybHM+PHJlbGF0ZWQtdXJscz48dXJsPmh0dHA6Ly93d3cubmNi

aS5ubG0ubmloLmdvdi9wdWJtZWQvMTAwODE2NTQ8L3VybD48L3JlbGF0ZWQtdXJscz48L3VybHM+

PGxhbmd1YWdlPjEyPC9sYW5ndWFnZT48L3JlY29yZD48L0NpdGU+PENpdGU+PEF1dGhvcj5ERkcg

RGV1dHNjaGUgRm9yc2NodW5nc2dlbWVpbnNjaGFmdDwvQXV0aG9yPjxZZWFyPjIwMTM8L1llYXI+

PFJlY051bT4yNTwvUmVjTnVtPjxyZWNvcmQ+PHJlYy1udW1iZXI+MjU8L3JlYy1udW1iZXI+PGZv

cmVpZ24ta2V5cz48a2V5IGFwcD0iRU4iIGRiLWlkPSJyYXZ4ZjA5cHM5MnAwYmVzdnM3NWRzd3o1

MGZhcDBwYXh4MngiIHRpbWVzdGFtcD0iMTU2NDAzMDc4NiI+MjU8L2tleT48L2ZvcmVpZ24ta2V5

cz48cmVmLXR5cGUgbmFtZT0iQm9vayI+NjwvcmVmLXR5cGU+PGNvbnRyaWJ1dG9ycz48YXV0aG9y

cz48YXV0aG9yPkRGRyBEZXV0c2NoZSBGb3JzY2h1bmdzZ2VtZWluc2NoYWZ0LDwvYXV0aG9yPjwv

YXV0aG9ycz48L2NvbnRyaWJ1dG9ycz48dGl0bGVzPjx0aXRsZT5TYWZlZ3VhcmRpbmcgR29vZCBT

Y2llbnRpZmljIFByYWN0aWNlPC90aXRsZT48L3RpdGxlcz48ZGF0ZXM+PHllYXI+MjAxMzwveWVh

cj48L2RhdGVzPjxwdWJsaXNoZXI+V2lsZXktVkNIPC9wdWJsaXNoZXI+PGlzYm4+OTc4LTMtNTI3

LTMzNzAzLTM8L2lzYm4+PHdvcmstdHlwZT5NZW1vcmFuZHVtPC93b3JrLXR5cGU+PHVybHM+PHJl

bGF0ZWQtdXJscz48dXJsPmh0dHBzOi8vd3d3LmRmZy5kZS9lbi9yZXNlYXJjaF9mdW5kaW5nL3By

aW5jaXBsZXNfZGZnX2Z1bmRpbmcvZ29vZF9zY2llbnRpZmljX3ByYWN0aWNlL2luZGV4Lmh0bWw8

L3VybD48L3JlbGF0ZWQtdXJscz48L3VybHM+PGxhbmd1YWdlPjEzPC9sYW5ndWFnZT48L3JlY29y

ZD48L0NpdGU+PC9FbmROb3RlPn==

ADDIN EN.CITE PEVuZE5vdGU+PENpdGU+PEF1dGhvcj5BbmRlcnNlbjwvQXV0aG9yPjxZZWFyPjE5OTk8L1llYXI+

PFJlY051bT4yMDwvUmVjTnVtPjxEaXNwbGF5VGV4dD5bMTAsIDExXTwvRGlzcGxheVRleHQ+PHJl

Y29yZD48cmVjLW51bWJlcj4yMDwvcmVjLW51bWJlcj48Zm9yZWlnbi1rZXlzPjxrZXkgYXBwPSJF

TiIgZGItaWQ9InJhdnhmMDlwczkycDBiZXN2czc1ZHN3ejUwZmFwMHBheHgyeCIgdGltZXN0YW1w

PSIxNTY0MDMwMzIwIj4yMDwva2V5PjwvZm9yZWlnbi1rZXlzPjxyZWYtdHlwZSBuYW1lPSJKb3Vy

bmFsIEFydGljbGUiPjE3PC9yZWYtdHlwZT48Y29udHJpYnV0b3JzPjxhdXRob3JzPjxhdXRob3I+

QW5kZXJzZW4sIEQuPC9hdXRob3I+PC9hdXRob3JzPjwvY29udHJpYnV0b3JzPjxhdXRoLWFkZHJl

c3M+RGFuaXNoIENvbW1pdHRlZSBvbiBTY2llbnRpZmljIERpc2hvbmVzdHksIERhbmlzaCBSZXNl

YXJjaCBBZ2VuY3ksIENvcGVuaGFnZW4uPC9hdXRoLWFkZHJlc3M+PHRpdGxlcz48dGl0bGU+R3Vp

ZGVsaW5lcyBmb3IgZ29vZCBzY2llbnRpZmljIHByYWN0aWNlPC90aXRsZT48c2Vjb25kYXJ5LXRp

dGxlPkRhbiBNZWQgQnVsbDwvc2Vjb25kYXJ5LXRpdGxlPjxhbHQtdGl0bGU+RGFuaXNoIG1lZGlj

YWwgYnVsbGV0aW48L2FsdC10aXRsZT48L3RpdGxlcz48cGVyaW9kaWNhbD48ZnVsbC10aXRsZT5E

YW4gTWVkIEJ1bGw8L2Z1bGwtdGl0bGU+PGFiYnItMT5EYW5pc2ggbWVkaWNhbCBidWxsZXRpbjwv

YWJici0xPjwvcGVyaW9kaWNhbD48YWx0LXBlcmlvZGljYWw+PGZ1bGwtdGl0bGU+RGFuIE1lZCBC

dWxsPC9mdWxsLXRpdGxlPjxhYmJyLTE+RGFuaXNoIG1lZGljYWwgYnVsbGV0aW48L2FiYnItMT48

L2FsdC1wZXJpb2RpY2FsPjxwYWdlcz42MC0xPC9wYWdlcz48dm9sdW1lPjQ2PC92b2x1bWU+PG51

bWJlcj4xPC9udW1iZXI+PGtleXdvcmRzPjxrZXl3b3JkPkRlbm1hcms8L2tleXdvcmQ+PGtleXdv

cmQ+Kkd1aWRlbGluZXMgYXMgVG9waWM8L2tleXdvcmQ+PGtleXdvcmQ+UmVzZWFyY2gvKnN0YW5k

YXJkczwva2V5d29yZD48L2tleXdvcmRzPjxkYXRlcz48eWVhcj4xOTk5PC95ZWFyPjxwdWItZGF0

ZXM+PGRhdGU+RmViPC9kYXRlPjwvcHViLWRhdGVzPjwvZGF0ZXM+PGlzYm4+MDkwNy04OTE2IChQ

cmludCkmI3hEOzA5MDctODkxNiAoTGlua2luZyk8L2lzYm4+PGFjY2Vzc2lvbi1udW0+MTAwODE2

NTQ8L2FjY2Vzc2lvbi1udW0+PHVybHM+PHJlbGF0ZWQtdXJscz48dXJsPmh0dHA6Ly93d3cubmNi

aS5ubG0ubmloLmdvdi9wdWJtZWQvMTAwODE2NTQ8L3VybD48L3JlbGF0ZWQtdXJscz48L3VybHM+

PGxhbmd1YWdlPjEyPC9sYW5ndWFnZT48L3JlY29yZD48L0NpdGU+PENpdGU+PEF1dGhvcj5ERkcg

RGV1dHNjaGUgRm9yc2NodW5nc2dlbWVpbnNjaGFmdDwvQXV0aG9yPjxZZWFyPjIwMTM8L1llYXI+

PFJlY051bT4yNTwvUmVjTnVtPjxyZWNvcmQ+PHJlYy1udW1iZXI+MjU8L3JlYy1udW1iZXI+PGZv

cmVpZ24ta2V5cz48a2V5IGFwcD0iRU4iIGRiLWlkPSJyYXZ4ZjA5cHM5MnAwYmVzdnM3NWRzd3o1

MGZhcDBwYXh4MngiIHRpbWVzdGFtcD0iMTU2NDAzMDc4NiI+MjU8L2tleT48L2ZvcmVpZ24ta2V5

cz48cmVmLXR5cGUgbmFtZT0iQm9vayI+NjwvcmVmLXR5cGU+PGNvbnRyaWJ1dG9ycz48YXV0aG9y

cz48YXV0aG9yPkRGRyBEZXV0c2NoZSBGb3JzY2h1bmdzZ2VtZWluc2NoYWZ0LDwvYXV0aG9yPjwv

YXV0aG9ycz48L2NvbnRyaWJ1dG9ycz48dGl0bGVzPjx0aXRsZT5TYWZlZ3VhcmRpbmcgR29vZCBT

Y2llbnRpZmljIFByYWN0aWNlPC90aXRsZT48L3RpdGxlcz48ZGF0ZXM+PHllYXI+MjAxMzwveWVh

cj48L2RhdGVzPjxwdWJsaXNoZXI+V2lsZXktVkNIPC9wdWJsaXNoZXI+PGlzYm4+OTc4LTMtNTI3

LTMzNzAzLTM8L2lzYm4+PHdvcmstdHlwZT5NZW1vcmFuZHVtPC93b3JrLXR5cGU+PHVybHM+PHJl

bGF0ZWQtdXJscz48dXJsPmh0dHBzOi8vd3d3LmRmZy5kZS9lbi9yZXNlYXJjaF9mdW5kaW5nL3By

aW5jaXBsZXNfZGZnX2Z1bmRpbmcvZ29vZF9zY2llbnRpZmljX3ByYWN0aWNlL2luZGV4Lmh0bWw8

L3VybD48L3JlbGF0ZWQtdXJscz48L3VybHM+PGxhbmd1YWdlPjEzPC9sYW5ndWFnZT48L3JlY29y

ZD48L0NpdGU+PC9FbmROb3RlPn==

ADDIN EN.CITE.DATA [10, 11] for scientists stressed their societal responsibilities, including the importance of open access to data for the scientific community so that results can be replicated and validated. Similarly, funding agencies also began requiring “open access” publication of NSD ADDIN EN.CITE <EndNote><Cite><Author>National Institutes of Health</Author><Year>2003</Year><RecNum>70</RecNum><DisplayText>[12]</DisplayText><record><rec-number>70</rec-number><foreign-keys><key app="EN" db-id="ravxf09ps92p0besvs75dswz50fap0paxx2x" timestamp="1565085562">70</key></foreign-keys><ref-type name="Web Page">12</ref-type><contributors><authors><author>National Institutes of Health,</author></authors></contributors><titles><title>Final NIH statement on sharing research data</title></titles><volume>2019</volume><number>Aug 06</number><dates><year>2003</year></dates><urls><related-urls><url>;[12]. Open access is an important term in science and policy circles which has been defined by UNESCO and includes the practice of open access publication of data including NSD and stems from societal demands for accountability, scientific integrity, prevention of fraud and misconduct, as well as acceleration of discovery and innovation through free and rapid exchange of information.3.2 Analysis of the public NSD database landscape As practicing biologists, we know intuitively that there are hundreds of diverse biological databases that contain NSD that serve many different disciplines and purposes. In order to perform a standardized, quantitative, peer-reviewed, and transparent analysis for the purposes of this study, we looked for a documented dataset that described the vast majority of known biological databases. For that purpose, and as was briefly summarized in the Laird & Wynberg study (p.28) ADDIN EN.CITE <EndNote><Cite><Author>Laird</Author><Year>2018</Year><RecNum>9</RecNum><DisplayText>[2]</DisplayText><record><rec-number>9</rec-number><foreign-keys><key app="EN" db-id="ravxf09ps92p0besvs75dswz50fap0paxx2x" timestamp="1563949168">9</key></foreign-keys><ref-type name="Report">27</ref-type><contributors><authors><author>Sarah A. Laird</author><author>Rachel P. Wynberg</author></authors></contributors><titles><title>A Fact Finding and Scoping Study on Digital Sequence Information on Genetic Resources in the Context of the Convention on Biological Diversity and the Nagoya Protocol</title></titles><dates><year>2018</year><pub-dates><date>Jan 10</date></pub-dates></dates><pub-location>Montreal, Canada</pub-location><publisher>United Nations</publisher><urls><related-urls><url>;[2] we turned to the annual database issue of the journal Nucleic Acids Research (NAR) ADDIN EN.CITE <EndNote><Cite><Author>Rigden</Author><Year>2019</Year><RecNum>37</RecNum><DisplayText>[13]</DisplayText><record><rec-number>37</rec-number><foreign-keys><key app="EN" db-id="ravxf09ps92p0besvs75dswz50fap0paxx2x" timestamp="1564033677">37</key></foreign-keys><ref-type name="Journal Article">17</ref-type><contributors><authors><author>Rigden, D. J.</author><author>Fernandez, X. M.</author></authors></contributors><auth-address>Institute of Integrative Biology, University of Liverpool, Crown Street, Liverpool L69 7ZB, UK.&#xD;Institut Curie, 25 rue d&apos;Ulm, 75005 Paris, France.</auth-address><titles><title>The 26th annual Nucleic Acids Research database issue and Molecular Biology Database Collection</title><secondary-title>Nucleic Acids Res</secondary-title><alt-title>Nucleic acids research</alt-title></titles><periodical><full-title>Nucleic Acids Res</full-title><abbr-1>Nucleic acids research</abbr-1></periodical><alt-periodical><full-title>Nucleic Acids Res</full-title><abbr-1>Nucleic acids research</abbr-1></alt-periodical><pages>D1-D7</pages><volume>47</volume><number>D1</number><dates><year>2019</year><pub-dates><date>Jan 8</date></pub-dates></dates><isbn>1362-4962 (Electronic)&#xD;0305-1048 (Linking)</isbn><accession-num>30626175</accession-num><urls><related-urls><url>, 42</language></record></Cite></EndNote>[13], which provides a yearly update of peer-reviewed biological databases that are categorized and evaluated by their peers. NAR is a scientific journal focusing on nucleic acids -- RNA and DNA -- research. In 1991, it started publishing an overview over molecular biology databases and has been over nearly 30 years the most comprehensive collection of such databases. From 2001 to 2016, 50 to 100 new databases came into the issue each year. During that same period, 3.8% of the databases declined per year, as a result of insufficient funding and maintenance. As noted by Laird and Wynberg, at the time of the writing of that study there were 1,500 biological databases, and as of the writing of this study, >1,600 databases.In general, there are two basic functions of public NSD databases, with many of them fulfilling both:Knowledge hub: The database sums up knowledge on a specialized topic, by collecting and rearranging information from other databases and already existing scientific publications.Bioinformatic tools: This database provides a tool for researchers to analyse/process either their own research data or the data/information provided by the (knowledge hub) database.The first type corresponds to the traditional perception of a database. In the second case it is not a conventional database, but an algorithm for predictions. For example, the database RAID (RNA Interactome Database) ADDIN EN.CITE <EndNote><Cite><Author>Yi</Author><Year>2017</Year><RecNum>111</RecNum><DisplayText>[14]</DisplayText><record><rec-number>111</rec-number><foreign-keys><key app="EN" db-id="ravxf09ps92p0besvs75dswz50fap0paxx2x" timestamp="1568782781">111</key></foreign-keys><ref-type name="Web Page">12</ref-type><contributors><authors><author>Ying Yi</author><author>Yue Zhao</author><author>Chunhua Li</author><author>Lin Zhang</author><author>Huiying Huang</author><author>Yana Li</author><author>Lanlan Liu</author><author>Ping Hou</author><author>Tianyu Cui</author><author>Puwen Tan</author><author>Yongfei Hu</author><author>Ting Zhang</author><author>Yan Huang</author><author>Xiaobo Li</author><author>Jia Yu</author><author>Dong Wang</author></authors></contributors><titles><title>RAID v2.0: an updated resource of RNA-associated interactions across organisms</title></titles><volume>2019</volume><number>Sep 18</number><dates><year>2017</year></dates><pub-location>Nucleic Acids Research</pub-location><urls><related-urls><url>;[14] has the tool PRIdictor (“Protein RNA Interaction Predictor”) ADDIN EN.CITE <EndNote><Cite><Author>Yi</Author><RecNum>112</RecNum><DisplayText>[15]</DisplayText><record><rec-number>112</rec-number><foreign-keys><key app="EN" db-id="ravxf09ps92p0besvs75dswz50fap0paxx2x" timestamp="1568783007">112</key></foreign-keys><ref-type name="Web Page">12</ref-type><contributors><authors><author>Ying Yi</author><author>Yue Zhao</author><author>Chunhua Li</author><author>Lin Zhang</author><author>Huiying Huang</author><author>Yana Li</author><author>Lanlan Liu</author><author>Ping Hou</author><author>Tianyu Cui</author><author>Puwen Tan</author><author>Yongfei Hu</author><author>Ting Zhang</author><author>Yan Huang</author><author>Xiaobo Li</author><author>Jia Yu</author><author>Dong Wang</author></authors></contributors><titles><title>PRIdictor - Protein-RNA Interaction Predictor</title></titles><volume>2019</volume><number>Sep 18</number><dates></dates><urls><related-urls><url>;[15] on its webpage, which predicts the interaction between RNA molecules and proteins. Here, a researcher can submit the NSD of an RNA molecule and the amino acid sequence of a protein and get a prediction of how these two molecules will bind to each other. When PRIdictor is used, no NSD or SI is accessed directly by the user, nor is any NSD submitted into the database. Instead, the prediction function of such a tool is based on the real life observations reported in scientific publications. Many such tools enable the user to type in the Accession Number (AN, a type of unique identifier, see Section 4.1) of an NSD entry instead of the raw NSD itself.Many databases consist of a knowledge hub, where NSD and SI on a specific topic can be accessed, as well as one or more bioinformatic tools. In the case of RAID, the database has collected and stored information on the binding of RNA molecules. It offers tools for interaction predictions based on that information, e.g. PRIdictor for RNA-protein interactions, and also a tool to extract information on RNA binding out of other publications by using the PubMed ID (explained in Section 4.1) to access and machine read the publicationPublic databases are normally created by researchers and public institutions and reflect their respective field of specialization. The creation or significant update of such a database results in a scientific publication, which is an additional incentive for the setup and improvement of such databases. The majority of such databases, after they are established during the project funding phase, are minimally, if at all, maintained, meaning webpages are infrequently updated, functions become defunct, or new data and bioinformatics tools are not added. The main reason for this is that the researcher or institution has to do database maintenance alongside their usual academic business and often the short-term public funding for such a database last for just a few years. The academic system is cyclical, as well as publication- and result-based rather than intended to build long-term infrastructure.In order to be self-sustaining, databases sometimes try to switch from open access to subscription models. One example is the TAIR database (The Arabidopsis Information Resource). It was created by public funding and now has subscription fees based on the amount of usage by the user of the previous year (e.g., for academic institutions, costs range from $1,000 to 8,000 USD per year). However, the subscription is only to use and access all information and tools provided by TAIR. NSD entries themselves can still be browsed without an account or a subscription and NSD entries are published with their ANs linking them to the original NSD entries at the INSDC. The value of the subscription in this case is not the NSD per se but rather the additional value added by the TAIR website.NSD submission to public databases (aka How important is INSDC really?)At the outset of this study, based on our scientific experience, it was our hypothesis that public NSD databases “revolve around” or “sit on top” of the core public infrastructure provided by the INSDC. If correct, this would mean that the number, size, and purpose of these biological databases is not as important to understand as it is to understand the structure of the database landscape (Figure 6). To test this hypothesis scientifically, we undertook an analysis of the broader biological database landscape to determine how central the INSDC actually is. This analysis also informs the question of NSD traceability because we simultaneously determined how many databases besides the three INSDC databases allow direct user submission of NSD, as well as how such databases use any form of NSD identifiers and how they are connected with the INSDC.Submission of NSD to a database is the first step of making it available to the public and thus an important initial point of NSD traceability. The current NAR database issue (described above) contains 1,613 biological databases, which are divided into 15 biological subcategories. Since these categories overlap, databases can be listed more than once, resulting in 1,778 total entries (Figure 1).The first analysis we performed was to ask how many of these 1,778 database entries focus on NSD (see scope discussion above; Figure 1). We analysed the 15 subcategories and manually reviewed their contents to screen out databases that did not deal with NSD (805, 45.3%) as well as duplicate entries (165, 9.3%). This left us with 808 databases (45.4%) that potentially deal with NSD in some way. The non-NSD databases that were excluded deal with the AHTEG categories (c) and (d) as well as protein data. We then further excluded databases dealing exclusively with human NSD leaving us with 743 databases. Figure 1. Public database inventory. The inverted pyramid represents the analysis process of the Nucleic Acids Research annual summary of biological databases. In each row, an analysis was conducted to determine at each level (moving from top to bottom): how many databases likely contain NSD; how many of the NSD databases contain non-human NSD; how many of the non-human NSD databases enable the user to directly upload NSD to the database; and, finally, how many of these non-human NSD databases operate outside the INSDC system of ANs and PubMed IDs.The next and most important question for this study was to ask, of these 743 non-human NSD databases, how many allow the user to submit their NSD to the database? In total, only 38 databases allow the submission of (non-human) NSD by users. The reason behind this low number is that public databases are not meant to be collectors and storage bins of NSD, but knowledge hubs and bioinformatics tools for scientific research (see above discussion). Public databases are predominantly created by researchers and public institutions. In order for those databases to be accurate and of value for the scientific community, their underlying information sources must be as accurate and validated as possible, thus their reliance on the INSDC as the central infrastructure for NSD. Using high-quality scientific databases (in this case, the INSDC) as a data source for NSD is more attractive for a database operator than relying on user uploads of non-trusted information. In other words, standardized, verified, and comprehensive NSD is more useful to a database operator than a user upload function where variability is much higher. Additionally, publications (which describe the NSD or the database itself) are peer-reviewed and thus considered reliable. Finally, the NAR database issue also lists databases which are “pure” bioinformatic tools. As these do not “allow upload” of NSD, they were excluded as well in this analysis step.Public databases operating outside the INSDC systemThe interconnectivity of these remaining 38 databases as well as the submission requirements were then analysed, focusing on their traceability options. Seven of these databases are part of the INSDC, meaning that they are run by either GenBank or EBI and thus assign ANs (see Section 4.1). Another 12 of the 38 databases obligatorily use INSDC-generated ANs either as a prerequisite for submission or because the database is synchronized with the INSDC, meaning that all submitted NSD will be synchronized with the INSDC and get an AN if they do not already have one. 19 of the 38 Databases require information on publications connected to the NSD submitted either in the form of PubMed ID (see Section 4) stating the already existing publication and/or a publication in progress.As mentioned in the Laird and Wynberg study (their p. 28), databases can be classified as “Primary”, containing raw data, and “Secondary”, containing curated data. However, these categories are actually a categorization schema created and used by the INSDC, which contains several primary and secondary databases (see Section 3.3), but importantly all of these databases fall under the governance structures and access and usage policies of the INSDC (see below). Within the database issue, there was only one database independent of the “INSDC complex” allowing upload of NSD without using identifiers. This is Xenbase ADDIN EN.CITE <EndNote><Cite><Author>Karimi</Author><Year>2018</Year><RecNum>115</RecNum><DisplayText>[16]</DisplayText><record><rec-number>115</rec-number><foreign-keys><key app="EN" db-id="ravxf09ps92p0besvs75dswz50fap0paxx2x" timestamp="1568783198">115</key></foreign-keys><ref-type name="Web Page">12</ref-type><contributors><authors><author>Karimi, K.</author><author>Fortriede, J. D.</author><author>Lotay, V. S.</author><author>Burns, K. A.</author><author>Wang, D. Z.</author><author>Fisher, M. E.</author><author>Pells, T. J.</author><author>James-Zorn, C.</author><author>Wang, Y.</author><author>Ponferrada, V. G.</author><author>Chu, S.</author><author>Chaturvedi, P.</author><author>Zorn, A. M.</author><author>Vize, P. D.</author></authors></contributors><titles><title>Xenbase: a genomic, epigenomic and transcriptomic model organism database</title></titles><volume>2019</volume><number>Sep 18</number><dates><year>2018</year></dates><pub-location>Nucleic Acids Research</pub-location><urls><related-urls><url>;[16], a database focusing on Xenopus laevis, a frog species serving as a model organism for scientific research. It allows upload of raw sequence data (without AN, not synchronized with INSDC), because it also serves as a “workbench” database for researchers (see section 4.1). Since Xenbase is funded by the National Institute of Health (USA), which also funds GenBank, it should be viewed as an upstream workbench database tailor-made for research on the model organism Xenopus laevis. Two non-human NSD databases based in China were discovered during the peer-review process. The BIG Data Center for Life and Health and Genome Warehouse, which allow for primary upload of NSD and use ANs and NSD is available under open access. At least part of the NSD is synchronized or downloaded from INSDC but it is possibly not 100%. Discussions are underway between the INSDC and the Chinese databases to more completely enable NSD synchronization in the future and address slow data transfer rates between China and other countires. In conclusion, there are three databases (0.16%) in the NAR issue that allow for primary upload and incompletely synchronize NSD with the INSDC.Access and use policies of non-INSDC biological databasesBiological databases often have minimal long-term stability or funding. As a result, the personnel and financial capacities are directed towards optimizing the database itself and not for the development of governance or use and access policies, so most databases simply have no formal policies. In general, databases manage their access and use in two ways:Access and use are without restriction. Anybody can visit the website and access everythingUsers must register with an institutional or personal email addressIn the second case, complete access is given normally directly after registration. In some cases, the registrations may need to be accepted by the owner of the database usually to give the owner an overview of the users of the database, but, to our knowledge, generally not as a method to limit or restrict access.Building off of the biological database inventory described above, we analysed the 38 databases that enable NSD upload for use and manually reviewed these databases’ access policies. Since these databases need to be well maintained and stable to deal with uploads, they are the databases most likely to have a formalized use and access policies listed on their websites. Of these 38 databases, only one database stated any terms or conditions of use. S/MARt DB ADDIN EN.CITE <EndNote><Cite><Author>Lieblich</Author><Year>2002</Year><RecNum>125</RecNum><DisplayText>[17]</DisplayText><record><rec-number>125</rec-number><foreign-keys><key app="EN" db-id="ravxf09ps92p0besvs75dswz50fap0paxx2x" timestamp="1568783736">125</key></foreign-keys><ref-type name="Web Page">12</ref-type><contributors><authors><author>Ines Lieblich</author><author>Jürgen Bode</author><author>Matthias Frisch</author><author>Edgar Wingender</author></authors></contributors><titles><title>S/MARt DB: a database on scaffold/matrix attached regions</title></titles><volume>2019</volume><number>Sep 18</number><dates><year>2002</year></dates><pub-location>Nucleic Acids Research</pub-location><urls><related-urls><url>;[17], a database of the University of G?ttingen, Germany states the following sentence:“The S/MAR transaction database is free for users from non-profit organizations only. Users from commercial enterprises have to license, please contact marketing@biobase.de for details.”However, access to this database is without registration, so this relies on the will to comply by commercial users.Some individual datasets within a database may have their own access policy. For example, the European golden eagle genome has a Data Use Policy reserving the right for authors to publish first. However, this policy also relies on the will of users to comply as there is no formal agreement in place.The remaining 37 databases do not indicate any access policy. Quite the contrary, some databases explicitly remind submitters that all their submitted data will be under open access (see UNESCO definition above). This confirms the strong trend towards informal, open access of public NSD databases.GISAID and other biological databases outside of the NAR datasetAs described above, our goal with the above analysis of the inventory of the 1,778 peer-reviewed biological databases curated annually by the NAR was to perform an objective, quantitative assessment of the landscape of biological databases with a particular focus on NSD. There are, of course, biological databases that have not submitted themselves for peer review in this journal that might also prove informative for DSI discussions.An alternative policy approach for managing NSD has been implemented by GISAID (Global Initiative on Sharing All Influenza Data) and its database EpiFlu. Its policy approach and focus on NSD may provide insights for the discussion on DSI. Launched at the 61st World Health Assembly in 2008, GISAID aims at fostering the rapid sharing of influenza virus NSD. It does so by addressing three major obstacles of influenza data sharing:Scientists do not want to publish raw data upfront (fear of being “scooped”)Countries do not want to be connected with outbreaks but also want to safeguard IP rightsFunding, coordination, legitimacy and international leadership are neededThe first point is a general problem in science. A researcher creates and collects raw data in order to analyze it, which can take years, and finally generate a scientific publication. If a researcher publishes his raw data upfront, he risks that another researcher uses the data and generates a publication faster (also called getting “scooped”). In the case of NSD, workbench databases are often used until the data is curated and the publication is ready (Section 4.1). Scientific publications and their impact are the main parameter determining a researcher’s success. This explains why researchers are very reluctant to publish raw data up-front or give detailed information on their research, although later in their scientific publication they exhaustively report on both. Due to the potential lethality of influenza outbreaks, the immediate sharing of influenza data is of major importance to monitor outbreaks and develop vaccines. GISAID requires users to acknowledge the origin and submitting laboratories of NSD in their publication and make best efforts to collaborate with them, thus removing the threat of “scooping” and making data sharing beneficial for the submitter.The second point contains several different issues. Countries can be reluctant to share influenza data to prevent the bad publicity of being seen as the center of an outbreak. Especially low and mid-income countries also fear that the publication of the data will lead to IP protected vaccine development, which then might be too expensive for their own population. GISAID enables the submitters of data and the recognition of their rights. It also acts as informal platform for conflict solving and trust building between countries. GISAID is part of the PIP Framework (Pandemic Influenza Preparedness) of the WHO, which is the body to reconcile interests between member countries, industry and other stakeholders.As already mentioned (section 3.2), lack of continuous funding is a general problem for most scientific databases. GISAID was started by private and corporate philanthropy (millions of USD) and is now continuously funded by the Federal Republic of Germany. The private person that started GISAID, also used much of his personal time to act as an informal communicator and mediator between stakeholders, facilitating the unique political success of GISAID. GISAID requires all users to create an account and identify themselves and their affiliation. Accounts are visible for all other registered users (necessary for communication). Users agree to not share data from GISAID with third parties that are not registered users of GISAID. Uploaded data contains the NSD and additional information, primarily the laboratory/researcher from which the physical influenza originated and the laboratory that did the sequencing and data submission. Users that want to utilize NSD entries in GISAID are required to inform these providers and, to the extent possible, collaborate with them in research projects/publications (with the minimum condition that the providing laboratories get mentioned in the publication). If a breach of policy is reported and verified, the respective user is banned from GISAID. GISAID is an interesting model but there are important scientific and technical considerations that should be considered in the context of the CBD. First, although users agree to the terms of use, the system provides no traceability once data is accessed/downloaded (similar to the general problem of traceability outside of databases, see section 4.1). More importantly, for biologists, GISAID has created a split in the broader viral and flu dataset and created a silo of data. GISAID is only for pandemic flu strains and it serves a useful purpose for NSD sharing in this unique space. However, this model has limitations for enabling biological research that need to compare pandemic flu NSD with other non-pandemic flu strains (which are very closely related genetically) and/or even more distant viral relatives or with NSD from other organisms. This data silo means that data does not flow easily within the biological database landscape and may hamper efficient research practices (see Section 6.3 for mention of “island” databases). In 2016, GISAID had over 6,500 registered users and hosted over 650,000 sequence entries. By comparison, GenBank had 5,848,882 users and hosted 231,211,621 sequence entries in 2018.Within the Global Measles and Rubella Laboratory Network (GMRLN), the WHO hosts two databases called Measle Nucleotide Surveillance (MeaNS) and Rubella nucleotide Surveillance (RubeNS). Measles and Rubella are both viruses for which the WHO runs surveillance and vaccination programs (primarily through GMRLN). These databases require registration (approved by the respective national laboratory of GMRLN), but their focus is primarily on collecting NSD and not on providing a framework for upfront NSD upload. They enable submitters to automatically submit their NSD entry to the INSDC and then update themselves with the new AN generated by the INSDC.Conclusions on the public database analysisBased on the results of the public database inventory above, the INDSC is the core database structure for research on NSD. The vast majority of all NSD used within the public sphere is within the INSDC and it is the first address and core infrastructure to obtain large amounts of NSD. Within the NAR database issue, three databases outside the INSDC complex could be found that enable user upload of primary NSD. Therefore, our subsequent analysis of database access and use policy is focused on the INSDC. In the rare cases (38 out of 743) where the upload of secondary, curated NSD is possible, this is done via ANs and PubMed IDs (see Section 4 on traceability). Submitting the underlying NSD of a scientific research project to the INSDC is also a standard requirement by scientific journals in order to publish a scientific publication.3.3 The INSDCThe International Nucleotide Sequence Database Collaboration (INSDC) is a long-standing cooperation for the permanent storage of NSD consisting of three large databases: The National Center for Biotechnology Information (NCBI) with GenBank in the USA;The European Nucleotide Archive (ENA) maintained at the EMBL-European Bioinformatics Institute (EMBL-EBI) in the United Kingdom under the auspices of the European Molecular Biology Laboratory (EMBL) in Germany;The DNA Data Bank of Japan (DDBJ) at the National Institute for Genetics (NIG) in Japan.Together these databases are the “core” repositories of public NSD for the scientific community, with millions of sequences per year submitted to them. For a more complete history of the INSDC, see Stevens’ “Globalizing Genomics: The Origins of the International Nucleotide Sequence Database Collaboration” ADDIN EN.CITE <EndNote><Cite><Author>Stevens</Author><Year>2018</Year><RecNum>26</RecNum><DisplayText>[18]</DisplayText><record><rec-number>26</rec-number><foreign-keys><key app="EN" db-id="ravxf09ps92p0besvs75dswz50fap0paxx2x" timestamp="1564030963">26</key></foreign-keys><ref-type name="Journal Article">17</ref-type><contributors><authors><author>Stevens, H.</author></authors></contributors><auth-address>School of Humanities and Social Sciences, Nanyang Technological University, 14 Nanyang Drive #05-07, Singapore, 637332, Singapore. hstevens@ntu.edu.sg.</auth-address><titles><title>Globalizing Genomics: The Origins of the International Nucleotide Sequence Database Collaboration</title><secondary-title>J Hist Biol</secondary-title><alt-title>Journal of the history of biology</alt-title></titles><periodical><full-title>J Hist Biol</full-title><abbr-1>Journal of the history of biology</abbr-1></periodical><alt-periodical><full-title>J Hist Biol</full-title><abbr-1>Journal of the history of biology</abbr-1></alt-periodical><pages>657-691</pages><volume>51</volume><number>4</number><keywords><keyword>Databases, Nucleic Acid/*history</keyword><keyword>Europe</keyword><keyword>Genomics/*history</keyword><keyword>History, 20th Century</keyword><keyword>History, 21st Century</keyword><keyword>Information Storage and Retrieval</keyword><keyword>Japan</keyword><keyword>Nucleotides/*analysis</keyword><keyword>United States</keyword></keywords><dates><year>2018</year><pub-dates><date>Dec</date></pub-dates></dates><isbn>1573-0387 (Electronic)&#xD;0022-5010 (Linking)</isbn><accession-num>28986915</accession-num><urls><related-urls><url>;[18].All three databases (GenBank, ENA and DDBJ) automatically exchange (“mirror”) all NSD with each other on a daily basis. This mirroring enables data integrity and security and means that access or analysis of the data from one database represents all three. Although the datasets are identical, they do offer different user platforms, tools, and analyses, which can lead to preferences for one database over another for practical reasons depending on the scientific analysis needed. For example, the metagenome database MGNify ADDIN EN.CITE <EndNote><Cite><Author>Mitchell</Author><Year>2018</Year><RecNum>127</RecNum><DisplayText>[19]</DisplayText><record><rec-number>127</rec-number><foreign-keys><key app="EN" db-id="ravxf09ps92p0besvs75dswz50fap0paxx2x" timestamp="1568783861">127</key></foreign-keys><ref-type name="Web Page">12</ref-type><contributors><authors><author>Mitchell, A. L.</author><author>Scheremetjew, M.</author><author>Denise, H.</author><author>Potter, S.</author><author>Tarkowska, A.</author><author>Qureshi, M.</author><author>Salazar, G. A.</author><author>Pesseat, S.</author><author>Boland, M. A.</author><author>Hunter, F. M. I.</author><author>Ten Hoopen, P.</author><author>Alako, B.</author><author>Amid, C.</author><author>Wilkinson, D. J.</author><author>Curtis, T. P.</author><author>Cochrane, G.</author><author>Finn, R. D.</author></authors></contributors><titles><title>EBI Metagenomics in 2017: enriching the analysis of microbial communities, from sequence reads to assemblies</title></titles><volume>2019</volume><number>Sep 18</number><dates><year>2018</year></dates><pub-location>Nucleic Acids Research</pub-location><urls><related-urls><url>;[19] run by EMBL-EBI is a subset of the NSD dataset exchanged by INSDC but a specialty database for comparison and analysis of metagenomes. This concept, of using only a subset of the entire NSD dataset available in INSDC and making it more accessible and interpretable for the needs of a specific community, is very common and further illustrated in Figure 2.Figure 2: Representation of INSDC and its connected instruments. The top of the columns name the three institutions behind the INSDC, the columns show some examples of databases, platforms, and tools listed and linked on the webpages of the respective institution. The three Nucleotide databases (GenBank, ENA, DDBJ) are daily synchronized and thus have identical content. The three points at the end of each column indicate that the lists of databases and resources are many more than can be listed here. Entities that appear in the list are somehow connected to and sometimes also hosted by the respective INSDC member institutions. Therefore, they directly use shared infrastructures and adhere to many of the same institutional policies and governance but are not necessarily owned by that same institution (e.g. UniProt).There are a number of larger databases that are tightly integrated with the INSDC that exchange NSD and SI with the INSDC in various ways. These databases are often funded by the same public sources and have members of the INSDC or connected institutions in their steering board. One example is Wormbase ADDIN EN.CITE <EndNote><Cite><Author>Stein</Author><Year>2001</Year><RecNum>128</RecNum><DisplayText>[20]</DisplayText><record><rec-number>128</rec-number><foreign-keys><key app="EN" db-id="ravxf09ps92p0besvs75dswz50fap0paxx2x" timestamp="1568784160">128</key></foreign-keys><ref-type name="Web Page">12</ref-type><contributors><authors><author>Stein, L.</author><author>Sternberg, P.</author><author>Durbin, R.</author><author>Thierry-Mieg, J.</author><author>Spieth, J.</author></authors></contributors><titles><title>WormBase: network access to the genome and biology of Caenorhabditis elegans</title></titles><volume>2019</volume><number>Sep 18</number><dates><year>2001</year></dates><pub-location>Nucleic Acids Research</pub-location><urls><related-urls><url>;[20], a database and research consortium for the model organism Caenorhabditis elegans and other nematodes (roundworms). Such databases rearrange and curate NSD from the INSDC, connect it with information obtained from scientific publications and make it publicly available.In conclusion, the management, curation, standardization, and inter-operability of large quantities of information (NSD and SI and scientific knowledge), as well as the different tools, and archives provided are the major advantage for the scientific community from the INSDC. All NSD + SI hosted by the INSDC partner institutes is openly accessible without login or registration and is viewed or downloaded by research institutions and companies hundreds of thousands of times a day.There are several public databases ADDIN EN.CITE <EndNote><Cite><Author>Beijing Institute of Genomics</Author><RecNum>73</RecNum><DisplayText>[22-24]</DisplayText><record><rec-number>73</rec-number><foreign-keys><key app="EN" db-id="ravxf09ps92p0besvs75dswz50fap0paxx2x" timestamp="1565085981">73</key></foreign-keys><ref-type name="Web Page">12</ref-type><contributors><authors><author>Beijing Institute of Genomics,</author></authors></contributors><titles><title>Homepage National Genomics Data Center &amp; BIG Data Center</title></titles><volume>2019</volume><number>Aug 06</number><dates></dates><urls><related-urls><url> for Arab Genomic Studies</Author><RecNum>72</RecNum><record><rec-number>72</rec-number><foreign-keys><key app="EN" db-id="ravxf09ps92p0besvs75dswz50fap0paxx2x" timestamp="1565085877">72</key></foreign-keys><ref-type name="Web Page">12</ref-type><contributors><authors><author>Centre for Arab Genomic Studies,</author></authors></contributors><titles><title>Homepage Centre for Arab Genomic Studies</title></titles><volume>2019</volume><number>Aug 06</number><dates></dates><urls><related-urls><url> National GeneBank</Author><RecNum>71</RecNum><record><rec-number>71</rec-number><foreign-keys><key app="EN" db-id="ravxf09ps92p0besvs75dswz50fap0paxx2x" timestamp="1565085718">71</key></foreign-keys><ref-type name="Web Page">12</ref-type><contributors><authors><author>China National GeneBank,</author></authors></contributors><titles><title>Homepage China National GeneBank</title></titles><volume>2019</volume><number>Aug 06</number><dates></dates><urls><related-urls><url>;[22-24] outside of INSDC that operate (in addition to English) in non-English languages -- Chinese and Arabic. They provide intermediary data publication services though on a smaller scale. These public databases help users to receive an AN through a different portal than the three main INSDC databases, although this “alternative AN” has limited acceptability by scientific journals.How are the INSDC databases governed?GenBank is part of the United States federal government and falls within the U.S. Department of Health and Human Services under the U.S. National Institute of Health (NIH) in the National Library of Medicine ADDIN EN.CITE <EndNote><Cite><Author>National Center for Biotechnology Information</Author><RecNum>27</RecNum><DisplayText>[25]</DisplayText><record><rec-number>27</rec-number><foreign-keys><key app="EN" db-id="ravxf09ps92p0besvs75dswz50fap0paxx2x" timestamp="1564031350">27</key></foreign-keys><ref-type name="Web Page">12</ref-type><contributors><authors><author>National Center for Biotechnology Information,</author></authors></contributors><titles><title>About NCBI</title></titles><volume>2019</volume><number>Jul 25</number><dates></dates><urls><related-urls><url>;[25]. The U.S. government and its representatives can make governing decisions, although historically they largely reflect the needs of the scientific community.EMBL-EBI is part of the European Molecular Biology Laboratory (EMBL), an inter-governmental organization with 20 member states (not to be confused with European Union Member States although there is significant overlap) and two non-European associate member states ADDIN EN.CITE <EndNote><Cite><Author>European Bioinformatics Institute</Author><RecNum>29</RecNum><DisplayText>[26, 27]</DisplayText><record><rec-number>29</rec-number><foreign-keys><key app="EN" db-id="ravxf09ps92p0besvs75dswz50fap0paxx2x" timestamp="1564031661">29</key></foreign-keys><ref-type name="Web Page">12</ref-type><contributors><authors><author>European Bioinformatics Institute,</author></authors></contributors><titles><title>Leadership</title></titles><volume>2019</volume><number>Jul 25</number><dates></dates><urls><related-urls><url> Molecular Biology Laboratory</Author><RecNum>28</RecNum><record><rec-number>28</rec-number><foreign-keys><key app="EN" db-id="ravxf09ps92p0besvs75dswz50fap0paxx2x" timestamp="1564031548">28</key></foreign-keys><ref-type name="Web Page">12</ref-type><contributors><authors><author>European Molecular Biology Laboratory,</author></authors></contributors><titles><title>Member States</title></titles><volume>2019</volume><number>Jul 25</number><dates></dates><urls><related-urls><url>;[26, 27]. EMBL-EBI is funded by their Member States as well as the European Commission, US National Institute of Health, Wellcome and UK Research and Innovation (UKRI), as well as a list of private foundations ADDIN EN.CITE <EndNote><Cite><Author>EMBL-EBI</Author><RecNum>30</RecNum><DisplayText>[28]</DisplayText><record><rec-number>30</rec-number><foreign-keys><key app="EN" db-id="ravxf09ps92p0besvs75dswz50fap0paxx2x" timestamp="1564031860">30</key></foreign-keys><ref-type name="Web Page">12</ref-type><contributors><authors><author>EMBL-EBI,</author></authors></contributors><titles><title>How we are funded</title></titles><volume>2019</volume><number>Jul 25</number><dates></dates><urls><related-urls><url>;[28]. EMBL-EBI is accountable to its board of directors, the EMBL Council, and not directly to a single government.DDBJ is part of the National Genetics Research Institute, which was re-organized in 2004, into one of four institutes within the non-governmental Research Organization of Information and Systems ADDIN EN.CITE <EndNote><Cite><Author>National Institute of Genetics</Author><RecNum>31</RecNum><DisplayText>[29]</DisplayText><record><rec-number>31</rec-number><foreign-keys><key app="EN" db-id="ravxf09ps92p0besvs75dswz50fap0paxx2x" timestamp="1564032191">31</key></foreign-keys><ref-type name="Web Page">12</ref-type><contributors><authors><author>National Institute of Genetics,</author></authors></contributors><titles><title>Support Us</title></titles><volume>2019</volume><number>Jul 25</number><dates></dates><urls><related-urls><url>;[29]. The largest funder of DDBJ is the Japanese Ministry of Education, Culture, Sports, Science and Technology (MEXT) and is governed by an international advisory board PEVuZE5vdGU+PENpdGU+PEF1dGhvcj5UYXRlbm88L0F1dGhvcj48WWVhcj4yMDAyPC9ZZWFyPjxS

ZWNOdW0+MzI8L1JlY051bT48RGlzcGxheVRleHQ+WzMwXTwvRGlzcGxheVRleHQ+PHJlY29yZD48

cmVjLW51bWJlcj4zMjwvcmVjLW51bWJlcj48Zm9yZWlnbi1rZXlzPjxrZXkgYXBwPSJFTiIgZGIt

aWQ9InJhdnhmMDlwczkycDBiZXN2czc1ZHN3ejUwZmFwMHBheHgyeCIgdGltZXN0YW1wPSIxNTY0

MDMyNDI0Ij4zMjwva2V5PjwvZm9yZWlnbi1rZXlzPjxyZWYtdHlwZSBuYW1lPSJKb3VybmFsIEFy

dGljbGUiPjE3PC9yZWYtdHlwZT48Y29udHJpYnV0b3JzPjxhdXRob3JzPjxhdXRob3I+VGF0ZW5v

LCBZLjwvYXV0aG9yPjxhdXRob3I+SW1hbmlzaGksIFQuPC9hdXRob3I+PGF1dGhvcj5NaXlhemFr

aSwgUy48L2F1dGhvcj48YXV0aG9yPkZ1a2FtaS1Lb2JheWFzaGksIEsuPC9hdXRob3I+PGF1dGhv

cj5TYWl0b3UsIE4uPC9hdXRob3I+PGF1dGhvcj5TdWdhd2FyYSwgSC48L2F1dGhvcj48YXV0aG9y

Pkdvam9ib3JpLCBULjwvYXV0aG9yPjwvYXV0aG9ycz48L2NvbnRyaWJ1dG9ycz48YXV0aC1hZGRy

ZXNzPkNlbnRlciBmb3IgSW5mb3JtYXRpb24gQmlvbG9neSBhbmQgRE5BIERhdGEgQmFuayBvZiBK

YXBhbiwgTmF0aW9uYWwgSW5zdGl0dXRlIG9mIEdlbmV0aWNzLCBZYXRhLCBNaXNoaW1hIDQxMS04

NTQwLCBKYXBhbi48L2F1dGgtYWRkcmVzcz48dGl0bGVzPjx0aXRsZT5ETkEgRGF0YSBCYW5rIG9m

IEphcGFuIChEREJKKSBmb3IgZ2Vub21lIHNjYWxlIHJlc2VhcmNoIGluIGxpZmUgc2NpZW5jZTwv

dGl0bGU+PHNlY29uZGFyeS10aXRsZT5OdWNsZWljIEFjaWRzIFJlczwvc2Vjb25kYXJ5LXRpdGxl

PjxhbHQtdGl0bGU+TnVjbGVpYyBhY2lkcyByZXNlYXJjaDwvYWx0LXRpdGxlPjwvdGl0bGVzPjxw

ZXJpb2RpY2FsPjxmdWxsLXRpdGxlPk51Y2xlaWMgQWNpZHMgUmVzPC9mdWxsLXRpdGxlPjxhYmJy

LTE+TnVjbGVpYyBhY2lkcyByZXNlYXJjaDwvYWJici0xPjwvcGVyaW9kaWNhbD48YWx0LXBlcmlv

ZGljYWw+PGZ1bGwtdGl0bGU+TnVjbGVpYyBBY2lkcyBSZXM8L2Z1bGwtdGl0bGU+PGFiYnItMT5O

dWNsZWljIGFjaWRzIHJlc2VhcmNoPC9hYmJyLTE+PC9hbHQtcGVyaW9kaWNhbD48cGFnZXM+Mjct

MzA8L3BhZ2VzPjx2b2x1bWU+MzA8L3ZvbHVtZT48bnVtYmVyPjE8L251bWJlcj48a2V5d29yZHM+

PGtleXdvcmQ+QW5pbWFsczwva2V5d29yZD48a2V5d29yZD5BcmFiaWRvcHNpcy9nZW5ldGljczwv

a2V5d29yZD48a2V5d29yZD5CYXNlIFNlcXVlbmNlPC9rZXl3b3JkPjxrZXl3b3JkPkJpb2xvZ2lj

YWwgU2NpZW5jZSBEaXNjaXBsaW5lczwva2V5d29yZD48a2V5d29yZD5EYXRhIENvbGxlY3Rpb248

L2tleXdvcmQ+PGtleXdvcmQ+KkRhdGFiYXNlcywgTnVjbGVpYyBBY2lkPC9rZXl3b3JkPjxrZXl3

b3JkPipHZW5vbWU8L2tleXdvcmQ+PGtleXdvcmQ+R2Vub21lLCBCYWN0ZXJpYWw8L2tleXdvcmQ+

PGtleXdvcmQ+R2Vub21lLCBIdW1hbjwva2V5d29yZD48a2V5d29yZD5HZW5vbWUsIFBsYW50PC9r

ZXl3b3JkPjxrZXl3b3JkPkh1bWFuczwva2V5d29yZD48a2V5d29yZD5KYXBhbjwva2V5d29yZD48

a2V5d29yZD4qU2VxdWVuY2UgQW5hbHlzaXMsIEROQTwva2V5d29yZD48L2tleXdvcmRzPjxkYXRl

cz48eWVhcj4yMDAyPC95ZWFyPjxwdWItZGF0ZXM+PGRhdGU+SmFuIDE8L2RhdGU+PC9wdWItZGF0

ZXM+PC9kYXRlcz48aXNibj4xMzYyLTQ5NjIgKEVsZWN0cm9uaWMpJiN4RDswMzA1LTEwNDggKExp

bmtpbmcpPC9pc2JuPjxhY2Nlc3Npb24tbnVtPjExNzUyMjQ1PC9hY2Nlc3Npb24tbnVtPjx1cmxz

PjxyZWxhdGVkLXVybHM+PHVybD5odHRwOi8vd3d3Lm5jYmkubmxtLm5paC5nb3YvcHVibWVkLzEx

NzUyMjQ1PC91cmw+PHVybD5odHRwczovL3d3dy5uY2JpLm5sbS5uaWguZ292L3BtYy9hcnRpY2xl

cy9QTUM5OTE0MC9wZGYvZ2tmMDg0LnBkZjwvdXJsPjwvcmVsYXRlZC11cmxzPjwvdXJscz48Y3Vz

dG9tMj45OTE0MDwvY3VzdG9tMj48ZWxlY3Ryb25pYy1yZXNvdXJjZS1udW0+MTAuMTA5My9uYXIv

MzAuMS4yNzwvZWxlY3Ryb25pYy1yZXNvdXJjZS1udW0+PGxhbmd1YWdlPjIyPC9sYW5ndWFnZT48

L3JlY29yZD48L0NpdGU+PC9FbmROb3RlPn==

ADDIN EN.CITE PEVuZE5vdGU+PENpdGU+PEF1dGhvcj5UYXRlbm88L0F1dGhvcj48WWVhcj4yMDAyPC9ZZWFyPjxS

ZWNOdW0+MzI8L1JlY051bT48RGlzcGxheVRleHQ+WzMwXTwvRGlzcGxheVRleHQ+PHJlY29yZD48

cmVjLW51bWJlcj4zMjwvcmVjLW51bWJlcj48Zm9yZWlnbi1rZXlzPjxrZXkgYXBwPSJFTiIgZGIt

aWQ9InJhdnhmMDlwczkycDBiZXN2czc1ZHN3ejUwZmFwMHBheHgyeCIgdGltZXN0YW1wPSIxNTY0

MDMyNDI0Ij4zMjwva2V5PjwvZm9yZWlnbi1rZXlzPjxyZWYtdHlwZSBuYW1lPSJKb3VybmFsIEFy

dGljbGUiPjE3PC9yZWYtdHlwZT48Y29udHJpYnV0b3JzPjxhdXRob3JzPjxhdXRob3I+VGF0ZW5v

LCBZLjwvYXV0aG9yPjxhdXRob3I+SW1hbmlzaGksIFQuPC9hdXRob3I+PGF1dGhvcj5NaXlhemFr

aSwgUy48L2F1dGhvcj48YXV0aG9yPkZ1a2FtaS1Lb2JheWFzaGksIEsuPC9hdXRob3I+PGF1dGhv

cj5TYWl0b3UsIE4uPC9hdXRob3I+PGF1dGhvcj5TdWdhd2FyYSwgSC48L2F1dGhvcj48YXV0aG9y

Pkdvam9ib3JpLCBULjwvYXV0aG9yPjwvYXV0aG9ycz48L2NvbnRyaWJ1dG9ycz48YXV0aC1hZGRy

ZXNzPkNlbnRlciBmb3IgSW5mb3JtYXRpb24gQmlvbG9neSBhbmQgRE5BIERhdGEgQmFuayBvZiBK

YXBhbiwgTmF0aW9uYWwgSW5zdGl0dXRlIG9mIEdlbmV0aWNzLCBZYXRhLCBNaXNoaW1hIDQxMS04

NTQwLCBKYXBhbi48L2F1dGgtYWRkcmVzcz48dGl0bGVzPjx0aXRsZT5ETkEgRGF0YSBCYW5rIG9m

IEphcGFuIChEREJKKSBmb3IgZ2Vub21lIHNjYWxlIHJlc2VhcmNoIGluIGxpZmUgc2NpZW5jZTwv

dGl0bGU+PHNlY29uZGFyeS10aXRsZT5OdWNsZWljIEFjaWRzIFJlczwvc2Vjb25kYXJ5LXRpdGxl

PjxhbHQtdGl0bGU+TnVjbGVpYyBhY2lkcyByZXNlYXJjaDwvYWx0LXRpdGxlPjwvdGl0bGVzPjxw

ZXJpb2RpY2FsPjxmdWxsLXRpdGxlPk51Y2xlaWMgQWNpZHMgUmVzPC9mdWxsLXRpdGxlPjxhYmJy

LTE+TnVjbGVpYyBhY2lkcyByZXNlYXJjaDwvYWJici0xPjwvcGVyaW9kaWNhbD48YWx0LXBlcmlv

ZGljYWw+PGZ1bGwtdGl0bGU+TnVjbGVpYyBBY2lkcyBSZXM8L2Z1bGwtdGl0bGU+PGFiYnItMT5O

dWNsZWljIGFjaWRzIHJlc2VhcmNoPC9hYmJyLTE+PC9hbHQtcGVyaW9kaWNhbD48cGFnZXM+Mjct

MzA8L3BhZ2VzPjx2b2x1bWU+MzA8L3ZvbHVtZT48bnVtYmVyPjE8L251bWJlcj48a2V5d29yZHM+

PGtleXdvcmQ+QW5pbWFsczwva2V5d29yZD48a2V5d29yZD5BcmFiaWRvcHNpcy9nZW5ldGljczwv

a2V5d29yZD48a2V5d29yZD5CYXNlIFNlcXVlbmNlPC9rZXl3b3JkPjxrZXl3b3JkPkJpb2xvZ2lj

YWwgU2NpZW5jZSBEaXNjaXBsaW5lczwva2V5d29yZD48a2V5d29yZD5EYXRhIENvbGxlY3Rpb248

L2tleXdvcmQ+PGtleXdvcmQ+KkRhdGFiYXNlcywgTnVjbGVpYyBBY2lkPC9rZXl3b3JkPjxrZXl3

b3JkPipHZW5vbWU8L2tleXdvcmQ+PGtleXdvcmQ+R2Vub21lLCBCYWN0ZXJpYWw8L2tleXdvcmQ+

PGtleXdvcmQ+R2Vub21lLCBIdW1hbjwva2V5d29yZD48a2V5d29yZD5HZW5vbWUsIFBsYW50PC9r

ZXl3b3JkPjxrZXl3b3JkPkh1bWFuczwva2V5d29yZD48a2V5d29yZD5KYXBhbjwva2V5d29yZD48

a2V5d29yZD4qU2VxdWVuY2UgQW5hbHlzaXMsIEROQTwva2V5d29yZD48L2tleXdvcmRzPjxkYXRl

cz48eWVhcj4yMDAyPC95ZWFyPjxwdWItZGF0ZXM+PGRhdGU+SmFuIDE8L2RhdGU+PC9wdWItZGF0

ZXM+PC9kYXRlcz48aXNibj4xMzYyLTQ5NjIgKEVsZWN0cm9uaWMpJiN4RDswMzA1LTEwNDggKExp

bmtpbmcpPC9pc2JuPjxhY2Nlc3Npb24tbnVtPjExNzUyMjQ1PC9hY2Nlc3Npb24tbnVtPjx1cmxz

PjxyZWxhdGVkLXVybHM+PHVybD5odHRwOi8vd3d3Lm5jYmkubmxtLm5paC5nb3YvcHVibWVkLzEx

NzUyMjQ1PC91cmw+PHVybD5odHRwczovL3d3dy5uY2JpLm5sbS5uaWguZ292L3BtYy9hcnRpY2xl

cy9QTUM5OTE0MC9wZGYvZ2tmMDg0LnBkZjwvdXJsPjwvcmVsYXRlZC11cmxzPjwvdXJscz48Y3Vz

dG9tMj45OTE0MDwvY3VzdG9tMj48ZWxlY3Ryb25pYy1yZXNvdXJjZS1udW0+MTAuMTA5My9uYXIv

MzAuMS4yNzwvZWxlY3Ryb25pYy1yZXNvdXJjZS1udW0+PGxhbmd1YWdlPjIyPC9sYW5ndWFnZT48

L3JlY29yZD48L0NpdGU+PC9FbmROb3RlPn==

ADDIN EN.CITE.DATA [30]. As with EBI, NIH also plays a supportive funding role.INSDC access and use policiesBecause of the central role of the INSDC in the scientific use and analysis of NSD (discussed above and in Section 3.2), it is critical to understand INSDC’s use policy first published in 2002 ADDIN EN.CITE <EndNote><Cite><Author>Brunak</Author><Year>2002</Year><RecNum>33</RecNum><DisplayText>[31]</DisplayText><record><rec-number>33</rec-number><foreign-keys><key app="EN" db-id="ravxf09ps92p0besvs75dswz50fap0paxx2x" timestamp="1564032608">33</key></foreign-keys><ref-type name="Journal Article">17</ref-type><contributors><authors><author>Brunak, S.</author><author>Danchin, A.</author><author>Hattori, M.</author><author>Nakamura, H.</author><author>Shinozaki, K.</author><author>Matise, T.</author><author>Preuss, D.</author></authors></contributors><titles><title>Nucleotide sequence database policies</title><secondary-title>Science</secondary-title><alt-title>Science</alt-title></titles><periodical><full-title>Science</full-title><abbr-1>Science</abbr-1></periodical><alt-periodical><full-title>Science</full-title><abbr-1>Science</abbr-1></alt-periodical><pages>1333</pages><volume>298</volume><number>5597</number><keywords><keyword>*Access to Information</keyword><keyword>Advisory Committees</keyword><keyword>*Databases, Nucleic Acid</keyword><keyword>Guidelines as Topic</keyword><keyword>International Cooperation</keyword><keyword>Internet</keyword><keyword>*Publishing</keyword><keyword>*Sequence Analysis, DNA</keyword></keywords><dates><year>2002</year><pub-dates><date>Nov 15</date></pub-dates></dates><isbn>1095-9203 (Electronic)&#xD;0036-8075 (Linking)</isbn><accession-num>12436968</accession-num><urls><related-urls><url>;[31]:“The INSD has a uniform policy of free and unrestricted access to all of the data records their databases contain. Scientists worldwide can access these records to plan experiments or publish any analysis or critique. Appropriate credit is given by citing the original submission, following the practices of scientists utilizing published scientific literature.The INSD will not attach statements to records that restrict access to the data, limit the use of the information in these records, or prohibit certain types of publications based on these records. Specifically, no use restrictions or licensing requirements will be included in any sequence data records, and no restrictions or licensing fees will be placed on the redistribution or use of the database by any party.All database records submitted to the INSD will remain permanently accessible as part of the scientific record. Corrections of errors and update of the records by authors are welcome and erroneous records may be removed from the next database release, but all will remain permanently accessible by accession number.Submitters are advised that the information displayed on the Web sites maintained by the INSD is fully disclosed to the public. It is the responsibility of the submitters to ascertain that they have the right to submit the data.Beyond limited editorial control and some internal integrity checks (for example, proper use of INSD formats and translation of coding regions specified in CDS entries are verified), the quality and accuracy of the record are the responsibility of the submitting author, not of the database. The databases will work with submitters and users of the database to achieve the best quality resource possible.”The policy clearly outlines that access to NSD from these databases is free, unrestricted and permanent. The INSDC re-affirmed this policy 14 years later in a 2016 publication PEVuZE5vdGU+PENpdGU+PEF1dGhvcj5Db2NocmFuZTwvQXV0aG9yPjxZZWFyPjIwMTY8L1llYXI+

PFJlY051bT41PC9SZWNOdW0+PERpc3BsYXlUZXh0PlszMl08L0Rpc3BsYXlUZXh0PjxyZWNvcmQ+

PHJlYy1udW1iZXI+NTwvcmVjLW51bWJlcj48Zm9yZWlnbi1rZXlzPjxrZXkgYXBwPSJFTiIgZGIt

aWQ9InJhdnhmMDlwczkycDBiZXN2czc1ZHN3ejUwZmFwMHBheHgyeCIgdGltZXN0YW1wPSIxNTYz

ODU4OTg0Ij41PC9rZXk+PC9mb3JlaWduLWtleXM+PHJlZi10eXBlIG5hbWU9IkpvdXJuYWwgQXJ0

aWNsZSI+MTc8L3JlZi10eXBlPjxjb250cmlidXRvcnM+PGF1dGhvcnM+PGF1dGhvcj5Db2NocmFu

ZSwgRy48L2F1dGhvcj48YXV0aG9yPkthcnNjaC1NaXpyYWNoaSwgSS48L2F1dGhvcj48YXV0aG9y

PlRha2FnaSwgVC48L2F1dGhvcj48YXV0aG9yPkludGVybmF0aW9uYWwgTnVjbGVvdGlkZSBTZXF1

ZW5jZSBEYXRhYmFzZSBDb2xsYWJvcmF0aW9uLDwvYXV0aG9yPjwvYXV0aG9ycz48L2NvbnRyaWJ1

dG9ycz48YXV0aC1hZGRyZXNzPkV1cm9wZWFuIE1vbGVjdWxhciBCaW9sb2d5IExhYm9yYXRvcnks

IEV1cm9wZWFuIEJpb2luZm9ybWF0aWNzIEluc3RpdHV0ZSAoRU1CTC1FQkkpLCBXZWxsY29tZSBH

ZW5vbWUgQ2FtcHVzLCBIaW54dG9uLCBDYW1icmlkZ2UgQ0IxMCAxU0QsIFVLIGNvY2hyYW5lQGVi

aS5hYy51ay4mI3hEO05hdGlvbmFsIENlbnRlciBmb3IgQmlvdGVjaG5vbG9neSBJbmZvcm1hdGlv

biwgTmF0aW9uYWwgTGlicmFyeSBvZiBNZWRpY2luZSwgTmF0aW9uYWwgSW5zdGl0dXRlcyBvZiBI

ZWFsdGgsIEJldGhlc2RhLCBNRCAyMDg5NCwgVVNBLiYjeEQ7RERCSiBDZW50ZXIsIE5hdGlvbmFs

IEluc3RpdHV0ZSBmb3IgR2VuZXRpY3MsIE1pc2hpbWEsIEphcGFuLjwvYXV0aC1hZGRyZXNzPjx0

aXRsZXM+PHRpdGxlPlRoZSBJbnRlcm5hdGlvbmFsIE51Y2xlb3RpZGUgU2VxdWVuY2UgRGF0YWJh

c2UgQ29sbGFib3JhdGlvbjwvdGl0bGU+PHNlY29uZGFyeS10aXRsZT5OdWNsZWljIEFjaWRzIFJl

czwvc2Vjb25kYXJ5LXRpdGxlPjxhbHQtdGl0bGU+TnVjbGVpYyBhY2lkcyByZXNlYXJjaDwvYWx0

LXRpdGxlPjwvdGl0bGVzPjxwZXJpb2RpY2FsPjxmdWxsLXRpdGxlPk51Y2xlaWMgQWNpZHMgUmVz

PC9mdWxsLXRpdGxlPjxhYmJyLTE+TnVjbGVpYyBhY2lkcyByZXNlYXJjaDwvYWJici0xPjwvcGVy

aW9kaWNhbD48YWx0LXBlcmlvZGljYWw+PGZ1bGwtdGl0bGU+TnVjbGVpYyBBY2lkcyBSZXM8L2Z1

bGwtdGl0bGU+PGFiYnItMT5OdWNsZWljIGFjaWRzIHJlc2VhcmNoPC9hYmJyLTE+PC9hbHQtcGVy

aW9kaWNhbD48cGFnZXM+RDQ4LTUwPC9wYWdlcz48dm9sdW1lPjQ0PC92b2x1bWU+PG51bWJlcj5E

MTwvbnVtYmVyPjxrZXl3b3Jkcz48a2V5d29yZD5Db29wZXJhdGl2ZSBCZWhhdmlvcjwva2V5d29y

ZD48a2V5d29yZD4qRGF0YWJhc2VzLCBOdWNsZWljIEFjaWQvc3RhbmRhcmRzPC9rZXl3b3JkPjxr

ZXl3b3JkPipIaWdoLVRocm91Z2hwdXQgTnVjbGVvdGlkZSBTZXF1ZW5jaW5nPC9rZXl3b3JkPjxr

ZXl3b3JkPlBvbGljeTwva2V5d29yZD48a2V5d29yZD4qU2VxdWVuY2UgQW5hbHlzaXMsIEROQTwv

a2V5d29yZD48L2tleXdvcmRzPjxkYXRlcz48eWVhcj4yMDE2PC95ZWFyPjxwdWItZGF0ZXM+PGRh

dGU+SmFuIDQ8L2RhdGU+PC9wdWItZGF0ZXM+PC9kYXRlcz48aXNibj4xMzYyLTQ5NjIgKEVsZWN0

cm9uaWMpJiN4RDswMzA1LTEwNDggKExpbmtpbmcpPC9pc2JuPjxhY2Nlc3Npb24tbnVtPjI2NjU3

NjMzPC9hY2Nlc3Npb24tbnVtPjx1cmxzPjxyZWxhdGVkLXVybHM+PHVybD5odHRwOi8vd3d3Lm5j

YmkubmxtLm5paC5nb3YvcHVibWVkLzI2NjU3NjMzPC91cmw+PHVybD5odHRwczovL3d3dy5uY2Jp

Lm5sbS5uaWguZ292L3BtYy9hcnRpY2xlcy9QTUM0NzAyOTI0L3BkZi9na3YxMzIzLnBkZjwvdXJs

PjwvcmVsYXRlZC11cmxzPjwvdXJscz48Y3VzdG9tMj40NzAyOTI0PC9jdXN0b20yPjxlbGVjdHJv

bmljLXJlc291cmNlLW51bT4xMC4xMDkzL25hci9na3YxMzIzPC9lbGVjdHJvbmljLXJlc291cmNl

LW51bT48bGFuZ3VhZ2U+MjQ8L2xhbmd1YWdlPjwvcmVjb3JkPjwvQ2l0ZT48L0VuZE5vdGU+

ADDIN EN.CITE PEVuZE5vdGU+PENpdGU+PEF1dGhvcj5Db2NocmFuZTwvQXV0aG9yPjxZZWFyPjIwMTY8L1llYXI+

PFJlY051bT41PC9SZWNOdW0+PERpc3BsYXlUZXh0PlszMl08L0Rpc3BsYXlUZXh0PjxyZWNvcmQ+

PHJlYy1udW1iZXI+NTwvcmVjLW51bWJlcj48Zm9yZWlnbi1rZXlzPjxrZXkgYXBwPSJFTiIgZGIt

aWQ9InJhdnhmMDlwczkycDBiZXN2czc1ZHN3ejUwZmFwMHBheHgyeCIgdGltZXN0YW1wPSIxNTYz

ODU4OTg0Ij41PC9rZXk+PC9mb3JlaWduLWtleXM+PHJlZi10eXBlIG5hbWU9IkpvdXJuYWwgQXJ0

aWNsZSI+MTc8L3JlZi10eXBlPjxjb250cmlidXRvcnM+PGF1dGhvcnM+PGF1dGhvcj5Db2NocmFu

ZSwgRy48L2F1dGhvcj48YXV0aG9yPkthcnNjaC1NaXpyYWNoaSwgSS48L2F1dGhvcj48YXV0aG9y

PlRha2FnaSwgVC48L2F1dGhvcj48YXV0aG9yPkludGVybmF0aW9uYWwgTnVjbGVvdGlkZSBTZXF1

ZW5jZSBEYXRhYmFzZSBDb2xsYWJvcmF0aW9uLDwvYXV0aG9yPjwvYXV0aG9ycz48L2NvbnRyaWJ1

dG9ycz48YXV0aC1hZGRyZXNzPkV1cm9wZWFuIE1vbGVjdWxhciBCaW9sb2d5IExhYm9yYXRvcnks

IEV1cm9wZWFuIEJpb2luZm9ybWF0aWNzIEluc3RpdHV0ZSAoRU1CTC1FQkkpLCBXZWxsY29tZSBH

ZW5vbWUgQ2FtcHVzLCBIaW54dG9uLCBDYW1icmlkZ2UgQ0IxMCAxU0QsIFVLIGNvY2hyYW5lQGVi

aS5hYy51ay4mI3hEO05hdGlvbmFsIENlbnRlciBmb3IgQmlvdGVjaG5vbG9neSBJbmZvcm1hdGlv

biwgTmF0aW9uYWwgTGlicmFyeSBvZiBNZWRpY2luZSwgTmF0aW9uYWwgSW5zdGl0dXRlcyBvZiBI

ZWFsdGgsIEJldGhlc2RhLCBNRCAyMDg5NCwgVVNBLiYjeEQ7RERCSiBDZW50ZXIsIE5hdGlvbmFs

IEluc3RpdHV0ZSBmb3IgR2VuZXRpY3MsIE1pc2hpbWEsIEphcGFuLjwvYXV0aC1hZGRyZXNzPjx0

aXRsZXM+PHRpdGxlPlRoZSBJbnRlcm5hdGlvbmFsIE51Y2xlb3RpZGUgU2VxdWVuY2UgRGF0YWJh

c2UgQ29sbGFib3JhdGlvbjwvdGl0bGU+PHNlY29uZGFyeS10aXRsZT5OdWNsZWljIEFjaWRzIFJl

czwvc2Vjb25kYXJ5LXRpdGxlPjxhbHQtdGl0bGU+TnVjbGVpYyBhY2lkcyByZXNlYXJjaDwvYWx0

LXRpdGxlPjwvdGl0bGVzPjxwZXJpb2RpY2FsPjxmdWxsLXRpdGxlPk51Y2xlaWMgQWNpZHMgUmVz

PC9mdWxsLXRpdGxlPjxhYmJyLTE+TnVjbGVpYyBhY2lkcyByZXNlYXJjaDwvYWJici0xPjwvcGVy

aW9kaWNhbD48YWx0LXBlcmlvZGljYWw+PGZ1bGwtdGl0bGU+TnVjbGVpYyBBY2lkcyBSZXM8L2Z1

bGwtdGl0bGU+PGFiYnItMT5OdWNsZWljIGFjaWRzIHJlc2VhcmNoPC9hYmJyLTE+PC9hbHQtcGVy

aW9kaWNhbD48cGFnZXM+RDQ4LTUwPC9wYWdlcz48dm9sdW1lPjQ0PC92b2x1bWU+PG51bWJlcj5E

MTwvbnVtYmVyPjxrZXl3b3Jkcz48a2V5d29yZD5Db29wZXJhdGl2ZSBCZWhhdmlvcjwva2V5d29y

ZD48a2V5d29yZD4qRGF0YWJhc2VzLCBOdWNsZWljIEFjaWQvc3RhbmRhcmRzPC9rZXl3b3JkPjxr

ZXl3b3JkPipIaWdoLVRocm91Z2hwdXQgTnVjbGVvdGlkZSBTZXF1ZW5jaW5nPC9rZXl3b3JkPjxr

ZXl3b3JkPlBvbGljeTwva2V5d29yZD48a2V5d29yZD4qU2VxdWVuY2UgQW5hbHlzaXMsIEROQTwv

a2V5d29yZD48L2tleXdvcmRzPjxkYXRlcz48eWVhcj4yMDE2PC95ZWFyPjxwdWItZGF0ZXM+PGRh

dGU+SmFuIDQ8L2RhdGU+PC9wdWItZGF0ZXM+PC9kYXRlcz48aXNibj4xMzYyLTQ5NjIgKEVsZWN0

cm9uaWMpJiN4RDswMzA1LTEwNDggKExpbmtpbmcpPC9pc2JuPjxhY2Nlc3Npb24tbnVtPjI2NjU3

NjMzPC9hY2Nlc3Npb24tbnVtPjx1cmxzPjxyZWxhdGVkLXVybHM+PHVybD5odHRwOi8vd3d3Lm5j

YmkubmxtLm5paC5nb3YvcHVibWVkLzI2NjU3NjMzPC91cmw+PHVybD5odHRwczovL3d3dy5uY2Jp

Lm5sbS5uaWguZ292L3BtYy9hcnRpY2xlcy9QTUM0NzAyOTI0L3BkZi9na3YxMzIzLnBkZjwvdXJs

PjwvcmVsYXRlZC11cmxzPjwvdXJscz48Y3VzdG9tMj40NzAyOTI0PC9jdXN0b20yPjxlbGVjdHJv

bmljLXJlc291cmNlLW51bT4xMC4xMDkzL25hci9na3YxMzIzPC9lbGVjdHJvbmljLXJlc291cmNl

LW51bT48bGFuZ3VhZ2U+MjQ8L2xhbmd1YWdlPjwvcmVjb3JkPjwvQ2l0ZT48L0VuZE5vdGU+

ADDIN EN.CITE.DATA [32] noting:“The core of the INSDC policy is maintaining public access to the global archives of nucleotide data generated in publicly funded experiments. A key instrument for this is submission as a pre-requisite for publication in scholarly journals, a convention in which INSDC partners and publishers work together to ensure timely and smooth flow of data into repositories for release before, or at the time of, literature publication. The primary benefit of this is that scientists all over the world can access these records at any time to plan experiments, analyse published findings or support their critique. It also ensures that the author of the work receives the appropriate credit, and that this narrative context remains linked to underlying data that remain in perpetuity. All database records submitted to the INSDC remain permanently accessible as part of the scientific record.”The extent to which this policy could or would be altered in the future largely depends on the governance structures discussed above.However, GenBank does have a data usage disclaimer on their website ADDIN EN.CITE <EndNote><Cite><Author>National Center for Biotechnology Information</Author><RecNum>34</RecNum><DisplayText>[33]</DisplayText><record><rec-number>34</rec-number><foreign-keys><key app="EN" db-id="ravxf09ps92p0besvs75dswz50fap0paxx2x" timestamp="1564032791">34</key></foreign-keys><ref-type name="Web Page">12</ref-type><contributors><authors><author>National Center for Biotechnology Information,</author></authors></contributors><titles><title>GenBank</title></titles><volume>2019</volume><number>Jul 25</number><dates></dates><urls><related-urls><url>;[33]:“The GenBank database is designed to provide and encourage access within the scientific community to the most up-to-date and comprehensive DNA sequence information. Therefore, NCBI places no restrictions on the use or distribution of the GenBank data. However, some submitters may claim patent, copyright, or other intellectual property rights in all or a portion of the data they have submitted. NCBI is not in a position to assess the validity of such claims, and therefore cannot provide comment or unrestricted permission concerning the use, copying, or distribution of the information contained in GenBank.”EMBL-EBI has a specific mention of benefit sharing on its data usage ADDIN EN.CITE <EndNote><Cite><Author>European Bioinformatics Institute</Author><RecNum>143</RecNum><DisplayText>[34]</DisplayText><record><rec-number>143</rec-number><foreign-keys><key app="EN" db-id="ravxf09ps92p0besvs75dswz50fap0paxx2x" timestamp="1569474153">143</key></foreign-keys><ref-type name="Web Page">12</ref-type><contributors><authors><author>European Bioinformatics Institute,</author></authors></contributors><titles><title>Terms of Use</title></titles><volume>2019</volume><number>Sep 26</number><dates></dates><urls><related-urls><url>;[34] (emphasis added):“The original data may be subject to rights claimed by third parties, including but not limited to, patent, copyright, other intellectual property rights, biodiversity-related access and benefit-sharing rights. For the specific case of the EGA database and human data consented for biomedical research, these rights may be formalised in Data Access Agreements. It is the responsibility of users of EMBL-EBI services to ensure that their exploitation of the data does not infringe any of the rights of such third parties.”Since biological databases rely on the NSD downloaded from or linked to their databases from the INSDC, they also agree to the INSDC use conditions described above. In this way, the INSDC policy has a “ripple effect” on the >1,600 databases that link to its NSD or SI.In terms of access, not only are all the data in the INSDC databases freely available globally to any user with internet access (no registration or login required although users can establish accounts for easier analysis or submission of NSD), all three INSDC databases offer free training ADDIN EN.CITE <EndNote><Cite><Author>EMBL-EBI</Author><RecNum>74</RecNum><DisplayText>[35]</DisplayText><record><rec-number>74</rec-number><foreign-keys><key app="EN" db-id="ravxf09ps92p0besvs75dswz50fap0paxx2x" timestamp="1565094071">74</key></foreign-keys><ref-type name="Web Page">12</ref-type><contributors><authors><author>EMBL-EBI,</author></authors></contributors><titles><title>Training</title></titles><volume>2019</volume><number>Aug 06</number><dates></dates><urls><related-urls><url>;[35] in use of NSD and bioinformatics and develop freely available software and analysis tools.The INSDC also has a very active help desk function and responds relatively quickly to user-originated changes or corrections. However, there is no “wiki” (user-editing) function available to change, edit, or improve NSD or SI available in the INSDC. All change requests must go through the help desk or through specific tools provided by INSDC partners. Financing of the INSDCLarge databases of any kind, including NSD databases, are cost-intensive and require a continuous source of funding to maintain permanent staff, hardware, and software infrastructure. Scientific projects on the other hand, which often are needed to create original databases, are often limited to a period of several years. Even large, successful databases with thousands of users built during a project phase often face a difficult or impossible transition to a permanent database status. Some NSD databases examined during the public database inventory (see Section 4) have an apparent defunct status likely due to a lack or absence of adequate funding and staff. The temporary nature of many databases increases the need for and reliance upon the core infrastructure provided by the INSDC.The annual operating budget of NCBI ADDIN EN.CITE <EndNote><Cite><Author>Department of Health and Human Services</Author><Year>2018</Year><RecNum>35</RecNum><DisplayText>[37]</DisplayText><record><rec-number>35</rec-number><foreign-keys><key app="EN" db-id="ravxf09ps92p0besvs75dswz50fap0paxx2x" timestamp="1564033142">35</key></foreign-keys><ref-type name="Government Document">46</ref-type><contributors><authors><author>Department of Health and Human Services,</author><author>National Institutes of Health,</author><author>National Library of Medicine (NLM),</author></authors></contributors><titles><title>Congressional Justification FY 2018 Budget</title></titles><dates><year>2018</year></dates><urls><related-urls><url>;[37] is estimated at $394 million USD annually with $34 million of the budget dedicated to operations and maintenance of GenBank. The annual operating budget of EBI is estimated at around 50 million USD annually ADDIN EN.CITE <EndNote><Cite><Author>European Molecular Biology Laboratory</Author><Year>2018</Year><RecNum>76</RecNum><DisplayText>[38]</DisplayText><record><rec-number>76</rec-number><foreign-keys><key app="EN" db-id="ravxf09ps92p0besvs75dswz50fap0paxx2x" timestamp="1565094463">76</key></foreign-keys><ref-type name="Report">27</ref-type><contributors><authors><author>European Molecular Biology Laboratory,</author></authors><tertiary-authors><author>EMBL</author></tertiary-authors></contributors><titles><title>Annual Report</title></titles><dates><year>2018</year></dates><urls><related-urls><url>;[38] although exact numbers were not available from the total EBI budget. The DDBJ budget appears to be somewhat smaller with around 10 million USD per year ADDIN EN.CITE <EndNote><Cite><Author>DNA Databank of Japan</Author><RecNum>77</RecNum><DisplayText>[39]</DisplayText><record><rec-number>77</rec-number><foreign-keys><key app="EN" db-id="ravxf09ps92p0besvs75dswz50fap0paxx2x" timestamp="1565094761">77</key></foreign-keys><ref-type name="Web Page">12</ref-type><contributors><authors><author>DNA Databank of Japan,</author></authors></contributors><titles><title>DDBJ Annual Reports</title></titles><volume>2019</volume><number>Aug 06</number><dates></dates><urls><related-urls><url>;[39]. Thus, a conservative estimate would be that at least 50 million USD is spent annually on INSDC. Despite these significant investments, none of the INSDC databases have ever charged user fees. The availability of the NSD to the broader public has been unconditionally free.3.4 What NSD is publicly available in the INSDC?The study mandate calls for an analysis of the “biological scope and size” of public sequence databases. Given the above public database inventory analysis, we will focus on the INSDC and GenBank in particular. (As shown in Section 3.3, the NSD content between all three INSDC databases is identical.) There are additionally protein (not nucleotide) sequence databases that extend beyond the scope of this study (see above notes on study scope). It is worth noting that many protein sequence databases (e.g., UniProt) are hosted by INSDC members (e.g., EMBL-EBI) and very closely follow the same technical infrastructure (e.g., use of ANs) and often connect back to the NSD databases because protein sequences can be derived from nucleotide sequences (although this connection is complex and the focus of an entire scientific field of study). Therefore, the information listed below is based on NSD but is likely broadly representative of protein sequence databases since they are often closely linked. Nevertheless, this study has done no analysis of protein sequence databases and the information below is limited to NSD databases only.Biological scopeOn April 17, 2019, GenBank consisted of 212,775,414 sequence entries, made up of 321,680,566,570 bases ADDIN EN.CITE <EndNote><Cite><Author>National Center for Biotechnology Information</Author><RecNum>46</RecNum><DisplayText>[5]</DisplayText><record><rec-number>46</rec-number><foreign-keys><key app="EN" db-id="ravxf09ps92p0besvs75dswz50fap0paxx2x" timestamp="1564119421">46</key></foreign-keys><ref-type name="Web Page">12</ref-type><contributors><authors><author>National Center for Biotechnology Information,</author></authors></contributors><titles><title>GenBank and WGS Statistics</title></titles><volume>2019</volume><number>Jul 26</number><dates></dates><urls><related-urls><url>;[5]. GenBank notes, “From 1982 to the present, the number of bases in GenBank has doubled approximately every 18 months.” Indeed, between our analysis in May and the writing of this study in early July, another 8,154,715,800 bases or 608,344 sequences were added to the database. On average, there are about 3,700 new submissions per week. In particular, whole genome sequences are growing hyper-exponentially as sequencing costs fall and the throughput continuously grows and the WGS (whole genome sequence) database is roughly 5x the size of INSDC. These larger databases including WGS and SRA are in the realm of “big data” science and create technical problems for large-scale, comprehensive analyses. For example, before we could attempt to do a preliminary analysis of the WGS it took several weeks to download the entire dataset correctly and entirely. Although genomes create more and bigger NSD entries, they use the same data structure and traceability options as all other “normal” NSD entries.Figure 3. What is the biological scope of the NSD available in GenBank? The pie charts show the distribution by taxonomy of entries (left) and bases (center) within GenBank. The difference between those two charts results from the fact that entries can constitute very different lengths in bases. Model organisms are not a single taxonomic group, but were subtracted from the taxa they come from. The category Other/Synthetic refers to entries of artificial NSD. The category unidentified contains NSD from environmental samples, whose taxon was not identified (primarily microorganisms as judged by sample names).We next set out to determine the biological scope of the NSD in GenBank. Human genetic resources account for 12% of GenBank entries and 6% of the bases (Figure 3) are out of scope and will be subsequently excluded. Furthermore, the vast majority of lab organisms and/or “model organisms” are generally considered to be out of scope because they represent very old inbred lines or lab strains that have been used around the world for decades or even centuries, long before the date of entry into force of the CBD in 1992. Unfortunately, there is no clear definition of what a model organism is. There is no legal definition of a model organism since it is a term-of-art employed by the biological community. Instead, we used the NCBI Taxonomy Browser list of 20 (excluding human) commonly used species ADDIN EN.CITE <EndNote><Cite><Author>National Center for Biotechnology Information</Author><RecNum>47</RecNum><DisplayText>[40]</DisplayText><record><rec-number>47</rec-number><foreign-keys><key app="EN" db-id="ravxf09ps92p0besvs75dswz50fap0paxx2x" timestamp="1564119495">47</key></foreign-keys><ref-type name="Web Page">12</ref-type><contributors><authors><author>National Center for Biotechnology Information,</author></authors></contributors><titles><title>Taxonomy Browser</title></titles><volume>2019</volume><number>Jul 26</number><dates></dates><urls><related-urls><url>;[40] as a proxy for “model organism” and contains, for example, wheat, cow, a common lab bacteria, Escherichia coli, etc. Beyond this “common model organism” list there are many more model organisms used by the community and Wikipedia lists over 100 model organisms ADDIN EN.CITE <EndNote><Cite><Author>Wikipedia</Author><RecNum>144</RecNum><DisplayText>[41]</DisplayText><record><rec-number>144</rec-number><foreign-keys><key app="EN" db-id="ravxf09ps92p0besvs75dswz50fap0paxx2x" timestamp="1569474628">144</key></foreign-keys><ref-type name="Web Page">12</ref-type><contributors><authors><author>Wikipedia,</author></authors></contributors><titles><title>List of model organisms</title></titles><volume>2019</volume><number>Sep 26</number><dates></dates><urls><related-urls><url>;[41] (of which the 20 listed by NCBI are included). Because all model organisms at some point in time came from the natural environment, it is always possible that some NSD in the databases came from environmental and not lab-based sources. In other words, our use of the NCBI “commonly used” list likely underestimates model organisms (because it only assesses 20 out of 100 model organisms). However, some of the NSD assessed as model organism in the pie chart above could have been sourced from the environment rather than the lab and our methods would not detect this and thus it could have overestimated in some cases. Taken together we hope these over-/under-estimates roughly cancel each other out and conclude that “model” organism NSD represents around 12% of GenBank entries and 9% of the bases.With that caveat, around 76% of the NSD in GenBank is conceivably relevant context of the CBD (Article 15 on access to genetic resources) and its specialized instruments, although this percentage does not account for geopolitical differences such as the vast amount of NSD that was sampled from the USA (an Observer to the CBD, estimated at 23% of the INSDC, see Figure 8b) or countries that have granted free access to genetic resources. Another variable that we could not easily assess with the dataset is temporal scope: whether all NSD entries would fall under a potential regulation or just those entries that were added after a certain date. Here the database metadata structure evolves as the data is updated and does not reflect the legal temporal scope but, assumedly, there could be large portions of the sequence databases that could fall out of temporal scope.The size of the individual entries in GenBank varies over ten orders of magnitude from 1 base (474 entries have a single nucleotide) to more than 2,030,161,756 (109) bases (Figure 4). While the vast majority (85%) of the entries have an average size of 1,000 bases (i.e. roughly the size to an average bacterial gene), 95% of the total bases in GenBank come from the 15% of the entries with largest size (Figure 4). Most of them are either whole bacterial genomes or eukaryotic chromosomes. The top 18 largest entries come from a model amphibian, the axolotl, and wheat chromosomes. Generally speaking, the richness of the biological information in these 15% of entries (genomic information) is, of course, much higher than in the entries of individual genes that are taken out of biological context.Conclusions on publicly available NSD in the INSDC and NAR database issueThere are large parts of NSD which may be out of scope of any future CBD decision on DSI, such as NSD from humans and model organisms, from the USA or NSD from lab environments. However, differentiation on each of these levels is technically complex.The most meaningful quantitative parameters for NSD are entries and bases (length). The first is the primary unit of interaction both in the digital and the scientific sphere, whereas the latter reflects the total information content.Figure 4. How long are the sequences in GenBank? What amount of the total bases does this represent? All entries within GenBank were ordered in ten categories according to their sequence length in bases. The left side shows the number of entries in each category, whilst the right side shows the total number of bases of all entries in that category. The majority of sequence entries have a length between 100 and 999 bases, but the majority of total bases come from the fewer entries with higher sequence lengths.3.5 INSDC UsersThere are over 5.8 million users of GenBank alone and they are located in every country in the world (Figure 5a). For ENA, data from EBI was previously partially published in their annual scientific report ADDIN EN.CITE <EndNote><Cite><Author>European Bioinformatics Institute</Author><Year>2017</Year><RecNum>105</RecNum><DisplayText>[42, 43]</DisplayText><record><rec-number>105</rec-number><foreign-keys><key app="EN" db-id="ravxf09ps92p0besvs75dswz50fap0paxx2x" timestamp="1568697302">105</key></foreign-keys><ref-type name="Report">27</ref-type><contributors><authors><author>European Bioinformatics Institute,</author></authors></contributors><titles><title>Scientific Report 2017</title></titles><dates><year>2017</year></dates><urls><related-urls><url>. Johanna Kleine (EMBL-EBI)</Author><Year>2019</Year><RecNum>106</RecNum><record><rec-number>106</rec-number><foreign-keys><key app="EN" db-id="ravxf09ps92p0besvs75dswz50fap0paxx2x" timestamp="1568697805">106</key></foreign-keys><ref-type name="Personal Communication">26</ref-type><contributors><authors><author>Dr. Johanna Kleine (EMBL-EBI),</author></authors></contributors><titles></titles><dates><year>2019</year></dates><urls></urls></record></Cite></EndNote>[42, 43] and data from Japan was cited in the submission of the Government of Japan to the inter-sessional 2017-18 period ADDIN EN.CITE <EndNote><Cite><Author>Government of Japan</Author><Year>2017</Year><RecNum>48</RecNum><DisplayText>[44]</DisplayText><record><rec-number>48</rec-number><foreign-keys><key app="EN" db-id="ravxf09ps92p0besvs75dswz50fap0paxx2x" timestamp="1564119868">48</key></foreign-keys><ref-type name="Report">27</ref-type><contributors><authors><author>Government of Japan,</author></authors></contributors><titles><title>Current state of the use of digital sequence information on genetic resources in the biodiversity field</title></titles><dates><year>2017</year></dates><urls><related-urls><url>;[44] and indicate a similar trend. Unlike virtually all of the other data presented in this study, data on the users of GenBank is not publicly available and was requested from and provided by GenBank. User data shows individual users (Figure 5a) and total requests (Figure 5b), which are computer requests for data (usually submitted by automated computer scripts such as, for example, data requests from a database programmed as a regular routine) from GenBank in 2018.The data (Figure 5a-5b) indicates that although the major NSD usage happens in the USA and China, which strongly correlates with their status as major contributors of NSD (see Section 4.2), every country around the world -- both developed and developing countries -- has users that use the INSDC and the NSD that it makes publicly available. As the number of database users in a country likely correlates with the total amount of people of that country, we normalized the user data by dividing the number of users of each country by total population, (Figure 5c). This normalized data shows a more homogenous use across the globe than Figure 5a-5b, demonstrating the overall strong usage in the USA or China is, in part, due to their population size. The normalized data show that developed countries still tend to have a higher amount of users per inhabitant than developing countries have, for example, if Western Europe is compared with Central Africa.Figure 5a. Where are the users of DSI? This world map shows the total number of users of GenBank in 2018, in a logarithmic color scale (one color grade darker indicates an increase in user number by a factor of 10). The left chart lists the ten countries with the highest user numbers and shows this as a percentage of total user numbers.Each INSDC database has unique users, so these GenBank data could be extrapolated to 8-12 million users of the INSDC databases worldwide and growing (although GenBank probably has the highest number of users due to historical reasons). Furthermore, these data only represent usage of GenBank not all of NCBI and its associated tools and platforms or EMBL-EBI and DDBJ and their other databases and tools (see Figure 2). Those numbers are estimated at 100 times more users globally ADDIN EN.CITE <EndNote><Cite><Author>Guy Cochrane (Head of ENA)</Author><Year>2019</Year><RecNum>129</RecNum><DisplayText>[45]</DisplayText><record><rec-number>129</rec-number><foreign-keys><key app="EN" db-id="ravxf09ps92p0besvs75dswz50fap0paxx2x" timestamp="1568784355">129</key></foreign-keys><ref-type name="Personal Communication">26</ref-type><contributors><authors><author>Guy Cochrane (Head of ENA),</author></authors></contributors><titles></titles><dates><year>2019</year></dates><urls></urls></record></Cite></EndNote>[45] for each of the three INSDC databases suggesting perhaps more than 500 million users worldwide.Finally, these data represent website use of GenBank. The complete INSDC NSD dataset can also be accessed via ftp (file transfer protocol) download, a service offered free of charge by the INSDC. This means that instead of visiting individual web pages, a user or an automated program can access the ftp site and download all or parts of GenBank as individual files directly onto their computer or server. This is a very common way for both public and private NSD databases (see Section 3) to access NSD. The geographical distribution of usage via ftp access is somewhat similar (Figure 5d; Top 5: Germany, USA, China, Switzerland, Japan), however only users from 140 countries used ftp download in 2018. The users of the remaining countries only used the GenBank website. This reduced number of countries may be due to the lack of data support and technical infrastructure necessary to use a locally downloaded copy of GenBank. Figure 5b. Where do requests to GenBank come from? This world map shows the total number of requests (proxy for volume of use) to GenBank in 2018, in a logarithmic color scale (one color grade darker indicates an increase in user number by a factor of 10). The left chart lists the ten countries with the highest numbers of requests and shows this as a percentage of total user numbers.Figure 5c. Users normalized by population. This world map shows the number of users from Figure 5a divided by each country’s population number (in logarithmic scale, please note the negative sign). The left chart lists the ten countries with the highest amount of users per population.Figure 5d: Where do ftp requests come from? This map shows the number of requests for FTP downloads from GenBank in 2018, sorted by countries and lists the ten countries where the highest amount of FTP requests come from.Figure 5e: What is the volume of data requested via ftp? This map shows the amount of FTP downloads from GenBank in 2018, sorted by countries and lists the ten countries with the highest volumes of ftp downloads. Please note that the scale of the world map is in byte, while inside the top 10 list the values were translated to gigabyte (GB) to make them more readable.To understand website vs. ftp usage, it can be useful to compare total data accessed via the two different methods: website usage of GenBank amounts to 1.3 Terabytes per month whereas ftp usage amounts to 53 Terabytes per month. In other words, ftp user downloads represent perhaps 50 times more usage (in terms of data transfer) than the website user data presented above. Although, these numbers are much higher, in part, because automated downloads by computers are much more frequent than user interactions via the website. In other words, scientific institutes often run an automated program and download all of GenBank every week or every month (the same data over and over) which somewhat over-inflates the data usage statistics. Limitations of the user data setFor this study, a further breakdown of the user data, e.g. into academic and commercial sectors, would be very useful. However, as described above, all three INSDC members provide open access to these databases (no login required and thus no account information). This means that only the access location at the level of country is collected. Further information like nationality, gender, affiliations is not tracked. Furthermore, a finer geographical resolution beyond the country level could not be provided by GenBank, due to privacy concerns and data protection laws. Due to technical constraints, more detailed information on what type of NSD that was accessed, or any further breakdown of usage patterns is not possible. For these same technical reasons, the user data presented here covers the entire GenBank database and, as such, cannot exclude access of human NSD. It is worth noting that the extensive user data presented here is the first publication of this type of user data that we are aware of and represents open collaboration on the part of the INSDC.Unfortunately, it is not possible to know what happens with ftp- or web-page-downloaded NSD after it is removed from an INSDC member or any other database. This is a point at which traceability of NSD can break down if downstream users (locally using NSD on their private computers or servers) do not maintain the AN system of traceability (See Section 4.1). It is therefore not possible to get information on subsequent usage and sharing of data. Even for this study we downloaded all of GenBank for our bioinformatics analyses, so our NSD data usage, for example, is not represented in the figures 5da-b. Even though the total number of webpage accessions is over a hundredfold higher than total ftp download requests, the subsequent accession/utilization of that downloaded data is most probably far higher (as it also would contain all NSD in the INSDC).A more systemic bias results from the frequency of synchronization with INSDC. For example, Database A might synchronize itself with the INSDC every two weeks, whilst database B does this only every six months. This way, database A produces 12x times the amount of requests and downloaded bytes. This only represents a higher actualization rate and does not necessarily represent a higher degree of utilization. Furthermore, if the data is downloaded by another public database, that data can in turn be downloaded by third parties from that respective public database. Similarly, if data is downloaded by an international company, that data can be accessed/used by all departments of that company around the world if they share the same shared IT infrastructure.Another challenge is that there are also “mirror” ftp download sites that are used, for example, by large universities with hundreds or thousands of users to prevent overburdening the downloading bandwidth of a university where many labs use the same dataset. These mirror sites are also not reflected in this user dataset and thus these usage statistics are certainly an underestimation of total usage. The two figures on FTP downloads show large differences between countries. This seems primarily to be due to differences in the availability and investment of/into IT infrastructure, as well as the general amount of biotech companies and research institutions. Institutions and companies that run public/private databases require large servers and maintenance costs. As these databases regularly synchronize, their download requests and amounts should outweigh all other, e.g. downloads for singular projects, by far. The country where the download happens is determined by the location of the servers, which does not necessarily need to be the country where a company/organisation is headquartered. The emergence of cloud genomics (section 5.3) may largely increase this trend in the future.These user data are all based on IP addresses. It is possible for a technically-sophisticated user to camouflage his IP address and many tools on the internet exist to do this. We are not aware of why an INSDC user would be interested in doing this, but it is a technical possibility that, if employed, would also bias or alter the dataset reported here.Conclusions on users of NSDUsers of the INSDC can be found in every country in the world (Figure 5a).USA has the highest number of users (23%), while China has the highest number of requests (automated computer contacts, 23%). Germany has the highest number of ftp downloads (1.1 million).Once normalized by population, users are distributed more homogenously (Figure 5c as compared to Figures 5a-5b), although some differences between developed and developing countries remain.FTP downloads show the largest differences between the developed and developing world, likely due to differences in the IT infrastructure required for download and maintenance3.6 Private databasesThe study mandate was also to address “to the extent possible” private databases. As there is no definition of private databases, for the purposes of this study, we consider private databases to be privately held databases that contain, either completely or partially, stored NSD that is not publicly accessible. We conducted case studies (Section 8) with private companies which are used as the basis for the analysis below. However, it is interesting to note that other entities, such as governments, especially regulatory agencies, also maintain and run private NSD databases in order to maintain a regulatory or security advantage in critical areas such as in food safety and food sourcing, wildlife trade, and pathogen detection and quarantine.In general, private databases can be subdivided into two categories. The first category is restricted databases, which we call “in-house databases” below, which privately store their generated or acquired NSD for internal use only. The second category we will call “commercial databases”, in which the access to the stored NSD and SI is possible by a member of the public, but coupled to financial compensations like fees.In-house databasesIn-house databases are used by companies to store and process NSD and SI related to their business. The internal NSD or SI is either generated by the company itself and/or obtained from external sources like the public databases. For example, a company involved in plant breeding, might sequence all of their newly generated plants (in-house generated NSD) to see what genetic traits the plants have obtained relative to the parental plant line (e.g. molecular markers, see also 8.4, Case study 4). Additionally, the plant and soil material may be sequenced to check for pathogenic contamination of viruses or bacteria. All this newly generated NSD can also be used for R&D and is best understood when compared to the comprehensive INSDC dataset. To continue with the example, if that same company also develops plant protection products, the sequencing of the plant and the soil will show how good their potential applications work. In Section 3.2 of the Laird and Wynberg study, a broad overview of the different industrial sectors using DSI is given.A very common method used by in-house databases is to download the entire NSD dataset available at the INSDC at a regular interval, e.g. weekly or monthly, into its private database(s). This has certain advantages for the company:Public and in-house generated NSD can be combined and analysed together;Running analysis is often limited by the speed of computational processing. Analysis of downloaded data is easier and faster than of online data (see also Section 5.4 on cloud genomics);Internal analyses are protected by the company’s own IT security system. Any online analysis can be the target of hacking or industrial espionage.It is important to note that the downloaded version of GenBank likely continues to be based on traceability via the AN (See Section 4). The private databases largely adopt the data structure and traceability system provided by the INSDC because they continue to periodically download the dataset and need to reference sequences internally using the INSDC-originated AN (see Figure 6). Thus, the lack of metadata (e.g. country of origin) in public database NSD will be transferred into the private databases (see section case studies below).Commercial databasesCommercial databases use, process, and analyse NSD in order to create more curated (value-added) SI, as well as developing bioinformatic tools to use, process and analyse the NSD+SI. Thus, they are basically like public databases in a scientific sense, with the key difference that access is not open, but bound to financial compensations like fees. In comparison to in-house databases, curation of the NSD is not primarily done for internal research projects, but to offer the curated NSD+SI as the final product. In addition to providing curated NSD+SI, commercial databases often offer bioinformatic analyses and other services to customers. Another model for commercial databases is to offer a private version of the “workbench” databases used in the public sphere (see Figure 2) where private NSD is integrated into an online privately accessible platform/workbench and the newly generated SI can be fed back into the company’s internal databases. Cloud genomics (Section 5.3) would fall under this category.To find examples of commercial databases beyond patent commercial databases and cloud genomics (where the NSD is generated by the purchaser of the platform), we searched for databases in commercially important topics like biofuel/biodiesel and natural products and found them to be public and open access. The NSD is submitted to the INSDC and INSDC identifiers are also used in their databases PEVuZE5vdGU+PENpdGU+PEF1dGhvcj5Mb21iYXJkPC9BdXRob3I+PFllYXI+MjAxNDwvWWVhcj48

UmVjTnVtPjEzMDwvUmVjTnVtPjxEaXNwbGF5VGV4dD5bNDYtNDldPC9EaXNwbGF5VGV4dD48cmVj

b3JkPjxyZWMtbnVtYmVyPjEzMDwvcmVjLW51bWJlcj48Zm9yZWlnbi1rZXlzPjxrZXkgYXBwPSJF

TiIgZGItaWQ9InJhdnhmMDlwczkycDBiZXN2czc1ZHN3ejUwZmFwMHBheHgyeCIgdGltZXN0YW1w

PSIxNTY4Nzg0NTA2Ij4xMzA8L2tleT48L2ZvcmVpZ24ta2V5cz48cmVmLXR5cGUgbmFtZT0iV2Vi

IFBhZ2UiPjEyPC9yZWYtdHlwZT48Y29udHJpYnV0b3JzPjxhdXRob3JzPjxhdXRob3I+TG9tYmFy

ZCwgVi48L2F1dGhvcj48YXV0aG9yPkdvbGFjb25kYSBSYW11bHUsIEguPC9hdXRob3I+PGF1dGhv

cj5EcnVsYSwgRS48L2F1dGhvcj48YXV0aG9yPkNvdXRpbmhvLCBQLiBNLjwvYXV0aG9yPjxhdXRo

b3I+SGVucmlzc2F0LCBCLjwvYXV0aG9yPjwvYXV0aG9ycz48L2NvbnRyaWJ1dG9ycz48dGl0bGVz

Pjx0aXRsZT5UaGUgY2FyYm9oeWRyYXRlLWFjdGl2ZSBlbnp5bWVzIGRhdGFiYXNlIChDQVp5KSBp

biAyMDEzPC90aXRsZT48L3RpdGxlcz48dm9sdW1lPjIwMTk8L3ZvbHVtZT48bnVtYmVyPlNlcCAx

ODwvbnVtYmVyPjxkYXRlcz48eWVhcj4yMDE0PC95ZWFyPjwvZGF0ZXM+PHB1Yi1sb2NhdGlvbj5O

dWNsZWljIEFjaWRzIFJlc2VhcmNoPC9wdWItbG9jYXRpb24+PHVybHM+PHJlbGF0ZWQtdXJscz48

dXJsPmh0dHA6Ly93d3cuY2F6eS5vcmcvPC91cmw+PC9yZWxhdGVkLXVybHM+PC91cmxzPjxlbGVj

dHJvbmljLXJlc291cmNlLW51bT4xMC4xMDkzL25hci9na3QxMTc4PC9lbGVjdHJvbmljLXJlc291

cmNlLW51bT48L3JlY29yZD48L0NpdGU+PENpdGU+PEF1dGhvcj5LdXJvdGFuaTwvQXV0aG9yPjxZ

ZWFyPjIwMTc8L1llYXI+PFJlY051bT4xMzE8L1JlY051bT48cmVjb3JkPjxyZWMtbnVtYmVyPjEz

MTwvcmVjLW51bWJlcj48Zm9yZWlnbi1rZXlzPjxrZXkgYXBwPSJFTiIgZGItaWQ9InJhdnhmMDlw

czkycDBiZXN2czc1ZHN3ejUwZmFwMHBheHgyeCIgdGltZXN0YW1wPSIxNTY4Nzg0NjQ4Ij4xMzE8

L2tleT48L2ZvcmVpZ24ta2V5cz48cmVmLXR5cGUgbmFtZT0iV2ViIFBhZ2UiPjEyPC9yZWYtdHlw

ZT48Y29udHJpYnV0b3JzPjxhdXRob3JzPjxhdXRob3I+S3Vyb3RhbmksIEEuPC9hdXRob3I+PGF1

dGhvcj5ZYW1hZGEsIFkuPC9hdXRob3I+PGF1dGhvcj5TYWt1cmFpLCBULjwvYXV0aG9yPjwvYXV0

aG9ycz48L2NvbnRyaWJ1dG9ycz48dGl0bGVzPjx0aXRsZT5BbGdhLVByQVMgKEFsZ2FsIFByb3Rl

aW4gQW5ub3RhdGlvbiBTdWl0ZSk6IEEgRGF0YWJhc2Ugb2YgQ29tcHJlaGVuc2l2ZSBBbm5vdGF0

aW9uIGluIEFsZ2FsIFByb3Rlb21lczwvdGl0bGU+PC90aXRsZXM+PHZvbHVtZT4yMDE5PC92b2x1

bWU+PG51bWJlcj5TZXAgMTg8L251bWJlcj48ZGF0ZXM+PHllYXI+MjAxNzwveWVhcj48L2RhdGVz

PjxwdWItbG9jYXRpb24+UGxhbnQgQ2VsbCBQaHlzaW9sPC9wdWItbG9jYXRpb24+PHVybHM+PHJl

bGF0ZWQtdXJscz48dXJsPmh0dHA6Ly9hbGdhLXByYXMucmlrZW4uanAvPC91cmw+PC9yZWxhdGVk

LXVybHM+PC91cmxzPjxlbGVjdHJvbmljLXJlc291cmNlLW51bT4xMC4xMDkzL3BjcC9wY3cyMTI8

L2VsZWN0cm9uaWMtcmVzb3VyY2UtbnVtPjwvcmVjb3JkPjwvQ2l0ZT48Q2l0ZT48QXV0aG9yPkRv

bmc8L0F1dGhvcj48WWVhcj4yMDA0PC9ZZWFyPjxSZWNOdW0+MTM2PC9SZWNOdW0+PHJlY29yZD48

cmVjLW51bWJlcj4xMzY8L3JlYy1udW1iZXI+PGZvcmVpZ24ta2V5cz48a2V5IGFwcD0iRU4iIGRi

LWlkPSJyYXZ4ZjA5cHM5MnAwYmVzdnM3NWRzd3o1MGZhcDBwYXh4MngiIHRpbWVzdGFtcD0iMTU2

ODc4NTIyOCI+MTM2PC9rZXk+PC9mb3JlaWduLWtleXM+PHJlZi10eXBlIG5hbWU9IldlYiBQYWdl

Ij4xMjwvcmVmLXR5cGU+PGNvbnRyaWJ1dG9ycz48YXV0aG9ycz48YXV0aG9yPkRvbmcsIFEuPC9h

dXRob3I+PGF1dGhvcj5TY2hsdWV0ZXIsIFMuIEQuPC9hdXRob3I+PGF1dGhvcj5CcmVuZGVsLCBW

LjwvYXV0aG9yPjwvYXV0aG9ycz48L2NvbnRyaWJ1dG9ycz48dGl0bGVzPjx0aXRsZT5QbGFudEdE

QiwgcGxhbnQgZ2Vub21lIGRhdGFiYXNlIGFuZCBhbmFseXNpcyB0b29sczwvdGl0bGU+PC90aXRs

ZXM+PHZvbHVtZT4yMDE5PC92b2x1bWU+PG51bWJlcj5TZXAgMTg8L251bWJlcj48ZGF0ZXM+PHll

YXI+MjAwNDwveWVhcj48L2RhdGVzPjxwdWItbG9jYXRpb24+TnVjbGVpYyBBY2lkcyBSZXNlYXJj

aDwvcHViLWxvY2F0aW9uPjx1cmxzPjxyZWxhdGVkLXVybHM+PHVybD5odHRwOi8vd3d3LnBsYW50

Z2RiLm9yZy88L3VybD48L3JlbGF0ZWQtdXJscz48L3VybHM+PGVsZWN0cm9uaWMtcmVzb3VyY2Ut

bnVtPjEwLjEwOTMvbmFyL2draDA0NjwvZWxlY3Ryb25pYy1yZXNvdXJjZS1udW0+PC9yZWNvcmQ+

PC9DaXRlPjxDaXRlPjxBdXRob3I+VmFuZGVwb2VsZTwvQXV0aG9yPjxZZWFyPjIwMTM8L1llYXI+

PFJlY051bT4xMzg8L1JlY051bT48cmVjb3JkPjxyZWMtbnVtYmVyPjEzODwvcmVjLW51bWJlcj48

Zm9yZWlnbi1rZXlzPjxrZXkgYXBwPSJFTiIgZGItaWQ9InJhdnhmMDlwczkycDBiZXN2czc1ZHN3

ejUwZmFwMHBheHgyeCIgdGltZXN0YW1wPSIxNTY4Nzg1MzM1Ij4xMzg8L2tleT48L2ZvcmVpZ24t

a2V5cz48cmVmLXR5cGUgbmFtZT0iV2ViIFBhZ2UiPjEyPC9yZWYtdHlwZT48Y29udHJpYnV0b3Jz

PjxhdXRob3JzPjxhdXRob3I+VmFuZGVwb2VsZSwgSy48L2F1dGhvcj48YXV0aG9yPlZhbiBCZWws

IE0uPC9hdXRob3I+PGF1dGhvcj5SaWNoYXJkLCBHLjwvYXV0aG9yPjxhdXRob3I+VmFuIExhbmRl

Z2hlbSwgUy48L2F1dGhvcj48YXV0aG9yPlZlcmhlbHN0LCBCLjwvYXV0aG9yPjxhdXRob3I+TW9y

ZWF1LCBILjwvYXV0aG9yPjxhdXRob3I+VmFuIGRlIFBlZXIsIFkuPC9hdXRob3I+PGF1dGhvcj5H

cmltc2xleSwgTi48L2F1dGhvcj48YXV0aG9yPlBpZ2FuZWF1LCBHLjwvYXV0aG9yPjwvYXV0aG9y

cz48L2NvbnRyaWJ1dG9ycz48dGl0bGVzPjx0aXRsZT5waWNvLVBMQVpBLCBhIGdlbm9tZSBkYXRh

YmFzZSBvZiBtaWNyb2JpYWwgcGhvdG9zeW50aGV0aWMgZXVrYXJ5b3RlczwvdGl0bGU+PC90aXRs

ZXM+PHZvbHVtZT4yMDE5PC92b2x1bWU+PG51bWJlcj5TZXAgMTg8L251bWJlcj48ZGF0ZXM+PHll

YXI+MjAxMzwveWVhcj48L2RhdGVzPjxwdWItbG9jYXRpb24+RW52aXJvbiBNaWNyb2Jpb2xvZ3k8

L3B1Yi1sb2NhdGlvbj48dXJscz48cmVsYXRlZC11cmxzPjx1cmw+aHR0cHM6Ly9iaW9pbmZvcm1h

dGljcy5wc2IudWdlbnQuYmUvcGxhemEvdmVyc2lvbnMvcGljby1wbGF6YS88L3VybD48L3JlbGF0

ZWQtdXJscz48L3VybHM+PGVsZWN0cm9uaWMtcmVzb3VyY2UtbnVtPjEwLjExMTEvMTQ2Mi0yOTIw

LjEyMTc0PC9lbGVjdHJvbmljLXJlc291cmNlLW51bT48L3JlY29yZD48L0NpdGU+PC9FbmROb3Rl

Pn==

ADDIN EN.CITE PEVuZE5vdGU+PENpdGU+PEF1dGhvcj5Mb21iYXJkPC9BdXRob3I+PFllYXI+MjAxNDwvWWVhcj48

UmVjTnVtPjEzMDwvUmVjTnVtPjxEaXNwbGF5VGV4dD5bNDYtNDldPC9EaXNwbGF5VGV4dD48cmVj

b3JkPjxyZWMtbnVtYmVyPjEzMDwvcmVjLW51bWJlcj48Zm9yZWlnbi1rZXlzPjxrZXkgYXBwPSJF

TiIgZGItaWQ9InJhdnhmMDlwczkycDBiZXN2czc1ZHN3ejUwZmFwMHBheHgyeCIgdGltZXN0YW1w

PSIxNTY4Nzg0NTA2Ij4xMzA8L2tleT48L2ZvcmVpZ24ta2V5cz48cmVmLXR5cGUgbmFtZT0iV2Vi

IFBhZ2UiPjEyPC9yZWYtdHlwZT48Y29udHJpYnV0b3JzPjxhdXRob3JzPjxhdXRob3I+TG9tYmFy

ZCwgVi48L2F1dGhvcj48YXV0aG9yPkdvbGFjb25kYSBSYW11bHUsIEguPC9hdXRob3I+PGF1dGhv

cj5EcnVsYSwgRS48L2F1dGhvcj48YXV0aG9yPkNvdXRpbmhvLCBQLiBNLjwvYXV0aG9yPjxhdXRo

b3I+SGVucmlzc2F0LCBCLjwvYXV0aG9yPjwvYXV0aG9ycz48L2NvbnRyaWJ1dG9ycz48dGl0bGVz

Pjx0aXRsZT5UaGUgY2FyYm9oeWRyYXRlLWFjdGl2ZSBlbnp5bWVzIGRhdGFiYXNlIChDQVp5KSBp

biAyMDEzPC90aXRsZT48L3RpdGxlcz48dm9sdW1lPjIwMTk8L3ZvbHVtZT48bnVtYmVyPlNlcCAx

ODwvbnVtYmVyPjxkYXRlcz48eWVhcj4yMDE0PC95ZWFyPjwvZGF0ZXM+PHB1Yi1sb2NhdGlvbj5O

dWNsZWljIEFjaWRzIFJlc2VhcmNoPC9wdWItbG9jYXRpb24+PHVybHM+PHJlbGF0ZWQtdXJscz48

dXJsPmh0dHA6Ly93d3cuY2F6eS5vcmcvPC91cmw+PC9yZWxhdGVkLXVybHM+PC91cmxzPjxlbGVj

dHJvbmljLXJlc291cmNlLW51bT4xMC4xMDkzL25hci9na3QxMTc4PC9lbGVjdHJvbmljLXJlc291

cmNlLW51bT48L3JlY29yZD48L0NpdGU+PENpdGU+PEF1dGhvcj5LdXJvdGFuaTwvQXV0aG9yPjxZ

ZWFyPjIwMTc8L1llYXI+PFJlY051bT4xMzE8L1JlY051bT48cmVjb3JkPjxyZWMtbnVtYmVyPjEz

MTwvcmVjLW51bWJlcj48Zm9yZWlnbi1rZXlzPjxrZXkgYXBwPSJFTiIgZGItaWQ9InJhdnhmMDlw

czkycDBiZXN2czc1ZHN3ejUwZmFwMHBheHgyeCIgdGltZXN0YW1wPSIxNTY4Nzg0NjQ4Ij4xMzE8

L2tleT48L2ZvcmVpZ24ta2V5cz48cmVmLXR5cGUgbmFtZT0iV2ViIFBhZ2UiPjEyPC9yZWYtdHlw

ZT48Y29udHJpYnV0b3JzPjxhdXRob3JzPjxhdXRob3I+S3Vyb3RhbmksIEEuPC9hdXRob3I+PGF1

dGhvcj5ZYW1hZGEsIFkuPC9hdXRob3I+PGF1dGhvcj5TYWt1cmFpLCBULjwvYXV0aG9yPjwvYXV0

aG9ycz48L2NvbnRyaWJ1dG9ycz48dGl0bGVzPjx0aXRsZT5BbGdhLVByQVMgKEFsZ2FsIFByb3Rl

aW4gQW5ub3RhdGlvbiBTdWl0ZSk6IEEgRGF0YWJhc2Ugb2YgQ29tcHJlaGVuc2l2ZSBBbm5vdGF0

aW9uIGluIEFsZ2FsIFByb3Rlb21lczwvdGl0bGU+PC90aXRsZXM+PHZvbHVtZT4yMDE5PC92b2x1

bWU+PG51bWJlcj5TZXAgMTg8L251bWJlcj48ZGF0ZXM+PHllYXI+MjAxNzwveWVhcj48L2RhdGVz

PjxwdWItbG9jYXRpb24+UGxhbnQgQ2VsbCBQaHlzaW9sPC9wdWItbG9jYXRpb24+PHVybHM+PHJl

bGF0ZWQtdXJscz48dXJsPmh0dHA6Ly9hbGdhLXByYXMucmlrZW4uanAvPC91cmw+PC9yZWxhdGVk

LXVybHM+PC91cmxzPjxlbGVjdHJvbmljLXJlc291cmNlLW51bT4xMC4xMDkzL3BjcC9wY3cyMTI8

L2VsZWN0cm9uaWMtcmVzb3VyY2UtbnVtPjwvcmVjb3JkPjwvQ2l0ZT48Q2l0ZT48QXV0aG9yPkRv

bmc8L0F1dGhvcj48WWVhcj4yMDA0PC9ZZWFyPjxSZWNOdW0+MTM2PC9SZWNOdW0+PHJlY29yZD48

cmVjLW51bWJlcj4xMzY8L3JlYy1udW1iZXI+PGZvcmVpZ24ta2V5cz48a2V5IGFwcD0iRU4iIGRi

LWlkPSJyYXZ4ZjA5cHM5MnAwYmVzdnM3NWRzd3o1MGZhcDBwYXh4MngiIHRpbWVzdGFtcD0iMTU2

ODc4NTIyOCI+MTM2PC9rZXk+PC9mb3JlaWduLWtleXM+PHJlZi10eXBlIG5hbWU9IldlYiBQYWdl

Ij4xMjwvcmVmLXR5cGU+PGNvbnRyaWJ1dG9ycz48YXV0aG9ycz48YXV0aG9yPkRvbmcsIFEuPC9h

dXRob3I+PGF1dGhvcj5TY2hsdWV0ZXIsIFMuIEQuPC9hdXRob3I+PGF1dGhvcj5CcmVuZGVsLCBW

LjwvYXV0aG9yPjwvYXV0aG9ycz48L2NvbnRyaWJ1dG9ycz48dGl0bGVzPjx0aXRsZT5QbGFudEdE

QiwgcGxhbnQgZ2Vub21lIGRhdGFiYXNlIGFuZCBhbmFseXNpcyB0b29sczwvdGl0bGU+PC90aXRs

ZXM+PHZvbHVtZT4yMDE5PC92b2x1bWU+PG51bWJlcj5TZXAgMTg8L251bWJlcj48ZGF0ZXM+PHll

YXI+MjAwNDwveWVhcj48L2RhdGVzPjxwdWItbG9jYXRpb24+TnVjbGVpYyBBY2lkcyBSZXNlYXJj

aDwvcHViLWxvY2F0aW9uPjx1cmxzPjxyZWxhdGVkLXVybHM+PHVybD5odHRwOi8vd3d3LnBsYW50

Z2RiLm9yZy88L3VybD48L3JlbGF0ZWQtdXJscz48L3VybHM+PGVsZWN0cm9uaWMtcmVzb3VyY2Ut

bnVtPjEwLjEwOTMvbmFyL2draDA0NjwvZWxlY3Ryb25pYy1yZXNvdXJjZS1udW0+PC9yZWNvcmQ+

PC9DaXRlPjxDaXRlPjxBdXRob3I+VmFuZGVwb2VsZTwvQXV0aG9yPjxZZWFyPjIwMTM8L1llYXI+

PFJlY051bT4xMzg8L1JlY051bT48cmVjb3JkPjxyZWMtbnVtYmVyPjEzODwvcmVjLW51bWJlcj48

Zm9yZWlnbi1rZXlzPjxrZXkgYXBwPSJFTiIgZGItaWQ9InJhdnhmMDlwczkycDBiZXN2czc1ZHN3

ejUwZmFwMHBheHgyeCIgdGltZXN0YW1wPSIxNTY4Nzg1MzM1Ij4xMzg8L2tleT48L2ZvcmVpZ24t

a2V5cz48cmVmLXR5cGUgbmFtZT0iV2ViIFBhZ2UiPjEyPC9yZWYtdHlwZT48Y29udHJpYnV0b3Jz

PjxhdXRob3JzPjxhdXRob3I+VmFuZGVwb2VsZSwgSy48L2F1dGhvcj48YXV0aG9yPlZhbiBCZWws

IE0uPC9hdXRob3I+PGF1dGhvcj5SaWNoYXJkLCBHLjwvYXV0aG9yPjxhdXRob3I+VmFuIExhbmRl

Z2hlbSwgUy48L2F1dGhvcj48YXV0aG9yPlZlcmhlbHN0LCBCLjwvYXV0aG9yPjxhdXRob3I+TW9y

ZWF1LCBILjwvYXV0aG9yPjxhdXRob3I+VmFuIGRlIFBlZXIsIFkuPC9hdXRob3I+PGF1dGhvcj5H

cmltc2xleSwgTi48L2F1dGhvcj48YXV0aG9yPlBpZ2FuZWF1LCBHLjwvYXV0aG9yPjwvYXV0aG9y

cz48L2NvbnRyaWJ1dG9ycz48dGl0bGVzPjx0aXRsZT5waWNvLVBMQVpBLCBhIGdlbm9tZSBkYXRh

YmFzZSBvZiBtaWNyb2JpYWwgcGhvdG9zeW50aGV0aWMgZXVrYXJ5b3RlczwvdGl0bGU+PC90aXRs

ZXM+PHZvbHVtZT4yMDE5PC92b2x1bWU+PG51bWJlcj5TZXAgMTg8L251bWJlcj48ZGF0ZXM+PHll

YXI+MjAxMzwveWVhcj48L2RhdGVzPjxwdWItbG9jYXRpb24+RW52aXJvbiBNaWNyb2Jpb2xvZ3k8

L3B1Yi1sb2NhdGlvbj48dXJscz48cmVsYXRlZC11cmxzPjx1cmw+aHR0cHM6Ly9iaW9pbmZvcm1h

dGljcy5wc2IudWdlbnQuYmUvcGxhemEvdmVyc2lvbnMvcGljby1wbGF6YS88L3VybD48L3JlbGF0

ZWQtdXJscz48L3VybHM+PGVsZWN0cm9uaWMtcmVzb3VyY2UtbnVtPjEwLjExMTEvMTQ2Mi0yOTIw

LjEyMTc0PC9lbGVjdHJvbmljLXJlc291cmNlLW51bT48L3JlY29yZD48L0NpdGU+PC9FbmROb3Rl

Pn==

ADDIN EN.CITE.DATA [46-49]. In addition, many papers related to those databases are published as open access (mainly in Nucleic Acids Research), which usually requires that the described database is publicly available. We couldn't find any evidence on restricting access to NSD using results of google search, google scholar, and PubMed. The one exception to these findings is that commercial patent NSD databases are good examples of commercial databases and are very frequently used by companies. These databases are commercial databases that attempt to (manually) collect all NSD mentioned in patents around the world (see Section 4.3) that is publicly available and curate this information. Companies use these databases to check if there are patents in place already, connected to the NSD they might want to utilize for their own R&D activities. For example, GQ life sciences states that is has over 400 million sequences from patents, suggesting it has roughly twice the amount of total NSD entries available in GenBank and the ten-fold amount of the 4.5 million patent sequences available at GenBank ADDIN EN.CITE <EndNote><Cite><Author>GQ Life Sciences</Author><RecNum>52</RecNum><DisplayText>[50]</DisplayText><record><rec-number>52</rec-number><foreign-keys><key app="EN" db-id="ravxf09ps92p0besvs75dswz50fap0paxx2x" timestamp="1564375872">52</key></foreign-keys><ref-type name="Web Page">12</ref-type><contributors><authors><author>GQ Life Sciences,</author></authors></contributors><titles><title>Genome Quest Homepage</title></titles><volume>2019</volume><number>Jul 29</number><dates></dates><urls><related-urls><url>;[50]. The necessity of such databases derives from the fact that there are many patent offices around the world, which all have different standards of storing NSD mentioned in patents. This makes the collection of such information very labor-intensive. At the same time, such information is not of primary interest to public research, so there are few public databases trying to collect all this information. It is also important to keep in mind that the value of this patent sequence databases does not come from NSD itself (i.e. not from the nucleotides themselves) since a good deal of the NSD is already in public databases under open access, but the value is in the comprehensive curation and collection of this patent NSD along with the patent metadata, which is valuable for a company that could otherwise waste R&D efforts on an innovation that is already patented without these comprehensive commercial patent databases.During analysis for this study, other than commercial patent databases, no databases could be found that would fall under the category commercial NSD database, meaning that the database requires any form of payment in order to access the NSD+SI. Some public databases are voluntarily supported by companies, others are hosted by companies and others offer additional commercial services (e.g. selling of chemicals/enzymes/microbial strains or workbench/cloud genomics offers). There probably do exist some in highly specialized fields that we were unable to identify during the study period, however, in discussions with colleagues, it was often mentioned that NSD databases that have tried to commercialize have often over-valued their NSD and under-anticipated the costs of maintenance of the infrastructure and personnel. In other words, the business model does not seem to work out, which could be due to an economic mismatch between NSD and commercialization.Case studies on private in-house databasesThe Laird and Wynberg study provided a useful overview of how DSI is used in different industrial sectors. Per definition, information on private databases is generally not publicly available. Therefore, we conducted interviews with companies to draft case studies to exemplify the content and usage of in-house databases. The complete results can be found in the technical methods (Section 8.4). The case studies show that the DSI stored in the private sector is very diverse and that databases are often distributed internally according to the uses and types of data stored. There are in-house databases for NSD and others for SI, especially on proteins. In general, it seems that at least half of the biological data used comes from public databases. However, these are rather coarse guesses from the interviewees. As there is no exact definition of DSI, these numbers can vary a lot, depended on what is included into the definition (also interviewees used different terms/categories, so this section contains a mixture of the terms DSI and NSD).Finding the country of origin of NSD in private databases is usually possible for the database holders, except for some public and/or historical NSD, typically obtained through third parties. Here, the weaker the link between the original GR, NSD and SI is, the harder it is to potentially trace to the countries of origin and may be impossible (e.g. the public databases rely on the submitters to give the correct country of origin).Patent sequence databases (commercial databases) are commonly used. All companies, except for one, stated that they use patent sequence databases. However, this company provides services for public and private entities, so it receives material and NSD from third parties and requires them to have met all potential patent rights (and requirements from the Nagoya Protocol) beforehand. Other than commercial patent databases, no private companies mentioned use of any other commercial database nor could they come up with examples of commercial databases.Privately generated NSD+SI can also be fed back into the public sphere from the private sphere (see Figure 2), primarily in the form of publications or the registration of patents. There are large quantities of unpublished private NSD which do not become part of patents and would not necessarily need to be kept private. However, there are not many incentives for companies to publish these NSD. Finally, and perhaps obviously, unlike for the INSDC, the NSD in private databases cannot be analysed to quantify total use, users and biological scope.Table 1. Overview of private database case studies. Listed are the number of employees, their focus with regards to use of NSD+SI (Companies may have other foci which do not involve biology), the percentage of public NSD+SI within their in-house databases (rough estimations, as definitions for SI are unclear), whether they submit internally generated NSD+SI to public databases, whether they engage in public-private partnerships, and whether they use commercial patent NSD databases.Conclusions on private databasesCompanies use the public NSD available from the INSDC and integrate it into their in-house databases. Given the size of the biotech industry, there are likely thousands of private NSD databases of widely varying sizes and uses.Some private NSD is eventually published, especially within collaborations with public institutions.Backtracking to the original GR by the company itself works in general for NSD generated in-house, but not for all NSD obtained from the public databases.Patent NSD databases (commercial databases) are frequently used to check for already existing mercial databases (except patent NSD databases) on NSD seem to be uncommon, which perhaps suggest this is a challenging business model as NSD is freely available at the INSDC and many downstream NSD and SI databases.3.7 Restricting and controlling access to NSDAs illustrated in Figure 6, access to NSD databases and other platforms exists in a wide variety of forms. Restricted or controlled access means that there are formal requirements that need to be fulfilled in order to get access from the hosts of the databases but it does not necessarily imply financial costs or commercial interests. Per definition, all private databases fall under the category of restricted access, but a handful of public databases restrict access, although restrictions are variable. For example, a user might need to input their name and an email address and then automatically can use the features of a public database, which would be a very low level of restricted access. Restricting and controlling access is not traceability per se. The owners of a database can decide who gets access and to which parts of the database. However, that does not mean they track every single accession or every single user.A good example of restricted access to NSD is the treatment of human NSD, which is of major importance for health-related research (commercial and non-commercial). Inside the INSDC, the majority of patient NSD is stored in the Database for Genomes and Phenotypes (dbGaP) ADDIN EN.CITE <EndNote><Cite><Author>National Center for Biotechnology Information</Author><RecNum>58</RecNum><DisplayText>[51]</DisplayText><record><rec-number>58</rec-number><foreign-keys><key app="EN" db-id="ravxf09ps92p0besvs75dswz50fap0paxx2x" timestamp="1564377391">58</key></foreign-keys><ref-type name="Web Page">12</ref-type><contributors><authors><author>National Center for Biotechnology Information,</author></authors></contributors><titles><title>Database of Genotypes and Phenotypes</title></titles><volume>2019</volume><number>Jul 29</number><dates></dates><urls><related-urls><url>;[51], run by GenBank, the European Genome-phenome Archive (EGA) ADDIN EN.CITE <EndNote><Cite><Author>EMBL-EBI</Author><RecNum>59</RecNum><DisplayText>[52]</DisplayText><record><rec-number>59</rec-number><foreign-keys><key app="EN" db-id="ravxf09ps92p0besvs75dswz50fap0paxx2x" timestamp="1564377530">59</key></foreign-keys><ref-type name="Web Page">12</ref-type><contributors><authors><author>EMBL-EBI,</author></authors></contributors><titles><title>European Genome-phenome Archive</title></titles><volume>2019</volume><number>Jul 29</number><dates></dates><urls><related-urls><url>;[52], run by EBI, and JGA, run by DDBJ. dbGaP and EGA restrict access but do not track or trace the NSD usage or downloads once access is granted. The point of this infrastructure is to protect patient privacy but not to trace access and usage. There are also private companies offering similar services that enable patients to choose up front the possible terms and conditions from a set of commercial use options. Additionally, companies are exploring the possibility of a system which could enable the patient to grant permission if a company wants to use their NSD for the development of a drug or therapy (see Section 5.3).4. Traceability of NSD4.1 Overview of NSD flow through the scientific landscapeFor non-biologists, the flow of NSD from the laboratory bench to the INSDC to biological databases, publications, and possible utilization can at first seem very complex. Figure 6 provides a simplified overview of the data flow, technical infrastructure, existing traceability system surrounding NSD and its scientific use that has developed over the past four decades.Figure 6: How does NSD flow through the databases, users, and into research? The INSDC is the core infrastructure in the movement of NSD. Orange boxes indicate use of accession numbers (ANs), generated by the INSDC. Blue boxes indicate either external or pre-INSDC analysis. Green boxes represent actors/sectors through which data and information flows. Note that both public and private researchers are responsible for generating NSD (“facilitate” arrows). Commercial databases that download NSD from the INSDC use ANs but if additional NSD or SI is added, this would not be associated with an AN, thus the orange-blue color scheme. Double-headed arrows indicate bi-directional data flow and single-headed arrows indicate uni-directional data flow. DOIs (primarily PubMed IDs) are given to publications by the publisher and are connected with NSD entries in public databases.SequencingThe process begins at the far left of Figure 6 (GR, blue box), where any kind of biological material, including environmental samples (e.g., soil or water), or in CBD terms a GR, is used to extract DNA/RNA in the form of “raw reads” of nucleotide sequences. This DNA/RNA is then processed in a variety of different ways depending on the sequencing technology and the sequence of the extracted nucleotides is determined. These resulting “raw reads” are further processed (trimmed, quality controlled, assembled, annotated, etc.) and then analysed in a manner determined by the goal of the research. Depending on the size and governance of the project, the NSD is submitted either early in the project to an archive such as the Sequence Read Archive (SRA, also part of the INSDC), or mid-project such as in large genome projects, or at the end of the project, at the latest before publication, into an INSDC database. (We note that Study 1 covers this topic in greater detail.)Scientific analysis: public research & workbench databasesNSD is usually produced in order to answer a scientific question. To that end, NSD is edited, examined, and analysed during scientific research to test hypotheses. In this process, new insights may be made and eventually published in a peer-reviewed journal (green box, public research, Figure 6). The ways that NSD are analysed varies among different institutes, lab groups, or even individuals. The publication of the NSD in public databases enables not only scientific reproducibility but also secondary analysis which can lead to new and different discoveries from the original intent of the sequencing effort. Molecular biological research is a collaborative, international, and often interdisciplinary process and NSD are usually shared freely within groups of collaborators. The lag time between NSD production and deposition in INSDC databases, which is strictly required for scientific publications, can range between immediate and several years and some NSD is never published. Mechanisms by which scientists share pre-publication NSD are very diverse. They range from email attachments to shared spaces on the Internet or in the cloud, to ad hoc databases that may, or may not, have a web presence.The next level of sophistication in pre-publication NSD analysis are so-called “workbench” databases (lower left blue box, Figure 6) that operate upstream of INSDC that are shared among different research groups and collaborators with a common interest and range from fully open/free to semi-public or invite-only as they are set up and used by groups of scientists working on pre-publication analysis. Many of these activities are planned to be semi-permanent and their overarching purpose is to further the scientific process by sharing, analysing, and discussing prior to the conclusion of analysis and joint publication via submission to INSDC. Depending on the database, the NSD within these “workbench” databases can be mostly unique (not yet found in INSDC) or non-unique (i.e., found already almost entirely in INSDC). For example, the GOLD database has published 99% of its NSD to INSDC with only low quality or incomplete projects not submitted. Other workbench databases, such as BOLD have perhaps more unique (non-published) NSD, although they often allow users to deposit NSD directly in the INSDC. Some workbench databases, such as the World Collection of Microorganisms PEVuZE5vdGU+PENpdGU+PEF1dGhvcj5XdTwvQXV0aG9yPjxZZWFyPjIwMTg8L1llYXI+PFJlY051

bT43OTwvUmVjTnVtPjxEaXNwbGF5VGV4dD5bNTNdPC9EaXNwbGF5VGV4dD48cmVjb3JkPjxyZWMt

bnVtYmVyPjc5PC9yZWMtbnVtYmVyPjxmb3JlaWduLWtleXM+PGtleSBhcHA9IkVOIiBkYi1pZD0i

cmF2eGYwOXBzOTJwMGJlc3ZzNzVkc3d6NTBmYXAwcGF4eDJ4IiB0aW1lc3RhbXA9IjE1NjUwOTUw

OTAiPjc5PC9rZXk+PC9mb3JlaWduLWtleXM+PHJlZi10eXBlIG5hbWU9IkpvdXJuYWwgQXJ0aWNs

ZSI+MTc8L3JlZi10eXBlPjxjb250cmlidXRvcnM+PGF1dGhvcnM+PGF1dGhvcj5XdSwgTC48L2F1

dGhvcj48YXV0aG9yPk1jQ2x1c2tleSwgSy48L2F1dGhvcj48YXV0aG9yPkRlc21ldGgsIFAuPC9h

dXRob3I+PGF1dGhvcj5MaXUsIFMuPC9hdXRob3I+PGF1dGhvcj5IaWRlYWtpLCBTLjwvYXV0aG9y

PjxhdXRob3I+WWluLCBZLjwvYXV0aG9yPjxhdXRob3I+TW9yaXlhLCBPLjwvYXV0aG9yPjxhdXRo

b3I+SXRvaCwgVC48L2F1dGhvcj48YXV0aG9yPktpbSwgQy4gWS48L2F1dGhvcj48YXV0aG9yPkxl

ZSwgSi4gUy48L2F1dGhvcj48YXV0aG9yPlpob3UsIFkuPC9hdXRob3I+PGF1dGhvcj5LYXdhc2Fr

aSwgSC48L2F1dGhvcj48YXV0aG9yPkhhemJvbiwgTS4gSC48L2F1dGhvcj48YXV0aG9yPlJvYmVy

dCwgVi48L2F1dGhvcj48YXV0aG9yPkJvZWtob3V0LCBULjwvYXV0aG9yPjxhdXRob3I+TGltYSwg

Ti48L2F1dGhvcj48YXV0aG9yPkV2dHVzaGVua28sIEwuPC9hdXRob3I+PGF1dGhvcj5Cb3VuZHkt

TWlsbHMsIEsuPC9hdXRob3I+PGF1dGhvcj5CdW5rLCBCLjwvYXV0aG9yPjxhdXRob3I+TW9vcmUs

IEUuIFIuIEIuPC9hdXRob3I+PGF1dGhvcj5FdXJ3aWxhaWNoaXRyLCBMLjwvYXV0aG9yPjxhdXRo

b3I+SW5nc3Jpc3dhbmcsIFMuPC9hdXRob3I+PGF1dGhvcj5TaGFoLCBILjwvYXV0aG9yPjxhdXRo

b3I+WWFvLCBTLjwvYXV0aG9yPjxhdXRob3I+SmluLCBULjwvYXV0aG9yPjxhdXRob3I+SHVhbmcs

IEouPC9hdXRob3I+PGF1dGhvcj5TaGksIFcuPC9hdXRob3I+PGF1dGhvcj5TdW4sIFEuPC9hdXRo

b3I+PGF1dGhvcj5GYW4sIEcuPC9hdXRob3I+PGF1dGhvcj5MaSwgVy48L2F1dGhvcj48YXV0aG9y

PkxpLCBYLjwvYXV0aG9yPjxhdXRob3I+S3VydGJva2UsIEkuPC9hdXRob3I+PGF1dGhvcj5NYSwg

Si48L2F1dGhvcj48L2F1dGhvcnM+PC9jb250cmlidXRvcnM+PGF1dGgtYWRkcmVzcz5NaWNyb2Jp

YWwgUmVzb3VyY2UgYW5kIEJpZyBEYXRhIENlbnRlciwgSW5zdGl0dXRlIG9mIE1pY3JvYmlvbG9n

eSwgQ2hpbmVzZSBBY2FkZW15IG9mIFNjaWVuY2VzLCBCZWlqaW5nIDEwMDEwMSwgQ2hpbmEuJiN4

RDtTdGF0ZSBLZXkgTGFib3JhdG9yeSBvZiBNaWNyb2JpYWwgUmVzb3VyY2VzLCBJbnN0aXR1dGUg

b2YgTWljcm9iaW9sb2d5LCBDaGluZXNlIEFjYWRlbXkgb2YgU2NpZW5jZXMsIEJlaWppbmcgMTAw

MTAxLCBDaGluYS4mI3hEO1dGQ0MtTUlSQ0VOIFdvcmxkIERhdGEgQ2VudGVyIGZvciBNaWNyb29y

Z2FuaXNtcywgQmVpamluZyAxMDAxMDEsIENoaW5hLiYjeEQ7V29ybGQgRmVkZXJhdGlvbiBvZiBD

dWx0dXJlIENvbGxlY3Rpb25zIChXRkNDKS4mI3hEO0Z1bmdhbCBHZW5ldGljcyBTdG9jayBDZW50

ZXIsIEthbnNhcyBTdGF0ZSBVbml2ZXJzaXR5LCBNYW5oYXR0YW4sIEtTIDY2NTA2LCBVU0EuJiN4

RDtCZWxnaWFuIENvb3JkaW5hdGVkIENvbGxlY3Rpb25zIG9mIE1pY3JvLW9yZ2FuaXNtcyBQcm9n

cmFtLCBCZWxnaWFuIFNjaWVuY2UgUG9saWN5IE9mZmljZSwgQnJ1c3NlbHMgMjMxIDEwNTAsIEJl

bGdpdW0uJiN4RDtOYXRpb25hbCBJbnN0aXR1dGUgb2YgR2VuZXRpY3MsIFlhdGEsIE1pc2hpbWEg

NDExLTg1NDAsIEphcGFuLiYjeEQ7QkdJIEdlbm9taWNzLCBCR0ktU2hlbnpoZW4sIFNoZW56aGVu

IDUxODA4MywgQ2hpbmEuJiN4RDtKYXBhbiBDb2xsZWN0aW9uIG9mIE1pY3Jvb3JnYW5pc21zL01p

Y3JvYmUgRGl2aXNpb24sIFJJS0VOIEJpb1Jlc291cmNlIENlbnRlciwgS295YWRhaSAzLTEtMSwg

VHN1a3ViYSwgSWJhcmFraSAzMDUtMDA3NCwgSmFwYW4uJiN4RDtLb3JlYW4gQ29sbGVjdGlvbiBm

b3IgVHlwZSBDdWx0dXJlcywgS29yZWEgUmVzZWFyY2ggSW5zdGl0dXRlIG9mIEJpb3NjaWVuY2Ug

YW5kIEJpb3RlY2hub2xvZ3kgKEtSSUJCKSwgMTgxIElwc2luLWdpbCwgSmVvbmdldXAtc2ksIEpl

b2xsYWJ1ay1kbywgNTYyMTIsIFJlcHVibGljIG9mIEtvcmVhLiYjeEQ7Q2hpbmEgR2VuZXJhbCBN

aWNyb2Jpb2xvZ2ljYWwgQ3VsdHVyZSBDb2xsZWN0aW9uIENlbnRlciwgSW5zdGl0dXRlIG9mIE1p

Y3JvYmlvbG9neSwgQ2hpbmVzZSBBY2FkZW15IG9mIFNjaWVuY2VzLCBCZWlqaW5nMTAwMTAsIENo

aW5hLiYjeEQ7TklURSBCaW9sb2dpY2FsIFJlc291cmNlIENlbnRlciwgTmF0aW9uYWwgSW5zdGl0

dXRlIG9mIFRlY2hub2xvZ3kgYW5kIEV2YWx1YXRpb24sIDItNS04IEthenVzYWthbWF0YXJpLCBL

aXNhcmF6dSwgQ2hpYmEgMjkyLTA4MTgsIEphcGFuLiYjeEQ7QW1lcmljYW4gVHlwZSBDdWx0dXJl

IENvbGxlY3Rpb24sIDEwODAxIFVuaXZlcnNpdHkgQm91bGV2YXJkLCBNYW5hc3NhcywgVkEgMjAx

MTAsIFVTQS4mI3hEO1dlc3RlcmRpamsgRnVuZ2FsIEJpb2RpdmVyc2l0eSBJbnN0aXR1ZSwgVXRy

ZWNodCAzNTM0Q1QsIE5ldGhlcmxhbmRzLiYjeEQ7SW5zdGl0dXRlIG9mIEJpb2RpdmVyc2l0eSBh

bmQgRWNvc3lzdGVtIER5bmFtaWNzLCBVbml2ZXJzaXR5IG9mIEFtc3RlcmRhbSwgU3B1aSAyMSAx

MDEyIFdYIEFtc3RlcmRhbSwgTmV0aGVybGFuZHMuJiN4RDtTaGFuZ2hhaSBLZXkgTGFib3JhdG9y

eSBvZiBNb2xlY3VsYXIgTWVkaWNhbCBNeWNvbG9neSwgU2hhbmdoYWkgSW5zdGl0dXRlIG9mIE15

Y29sb2d5LCBTaGFuZ2hhaSBDaGFuZ3poZW5nIEhvc3BpdGFsLCBTaGFuZ2hhaSAyMDAwMDMsIENo

aW5hLiYjeEQ7TWljb3RlY2EgZGEgVW5pdmVyc2lkYWRlIGRvIE1pbmhvLCBCaW9sb2dpY2FsIEVu

Z2luZWVyaW5nIENlbnRyZSwgNDcxMC0wNTcgQnJhZ2EsIFBvcnR1Z2FsLiYjeEQ7QWxsLVJ1c3Np

YW4gQ29sbGVjdGlvbiBvZiBNaWNyb29yZ2FuaXNtcywgR0sgU2tyeWFiaW4gSW5zdGl0dXRlIG9m

IEJpb2NoZW1pc3RyeSBhbmQgUGh5c2lvbG9neSBvZiBNaWNyb29yZ2FuaXNtcyBSQVMsIFB1c2hj

aGlubywgTW9zY293IFJlZ2lvbiAxNDIyOTAsIFJ1c3NpYS4mI3hEO1BoYWZmIFllYXN0IEN1bHR1

cmUgQ29sbGVjdGlvbiwgRm9vZCBTY2llbmNlIGFuZCBUZWNobm9sb2d5IERlcGFydG1lbnQsIFVu

aXZlcnNpdHkgb2YgQ2FsaWZvcm5pYSBEYXZpcywgMSBTaGllbGRzIEF2ZW51ZSwgRGF2aXMsIENB

IDk1NjE2LTg1OTgsIFVTQS4mI3hEO0xlaWJuaXotSW5zdGl0dXRlIERTTVogLSBHZXJtYW4gQ29s

bGVjdGlvbiBvZiBNaWNyb29yZ2FuaXNtcyBhbmQgQ2VsbCBDdWx0dXJlcywgRC0zODEyNCBCcmF1

bnNjaHdlaWcsIEdlcm1hbnkuJiN4RDtDdWx0dXJlIENvbGxlY3Rpb24gVW5pdmVyc2l0eSBvZiBH

b3RoZW5idXJnIChDQ1VHKSwgU2FobGdyZW5za2EgQWNhZGVteSBvZiB0aGUgVW5pdmVyc2l0eSBv

ZiBHb3RoZW5idXJnLCBTRS00MTM0NiBHb3RoZW5idXJnLCBTd2VkZW4uJiN4RDtCaW9yZXNvdXJj

ZXMgVGVjaG5vbG9neSBVbml0LCBUaGFpbGFuZCBCaW9yZXNvdXJjZSBSZXNlYXJjaCBDZW50ZXIs

IE5hdGlvbmFsIENlbnRlciBmb3IgR2VuZXRpYyBFbmdpbmVlcmluZyBhbmQgQmlvdGVjaG5vbG9n

eSwgQmFuZ2tvayBOYXRpb25hbCBTY2llbmNlIGFuZCBUZWNobm9sb2d5IERldmVsb3BtZW50IEFn

ZW5jeSwgMTEzLCBUaGFpbGFuZC4mI3hEO05hdGlvbmFsIENvbGxlY3Rpb24gb2YgVHlwZSBDdWx0

dXJlcywgUHVibGljIEhlYWx0aCBFbmdsYW5kLCBQb3J0b24gRG93biwgU2FsaXNidXJ5LCBXaWx0

c2hpcmUgU1A0IDBKRywgVUsuJiN4RDtDaGluYSBDZW50ZXIgb2YgSW5kdXN0cmlhbCBDdWx0dXJl

IENvbGxlY3Rpb24sIEJlaWppbmcgMTAwMDE1LCBDaGluYS4mI3hEO0ZhY3VsdHkgb2YgU2NpZW5j

ZSwgSGVhbHRoLCBFZHVjYXRpb24gYW5kIEVuZ2luZWVyaW5nLCBVbml2ZXJzaXR5IG9mIHRoZSBT

dW5zaGluZSBDb2FzdCwgTWFyb29jaHlkb3JlLCBRdWVlbnNsYW5kIDQ1NTgsIEF1c3RyYWxpYS48

L2F1dGgtYWRkcmVzcz48dGl0bGVzPjx0aXRsZT5UaGUgZ2xvYmFsIGNhdGFsb2d1ZSBvZiBtaWNy

b29yZ2FuaXNtcyAxMEsgdHlwZSBzdHJhaW4gc2VxdWVuY2luZyBwcm9qZWN0OiBjbG9zaW5nIHRo

ZSBnZW5vbWljIGdhcHMgZm9yIHRoZSB2YWxpZGx5IHB1Ymxpc2hlZCBwcm9rYXJ5b3RpYyBhbmQg

ZnVuZ2kgc3BlY2llczwvdGl0bGU+PHNlY29uZGFyeS10aXRsZT5HaWdhc2NpZW5jZTwvc2Vjb25k

YXJ5LXRpdGxlPjxhbHQtdGl0bGU+R2lnYVNjaWVuY2U8L2FsdC10aXRsZT48L3RpdGxlcz48cGVy

aW9kaWNhbD48ZnVsbC10aXRsZT5HaWdhc2NpZW5jZTwvZnVsbC10aXRsZT48YWJici0xPkdpZ2FT

Y2llbmNlPC9hYmJyLTE+PC9wZXJpb2RpY2FsPjxhbHQtcGVyaW9kaWNhbD48ZnVsbC10aXRsZT5H

aWdhc2NpZW5jZTwvZnVsbC10aXRsZT48YWJici0xPkdpZ2FTY2llbmNlPC9hYmJyLTE+PC9hbHQt

cGVyaW9kaWNhbD48dm9sdW1lPjc8L3ZvbHVtZT48bnVtYmVyPjU8L251bWJlcj48a2V5d29yZHM+

PGtleXdvcmQ+QmFjdGVyaWEvKmdlbmV0aWNzPC9rZXl3b3JkPjxrZXl3b3JkPkZ1bmdpLypnZW5l

dGljczwva2V5d29yZD48a2V5d29yZD5HZW5vbWljcy8qbWV0aG9kczwva2V5d29yZD48a2V5d29y

ZD5Qcm9rYXJ5b3RpYyBDZWxscy8qbWV0YWJvbGlzbTwva2V5d29yZD48a2V5d29yZD5SZXByb2R1

Y2liaWxpdHkgb2YgUmVzdWx0czwva2V5d29yZD48a2V5d29yZD5TZXF1ZW5jZSBBbmFseXNpcywg

RE5BLyptZXRob2RzPC9rZXl3b3JkPjwva2V5d29yZHM+PGRhdGVzPjx5ZWFyPjIwMTg8L3llYXI+

PHB1Yi1kYXRlcz48ZGF0ZT5NYXkgMTwvZGF0ZT48L3B1Yi1kYXRlcz48L2RhdGVzPjxpc2JuPjIw

NDctMjE3WCAoRWxlY3Ryb25pYykmI3hEOzIwNDctMjE3WCAoTGlua2luZyk8L2lzYm4+PGFjY2Vz

c2lvbi1udW0+Mjk3MTgyMDI8L2FjY2Vzc2lvbi1udW0+PHVybHM+PHJlbGF0ZWQtdXJscz48dXJs

Pmh0dHA6Ly93d3cubmNiaS5ubG0ubmloLmdvdi9wdWJtZWQvMjk3MTgyMDI8L3VybD48dXJsPmh0

dHBzOi8vd3d3Lm5jYmkubmxtLm5paC5nb3YvcG1jL2FydGljbGVzL1BNQzU5NDExMzYvcGRmL2dp

eTAyNi5wZGY8L3VybD48L3JlbGF0ZWQtdXJscz48L3VybHM+PGN1c3RvbTI+NTk0MTEzNjwvY3Vz

dG9tMj48ZWxlY3Ryb25pYy1yZXNvdXJjZS1udW0+MTAuMTA5My9naWdhc2NpZW5jZS9naXkwMjY8

L2VsZWN0cm9uaWMtcmVzb3VyY2UtbnVtPjxsYW5ndWFnZT4zNTwvbGFuZ3VhZ2U+PC9yZWNvcmQ+

PC9DaXRlPjwvRW5kTm90ZT5=

ADDIN EN.CITE PEVuZE5vdGU+PENpdGU+PEF1dGhvcj5XdTwvQXV0aG9yPjxZZWFyPjIwMTg8L1llYXI+PFJlY051

bT43OTwvUmVjTnVtPjxEaXNwbGF5VGV4dD5bNTNdPC9EaXNwbGF5VGV4dD48cmVjb3JkPjxyZWMt

bnVtYmVyPjc5PC9yZWMtbnVtYmVyPjxmb3JlaWduLWtleXM+PGtleSBhcHA9IkVOIiBkYi1pZD0i

cmF2eGYwOXBzOTJwMGJlc3ZzNzVkc3d6NTBmYXAwcGF4eDJ4IiB0aW1lc3RhbXA9IjE1NjUwOTUw

OTAiPjc5PC9rZXk+PC9mb3JlaWduLWtleXM+PHJlZi10eXBlIG5hbWU9IkpvdXJuYWwgQXJ0aWNs

ZSI+MTc8L3JlZi10eXBlPjxjb250cmlidXRvcnM+PGF1dGhvcnM+PGF1dGhvcj5XdSwgTC48L2F1

dGhvcj48YXV0aG9yPk1jQ2x1c2tleSwgSy48L2F1dGhvcj48YXV0aG9yPkRlc21ldGgsIFAuPC9h

dXRob3I+PGF1dGhvcj5MaXUsIFMuPC9hdXRob3I+PGF1dGhvcj5IaWRlYWtpLCBTLjwvYXV0aG9y

PjxhdXRob3I+WWluLCBZLjwvYXV0aG9yPjxhdXRob3I+TW9yaXlhLCBPLjwvYXV0aG9yPjxhdXRo

b3I+SXRvaCwgVC48L2F1dGhvcj48YXV0aG9yPktpbSwgQy4gWS48L2F1dGhvcj48YXV0aG9yPkxl

ZSwgSi4gUy48L2F1dGhvcj48YXV0aG9yPlpob3UsIFkuPC9hdXRob3I+PGF1dGhvcj5LYXdhc2Fr

aSwgSC48L2F1dGhvcj48YXV0aG9yPkhhemJvbiwgTS4gSC48L2F1dGhvcj48YXV0aG9yPlJvYmVy

dCwgVi48L2F1dGhvcj48YXV0aG9yPkJvZWtob3V0LCBULjwvYXV0aG9yPjxhdXRob3I+TGltYSwg

Ti48L2F1dGhvcj48YXV0aG9yPkV2dHVzaGVua28sIEwuPC9hdXRob3I+PGF1dGhvcj5Cb3VuZHkt

TWlsbHMsIEsuPC9hdXRob3I+PGF1dGhvcj5CdW5rLCBCLjwvYXV0aG9yPjxhdXRob3I+TW9vcmUs

IEUuIFIuIEIuPC9hdXRob3I+PGF1dGhvcj5FdXJ3aWxhaWNoaXRyLCBMLjwvYXV0aG9yPjxhdXRo

b3I+SW5nc3Jpc3dhbmcsIFMuPC9hdXRob3I+PGF1dGhvcj5TaGFoLCBILjwvYXV0aG9yPjxhdXRo

b3I+WWFvLCBTLjwvYXV0aG9yPjxhdXRob3I+SmluLCBULjwvYXV0aG9yPjxhdXRob3I+SHVhbmcs

IEouPC9hdXRob3I+PGF1dGhvcj5TaGksIFcuPC9hdXRob3I+PGF1dGhvcj5TdW4sIFEuPC9hdXRo

b3I+PGF1dGhvcj5GYW4sIEcuPC9hdXRob3I+PGF1dGhvcj5MaSwgVy48L2F1dGhvcj48YXV0aG9y

PkxpLCBYLjwvYXV0aG9yPjxhdXRob3I+S3VydGJva2UsIEkuPC9hdXRob3I+PGF1dGhvcj5NYSwg

Si48L2F1dGhvcj48L2F1dGhvcnM+PC9jb250cmlidXRvcnM+PGF1dGgtYWRkcmVzcz5NaWNyb2Jp

YWwgUmVzb3VyY2UgYW5kIEJpZyBEYXRhIENlbnRlciwgSW5zdGl0dXRlIG9mIE1pY3JvYmlvbG9n

eSwgQ2hpbmVzZSBBY2FkZW15IG9mIFNjaWVuY2VzLCBCZWlqaW5nIDEwMDEwMSwgQ2hpbmEuJiN4

RDtTdGF0ZSBLZXkgTGFib3JhdG9yeSBvZiBNaWNyb2JpYWwgUmVzb3VyY2VzLCBJbnN0aXR1dGUg

b2YgTWljcm9iaW9sb2d5LCBDaGluZXNlIEFjYWRlbXkgb2YgU2NpZW5jZXMsIEJlaWppbmcgMTAw

MTAxLCBDaGluYS4mI3hEO1dGQ0MtTUlSQ0VOIFdvcmxkIERhdGEgQ2VudGVyIGZvciBNaWNyb29y

Z2FuaXNtcywgQmVpamluZyAxMDAxMDEsIENoaW5hLiYjeEQ7V29ybGQgRmVkZXJhdGlvbiBvZiBD

dWx0dXJlIENvbGxlY3Rpb25zIChXRkNDKS4mI3hEO0Z1bmdhbCBHZW5ldGljcyBTdG9jayBDZW50

ZXIsIEthbnNhcyBTdGF0ZSBVbml2ZXJzaXR5LCBNYW5oYXR0YW4sIEtTIDY2NTA2LCBVU0EuJiN4

RDtCZWxnaWFuIENvb3JkaW5hdGVkIENvbGxlY3Rpb25zIG9mIE1pY3JvLW9yZ2FuaXNtcyBQcm9n

cmFtLCBCZWxnaWFuIFNjaWVuY2UgUG9saWN5IE9mZmljZSwgQnJ1c3NlbHMgMjMxIDEwNTAsIEJl

bGdpdW0uJiN4RDtOYXRpb25hbCBJbnN0aXR1dGUgb2YgR2VuZXRpY3MsIFlhdGEsIE1pc2hpbWEg

NDExLTg1NDAsIEphcGFuLiYjeEQ7QkdJIEdlbm9taWNzLCBCR0ktU2hlbnpoZW4sIFNoZW56aGVu

IDUxODA4MywgQ2hpbmEuJiN4RDtKYXBhbiBDb2xsZWN0aW9uIG9mIE1pY3Jvb3JnYW5pc21zL01p

Y3JvYmUgRGl2aXNpb24sIFJJS0VOIEJpb1Jlc291cmNlIENlbnRlciwgS295YWRhaSAzLTEtMSwg

VHN1a3ViYSwgSWJhcmFraSAzMDUtMDA3NCwgSmFwYW4uJiN4RDtLb3JlYW4gQ29sbGVjdGlvbiBm

b3IgVHlwZSBDdWx0dXJlcywgS29yZWEgUmVzZWFyY2ggSW5zdGl0dXRlIG9mIEJpb3NjaWVuY2Ug

YW5kIEJpb3RlY2hub2xvZ3kgKEtSSUJCKSwgMTgxIElwc2luLWdpbCwgSmVvbmdldXAtc2ksIEpl

b2xsYWJ1ay1kbywgNTYyMTIsIFJlcHVibGljIG9mIEtvcmVhLiYjeEQ7Q2hpbmEgR2VuZXJhbCBN

aWNyb2Jpb2xvZ2ljYWwgQ3VsdHVyZSBDb2xsZWN0aW9uIENlbnRlciwgSW5zdGl0dXRlIG9mIE1p

Y3JvYmlvbG9neSwgQ2hpbmVzZSBBY2FkZW15IG9mIFNjaWVuY2VzLCBCZWlqaW5nMTAwMTAsIENo

aW5hLiYjeEQ7TklURSBCaW9sb2dpY2FsIFJlc291cmNlIENlbnRlciwgTmF0aW9uYWwgSW5zdGl0

dXRlIG9mIFRlY2hub2xvZ3kgYW5kIEV2YWx1YXRpb24sIDItNS04IEthenVzYWthbWF0YXJpLCBL

aXNhcmF6dSwgQ2hpYmEgMjkyLTA4MTgsIEphcGFuLiYjeEQ7QW1lcmljYW4gVHlwZSBDdWx0dXJl

IENvbGxlY3Rpb24sIDEwODAxIFVuaXZlcnNpdHkgQm91bGV2YXJkLCBNYW5hc3NhcywgVkEgMjAx

MTAsIFVTQS4mI3hEO1dlc3RlcmRpamsgRnVuZ2FsIEJpb2RpdmVyc2l0eSBJbnN0aXR1ZSwgVXRy

ZWNodCAzNTM0Q1QsIE5ldGhlcmxhbmRzLiYjeEQ7SW5zdGl0dXRlIG9mIEJpb2RpdmVyc2l0eSBh

bmQgRWNvc3lzdGVtIER5bmFtaWNzLCBVbml2ZXJzaXR5IG9mIEFtc3RlcmRhbSwgU3B1aSAyMSAx

MDEyIFdYIEFtc3RlcmRhbSwgTmV0aGVybGFuZHMuJiN4RDtTaGFuZ2hhaSBLZXkgTGFib3JhdG9y

eSBvZiBNb2xlY3VsYXIgTWVkaWNhbCBNeWNvbG9neSwgU2hhbmdoYWkgSW5zdGl0dXRlIG9mIE15

Y29sb2d5LCBTaGFuZ2hhaSBDaGFuZ3poZW5nIEhvc3BpdGFsLCBTaGFuZ2hhaSAyMDAwMDMsIENo

aW5hLiYjeEQ7TWljb3RlY2EgZGEgVW5pdmVyc2lkYWRlIGRvIE1pbmhvLCBCaW9sb2dpY2FsIEVu

Z2luZWVyaW5nIENlbnRyZSwgNDcxMC0wNTcgQnJhZ2EsIFBvcnR1Z2FsLiYjeEQ7QWxsLVJ1c3Np

YW4gQ29sbGVjdGlvbiBvZiBNaWNyb29yZ2FuaXNtcywgR0sgU2tyeWFiaW4gSW5zdGl0dXRlIG9m

IEJpb2NoZW1pc3RyeSBhbmQgUGh5c2lvbG9neSBvZiBNaWNyb29yZ2FuaXNtcyBSQVMsIFB1c2hj

aGlubywgTW9zY293IFJlZ2lvbiAxNDIyOTAsIFJ1c3NpYS4mI3hEO1BoYWZmIFllYXN0IEN1bHR1

cmUgQ29sbGVjdGlvbiwgRm9vZCBTY2llbmNlIGFuZCBUZWNobm9sb2d5IERlcGFydG1lbnQsIFVu

aXZlcnNpdHkgb2YgQ2FsaWZvcm5pYSBEYXZpcywgMSBTaGllbGRzIEF2ZW51ZSwgRGF2aXMsIENB

IDk1NjE2LTg1OTgsIFVTQS4mI3hEO0xlaWJuaXotSW5zdGl0dXRlIERTTVogLSBHZXJtYW4gQ29s

bGVjdGlvbiBvZiBNaWNyb29yZ2FuaXNtcyBhbmQgQ2VsbCBDdWx0dXJlcywgRC0zODEyNCBCcmF1

bnNjaHdlaWcsIEdlcm1hbnkuJiN4RDtDdWx0dXJlIENvbGxlY3Rpb24gVW5pdmVyc2l0eSBvZiBH

b3RoZW5idXJnIChDQ1VHKSwgU2FobGdyZW5za2EgQWNhZGVteSBvZiB0aGUgVW5pdmVyc2l0eSBv

ZiBHb3RoZW5idXJnLCBTRS00MTM0NiBHb3RoZW5idXJnLCBTd2VkZW4uJiN4RDtCaW9yZXNvdXJj

ZXMgVGVjaG5vbG9neSBVbml0LCBUaGFpbGFuZCBCaW9yZXNvdXJjZSBSZXNlYXJjaCBDZW50ZXIs

IE5hdGlvbmFsIENlbnRlciBmb3IgR2VuZXRpYyBFbmdpbmVlcmluZyBhbmQgQmlvdGVjaG5vbG9n

eSwgQmFuZ2tvayBOYXRpb25hbCBTY2llbmNlIGFuZCBUZWNobm9sb2d5IERldmVsb3BtZW50IEFn

ZW5jeSwgMTEzLCBUaGFpbGFuZC4mI3hEO05hdGlvbmFsIENvbGxlY3Rpb24gb2YgVHlwZSBDdWx0

dXJlcywgUHVibGljIEhlYWx0aCBFbmdsYW5kLCBQb3J0b24gRG93biwgU2FsaXNidXJ5LCBXaWx0

c2hpcmUgU1A0IDBKRywgVUsuJiN4RDtDaGluYSBDZW50ZXIgb2YgSW5kdXN0cmlhbCBDdWx0dXJl

IENvbGxlY3Rpb24sIEJlaWppbmcgMTAwMDE1LCBDaGluYS4mI3hEO0ZhY3VsdHkgb2YgU2NpZW5j

ZSwgSGVhbHRoLCBFZHVjYXRpb24gYW5kIEVuZ2luZWVyaW5nLCBVbml2ZXJzaXR5IG9mIHRoZSBT

dW5zaGluZSBDb2FzdCwgTWFyb29jaHlkb3JlLCBRdWVlbnNsYW5kIDQ1NTgsIEF1c3RyYWxpYS48

L2F1dGgtYWRkcmVzcz48dGl0bGVzPjx0aXRsZT5UaGUgZ2xvYmFsIGNhdGFsb2d1ZSBvZiBtaWNy

b29yZ2FuaXNtcyAxMEsgdHlwZSBzdHJhaW4gc2VxdWVuY2luZyBwcm9qZWN0OiBjbG9zaW5nIHRo

ZSBnZW5vbWljIGdhcHMgZm9yIHRoZSB2YWxpZGx5IHB1Ymxpc2hlZCBwcm9rYXJ5b3RpYyBhbmQg

ZnVuZ2kgc3BlY2llczwvdGl0bGU+PHNlY29uZGFyeS10aXRsZT5HaWdhc2NpZW5jZTwvc2Vjb25k

YXJ5LXRpdGxlPjxhbHQtdGl0bGU+R2lnYVNjaWVuY2U8L2FsdC10aXRsZT48L3RpdGxlcz48cGVy

aW9kaWNhbD48ZnVsbC10aXRsZT5HaWdhc2NpZW5jZTwvZnVsbC10aXRsZT48YWJici0xPkdpZ2FT

Y2llbmNlPC9hYmJyLTE+PC9wZXJpb2RpY2FsPjxhbHQtcGVyaW9kaWNhbD48ZnVsbC10aXRsZT5H

aWdhc2NpZW5jZTwvZnVsbC10aXRsZT48YWJici0xPkdpZ2FTY2llbmNlPC9hYmJyLTE+PC9hbHQt

cGVyaW9kaWNhbD48dm9sdW1lPjc8L3ZvbHVtZT48bnVtYmVyPjU8L251bWJlcj48a2V5d29yZHM+

PGtleXdvcmQ+QmFjdGVyaWEvKmdlbmV0aWNzPC9rZXl3b3JkPjxrZXl3b3JkPkZ1bmdpLypnZW5l

dGljczwva2V5d29yZD48a2V5d29yZD5HZW5vbWljcy8qbWV0aG9kczwva2V5d29yZD48a2V5d29y

ZD5Qcm9rYXJ5b3RpYyBDZWxscy8qbWV0YWJvbGlzbTwva2V5d29yZD48a2V5d29yZD5SZXByb2R1

Y2liaWxpdHkgb2YgUmVzdWx0czwva2V5d29yZD48a2V5d29yZD5TZXF1ZW5jZSBBbmFseXNpcywg

RE5BLyptZXRob2RzPC9rZXl3b3JkPjwva2V5d29yZHM+PGRhdGVzPjx5ZWFyPjIwMTg8L3llYXI+

PHB1Yi1kYXRlcz48ZGF0ZT5NYXkgMTwvZGF0ZT48L3B1Yi1kYXRlcz48L2RhdGVzPjxpc2JuPjIw

NDctMjE3WCAoRWxlY3Ryb25pYykmI3hEOzIwNDctMjE3WCAoTGlua2luZyk8L2lzYm4+PGFjY2Vz

c2lvbi1udW0+Mjk3MTgyMDI8L2FjY2Vzc2lvbi1udW0+PHVybHM+PHJlbGF0ZWQtdXJscz48dXJs

Pmh0dHA6Ly93d3cubmNiaS5ubG0ubmloLmdvdi9wdWJtZWQvMjk3MTgyMDI8L3VybD48dXJsPmh0

dHBzOi8vd3d3Lm5jYmkubmxtLm5paC5nb3YvcG1jL2FydGljbGVzL1BNQzU5NDExMzYvcGRmL2dp

eTAyNi5wZGY8L3VybD48L3JlbGF0ZWQtdXJscz48L3VybHM+PGN1c3RvbTI+NTk0MTEzNjwvY3Vz

dG9tMj48ZWxlY3Ryb25pYy1yZXNvdXJjZS1udW0+MTAuMTA5My9naWdhc2NpZW5jZS9naXkwMjY8

L2VsZWN0cm9uaWMtcmVzb3VyY2UtbnVtPjxsYW5ndWFnZT4zNTwvbGFuZ3VhZ2U+PC9yZWNvcmQ+

PC9DaXRlPjwvRW5kTm90ZT5=

ADDIN EN.CITE.DATA [53] have NSD publication rules, e.g., NSD will be released to INSDC within two years or by publication whichever occurs first. Importantly, these “workbench” databases widely offer direct INSDC submissions.Accession Numbers (ANs)As mentioned briefly above (Section 3.1), the submission of NSD into INSDC generates an Accession Number (AN). The AN serves two purposes: 1) it enables the chain of traceability and 2) demonstrates to the scientific journal editors that free (to the users), unrestricted access (often known as “open access”) has been granted for the NSD. This open access availability to NSD is a standard pre-requisite for publication by scientific journals as well as, in many cases, a reporting requirement by the funding agency that funded the initial research. Indeed, for the overwhelming majority of journals that the authors of this study and surveyed colleagues are familiar with, the submission of NSD into an INSDC database is required in the journal’s data policy ADDIN EN.CITE <EndNote><Cite><Author>Springer Nature</Author><RecNum>39</RecNum><DisplayText>[54-56]</DisplayText><record><rec-number>39</rec-number><foreign-keys><key app="EN" db-id="ravxf09ps92p0besvs75dswz50fap0paxx2x" timestamp="1564116792">39</key></foreign-keys><ref-type name="Web Page">12</ref-type><contributors><authors><author>Springer Nature,</author></authors></contributors><titles><title>Research data policies</title></titles><volume>2019</volume><number>Jul 26</number><dates></dates><urls><related-urls><url> app="EN" db-id="ravxf09ps92p0besvs75dswz50fap0paxx2x" timestamp="1564117027">40</key></foreign-keys><ref-type name="Web Page">12</ref-type><contributors><authors><author>Elsevier,</author></authors></contributors><titles><title>Sharing research data</title></titles><volume>2019</volume><number>Jul 26</number><dates></dates><urls><related-urls><url> app="EN" db-id="ravxf09ps92p0besvs75dswz50fap0paxx2x" timestamp="1564117800">41</key></foreign-keys><ref-type name="Web Page">12</ref-type><contributors><authors><author>Elsevier,</author></authors></contributors><titles><title>Database Linking</title></titles><volume>2019</volume><number>Jul 26</number><dates></dates><urls><related-urls><url>;[54-56]. In other words, without an AN, a scientist will not be able to publish their NSD-based scientific results. Of course, errors and oversights happen ADDIN EN.CITE <EndNote><Cite><Author>Noor</Author><Year>2006</Year><RecNum>42</RecNum><DisplayText>[57]</DisplayText><record><rec-number>42</rec-number><foreign-keys><key app="EN" db-id="ravxf09ps92p0besvs75dswz50fap0paxx2x" timestamp="1564118243">42</key></foreign-keys><ref-type name="Journal Article">17</ref-type><contributors><authors><author>Mohamed A. F Noor</author><author>Katherine J Zimmerman</author><author>Katherine C Teeter</author></authors></contributors><titles><title>Data Sharing: How Much Doesn&apos;t Get Submitted to GenBank?</title><secondary-title>PLoS Biol</secondary-title></titles><periodical><full-title>PLoS Biol</full-title></periodical><pages>e228</pages><volume>4</volume><number>7</number><edition>Jul 11</edition><dates><year>2006</year></dates><urls><related-urls><url>;[57] but there is a strong pressure from journals, funders, peers, and society to release NSD and other scientific data to the scientific community.Accession Numbers for NSD typically start with one to six capital letters followed by five to nine digits and are easily recognized in the community when listed in publications (see also Section 3). Updates or versions of a sequence are marked by “identifier.2” (first version equals “identifier.1”). One known disadvantage of ANs in publications is that they are not resolvable or clickable for machines or humans; the AN needs to be copied and pasted into the INSDC manually to retrieve the NSD entry. This is an inconvenience and inefficiency compared with digital object identifiers (DOIs) and HTTP unique resource identifiers (URIs) used for articles, scientific publications, and GR (see sections below “Traceability after INSDC submission to publications”).” However, the INSDC has created automated routines to detect published ANs via text and data mining to set those NSD records to “public” and create a link to the PubMed ID if applicable ADDIN EN.CITE <EndNote><Cite><Author>Dr. Ilene Mizrachi (GenBank)</Author><Year>2019</Year><RecNum>36</RecNum><DisplayText>[21]</DisplayText><record><rec-number>36</rec-number><foreign-keys><key app="EN" db-id="ravxf09ps92p0besvs75dswz50fap0paxx2x" timestamp="1564033407">36</key></foreign-keys><ref-type name="Personal Communication">26</ref-type><contributors><authors><author>Dr. Ilene Mizrachi (GenBank),</author></authors></contributors><titles></titles><dates><year>2019</year><pub-dates><date>Jul 1</date></pub-dates></dates><urls></urls><language>31, 72</language></record></Cite></EndNote>[21].ANs for metadataIn certain cases, an additional AN is also generated for the metadata (e.g., author, institute, sequencing method, etc.) associated with the NSD. At the beginning of the INSDC NSD submission process, the submitter is guided through a series of questions in order to determine what information will be required for the submission. For example, for sequences that come directly out of a natural environment (non-model organism, non-human, non-synthetic), scientists must submit the metadata through the BioSample portal on GenBank (and equivalent portals on EBI or DDBJ). This yields an AN for the metadata in addition to an AN for the NSD. This enables scientists to fill out one “form” for an entire project and apply this metadata to hundreds or even thousands of sequences.Within the BioSample metadata structure (as well as other formats) there are additional linkages that can be made to the NSD. For example, links to the original biological objects from museums, culture collections, germplasm collections, biological material of all kinds, cell lines and strains can be noted in the metadata and are manually checked by GenBank staff for conformity. This enables, where appropriate, direct linkages between GR and the ensuing NSD. Furthermore, the BioSample ADDIN EN.CITE <EndNote><Cite><Author>National Center for Biotechnology Information</Author><RecNum>43</RecNum><DisplayText>[58]</DisplayText><record><rec-number>43</rec-number><foreign-keys><key app="EN" db-id="ravxf09ps92p0besvs75dswz50fap0paxx2x" timestamp="1564118463">43</key></foreign-keys><ref-type name="Web Page">12</ref-type><contributors><authors><author>National Center for Biotechnology Information,</author></authors></contributors><titles><title>BioSample</title></titles><volume>2019</volume><number>Jul 26</number><dates></dates><urls><related-urls><url>;[58] interface requires the submitter to fill in either country of origin and/or GPS coordinates establishing a link back to the country of origin of the GR.Traceability of GR from public collectionsThis traceable connection to GR is technically enabled by three specific metadata tags: bio_material, culture_collection and specimen_voucher. Approximately 14.2 million NSD entries (6%) in the INSDC have a connection to publicly available GR, i.e., available from a culture collection, museum, botanical garden etc. (Figure 7). None of these tags are mandatory but INSDC provides best practice on how to use a standardized syntax ADDIN EN.CITE <EndNote><Cite><Author>International Nucleotide Sequence Database Collaboration</Author><RecNum>84</RecNum><DisplayText>[59]</DisplayText><record><rec-number>84</rec-number><foreign-keys><key app="EN" db-id="ravxf09ps92p0besvs75dswz50fap0paxx2x" timestamp="1565099673">84</key></foreign-keys><ref-type name="Web Page">12</ref-type><contributors><authors><author>International Nucleotide Sequence Database Collaboration,</author></authors></contributors><titles><title>The DDBJ/ENA/GenBank Feature Table Definition</title></titles><volume>2019</volume><number>Aug 06</number><dates></dates><urls><related-urls><url>;[59]. However, submitters are still not accustomed to citing GRs properly and many GRs are not yet deposited in publicly available collections. This 6% appears rather low and is probably an under-reporting on the part of the scientists. However, in our experience, the vast majority of NSD is indeed generated from privately held GR and would therefore not use these metadata tags.GenBank has created the BioCollections database ADDIN EN.CITE <EndNote><Cite><Author>Sharma</Author><Year>2018</Year><RecNum>87</RecNum><DisplayText>[60]</DisplayText><record><rec-number>87</rec-number><foreign-keys><key app="EN" db-id="ravxf09ps92p0besvs75dswz50fap0paxx2x" timestamp="1565239739">87</key></foreign-keys><ref-type name="Journal Article">17</ref-type><contributors><authors><author>Sharma, S.</author><author>Ciufo, S.</author><author>Starchenko, E.</author><author>Darji, D.</author><author>Chlumsky, L.</author><author>Karsch-Mizrachi, I.</author><author>Schoch, C. L.</author></authors></contributors><auth-address>National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 45 Center Drive, Bethesda, MD 20892, USA.</auth-address><titles><title>The NCBI BioCollections Database</title><secondary-title>Database (Oxford)</secondary-title><alt-title>Database : the journal of biological databases and curation</alt-title></titles><periodical><full-title>Database (Oxford)</full-title><abbr-1>Database : the journal of biological databases and curation</abbr-1></periodical><alt-periodical><full-title>Database (Oxford)</full-title><abbr-1>Database : the journal of biological databases and curation</abbr-1></alt-periodical><volume>2018</volume><keywords><keyword>*Data Accuracy</keyword><keyword>*Databases, Factual</keyword><keyword>National Library of Medicine (U.S.)</keyword><keyword>United States</keyword></keywords><dates><year>2018</year><pub-dates><date>Jan 1</date></pub-dates></dates><isbn>1758-0463 (Electronic)&#xD;1758-0463 (Linking)</isbn><accession-num>29688360</accession-num><urls><related-urls><url>;[60] to store general information on biological collections. This database contains acronyms to use as additions, whenever collection tags for NSD are used and creates links to the related BioCollection entry. Out of 14.2 million NSD entries with filled collection tag(s) only 3.7 million (26% of tagged NSD, 1.7% of all NSD) have a standardized connection to a collection holding institution. This low number is mainly caused by submission from untrained researchers who often work at universities and not in collections and therefore are not familiar with specimen identifiers. And, again, another reason is that many researchers do not deposit their material in collections.Traceability of GR from the environmentIf GR does not come from a public collection, it often comes directly from the environment. Environmental samples can be classified in two groups: abiotic environmental samples such as water, soil or ice samplesbiotic environmental samples such as plant or animal tissue, wood or fecal samplesBoth sample groups can contain the genetic sequences of many different organisms (nothing in the wild is sterile) and the heterogeneity and amount of NSD is, generally, much higher than in other samples. Typically, researchers will want to focus on certain organism groups or from the sample, e.g., viruses, bacteria or fungi.The three INSDC databases have created the BioSamples database to better structure and document metadata of environmental samples or cell lines ADDIN EN.CITE <EndNote><Cite><Author>National Center for Biotechnology Information</Author><RecNum>88</RecNum><DisplayText>[61]</DisplayText><record><rec-number>88</rec-number><foreign-keys><key app="EN" db-id="ravxf09ps92p0besvs75dswz50fap0paxx2x" timestamp="1565240147">88</key></foreign-keys><ref-type name="Web Page">12</ref-type><contributors><authors><author>National Center for Biotechnology Information,</author></authors></contributors><titles><title>BioSample Documentation</title></titles><volume>2019</volume><number>Aug 08</number><dates></dates><urls><related-urls><url>;[61]. Here, thousands of NSD entries can be associated to one single BioSample and submissions of any NSD must start with the registration of a BioProject to describe the study. BioSamples and NSD must be linked to these BioProjects. The registration of BioSamples is mandatory for environmental samples, but not for single organism samples (although recommended by GenBank). Metadata associated to a BioSample must be provided by using standardized tags like sample name, collection date, depth, environment or medium. Customized tags are also possible upon request if needed. The Genomics Standards Consortium (GSC) ADDIN EN.CITE <EndNote><Cite><Author>Genomic Standards Consortium</Author><RecNum>89</RecNum><DisplayText>[62]</DisplayText><record><rec-number>89</rec-number><foreign-keys><key app="EN" db-id="ravxf09ps92p0besvs75dswz50fap0paxx2x" timestamp="1565240270">89</key></foreign-keys><ref-type name="Web Page">12</ref-type><contributors><authors><author>Genomic Standards Consortium,</author></authors></contributors><titles><title>Homepage Genomic Standards Consortium</title></titles><volume>2019</volume><number>Aug 08</number><dates></dates><urls><related-urls><url>;[62] has created a suite of standards to describe any kind of environmental sample, which are supported by INSDC for over a decade and are today used in several hundred thousand BioSample records. 9.95 million BioSample records are available in the database.Traceability after INSDC submission to publicationsOnce the NSD has been submitted to the INSDC, the traceability system via the AN begins. The resulting scientific publication will receive a digital object identifier (DOI) and within the publication the ANs will be listed. Once published, the majority of publishers use DOIs ADDIN EN.CITE <EndNote><Cite><Author>International DOI Foundation</Author><RecNum>44</RecNum><DisplayText>[63]</DisplayText><record><rec-number>44</rec-number><foreign-keys><key app="EN" db-id="ravxf09ps92p0besvs75dswz50fap0paxx2x" timestamp="1564118579">44</key></foreign-keys><ref-type name="Web Page">12</ref-type><contributors><authors><author>International DOI Foundation,</author></authors></contributors><titles><title>The DOI system</title></titles><volume>2019</volume><number>Jul 26</number><dates></dates><urls><related-urls><url>;[63] to trace publications. DOIs are stable links to online content that are issued by a DOI registration agency which do not break when content moves around the internet. Indeed, recent estimates indicate that around 90% of publications in the natural sciences have a DOI PEVuZE5vdGU+PENpdGU+PEF1dGhvcj5Hb3JyYWl6PC9BdXRob3I+PFllYXI+MjAxNjwvWWVhcj48

UmVjTnVtPjQ1PC9SZWNOdW0+PERpc3BsYXlUZXh0Pls2NF08L0Rpc3BsYXlUZXh0PjxyZWNvcmQ+

PHJlYy1udW1iZXI+NDU8L3JlYy1udW1iZXI+PGZvcmVpZ24ta2V5cz48a2V5IGFwcD0iRU4iIGRi

LWlkPSJyYXZ4ZjA5cHM5MnAwYmVzdnM3NWRzd3o1MGZhcDBwYXh4MngiIHRpbWVzdGFtcD0iMTU2

NDExODY1NSI+NDU8L2tleT48L2ZvcmVpZ24ta2V5cz48cmVmLXR5cGUgbmFtZT0iSm91cm5hbCBB

cnRpY2xlIj4xNzwvcmVmLXR5cGU+PGNvbnRyaWJ1dG9ycz48YXV0aG9ycz48YXV0aG9yPkdvcnJh

aXosIEouPC9hdXRob3I+PGF1dGhvcj5NZWxlcm8tRnVlbnRlcywgRC48L2F1dGhvcj48YXV0aG9y

Pkd1bXBlbmJlcmdlciwgQy48L2F1dGhvcj48YXV0aG9yPlZhbGRlcnJhbWEtWnVyaWFuLCBKLiBD

LjwvYXV0aG9yPjwvYXV0aG9ycz48L2NvbnRyaWJ1dG9ycz48YXV0aC1hZGRyZXNzPlVuaXYgVmll

bm5hLCBMaWIgU2VydiwgQmlibGlvbWV0ciAmYW1wOyBQdWJsaWNhdCBTdHJhdGVnaWVzLCBWaWVu

bmEsIEF1c3RyaWEmI3hEO1VuaXYgVmllbm5hLCBBcmNoIFNlcnYsIEJpYmxpb21ldHIgJmFtcDsg

UHVibGljYXQgU3RyYXRlZ2llcywgVmllbm5hLCBBdXN0cmlhJiN4RDtDYXRob2xpYyBVbml2IFZh

bGVuY2lhLCBJbnN0IERvY3VtZW50YXQgJmFtcDsgSW5mb3JtYXQgVGVjaG5vbCwgVmFsZW5jaWEs

IFNwYWluJiN4RDtVSVNZUyBVbml2IFZhbGVuY2lhIFNwYW5pc2ggTmF0bCBSZXMgQ291bmNpbCwg

VmFsZW5jaWEsIFNwYWluJiN4RDtVbml2IFZhbGVuY2lhLCBEZXB0IEhpc3QgU2NpICZhbXA7IERv

Y3VtZW50YXQsIFZhbGVuY2lhLCBTcGFpbjwvYXV0aC1hZGRyZXNzPjx0aXRsZXM+PHRpdGxlPkF2

YWlsYWJpbGl0eSBvZiBkaWdpdGFsIG9iamVjdCBpZGVudGlmaWVycyAoRE9JcykgaW4gV2ViIG9m

IFNjaWVuY2UgYW5kIFNjb3B1czwvdGl0bGU+PHNlY29uZGFyeS10aXRsZT5Kb3VybmFsIG9mIElu

Zm9ybWV0cmljczwvc2Vjb25kYXJ5LXRpdGxlPjxhbHQtdGl0bGU+SiBJbmZvcm1ldHI8L2FsdC10

aXRsZT48L3RpdGxlcz48cGVyaW9kaWNhbD48ZnVsbC10aXRsZT5Kb3VybmFsIG9mIEluZm9ybWV0

cmljczwvZnVsbC10aXRsZT48YWJici0xPkogSW5mb3JtZXRyPC9hYmJyLTE+PC9wZXJpb2RpY2Fs

PjxhbHQtcGVyaW9kaWNhbD48ZnVsbC10aXRsZT5Kb3VybmFsIG9mIEluZm9ybWV0cmljczwvZnVs

bC10aXRsZT48YWJici0xPkogSW5mb3JtZXRyPC9hYmJyLTE+PC9hbHQtcGVyaW9kaWNhbD48cGFn

ZXM+OTgtMTA5PC9wYWdlcz48dm9sdW1lPjEwPC92b2x1bWU+PG51bWJlcj4xPC9udW1iZXI+PGtl

eXdvcmRzPjxrZXl3b3JkPmRvaTwva2V5d29yZD48a2V5d29yZD53ZWIgb2Ygc2NpZW5jZSBjb3Jl

IGNvbGxlY3Rpb248L2tleXdvcmQ+PGtleXdvcmQ+c2NvcHVzPC9rZXl3b3JkPjxrZXl3b3JkPmNp

dGF0aW9uIGRhdGFiYXNlczwva2V5d29yZD48a2V5d29yZD52aXNpYmlsaXR5PC9rZXl3b3JkPjxr

ZXl3b3JkPmFsdG1ldHJpY3M8L2tleXdvcmQ+PGtleXdvcmQ+YWx0bWV0cmljczwva2V5d29yZD48

a2V5d29yZD5pbXBhY3Q8L2tleXdvcmQ+PC9rZXl3b3Jkcz48ZGF0ZXM+PHllYXI+MjAxNjwveWVh

cj48cHViLWRhdGVzPjxkYXRlPkZlYjwvZGF0ZT48L3B1Yi1kYXRlcz48L2RhdGVzPjxpc2JuPjE3

NTEtMTU3NzwvaXNibj48YWNjZXNzaW9uLW51bT5XT1M6MDAwMzcxOTM4NjAwMDEwPC9hY2Nlc3Np

b24tbnVtPjx1cmxzPjxyZWxhdGVkLXVybHM+PHVybD4mbHQ7R28gdG8gSVNJJmd0OzovL1dPUzow

MDAzNzE5Mzg2MDAwMTA8L3VybD48dXJsPmh0dHBzOi8vd3d3LnNjaWVuY2VkaXJlY3QuY29tL3Nj

aWVuY2UvYXJ0aWNsZS9hYnMvcGlpL1MxNzUxMTU3NzE1MzAxMTc2P3ZpYSUzRGlodWI8L3VybD48

L3JlbGF0ZWQtdXJscz48L3VybHM+PGVsZWN0cm9uaWMtcmVzb3VyY2UtbnVtPjEwLjEwMTYvai5q

b2kuMjAxNS4xMS4wMDg8L2VsZWN0cm9uaWMtcmVzb3VyY2UtbnVtPjxsYW5ndWFnZT40MTwvbGFu

Z3VhZ2U+PC9yZWNvcmQ+PC9DaXRlPjwvRW5kTm90ZT5=

ADDIN EN.CITE PEVuZE5vdGU+PENpdGU+PEF1dGhvcj5Hb3JyYWl6PC9BdXRob3I+PFllYXI+MjAxNjwvWWVhcj48

UmVjTnVtPjQ1PC9SZWNOdW0+PERpc3BsYXlUZXh0Pls2NF08L0Rpc3BsYXlUZXh0PjxyZWNvcmQ+

PHJlYy1udW1iZXI+NDU8L3JlYy1udW1iZXI+PGZvcmVpZ24ta2V5cz48a2V5IGFwcD0iRU4iIGRi

LWlkPSJyYXZ4ZjA5cHM5MnAwYmVzdnM3NWRzd3o1MGZhcDBwYXh4MngiIHRpbWVzdGFtcD0iMTU2

NDExODY1NSI+NDU8L2tleT48L2ZvcmVpZ24ta2V5cz48cmVmLXR5cGUgbmFtZT0iSm91cm5hbCBB

cnRpY2xlIj4xNzwvcmVmLXR5cGU+PGNvbnRyaWJ1dG9ycz48YXV0aG9ycz48YXV0aG9yPkdvcnJh

aXosIEouPC9hdXRob3I+PGF1dGhvcj5NZWxlcm8tRnVlbnRlcywgRC48L2F1dGhvcj48YXV0aG9y

Pkd1bXBlbmJlcmdlciwgQy48L2F1dGhvcj48YXV0aG9yPlZhbGRlcnJhbWEtWnVyaWFuLCBKLiBD

LjwvYXV0aG9yPjwvYXV0aG9ycz48L2NvbnRyaWJ1dG9ycz48YXV0aC1hZGRyZXNzPlVuaXYgVmll

bm5hLCBMaWIgU2VydiwgQmlibGlvbWV0ciAmYW1wOyBQdWJsaWNhdCBTdHJhdGVnaWVzLCBWaWVu

bmEsIEF1c3RyaWEmI3hEO1VuaXYgVmllbm5hLCBBcmNoIFNlcnYsIEJpYmxpb21ldHIgJmFtcDsg

UHVibGljYXQgU3RyYXRlZ2llcywgVmllbm5hLCBBdXN0cmlhJiN4RDtDYXRob2xpYyBVbml2IFZh

bGVuY2lhLCBJbnN0IERvY3VtZW50YXQgJmFtcDsgSW5mb3JtYXQgVGVjaG5vbCwgVmFsZW5jaWEs

IFNwYWluJiN4RDtVSVNZUyBVbml2IFZhbGVuY2lhIFNwYW5pc2ggTmF0bCBSZXMgQ291bmNpbCwg

VmFsZW5jaWEsIFNwYWluJiN4RDtVbml2IFZhbGVuY2lhLCBEZXB0IEhpc3QgU2NpICZhbXA7IERv

Y3VtZW50YXQsIFZhbGVuY2lhLCBTcGFpbjwvYXV0aC1hZGRyZXNzPjx0aXRsZXM+PHRpdGxlPkF2

YWlsYWJpbGl0eSBvZiBkaWdpdGFsIG9iamVjdCBpZGVudGlmaWVycyAoRE9JcykgaW4gV2ViIG9m

IFNjaWVuY2UgYW5kIFNjb3B1czwvdGl0bGU+PHNlY29uZGFyeS10aXRsZT5Kb3VybmFsIG9mIElu

Zm9ybWV0cmljczwvc2Vjb25kYXJ5LXRpdGxlPjxhbHQtdGl0bGU+SiBJbmZvcm1ldHI8L2FsdC10

aXRsZT48L3RpdGxlcz48cGVyaW9kaWNhbD48ZnVsbC10aXRsZT5Kb3VybmFsIG9mIEluZm9ybWV0

cmljczwvZnVsbC10aXRsZT48YWJici0xPkogSW5mb3JtZXRyPC9hYmJyLTE+PC9wZXJpb2RpY2Fs

PjxhbHQtcGVyaW9kaWNhbD48ZnVsbC10aXRsZT5Kb3VybmFsIG9mIEluZm9ybWV0cmljczwvZnVs

bC10aXRsZT48YWJici0xPkogSW5mb3JtZXRyPC9hYmJyLTE+PC9hbHQtcGVyaW9kaWNhbD48cGFn

ZXM+OTgtMTA5PC9wYWdlcz48dm9sdW1lPjEwPC92b2x1bWU+PG51bWJlcj4xPC9udW1iZXI+PGtl

eXdvcmRzPjxrZXl3b3JkPmRvaTwva2V5d29yZD48a2V5d29yZD53ZWIgb2Ygc2NpZW5jZSBjb3Jl

IGNvbGxlY3Rpb248L2tleXdvcmQ+PGtleXdvcmQ+c2NvcHVzPC9rZXl3b3JkPjxrZXl3b3JkPmNp

dGF0aW9uIGRhdGFiYXNlczwva2V5d29yZD48a2V5d29yZD52aXNpYmlsaXR5PC9rZXl3b3JkPjxr

ZXl3b3JkPmFsdG1ldHJpY3M8L2tleXdvcmQ+PGtleXdvcmQ+YWx0bWV0cmljczwva2V5d29yZD48

a2V5d29yZD5pbXBhY3Q8L2tleXdvcmQ+PC9rZXl3b3Jkcz48ZGF0ZXM+PHllYXI+MjAxNjwveWVh

cj48cHViLWRhdGVzPjxkYXRlPkZlYjwvZGF0ZT48L3B1Yi1kYXRlcz48L2RhdGVzPjxpc2JuPjE3

NTEtMTU3NzwvaXNibj48YWNjZXNzaW9uLW51bT5XT1M6MDAwMzcxOTM4NjAwMDEwPC9hY2Nlc3Np

b24tbnVtPjx1cmxzPjxyZWxhdGVkLXVybHM+PHVybD4mbHQ7R28gdG8gSVNJJmd0OzovL1dPUzow

MDAzNzE5Mzg2MDAwMTA8L3VybD48dXJsPmh0dHBzOi8vd3d3LnNjaWVuY2VkaXJlY3QuY29tL3Nj

aWVuY2UvYXJ0aWNsZS9hYnMvcGlpL1MxNzUxMTU3NzE1MzAxMTc2P3ZpYSUzRGlodWI8L3VybD48

L3JlbGF0ZWQtdXJscz48L3VybHM+PGVsZWN0cm9uaWMtcmVzb3VyY2UtbnVtPjEwLjEwMTYvai5q

b2kuMjAxNS4xMS4wMDg8L2VsZWN0cm9uaWMtcmVzb3VyY2UtbnVtPjxsYW5ndWFnZT40MTwvbGFu

Z3VhZ2U+PC9yZWNvcmQ+PC9DaXRlPjwvRW5kTm90ZT5=

ADDIN EN.CITE.DATA [64]. DOIs can also be linked to additional unique identifiers such as the PubMed ID (PMID) used by NCBI, which links publications in PubMed (a literature search tool) to DOIs and ANs.The AN is often listed as text in the publication. The publishing scientist will then report the publication back to the INSDC which will then update the NSD entry with the DOI from the publication. Or, if the scientist forgets to report, there are some automated methods that INSDC employs to scan new open access publication for ANs (since they have a standard format) and link publications (via the DOI) to the respective NSD entries. This information is then pulled out of the INSDC and into biological databases (Section 3.2 and Figure 6) where links to both the original NSD and the original publication can be found.Traceability to other databases and data layersOnce published in INSDC, the ANs and DOIs/PMIDs are jointly used to enable NSD exchange and new layers of SI (e.g. protein sequences or gene expression studies) by hundreds of other databases or potentially thousands of downstream publications (through citations) to generate additional subsidiary information and add scientific understanding, context and meaning to the original NSD. This knowledge generation and addition of scientific value occurs in the titled “public sphere” in Figure 3. The green boxes for both private and public research access the public sphere during the research phase and contribute back to it at the conclusion of a research project with new NSD and possibly new SI.The PubMed database, established in 1996 by NCBI, is another database collaborating with the INSDC to provide metadata about publications (e.g., authors names, abstracts) and full texts (if copyright permits) in life sciences and biomedical area ADDIN EN.CITE <EndNote><Cite><Author>Canese</Author><Year>2006</Year><RecNum>91</RecNum><DisplayText>[65]</DisplayText><record><rec-number>91</rec-number><foreign-keys><key app="EN" db-id="ravxf09ps92p0besvs75dswz50fap0paxx2x" timestamp="1565240703">91</key></foreign-keys><ref-type name="Electronic Article">43</ref-type><contributors><authors><author>K. Canese</author></authors></contributors><titles><title>PubMed Celebrates its 10th Anniversary!</title><secondary-title>NLM Tech Bull.</secondary-title></titles><periodical><full-title>NLM Tech Bull.</full-title></periodical><volume>352:e5</volume><edition>Oct 05</edition><dates><year>2006</year></dates><urls><related-urls><url>;[65]. For each record a PubMed ID (unique integer value starting at 1) is created in addition to existing DOIs created by publishers. These PubMed IDs are used to set up a connection between NSDs and publications. Today 2,751 life science journals are deposited at PubMed and 5,246 at MEDLINE, which is the biomedical chapter of this database ADDIN EN.CITE <EndNote><Cite><Author>National Center for Biotechnology Information</Author><RecNum>92</RecNum><DisplayText>[66]</DisplayText><record><rec-number>92</rec-number><foreign-keys><key app="EN" db-id="ravxf09ps92p0besvs75dswz50fap0paxx2x" timestamp="1565240898">92</key></foreign-keys><ref-type name="Web Page">12</ref-type><contributors><authors><author>National Center for Biotechnology Information,</author></authors></contributors><titles><title>NLM Catalog: Journals referenced in the NCBI Databases</title></titles><volume>2019</volume><number>Aug 08</number><dates></dates><urls><related-urls><url>;[66]. Thirty nine (39%) percent of all non-human NSD records are associated to a PubMed ID and traceability from NSD to publications is achieved. The vast majority of the remaining NSD records is also published in articles, but not connected to the PubMed database. The article information is available, but often no dynamic linkage between NSD and DOI is given. Traceability is given by manual steps, but could be improved by supporting DOIs in addition to PubMed IDs on the part of INSDC.Private sphereSection 3.6 and 4.3 discuss the flow of NSD between private databases, private research and patent NSD disclosure and submission to the INSDC. In short, private research (upper green box, Figure 6) also generates NSD from GR and submits it to in-house databases that are not publicly accessible. Private research also downloads NSD from the INSDC and uses this data to compare to their in-house NSD. If NSD during the course of R&D is relevant in a patent application, the NSD must be disclosed and submitted to the patent office as well as, in some cases, to the INSDC (see Section 4.3). Here again ANs from INSDC are in widespread use, although the internal generation of private NSD does not generate an AN since only NSD that has passed through the INSDC receives an AN.Traceability to GR accessed under the Nagoya ProtocolAnother aspect related to GR is whether prior informed consent (PIC) and/or mutually agreed terms (MAT) documentation that could be associated with GR is available in the associated NSD entry in the INSDC. We could find no evidence of this. There are probably two reasons for this: Many Parties to the CBD generate paper/PDF files when issuing PIC and MAT. These files are not technically linkable to INSDC entries.There is not a dedicated PIC/MAT metadata field in the INSDC submission form (although there are free text fields available).However, it is possible to trace a stable link to an NSD submission. A Unique Identifier is generated by the ABS Clearinghouse when an internationally recognized certificate of compliance (IRCC) is published. An IRCC is a special (globally available) form of PIC/MAT. In other words, if a user submitted NSD to the INSDC and provided the Unique Identifier and link from their IRCC published on the ABS Clearinghouse, the traceability link could easily be established. Although there are on-going discussions within the INSDC on creating an IRCC metadata field, there is not yet a specific field for this. Instead, in the current schema, a user could hypothetically add the Unique Identifier using a text metadata field and the link. Alternatively this information can be provided together with the metadata information about the underlying GR deposited in collections by using established biodiversity data standards PEVuZE5vdGU+PENpdGU+PEF1dGhvcj5Ecm9lZ2U8L0F1dGhvcj48WWVhcj4yMDE2PC9ZZWFyPjxS

ZWNOdW0+OTA8L1JlY051bT48RGlzcGxheVRleHQ+WzY3XTwvRGlzcGxheVRleHQ+PHJlY29yZD48

cmVjLW51bWJlcj45MDwvcmVjLW51bWJlcj48Zm9yZWlnbi1rZXlzPjxrZXkgYXBwPSJFTiIgZGIt

aWQ9InJhdnhmMDlwczkycDBiZXN2czc1ZHN3ejUwZmFwMHBheHgyeCIgdGltZXN0YW1wPSIxNTY1

MjQwMzU0Ij45MDwva2V5PjwvZm9yZWlnbi1rZXlzPjxyZWYtdHlwZSBuYW1lPSJKb3VybmFsIEFy

dGljbGUiPjE3PC9yZWYtdHlwZT48Y29udHJpYnV0b3JzPjxhdXRob3JzPjxhdXRob3I+RHJvZWdl

LCBHLjwvYXV0aG9yPjxhdXRob3I+QmFya2VyLCBLLjwvYXV0aG9yPjxhdXRob3I+U2ViZXJnLCBP

LjwvYXV0aG9yPjxhdXRob3I+Q29kZGluZ3RvbiwgSi48L2F1dGhvcj48YXV0aG9yPkJlbnNvbiwg

RS48L2F1dGhvcj48YXV0aG9yPkJlcmVuZHNvaG4sIFcuIEcuPC9hdXRob3I+PGF1dGhvcj5CdW5r

LCBCLjwvYXV0aG9yPjxhdXRob3I+QnV0bGVyLCBDLjwvYXV0aG9yPjxhdXRob3I+Q2F3c2V5LCBF

LiBNLjwvYXV0aG9yPjxhdXRob3I+RGVjaywgSi48L2F1dGhvcj48YXV0aG9yPkRvcmluZywgTS48

L2F1dGhvcj48YXV0aG9yPkZsZW1vbnMsIFAuPC9hdXRob3I+PGF1dGhvcj5HZW1laW5ob2x6ZXIs

IEIuPC9hdXRob3I+PGF1dGhvcj5HdW50c2NoLCBBLjwvYXV0aG9yPjxhdXRob3I+SG9sbG93ZWxs

LCBULjwvYXV0aG9yPjxhdXRob3I+S2VsYmVydCwgUC48L2F1dGhvcj48YXV0aG9yPktvc3RhZGlu

b3YsIEkuPC9hdXRob3I+PGF1dGhvcj5Lb3R0bWFubiwgUi48L2F1dGhvcj48YXV0aG9yPkxhd2xv

ciwgUi4gVC48L2F1dGhvcj48YXV0aG9yPkx5YWwsIEMuPC9hdXRob3I+PGF1dGhvcj5NYWNrZW56

aWUtRG9kZHMsIEouPC9hdXRob3I+PGF1dGhvcj5NZXllciwgQy48L2F1dGhvcj48YXV0aG9yPk11

bGNhaHksIEQuPC9hdXRob3I+PGF1dGhvcj5OdXNzYmVjaywgUy4gWS48L2F1dGhvcj48YXV0aG9y

Pk8mYXBvcztUdWFtYSwgRS48L2F1dGhvcj48YXV0aG9yPk9ycmVsbCwgVC48L2F1dGhvcj48YXV0

aG9yPlBldGVyc2VuLCBHLjwvYXV0aG9yPjxhdXRob3I+Um9iZXJ0c29uLCBULjwvYXV0aG9yPjxh

dXRob3I+U29obmdlbiwgQy48L2F1dGhvcj48YXV0aG9yPldoaXRhY3JlLCBKLjwvYXV0aG9yPjxh

dXRob3I+V2llY3pvcmVrLCBKLjwvYXV0aG9yPjxhdXRob3I+WWlsbWF6LCBQLjwvYXV0aG9yPjxh

dXRob3I+WmV0enNjaGUsIEguPC9hdXRob3I+PGF1dGhvcj5aaGFuZywgWS48L2F1dGhvcj48YXV0

aG9yPlpob3UsIFguPC9hdXRob3I+PC9hdXRob3JzPjwvY29udHJpYnV0b3JzPjxhdXRoLWFkZHJl

c3M+Qm90YW5pYyBHYXJkZW4gYW5kIEJvdGFuaWNhbCBNdXNldW0gQmVybGluLURhaGxlbSwgRnJl

aWUgVW5pdmVyc2l0YXQgQmVybGluLCBLb25pZ2luLUx1aXNlLVN0ci4gNi04LCBCZXJsaW4gMTQx

OTUsIEdlcm1hbnkgZy5kcm9lZ2VAYmdibS5vcmcuJiN4RDtOYXRpb25hbCBNdXNldW0gb2YgTmF0

dXJhbCBIaXN0b3J5LCBTbWl0aHNvbmlhbiBJbnN0aXR1dGlvbiwgV2FzaGluZ3RvbiwgREMgMjA1

NjAsIFVTQS4mI3hEO05hdHVyYWwgSGlzdG9yeSBNdXNldW0gb2YgRGVubWFyaywgVW5pdmVyc2l0

eSBvZiBDb3BlbmhhZ2VuLCBTb2x2Z2FkZSA4Mywgb3BnLiBTLCBDb3BlbmhhZ2VuIERLLTEzMDcs

IERlbm1hcmsuJiN4RDtEYW1hciBSZXNlYXJjaCBTY2llbnRpc3RzLCBEYW1hciwgRHJ1bSBSb2Fk

LCBDdXBhcm11aXIsIEZpZmUgS1kxNSA1UkosIFVLLiYjeEQ7Qm90YW5pYyBHYXJkZW4gYW5kIEJv

dGFuaWNhbCBNdXNldW0gQmVybGluLURhaGxlbSwgRnJlaWUgVW5pdmVyc2l0YXQgQmVybGluLCBL

b25pZ2luLUx1aXNlLVN0ci4gNi04LCBCZXJsaW4gMTQxOTUsIEdlcm1hbnkuJiN4RDtMZWlibml6

IEluc3RpdHV0ZSBEU01aIC0gR2VybWFuIENvbGxlY3Rpb24gb2YgTWljcm9vcmdhbmlzbXMgYW5k

IENlbGwgQ3VsdHVyZXMsIEluaG9mZmVuc3RyLiA3QiwgQnJhdW5zY2h3ZWlnIDM4MTI0LCBHZXJt

YW55LiYjeEQ7QXVzdHJhbGlhbiBOYXRpb25hbCBXaWxkbGlmZSBDb2xsZWN0aW9uLCBDU0lSTyBO

YXRpb25hbCBSZXNlYXJjaCBDb2xsZWN0aW9ucyBBdXN0cmFsaWEsIENhbmJlcnJhLCBBdXN0cmFs

aWEuJiN4RDtCZXJrZWxleSBOYXR1cmFsIEhpc3RvcnkgTXVzZXVtcywgVW5pdmVyc2l0eSBvZiBD

YWxpZm9ybmlhIGF0IEJlcmtlbGV5LCBCZXJrZWxleSwgQ0EgOTQ3MjAsIFVTQS4mI3hEO0dsb2Jh

bCBCaW9kaXZlcnNpdHkgSW5mb3JtYXRpb24gRmFjaWxpdHkgU2VjcmV0YXJpYXQsIFVuaXZlcnNp

dGV0c3BhcmtlbiAxNSwgQ29wZW5oYWdlbiBESy0yMTAwLCBEZW5tYXJrLiYjeEQ7QXVzdHJhbGlh

biBNdXNldW0sIFN5ZG5leSAyMDEwLCBOU1csIEF1c3RyYWxpYS4mI3hEO1N5c3RlbWF0aWMgQm90

YW55LCBKdXN0dXMgTGllYmlnIFVuaXZlcnNpdHksIEdpZXNzZW4gMzUzOTIsIEdlcm1hbnkuJiN4

RDtEZXBhcnRtZW50IG9mIExpZmUgU2NpZW5jZXMgJmFtcDsgQ2hlbWlzdHJ5LCBKYWNvYnMgVW5p

dmVyc2l0eSBCcmVtZW4gZ0dtYkgsIENhbXB1cyBSaW5nIDEsIEJyZW1lbiAyODc1OSwgR2VybWFu

eS4mI3hEO01pY3JvYmlhbCBHZW5vbWljcyBhbmQgQmlvaW5mb3JtYXRpY3MgUmVzZWFyY2ggR3Jv

dXAsIE1heCBQbGFuY2sgSW5zdGl0dXRlIGZvciBNYXJpbmUgTWljcm9iaW9sb2d5LCBDZWxzaXVz

c3RyYXNzZSAxLCBCcmVtZW4gMjgzNTksIEdlcm1hbnkuJiN4RDtBUkMtTmV0IEFwcGxpZWQgUmVz

ZWFyY2ggb24gQ2FuY2VyIENlbnRyZSwgRGVwYXJ0bWVudCBvZiBQYXRob2xvZ3kgYW5kIERpYWdu

b3N0aWNzLCBVbml2ZXJzaXR5IG9mIFZlcm9uYSwgVmVyb25hIDM3MTM0LCBJdGFseS4mI3hEO05h

dHVyYWwgSGlzdG9yeSBNdXNldW0sIENyb213ZWxsIFJvYWQsIExvbmRvbiBTVzcgNUJELCBVSy4m

I3hEO0RlcGFydG1lbnQgb2YgTWVkaWNhbCBJbmZvcm1hdGljcyBhbmQgVU1HIEJpb2JhbmssIFVu

aXZlcnNpdHkgTWVkaWNhbCBDZW50ZXIgR290dGluZ2VuLCBSb2JlcnQtS29jaC1TdHIuIDQwLCBH

b3R0aW5nZW4gMzcwNzUsIEdlcm1hbnkuJiN4RDtNdXNldW0gb2YgVmVydGVicmF0ZSBab29sb2d5

LCBVbml2ZXJzaXR5IG9mIENhbGlmb3JuaWEgYXQgQmVya2VsZXksIEJlcmtlbGV5LCBDQSA5NDcy

MCwgVVNBLiYjeEQ7SnVsaXVzIEt1ZWhuLUluc3RpdHV0ZSAoSktJKSwgRmVkZXJhbCBSZXNlYXJj

aCBDZW50cmUgZm9yIEN1bHRpdmF0ZWQgUGxhbnRzLCBJbnN0aXR1dGUgZm9yIFJlc2lzdGFuY2Ug

UmVzZWFyY2ggYW5kIFN0cmVzcyBUb2xlcmFuY2UsIEVyd2luLUJhdXItU3RyLiAyNywgUXVlZGxp

bmJ1cmcgMDY0ODQsIEdlcm1hbnkuJiN4RDtDaGluYSBOYXRpb25hbCBHZW5lQmFuaywgQkdJLVNo

ZW56aGVuLCBTaGVuemhlbiwgR3Vhbmdkb25nIDUxODA4MywgQ2hpbmEuPC9hdXRoLWFkZHJlc3M+

PHRpdGxlcz48dGl0bGU+VGhlIEdsb2JhbCBHZW5vbWUgQmlvZGl2ZXJzaXR5IE5ldHdvcmsgKEdH

Qk4pIERhdGEgU3RhbmRhcmQgc3BlY2lmaWNhdGlvbjwvdGl0bGU+PHNlY29uZGFyeS10aXRsZT5E

YXRhYmFzZSAoT3hmb3JkKTwvc2Vjb25kYXJ5LXRpdGxlPjxhbHQtdGl0bGU+RGF0YWJhc2UgOiB0

aGUgam91cm5hbCBvZiBiaW9sb2dpY2FsIGRhdGFiYXNlcyBhbmQgY3VyYXRpb248L2FsdC10aXRs

ZT48L3RpdGxlcz48cGVyaW9kaWNhbD48ZnVsbC10aXRsZT5EYXRhYmFzZSAoT3hmb3JkKTwvZnVs

bC10aXRsZT48YWJici0xPkRhdGFiYXNlIDogdGhlIGpvdXJuYWwgb2YgYmlvbG9naWNhbCBkYXRh

YmFzZXMgYW5kIGN1cmF0aW9uPC9hYmJyLTE+PC9wZXJpb2RpY2FsPjxhbHQtcGVyaW9kaWNhbD48

ZnVsbC10aXRsZT5EYXRhYmFzZSAoT3hmb3JkKTwvZnVsbC10aXRsZT48YWJici0xPkRhdGFiYXNl

IDogdGhlIGpvdXJuYWwgb2YgYmlvbG9naWNhbCBkYXRhYmFzZXMgYW5kIGN1cmF0aW9uPC9hYmJy

LTE+PC9hbHQtcGVyaW9kaWNhbD48dm9sdW1lPjIwMTY8L3ZvbHVtZT48a2V5d29yZHM+PGtleXdv

cmQ+KkJpb2RpdmVyc2l0eTwva2V5d29yZD48a2V5d29yZD4qRGF0YWJhc2VzLCBOdWNsZWljIEFj

aWQ8L2tleXdvcmQ+PGtleXdvcmQ+Kkdlbm9tZTwva2V5d29yZD48L2tleXdvcmRzPjxkYXRlcz48

eWVhcj4yMDE2PC95ZWFyPjwvZGF0ZXM+PGlzYm4+MTc1OC0wNDYzIChFbGVjdHJvbmljKSYjeEQ7

MTc1OC0wNDYzIChMaW5raW5nKTwvaXNibj48YWNjZXNzaW9uLW51bT4yNzY5NDIwNjwvYWNjZXNz

aW9uLW51bT48dXJscz48cmVsYXRlZC11cmxzPjx1cmw+aHR0cDovL3d3dy5uY2JpLm5sbS5uaWgu

Z292L3B1Ym1lZC8yNzY5NDIwNjwvdXJsPjx1cmw+aHR0cHM6Ly93d3cubmNiaS5ubG0ubmloLmdv

di9wbWMvYXJ0aWNsZXMvUE1DNTA0NTg1OS9wZGYvYmF3MTI1LnBkZjwvdXJsPjwvcmVsYXRlZC11

cmxzPjwvdXJscz48Y3VzdG9tMj41MDQ1ODU5PC9jdXN0b20yPjxlbGVjdHJvbmljLXJlc291cmNl

LW51bT4xMC4xMDkzL2RhdGFiYXNlL2JhdzEyNTwvZWxlY3Ryb25pYy1yZXNvdXJjZS1udW0+PGxh

bmd1YWdlPjcxPC9sYW5ndWFnZT48L3JlY29yZD48L0NpdGU+PC9FbmROb3RlPn==

ADDIN EN.CITE PEVuZE5vdGU+PENpdGU+PEF1dGhvcj5Ecm9lZ2U8L0F1dGhvcj48WWVhcj4yMDE2PC9ZZWFyPjxS

ZWNOdW0+OTA8L1JlY051bT48RGlzcGxheVRleHQ+WzY3XTwvRGlzcGxheVRleHQ+PHJlY29yZD48

cmVjLW51bWJlcj45MDwvcmVjLW51bWJlcj48Zm9yZWlnbi1rZXlzPjxrZXkgYXBwPSJFTiIgZGIt

aWQ9InJhdnhmMDlwczkycDBiZXN2czc1ZHN3ejUwZmFwMHBheHgyeCIgdGltZXN0YW1wPSIxNTY1

MjQwMzU0Ij45MDwva2V5PjwvZm9yZWlnbi1rZXlzPjxyZWYtdHlwZSBuYW1lPSJKb3VybmFsIEFy

dGljbGUiPjE3PC9yZWYtdHlwZT48Y29udHJpYnV0b3JzPjxhdXRob3JzPjxhdXRob3I+RHJvZWdl

LCBHLjwvYXV0aG9yPjxhdXRob3I+QmFya2VyLCBLLjwvYXV0aG9yPjxhdXRob3I+U2ViZXJnLCBP

LjwvYXV0aG9yPjxhdXRob3I+Q29kZGluZ3RvbiwgSi48L2F1dGhvcj48YXV0aG9yPkJlbnNvbiwg

RS48L2F1dGhvcj48YXV0aG9yPkJlcmVuZHNvaG4sIFcuIEcuPC9hdXRob3I+PGF1dGhvcj5CdW5r

LCBCLjwvYXV0aG9yPjxhdXRob3I+QnV0bGVyLCBDLjwvYXV0aG9yPjxhdXRob3I+Q2F3c2V5LCBF

LiBNLjwvYXV0aG9yPjxhdXRob3I+RGVjaywgSi48L2F1dGhvcj48YXV0aG9yPkRvcmluZywgTS48

L2F1dGhvcj48YXV0aG9yPkZsZW1vbnMsIFAuPC9hdXRob3I+PGF1dGhvcj5HZW1laW5ob2x6ZXIs

IEIuPC9hdXRob3I+PGF1dGhvcj5HdW50c2NoLCBBLjwvYXV0aG9yPjxhdXRob3I+SG9sbG93ZWxs

LCBULjwvYXV0aG9yPjxhdXRob3I+S2VsYmVydCwgUC48L2F1dGhvcj48YXV0aG9yPktvc3RhZGlu

b3YsIEkuPC9hdXRob3I+PGF1dGhvcj5Lb3R0bWFubiwgUi48L2F1dGhvcj48YXV0aG9yPkxhd2xv

ciwgUi4gVC48L2F1dGhvcj48YXV0aG9yPkx5YWwsIEMuPC9hdXRob3I+PGF1dGhvcj5NYWNrZW56

aWUtRG9kZHMsIEouPC9hdXRob3I+PGF1dGhvcj5NZXllciwgQy48L2F1dGhvcj48YXV0aG9yPk11

bGNhaHksIEQuPC9hdXRob3I+PGF1dGhvcj5OdXNzYmVjaywgUy4gWS48L2F1dGhvcj48YXV0aG9y

Pk8mYXBvcztUdWFtYSwgRS48L2F1dGhvcj48YXV0aG9yPk9ycmVsbCwgVC48L2F1dGhvcj48YXV0

aG9yPlBldGVyc2VuLCBHLjwvYXV0aG9yPjxhdXRob3I+Um9iZXJ0c29uLCBULjwvYXV0aG9yPjxh

dXRob3I+U29obmdlbiwgQy48L2F1dGhvcj48YXV0aG9yPldoaXRhY3JlLCBKLjwvYXV0aG9yPjxh

dXRob3I+V2llY3pvcmVrLCBKLjwvYXV0aG9yPjxhdXRob3I+WWlsbWF6LCBQLjwvYXV0aG9yPjxh

dXRob3I+WmV0enNjaGUsIEguPC9hdXRob3I+PGF1dGhvcj5aaGFuZywgWS48L2F1dGhvcj48YXV0

aG9yPlpob3UsIFguPC9hdXRob3I+PC9hdXRob3JzPjwvY29udHJpYnV0b3JzPjxhdXRoLWFkZHJl

c3M+Qm90YW5pYyBHYXJkZW4gYW5kIEJvdGFuaWNhbCBNdXNldW0gQmVybGluLURhaGxlbSwgRnJl

aWUgVW5pdmVyc2l0YXQgQmVybGluLCBLb25pZ2luLUx1aXNlLVN0ci4gNi04LCBCZXJsaW4gMTQx

OTUsIEdlcm1hbnkgZy5kcm9lZ2VAYmdibS5vcmcuJiN4RDtOYXRpb25hbCBNdXNldW0gb2YgTmF0

dXJhbCBIaXN0b3J5LCBTbWl0aHNvbmlhbiBJbnN0aXR1dGlvbiwgV2FzaGluZ3RvbiwgREMgMjA1

NjAsIFVTQS4mI3hEO05hdHVyYWwgSGlzdG9yeSBNdXNldW0gb2YgRGVubWFyaywgVW5pdmVyc2l0

eSBvZiBDb3BlbmhhZ2VuLCBTb2x2Z2FkZSA4Mywgb3BnLiBTLCBDb3BlbmhhZ2VuIERLLTEzMDcs

IERlbm1hcmsuJiN4RDtEYW1hciBSZXNlYXJjaCBTY2llbnRpc3RzLCBEYW1hciwgRHJ1bSBSb2Fk

LCBDdXBhcm11aXIsIEZpZmUgS1kxNSA1UkosIFVLLiYjeEQ7Qm90YW5pYyBHYXJkZW4gYW5kIEJv

dGFuaWNhbCBNdXNldW0gQmVybGluLURhaGxlbSwgRnJlaWUgVW5pdmVyc2l0YXQgQmVybGluLCBL

b25pZ2luLUx1aXNlLVN0ci4gNi04LCBCZXJsaW4gMTQxOTUsIEdlcm1hbnkuJiN4RDtMZWlibml6

IEluc3RpdHV0ZSBEU01aIC0gR2VybWFuIENvbGxlY3Rpb24gb2YgTWljcm9vcmdhbmlzbXMgYW5k

IENlbGwgQ3VsdHVyZXMsIEluaG9mZmVuc3RyLiA3QiwgQnJhdW5zY2h3ZWlnIDM4MTI0LCBHZXJt

YW55LiYjeEQ7QXVzdHJhbGlhbiBOYXRpb25hbCBXaWxkbGlmZSBDb2xsZWN0aW9uLCBDU0lSTyBO

YXRpb25hbCBSZXNlYXJjaCBDb2xsZWN0aW9ucyBBdXN0cmFsaWEsIENhbmJlcnJhLCBBdXN0cmFs

aWEuJiN4RDtCZXJrZWxleSBOYXR1cmFsIEhpc3RvcnkgTXVzZXVtcywgVW5pdmVyc2l0eSBvZiBD

YWxpZm9ybmlhIGF0IEJlcmtlbGV5LCBCZXJrZWxleSwgQ0EgOTQ3MjAsIFVTQS4mI3hEO0dsb2Jh

bCBCaW9kaXZlcnNpdHkgSW5mb3JtYXRpb24gRmFjaWxpdHkgU2VjcmV0YXJpYXQsIFVuaXZlcnNp

dGV0c3BhcmtlbiAxNSwgQ29wZW5oYWdlbiBESy0yMTAwLCBEZW5tYXJrLiYjeEQ7QXVzdHJhbGlh

biBNdXNldW0sIFN5ZG5leSAyMDEwLCBOU1csIEF1c3RyYWxpYS4mI3hEO1N5c3RlbWF0aWMgQm90

YW55LCBKdXN0dXMgTGllYmlnIFVuaXZlcnNpdHksIEdpZXNzZW4gMzUzOTIsIEdlcm1hbnkuJiN4

RDtEZXBhcnRtZW50IG9mIExpZmUgU2NpZW5jZXMgJmFtcDsgQ2hlbWlzdHJ5LCBKYWNvYnMgVW5p

dmVyc2l0eSBCcmVtZW4gZ0dtYkgsIENhbXB1cyBSaW5nIDEsIEJyZW1lbiAyODc1OSwgR2VybWFu

eS4mI3hEO01pY3JvYmlhbCBHZW5vbWljcyBhbmQgQmlvaW5mb3JtYXRpY3MgUmVzZWFyY2ggR3Jv

dXAsIE1heCBQbGFuY2sgSW5zdGl0dXRlIGZvciBNYXJpbmUgTWljcm9iaW9sb2d5LCBDZWxzaXVz

c3RyYXNzZSAxLCBCcmVtZW4gMjgzNTksIEdlcm1hbnkuJiN4RDtBUkMtTmV0IEFwcGxpZWQgUmVz

ZWFyY2ggb24gQ2FuY2VyIENlbnRyZSwgRGVwYXJ0bWVudCBvZiBQYXRob2xvZ3kgYW5kIERpYWdu

b3N0aWNzLCBVbml2ZXJzaXR5IG9mIFZlcm9uYSwgVmVyb25hIDM3MTM0LCBJdGFseS4mI3hEO05h

dHVyYWwgSGlzdG9yeSBNdXNldW0sIENyb213ZWxsIFJvYWQsIExvbmRvbiBTVzcgNUJELCBVSy4m

I3hEO0RlcGFydG1lbnQgb2YgTWVkaWNhbCBJbmZvcm1hdGljcyBhbmQgVU1HIEJpb2JhbmssIFVu

aXZlcnNpdHkgTWVkaWNhbCBDZW50ZXIgR290dGluZ2VuLCBSb2JlcnQtS29jaC1TdHIuIDQwLCBH

b3R0aW5nZW4gMzcwNzUsIEdlcm1hbnkuJiN4RDtNdXNldW0gb2YgVmVydGVicmF0ZSBab29sb2d5

LCBVbml2ZXJzaXR5IG9mIENhbGlmb3JuaWEgYXQgQmVya2VsZXksIEJlcmtlbGV5LCBDQSA5NDcy

MCwgVVNBLiYjeEQ7SnVsaXVzIEt1ZWhuLUluc3RpdHV0ZSAoSktJKSwgRmVkZXJhbCBSZXNlYXJj

aCBDZW50cmUgZm9yIEN1bHRpdmF0ZWQgUGxhbnRzLCBJbnN0aXR1dGUgZm9yIFJlc2lzdGFuY2Ug

UmVzZWFyY2ggYW5kIFN0cmVzcyBUb2xlcmFuY2UsIEVyd2luLUJhdXItU3RyLiAyNywgUXVlZGxp

bmJ1cmcgMDY0ODQsIEdlcm1hbnkuJiN4RDtDaGluYSBOYXRpb25hbCBHZW5lQmFuaywgQkdJLVNo

ZW56aGVuLCBTaGVuemhlbiwgR3Vhbmdkb25nIDUxODA4MywgQ2hpbmEuPC9hdXRoLWFkZHJlc3M+

PHRpdGxlcz48dGl0bGU+VGhlIEdsb2JhbCBHZW5vbWUgQmlvZGl2ZXJzaXR5IE5ldHdvcmsgKEdH

Qk4pIERhdGEgU3RhbmRhcmQgc3BlY2lmaWNhdGlvbjwvdGl0bGU+PHNlY29uZGFyeS10aXRsZT5E

YXRhYmFzZSAoT3hmb3JkKTwvc2Vjb25kYXJ5LXRpdGxlPjxhbHQtdGl0bGU+RGF0YWJhc2UgOiB0

aGUgam91cm5hbCBvZiBiaW9sb2dpY2FsIGRhdGFiYXNlcyBhbmQgY3VyYXRpb248L2FsdC10aXRs

ZT48L3RpdGxlcz48cGVyaW9kaWNhbD48ZnVsbC10aXRsZT5EYXRhYmFzZSAoT3hmb3JkKTwvZnVs

bC10aXRsZT48YWJici0xPkRhdGFiYXNlIDogdGhlIGpvdXJuYWwgb2YgYmlvbG9naWNhbCBkYXRh

YmFzZXMgYW5kIGN1cmF0aW9uPC9hYmJyLTE+PC9wZXJpb2RpY2FsPjxhbHQtcGVyaW9kaWNhbD48

ZnVsbC10aXRsZT5EYXRhYmFzZSAoT3hmb3JkKTwvZnVsbC10aXRsZT48YWJici0xPkRhdGFiYXNl

IDogdGhlIGpvdXJuYWwgb2YgYmlvbG9naWNhbCBkYXRhYmFzZXMgYW5kIGN1cmF0aW9uPC9hYmJy

LTE+PC9hbHQtcGVyaW9kaWNhbD48dm9sdW1lPjIwMTY8L3ZvbHVtZT48a2V5d29yZHM+PGtleXdv

cmQ+KkJpb2RpdmVyc2l0eTwva2V5d29yZD48a2V5d29yZD4qRGF0YWJhc2VzLCBOdWNsZWljIEFj

aWQ8L2tleXdvcmQ+PGtleXdvcmQ+Kkdlbm9tZTwva2V5d29yZD48L2tleXdvcmRzPjxkYXRlcz48

eWVhcj4yMDE2PC95ZWFyPjwvZGF0ZXM+PGlzYm4+MTc1OC0wNDYzIChFbGVjdHJvbmljKSYjeEQ7

MTc1OC0wNDYzIChMaW5raW5nKTwvaXNibj48YWNjZXNzaW9uLW51bT4yNzY5NDIwNjwvYWNjZXNz

aW9uLW51bT48dXJscz48cmVsYXRlZC11cmxzPjx1cmw+aHR0cDovL3d3dy5uY2JpLm5sbS5uaWgu

Z292L3B1Ym1lZC8yNzY5NDIwNjwvdXJsPjx1cmw+aHR0cHM6Ly93d3cubmNiaS5ubG0ubmloLmdv

di9wbWMvYXJ0aWNsZXMvUE1DNTA0NTg1OS9wZGYvYmF3MTI1LnBkZjwvdXJsPjwvcmVsYXRlZC11

cmxzPjwvdXJscz48Y3VzdG9tMj41MDQ1ODU5PC9jdXN0b20yPjxlbGVjdHJvbmljLXJlc291cmNl

LW51bT4xMC4xMDkzL2RhdGFiYXNlL2JhdzEyNTwvZWxlY3Ryb25pYy1yZXNvdXJjZS1udW0+PGxh

bmd1YWdlPjcxPC9sYW5ndWFnZT48L3JlY29yZD48L0NpdGU+PC9FbmROb3RlPn==

ADDIN EN.CITE.DATA [67].The temporal scope of GR utilization is also important in the context of the Nagoya Protocol and national ABS laws. Date of sampling/primary access, date of sequencing/utilization, and other temporal information is not always available in an NSD entry. This information would greatly help users to understand their legal situation if the INSDC were to enable new metadata fields to transmit this information.Evolving technologies in biodiversity traceabilityTraceability of digital information is crucial not only for biodiversity and molecular data, but all kinds of data. Other technologies that enable information traceability include hypertext transfer protocol (HTTP), uniform resource identifiers (URIs), uniform resource locators (URLs), and Globally Unique Identifiers (GUIDs). The biodiversity informatics community that deals with primary biodiversity data has been working on global infrastructures for more than three decades. Most importantly the Global Biodiversity Information Facility (GBIF) ADDIN EN.CITE <EndNote><Cite><Author>Global Biodiversity Information Facility</Author><RecNum>80</RecNum><DisplayText>[68]</DisplayText><record><rec-number>80</rec-number><foreign-keys><key app="EN" db-id="ravxf09ps92p0besvs75dswz50fap0paxx2x" timestamp="1565098359">80</key></foreign-keys><ref-type name="Web Page">12</ref-type><contributors><authors><author>Global Biodiversity Information Facility,</author></authors></contributors><titles><title>Homepage GBIF</title></titles><volume>2019</volume><number>Aug 06</number><dates></dates><urls><related-urls><url>;[68] and the Biodiversity Information Standards initiative (for historical reasons called TDWG) ADDIN EN.CITE <EndNote><Cite><Author>Biodiversity Information Standards</Author><RecNum>81</RecNum><DisplayText>[69]</DisplayText><record><rec-number>81</rec-number><foreign-keys><key app="EN" db-id="ravxf09ps92p0besvs75dswz50fap0paxx2x" timestamp="1565098830">81</key></foreign-keys><ref-type name="Web Page">12</ref-type><contributors><authors><author>Biodiversity Information Standards,</author></authors></contributors><titles><title>Homepage TDWG</title></titles><volume>2019</volume><number>Aug 06</number><dates></dates><urls><related-urls><url>;[69] are together driving forward the creation of globally unique identifiers for and standardization of biological data. Inspired by the International Plant Exchange Network (IPEN), several initiatives are currently trying to establish a traceability system that works across all Natural History Collections (e.g., SYNTHESYS+ ADDIN EN.CITE <EndNote><Cite><Author>SYNTHESYS+</Author><RecNum>145</RecNum><DisplayText>[70]</DisplayText><record><rec-number>145</rec-number><foreign-keys><key app="EN" db-id="ravxf09ps92p0besvs75dswz50fap0paxx2x" timestamp="1569474992">145</key></foreign-keys><ref-type name="Web Page">12</ref-type><contributors><authors><author>SYNTHESYS+,</author></authors></contributors><titles><title>Synthesis of Systematic Resources Hompage</title></titles><volume>2019</volume><number>Sep 26</number><dates></dates><urls><related-urls><url>;[70], CETAF ADDIN EN.CITE <EndNote><Cite><Author>CETAF</Author><RecNum>146</RecNum><DisplayText>[71]</DisplayText><record><rec-number>146</rec-number><foreign-keys><key app="EN" db-id="ravxf09ps92p0besvs75dswz50fap0paxx2x" timestamp="1569475099">146</key></foreign-keys><ref-type name="Web Page">12</ref-type><contributors><authors><author>CETAF,</author></authors></contributors><titles><title>Consortium of European Taxonomic Facilities Homepage</title></titles><volume>2019</volume><number>Sep 26</number><dates></dates><urls><related-urls><url>;[71], GGBN ADDIN EN.CITE <EndNote><Cite><Author>GGBN</Author><RecNum>147</RecNum><DisplayText>[72]</DisplayText><record><rec-number>147</rec-number><foreign-keys><key app="EN" db-id="ravxf09ps92p0besvs75dswz50fap0paxx2x" timestamp="1569475187">147</key></foreign-keys><ref-type name="Web Page">12</ref-type><contributors><authors><author>GGBN,</author></authors></contributors><titles><title>Global Genome Diversity Network Homepage</title></titles><volume>2019</volume><number>Sep 26</number><dates></dates><urls><related-urls><url>;[72]). It is based on a shared code of conduct, which aims to enable all signatories to be treated as one legal entity. Collaborations between stakeholders such as INSDC, GGBN and GBIF have been established to work on best practices with respect to traceability of NSD and underlying biological material.The GBIF infrastructure and data portal provides standardized and open access to more than 1.3 billion biodiversity occurrence records, meaning they store the location/observation of biological species around the globe. Those records are mainly based on observations, biological collection objects (both fossils and genetic resources) and the metadata retrieved from NSD entries from the INSDC and “workbench databases” like BOLD. The goal is to establish stable identifiers for every occurrence record, including the genetic resources housed in natural history and culture collections. GBIF creates DOIs for every data set from which occurrence records were obtained. In addition, natural history collections are working on mechanisms to create stable identifiers for their collection objects too by using HTTP URLs PEVuZE5vdGU+PENpdGU+PEF1dGhvcj5HdW50c2NoPC9BdXRob3I+PFllYXI+MjAxNzwvWWVhcj48

UmVjTnVtPjgyPC9SZWNOdW0+PERpc3BsYXlUZXh0Pls3M108L0Rpc3BsYXlUZXh0PjxyZWNvcmQ+

PHJlYy1udW1iZXI+ODI8L3JlYy1udW1iZXI+PGZvcmVpZ24ta2V5cz48a2V5IGFwcD0iRU4iIGRi

LWlkPSJyYXZ4ZjA5cHM5MnAwYmVzdnM3NWRzd3o1MGZhcDBwYXh4MngiIHRpbWVzdGFtcD0iMTU2

NTA5ODkwOCI+ODI8L2tleT48L2ZvcmVpZ24ta2V5cz48cmVmLXR5cGUgbmFtZT0iSm91cm5hbCBB

cnRpY2xlIj4xNzwvcmVmLXR5cGU+PGNvbnRyaWJ1dG9ycz48YXV0aG9ycz48YXV0aG9yPkd1bnRz

Y2gsIEEuPC9hdXRob3I+PGF1dGhvcj5IeWFtLCBSLjwvYXV0aG9yPjxhdXRob3I+SGFnZWRvcm4s

IEcuPC9hdXRob3I+PGF1dGhvcj5DaGFnbm91eCwgUy48L2F1dGhvcj48YXV0aG9yPlJvcGVydCwg

RC48L2F1dGhvcj48YXV0aG9yPkNhc2lubywgQS48L2F1dGhvcj48YXV0aG9yPkRyb2VnZSwgRy48

L2F1dGhvcj48YXV0aG9yPkdsb2NrbGVyLCBGLjwvYXV0aG9yPjxhdXRob3I+R29kZGVyeiwgSy48

L2F1dGhvcj48YXV0aG9yPkdyb29tLCBRLjwvYXV0aG9yPjxhdXRob3I+SG9mZm1hbm4sIEouPC9h

dXRob3I+PGF1dGhvcj5Ib2xsZW1hbiwgQS48L2F1dGhvcj48YXV0aG9yPktlbXBhLCBNLjwvYXV0

aG9yPjxhdXRob3I+S29pdnVsYSwgSC48L2F1dGhvcj48YXV0aG9yPk1hcmhvbGQsIEsuPC9hdXRo

b3I+PGF1dGhvcj5OaWNvbHNvbiwgTi48L2F1dGhvcj48YXV0aG9yPlNtaXRoLCBWLiBTLjwvYXV0

aG9yPjxhdXRob3I+VHJpZWJlbCwgRC48L2F1dGhvcj48L2F1dGhvcnM+PC9jb250cmlidXRvcnM+

PGF1dGgtYWRkcmVzcz5Cb3RhbmljIEdhcmRlbiBhbmQgQm90YW5pY2FsIE11c2V1bSBCZXJsaW4s

IEZyZWllIFVuaXZlcnNpdGF0IEJlcmxpbiwgQmVybGluLCBHZXJtYW55LiYjeEQ7Um95YWwgQm90

YW5pYyBHYXJkZW4gRWRpbmJ1cmdoLCBFZGluYnVyZ2gsIFVLLiYjeEQ7TXVzZXVtIGZ1ciBOYXR1

cmt1bmRlIEJlcmxpbiwgTGVpYm5pei1JbnN0aXR1dGUgZm9yIEV2b2x1dGlvbiBhbmQgQmlvZGl2

ZXJzaXR5LCBCZXJsaW4sIEdlcm1hbnkuJiN4RDtNdXNldW0gTmF0aW9uYWwgZCZhcG9zO0hpc3Rv

aXJlIE5hdHVyZWxsZSBQYXJpcywgUGFyaXMsIEZyYW5jZS4mI3hEO0NFVEFGLUNvbnNvcnRpdW0g

b2YgRXVyb3BlYW4gVGF4b25vbWljIEZhY2lsaXRpZXMsIEJydXNzZWxzLCBCZWxnaXVtLiYjeEQ7

Qm90YW5pYyBHYXJkZW4gTWVpc2UsIE1laXNlLCBCZWxnaXVtLiYjeEQ7TmF0dXJhbGlzIEJpb2Rp

dmVyc2l0eSBDZW50ZXIsIExlaWRlbiwgVGhlIE5ldGhlcmxhbmRzLiYjeEQ7SW5zdGl0dXRlIG9m

IEJvdGFueSwgUGxhbnQgU2NpZW5jZSBhbmQgQmlvZGl2ZXJzaXR5IENlbnRlciwgU2xvdmFrIEFj

YWRlbXkgb2YgU2NpZW5jZXMsIEJyYXRpc2xhdmEsIFNsb3Zha2lhLiYjeEQ7RmlubmlzaCBNdXNl

dW0gb2YgTmF0dXJhbCBIaXN0b3J5LCBVbml2ZXJzaXR5IG9mIEhlbHNpbmtpLCBIZWxzaW5raSwg

RmlubGFuZC4mI3hEO0RlcGFydG1lbnQgb2YgQm90YW55LCBGYWN1bHR5IG9mIFNjaWVuY2UsIENo

YXJsZXMgVW5pdmVyc2l0eSwgUHJhaGEsIEN6ZWNoIFJlcHVibGljLiYjeEQ7QmlvZGl2ZXJzaXR5

IEluZm9ybWF0aWNzICZhbXA7IFNwYXRpYWwgQW5hbHlzaXMsIFJveWFsIEJvdGFuaWMgR2FyZGVu

cywgS2V3LCBMb25kb24sIFVLLiYjeEQ7RGVwYXJ0bWVudCBvZiBMaWZlIFNjaWVuY2VzLCBUaGUg

TmF0dXJhbCBIaXN0b3J5IE11c2V1bSwgVUsuJiN4RDtTTlNCIElUIENlbnRlciwgU3RhYXRsaWNo

ZSBOYXR1cndpc3NlbnNjaGFmdGxpY2hlIFNhbW1sdW5nZW4gQmF5ZXJucywgTXVuY2hlbiwgR2Vy

bWFueS48L2F1dGgtYWRkcmVzcz48dGl0bGVzPjx0aXRsZT5BY3Rpb25hYmxlLCBsb25nLXRlcm0g

c3RhYmxlIGFuZCBzZW1hbnRpYyB3ZWIgY29tcGF0aWJsZSBpZGVudGlmaWVycyBmb3IgYWNjZXNz

IHRvIGJpb2xvZ2ljYWwgY29sbGVjdGlvbiBvYmplY3RzPC90aXRsZT48c2Vjb25kYXJ5LXRpdGxl

PkRhdGFiYXNlIChPeGZvcmQpPC9zZWNvbmRhcnktdGl0bGU+PGFsdC10aXRsZT5EYXRhYmFzZSA6

IHRoZSBqb3VybmFsIG9mIGJpb2xvZ2ljYWwgZGF0YWJhc2VzIGFuZCBjdXJhdGlvbjwvYWx0LXRp

dGxlPjwvdGl0bGVzPjxwZXJpb2RpY2FsPjxmdWxsLXRpdGxlPkRhdGFiYXNlIChPeGZvcmQpPC9m

dWxsLXRpdGxlPjxhYmJyLTE+RGF0YWJhc2UgOiB0aGUgam91cm5hbCBvZiBiaW9sb2dpY2FsIGRh

dGFiYXNlcyBhbmQgY3VyYXRpb248L2FiYnItMT48L3BlcmlvZGljYWw+PGFsdC1wZXJpb2RpY2Fs

PjxmdWxsLXRpdGxlPkRhdGFiYXNlIChPeGZvcmQpPC9mdWxsLXRpdGxlPjxhYmJyLTE+RGF0YWJh

c2UgOiB0aGUgam91cm5hbCBvZiBiaW9sb2dpY2FsIGRhdGFiYXNlcyBhbmQgY3VyYXRpb248L2Fi

YnItMT48L2FsdC1wZXJpb2RpY2FsPjx2b2x1bWU+MjAxNzwvdm9sdW1lPjxudW1iZXI+MTwvbnVt

YmVyPjxrZXl3b3Jkcz48a2V5d29yZD4qQmlvZGl2ZXJzaXR5PC9rZXl3b3JkPjxrZXl3b3JkPipE

YXRhYmFzZXMsIEZhY3R1YWw8L2tleXdvcmQ+PGtleXdvcmQ+Kk5hdHVyYWwgTGFuZ3VhZ2UgUHJv

Y2Vzc2luZzwva2V5d29yZD48a2V5d29yZD4qU2VtYW50aWMgV2ViPC9rZXl3b3JkPjxrZXl3b3Jk

PipTb2Z0d2FyZTwva2V5d29yZD48L2tleXdvcmRzPjxkYXRlcz48eWVhcj4yMDE3PC95ZWFyPjxw

dWItZGF0ZXM+PGRhdGU+SmFuIDE8L2RhdGU+PC9wdWItZGF0ZXM+PC9kYXRlcz48aXNibj4xNzU4

LTA0NjMgKEVsZWN0cm9uaWMpJiN4RDsxNzU4LTA0NjMgKExpbmtpbmcpPC9pc2JuPjxhY2Nlc3Np

b24tbnVtPjI4MzY1NzI0PC9hY2Nlc3Npb24tbnVtPjx1cmxzPjxyZWxhdGVkLXVybHM+PHVybD5o

dHRwOi8vd3d3Lm5jYmkubmxtLm5paC5nb3YvcHVibWVkLzI4MzY1NzI0PC91cmw+PHVybD5odHRw

czovL3d3dy5uY2JpLm5sbS5uaWguZ292L3BtYy9hcnRpY2xlcy9QTUM1NDY3NTQ3L3BkZi9iYXgw

MDMucGRmPC91cmw+PC9yZWxhdGVkLXVybHM+PC91cmxzPjxjdXN0b20yPjU0Njc1NDc8L2N1c3Rv

bTI+PGVsZWN0cm9uaWMtcmVzb3VyY2UtbnVtPjEwLjEwOTMvZGF0YWJhc2UvYmF4MDAzPC9lbGVj

dHJvbmljLXJlc291cmNlLW51bT48bGFuZ3VhZ2U+NjI8L2xhbmd1YWdlPjwvcmVjb3JkPjwvQ2l0

ZT48L0VuZE5vdGU+

ADDIN EN.CITE PEVuZE5vdGU+PENpdGU+PEF1dGhvcj5HdW50c2NoPC9BdXRob3I+PFllYXI+MjAxNzwvWWVhcj48

UmVjTnVtPjgyPC9SZWNOdW0+PERpc3BsYXlUZXh0Pls3M108L0Rpc3BsYXlUZXh0PjxyZWNvcmQ+

PHJlYy1udW1iZXI+ODI8L3JlYy1udW1iZXI+PGZvcmVpZ24ta2V5cz48a2V5IGFwcD0iRU4iIGRi

LWlkPSJyYXZ4ZjA5cHM5MnAwYmVzdnM3NWRzd3o1MGZhcDBwYXh4MngiIHRpbWVzdGFtcD0iMTU2

NTA5ODkwOCI+ODI8L2tleT48L2ZvcmVpZ24ta2V5cz48cmVmLXR5cGUgbmFtZT0iSm91cm5hbCBB

cnRpY2xlIj4xNzwvcmVmLXR5cGU+PGNvbnRyaWJ1dG9ycz48YXV0aG9ycz48YXV0aG9yPkd1bnRz

Y2gsIEEuPC9hdXRob3I+PGF1dGhvcj5IeWFtLCBSLjwvYXV0aG9yPjxhdXRob3I+SGFnZWRvcm4s

IEcuPC9hdXRob3I+PGF1dGhvcj5DaGFnbm91eCwgUy48L2F1dGhvcj48YXV0aG9yPlJvcGVydCwg

RC48L2F1dGhvcj48YXV0aG9yPkNhc2lubywgQS48L2F1dGhvcj48YXV0aG9yPkRyb2VnZSwgRy48

L2F1dGhvcj48YXV0aG9yPkdsb2NrbGVyLCBGLjwvYXV0aG9yPjxhdXRob3I+R29kZGVyeiwgSy48

L2F1dGhvcj48YXV0aG9yPkdyb29tLCBRLjwvYXV0aG9yPjxhdXRob3I+SG9mZm1hbm4sIEouPC9h

dXRob3I+PGF1dGhvcj5Ib2xsZW1hbiwgQS48L2F1dGhvcj48YXV0aG9yPktlbXBhLCBNLjwvYXV0

aG9yPjxhdXRob3I+S29pdnVsYSwgSC48L2F1dGhvcj48YXV0aG9yPk1hcmhvbGQsIEsuPC9hdXRo

b3I+PGF1dGhvcj5OaWNvbHNvbiwgTi48L2F1dGhvcj48YXV0aG9yPlNtaXRoLCBWLiBTLjwvYXV0

aG9yPjxhdXRob3I+VHJpZWJlbCwgRC48L2F1dGhvcj48L2F1dGhvcnM+PC9jb250cmlidXRvcnM+

PGF1dGgtYWRkcmVzcz5Cb3RhbmljIEdhcmRlbiBhbmQgQm90YW5pY2FsIE11c2V1bSBCZXJsaW4s

IEZyZWllIFVuaXZlcnNpdGF0IEJlcmxpbiwgQmVybGluLCBHZXJtYW55LiYjeEQ7Um95YWwgQm90

YW5pYyBHYXJkZW4gRWRpbmJ1cmdoLCBFZGluYnVyZ2gsIFVLLiYjeEQ7TXVzZXVtIGZ1ciBOYXR1

cmt1bmRlIEJlcmxpbiwgTGVpYm5pei1JbnN0aXR1dGUgZm9yIEV2b2x1dGlvbiBhbmQgQmlvZGl2

ZXJzaXR5LCBCZXJsaW4sIEdlcm1hbnkuJiN4RDtNdXNldW0gTmF0aW9uYWwgZCZhcG9zO0hpc3Rv

aXJlIE5hdHVyZWxsZSBQYXJpcywgUGFyaXMsIEZyYW5jZS4mI3hEO0NFVEFGLUNvbnNvcnRpdW0g

b2YgRXVyb3BlYW4gVGF4b25vbWljIEZhY2lsaXRpZXMsIEJydXNzZWxzLCBCZWxnaXVtLiYjeEQ7

Qm90YW5pYyBHYXJkZW4gTWVpc2UsIE1laXNlLCBCZWxnaXVtLiYjeEQ7TmF0dXJhbGlzIEJpb2Rp

dmVyc2l0eSBDZW50ZXIsIExlaWRlbiwgVGhlIE5ldGhlcmxhbmRzLiYjeEQ7SW5zdGl0dXRlIG9m

IEJvdGFueSwgUGxhbnQgU2NpZW5jZSBhbmQgQmlvZGl2ZXJzaXR5IENlbnRlciwgU2xvdmFrIEFj

YWRlbXkgb2YgU2NpZW5jZXMsIEJyYXRpc2xhdmEsIFNsb3Zha2lhLiYjeEQ7RmlubmlzaCBNdXNl

dW0gb2YgTmF0dXJhbCBIaXN0b3J5LCBVbml2ZXJzaXR5IG9mIEhlbHNpbmtpLCBIZWxzaW5raSwg

RmlubGFuZC4mI3hEO0RlcGFydG1lbnQgb2YgQm90YW55LCBGYWN1bHR5IG9mIFNjaWVuY2UsIENo

YXJsZXMgVW5pdmVyc2l0eSwgUHJhaGEsIEN6ZWNoIFJlcHVibGljLiYjeEQ7QmlvZGl2ZXJzaXR5

IEluZm9ybWF0aWNzICZhbXA7IFNwYXRpYWwgQW5hbHlzaXMsIFJveWFsIEJvdGFuaWMgR2FyZGVu

cywgS2V3LCBMb25kb24sIFVLLiYjeEQ7RGVwYXJ0bWVudCBvZiBMaWZlIFNjaWVuY2VzLCBUaGUg

TmF0dXJhbCBIaXN0b3J5IE11c2V1bSwgVUsuJiN4RDtTTlNCIElUIENlbnRlciwgU3RhYXRsaWNo

ZSBOYXR1cndpc3NlbnNjaGFmdGxpY2hlIFNhbW1sdW5nZW4gQmF5ZXJucywgTXVuY2hlbiwgR2Vy

bWFueS48L2F1dGgtYWRkcmVzcz48dGl0bGVzPjx0aXRsZT5BY3Rpb25hYmxlLCBsb25nLXRlcm0g

c3RhYmxlIGFuZCBzZW1hbnRpYyB3ZWIgY29tcGF0aWJsZSBpZGVudGlmaWVycyBmb3IgYWNjZXNz

IHRvIGJpb2xvZ2ljYWwgY29sbGVjdGlvbiBvYmplY3RzPC90aXRsZT48c2Vjb25kYXJ5LXRpdGxl

PkRhdGFiYXNlIChPeGZvcmQpPC9zZWNvbmRhcnktdGl0bGU+PGFsdC10aXRsZT5EYXRhYmFzZSA6

IHRoZSBqb3VybmFsIG9mIGJpb2xvZ2ljYWwgZGF0YWJhc2VzIGFuZCBjdXJhdGlvbjwvYWx0LXRp

dGxlPjwvdGl0bGVzPjxwZXJpb2RpY2FsPjxmdWxsLXRpdGxlPkRhdGFiYXNlIChPeGZvcmQpPC9m

dWxsLXRpdGxlPjxhYmJyLTE+RGF0YWJhc2UgOiB0aGUgam91cm5hbCBvZiBiaW9sb2dpY2FsIGRh

dGFiYXNlcyBhbmQgY3VyYXRpb248L2FiYnItMT48L3BlcmlvZGljYWw+PGFsdC1wZXJpb2RpY2Fs

PjxmdWxsLXRpdGxlPkRhdGFiYXNlIChPeGZvcmQpPC9mdWxsLXRpdGxlPjxhYmJyLTE+RGF0YWJh

c2UgOiB0aGUgam91cm5hbCBvZiBiaW9sb2dpY2FsIGRhdGFiYXNlcyBhbmQgY3VyYXRpb248L2Fi

YnItMT48L2FsdC1wZXJpb2RpY2FsPjx2b2x1bWU+MjAxNzwvdm9sdW1lPjxudW1iZXI+MTwvbnVt

YmVyPjxrZXl3b3Jkcz48a2V5d29yZD4qQmlvZGl2ZXJzaXR5PC9rZXl3b3JkPjxrZXl3b3JkPipE

YXRhYmFzZXMsIEZhY3R1YWw8L2tleXdvcmQ+PGtleXdvcmQ+Kk5hdHVyYWwgTGFuZ3VhZ2UgUHJv

Y2Vzc2luZzwva2V5d29yZD48a2V5d29yZD4qU2VtYW50aWMgV2ViPC9rZXl3b3JkPjxrZXl3b3Jk

PipTb2Z0d2FyZTwva2V5d29yZD48L2tleXdvcmRzPjxkYXRlcz48eWVhcj4yMDE3PC95ZWFyPjxw

dWItZGF0ZXM+PGRhdGU+SmFuIDE8L2RhdGU+PC9wdWItZGF0ZXM+PC9kYXRlcz48aXNibj4xNzU4

LTA0NjMgKEVsZWN0cm9uaWMpJiN4RDsxNzU4LTA0NjMgKExpbmtpbmcpPC9pc2JuPjxhY2Nlc3Np

b24tbnVtPjI4MzY1NzI0PC9hY2Nlc3Npb24tbnVtPjx1cmxzPjxyZWxhdGVkLXVybHM+PHVybD5o

dHRwOi8vd3d3Lm5jYmkubmxtLm5paC5nb3YvcHVibWVkLzI4MzY1NzI0PC91cmw+PHVybD5odHRw

czovL3d3dy5uY2JpLm5sbS5uaWguZ292L3BtYy9hcnRpY2xlcy9QTUM1NDY3NTQ3L3BkZi9iYXgw

MDMucGRmPC91cmw+PC9yZWxhdGVkLXVybHM+PC91cmxzPjxjdXN0b20yPjU0Njc1NDc8L2N1c3Rv

bTI+PGVsZWN0cm9uaWMtcmVzb3VyY2UtbnVtPjEwLjEwOTMvZGF0YWJhc2UvYmF4MDAzPC9lbGVj

dHJvbmljLXJlc291cmNlLW51bT48bGFuZ3VhZ2U+NjI8L2xhbmd1YWdlPjwvcmVjb3JkPjwvQ2l0

ZT48L0VuZE5vdGU+

ADDIN EN.CITE.DATA [73]. Both humans and machines can use those identifiers to retrieve the metadata about a certain biological collection object.Today more and more institutions worldwide are working on implementing stable identifiers. By using those identifiers in publications or as metadata accompanying NSD data an important part of traceability of GR could be fulfilled. Many publishers support the use of such identifiers today ADDIN EN.CITE <EndNote><Cite><Author>Penev</Author><Year>2011</Year><RecNum>83</RecNum><DisplayText>[74]</DisplayText><record><rec-number>83</rec-number><foreign-keys><key app="EN" db-id="ravxf09ps92p0besvs75dswz50fap0paxx2x" timestamp="1565099339">83</key></foreign-keys><ref-type name="Electronic Article">43</ref-type><contributors><authors><author>Lyubomir Penev</author><author>Daniel Mietchen</author><author>Vishwas Chavan</author><author>Gregor Hagedorn</author><author>David Remsen</author><author>Vincent Smith</author><author>David Shotton</author></authors></contributors><titles><title>Pensoft Data Publishing Policies and Guidelines for Biodiversity Data</title></titles><section>May 26</section><dates><year>2011</year></dates><publisher>Pensoft Publishers</publisher><urls><related-urls><url>;[74]. More and more institutions are establishing and using stable identifiers for the data on their collection objects. Additionally, the establishment of institutional DNA and tissue banks over the last decades enabled collection holders to better track the use of their GR by researchers worldwide.Today 160 million occurrence records are provided to GBIF are based on specimen data. Out of those only 1.6 million records (1%) have information on associated sequences (i.e. have an AN) ADDIN EN.CITE <EndNote><Cite><Author>Zenodo</Author><Year>2019</Year><RecNum>85</RecNum><DisplayText>[75]</DisplayText><record><rec-number>85</rec-number><foreign-keys><key app="EN" db-id="ravxf09ps92p0besvs75dswz50fap0paxx2x" timestamp="1565099794">85</key></foreign-keys><ref-type name="Web Page">12</ref-type><contributors><authors><author>Zenodo,</author></authors></contributors><titles><title>Custom GBIF Occurrence Download</title></titles><volume>2019</volume><number>Aug 06</number><dates><year>2019</year><pub-dates><date>Jun 18</date></pub-dates></dates><urls><related-urls><url>;[75] although the area of metagenomics is leading to a large growth in this area in recent months. This number is generally higher for microbial culture collections, were it is estimated at 10% and growing PEVuZE5vdGU+PENpdGU+PEF1dGhvcj5XdTwvQXV0aG9yPjxZZWFyPjIwMTk8L1llYXI+PFJlY051

bT41MzwvUmVjTnVtPjxEaXNwbGF5VGV4dD5bNzYsIDc3XTwvRGlzcGxheVRleHQ+PHJlY29yZD48

cmVjLW51bWJlcj41MzwvcmVjLW51bWJlcj48Zm9yZWlnbi1rZXlzPjxrZXkgYXBwPSJFTiIgZGIt

aWQ9InJhdnhmMDlwczkycDBiZXN2czc1ZHN3ejUwZmFwMHBheHgyeCIgdGltZXN0YW1wPSIxNTY0

Mzc1OTgwIj41Mzwva2V5PjwvZm9yZWlnbi1rZXlzPjxyZWYtdHlwZSBuYW1lPSJKb3VybmFsIEFy

dGljbGUiPjE3PC9yZWYtdHlwZT48Y29udHJpYnV0b3JzPjxhdXRob3JzPjxhdXRob3I+V3UsIEwu

PC9hdXRob3I+PGF1dGhvcj5NYSwgSi48L2F1dGhvcj48L2F1dGhvcnM+PC9jb250cmlidXRvcnM+

PGF1dGgtYWRkcmVzcz4xTWljcm9iaWFsIFJlc291cmNlIGFuZCBCaWcgRGF0YSBDZW50ZXIsIElu

c3RpdHV0ZSBvZiBNaWNyb2Jpb2xvZ3ksIENoaW5lc2UgQWNhZGVteSBvZiBTY2llbmNlcywgQmVp

amluZyAxMDAxMDEsIFBSIENoaW5hLiYjeEQ7MldGQ0MtTUlSQ0VOIFdvcmxkIERhdGEgQ2VudGVy

IGZvciBtaWNyb29yZ2FuaXNtcywgV0RDTSwgQmVpamluZyAxMDAxMDEsIFBSIENoaW5hLjwvYXV0

aC1hZGRyZXNzPjx0aXRsZXM+PHRpdGxlPlRoZSBHbG9iYWwgQ2F0YWxvZ3VlIG9mIE1pY3Jvb3Jn

YW5pc21zIChHQ00pIDEwSyB0eXBlIHN0cmFpbiBzZXF1ZW5jaW5nIHByb2plY3Q6IHByb3ZpZGlu

ZyBzZXJ2aWNlcyB0byB0YXhvbm9taXN0cyBmb3Igc3RhbmRhcmQgZ2Vub21lIHNlcXVlbmNpbmcg

YW5kIGFubm90YXRpb248L3RpdGxlPjxzZWNvbmRhcnktdGl0bGU+SW50IEogU3lzdCBFdm9sIE1p

Y3JvYmlvbDwvc2Vjb25kYXJ5LXRpdGxlPjxhbHQtdGl0bGU+SW50ZXJuYXRpb25hbCBqb3VybmFs

IG9mIHN5c3RlbWF0aWMgYW5kIGV2b2x1dGlvbmFyeSBtaWNyb2Jpb2xvZ3k8L2FsdC10aXRsZT48

L3RpdGxlcz48cGVyaW9kaWNhbD48ZnVsbC10aXRsZT5JbnQgSiBTeXN0IEV2b2wgTWljcm9iaW9s

PC9mdWxsLXRpdGxlPjxhYmJyLTE+SW50ZXJuYXRpb25hbCBqb3VybmFsIG9mIHN5c3RlbWF0aWMg

YW5kIGV2b2x1dGlvbmFyeSBtaWNyb2Jpb2xvZ3k8L2FiYnItMT48L3BlcmlvZGljYWw+PGFsdC1w

ZXJpb2RpY2FsPjxmdWxsLXRpdGxlPkludCBKIFN5c3QgRXZvbCBNaWNyb2Jpb2w8L2Z1bGwtdGl0

bGU+PGFiYnItMT5JbnRlcm5hdGlvbmFsIGpvdXJuYWwgb2Ygc3lzdGVtYXRpYyBhbmQgZXZvbHV0

aW9uYXJ5IG1pY3JvYmlvbG9neTwvYWJici0xPjwvYWx0LXBlcmlvZGljYWw+PHBhZ2VzPjg5NS04

OTg8L3BhZ2VzPjx2b2x1bWU+Njk8L3ZvbHVtZT48bnVtYmVyPjQ8L251bWJlcj48a2V5d29yZHM+

PGtleXdvcmQ+Q2xhc3NpZmljYXRpb24vKm1ldGhvZHM8L2tleXdvcmQ+PGtleXdvcmQ+KkRhdGFi

YXNlcywgR2VuZXRpYzwva2V5d29yZD48a2V5d29yZD5HZW5vbWljcy8qc3RhbmRhcmRzPC9rZXl3

b3JkPjxrZXl3b3JkPlByb2thcnlvdGljIENlbGxzLypjbGFzc2lmaWNhdGlvbjwva2V5d29yZD48

a2V5d29yZD5TZXF1ZW5jZSBBbmFseXNpcywgRE5BLypzdGFuZGFyZHM8L2tleXdvcmQ+PC9rZXl3

b3Jkcz48ZGF0ZXM+PHllYXI+MjAxOTwveWVhcj48cHViLWRhdGVzPjxkYXRlPkFwcjwvZGF0ZT48

L3B1Yi1kYXRlcz48L2RhdGVzPjxpc2JuPjE0NjYtNTAzNCAoRWxlY3Ryb25pYykmI3hEOzE0NjYt

NTAyNiAoTGlua2luZyk8L2lzYm4+PGFjY2Vzc2lvbi1udW0+MzA4MzI3NTc8L2FjY2Vzc2lvbi1u

dW0+PHVybHM+PHJlbGF0ZWQtdXJscz48dXJsPmh0dHA6Ly93d3cubmNiaS5ubG0ubmloLmdvdi9w

dWJtZWQvMzA4MzI3NTc8L3VybD48dXJsPmh0dHBzOi8vd3d3Lm1pY3JvYmlvbG9neXJlc2VhcmNo

Lm9yZy9kb2NzZXJ2ZXIvZnVsbHRleHQvaWpzZW0vNjkvNC84OTVfaWpzZW0wMDMyNzYucGRmP2V4

cGlyZXM9MTU2NDM3Njg5NCZhbXA7aWQ9aWQmYW1wO2FjY25hbWU9c2dpZDAyNjQ0NSZhbXA7Y2hl

Y2tzdW09NzFBMDJDOTQxNENCRUQ4NDI5MDIwODlDNDVDODAwMkY8L3VybD48L3JlbGF0ZWQtdXJs

cz48L3VybHM+PGVsZWN0cm9uaWMtcmVzb3VyY2UtbnVtPjEwLjEwOTkvaWpzZW0uMC4wMDMyNzY8

L2VsZWN0cm9uaWMtcmVzb3VyY2UtbnVtPjxsYW5ndWFnZT42NjwvbGFuZ3VhZ2U+PC9yZWNvcmQ+

PC9DaXRlPjxDaXRlPjxBdXRob3I+VS4gUy4gRGVwYXJ0bWVudCBvZiBFbmVyZ3kgSm9pbnQgR2Vu

b21lIEluc3RpdHV0ZTwvQXV0aG9yPjxSZWNOdW0+NTQ8L1JlY051bT48cmVjb3JkPjxyZWMtbnVt

YmVyPjU0PC9yZWMtbnVtYmVyPjxmb3JlaWduLWtleXM+PGtleSBhcHA9IkVOIiBkYi1pZD0icmF2

eGYwOXBzOTJwMGJlc3ZzNzVkc3d6NTBmYXAwcGF4eDJ4IiB0aW1lc3RhbXA9IjE1NjQzNzYyODgi

PjU0PC9rZXk+PC9mb3JlaWduLWtleXM+PHJlZi10eXBlIG5hbWU9IldlYiBQYWdlIj4xMjwvcmVm

LXR5cGU+PGNvbnRyaWJ1dG9ycz48YXV0aG9ycz48YXV0aG9yPlUuIFMuIERlcGFydG1lbnQgb2Yg

RW5lcmd5IEpvaW50IEdlbm9tZSBJbnN0aXR1dGUsPC9hdXRob3I+PC9hdXRob3JzPjwvY29udHJp

YnV0b3JzPjx0aXRsZXM+PHRpdGxlPlBoeWxvZ2VuZXRpYyBEaXZlcnNpdHk8L3RpdGxlPjwvdGl0

bGVzPjx2b2x1bWU+MjAxOTwvdm9sdW1lPjxudW1iZXI+SnVsIDI5PC9udW1iZXI+PGRhdGVzPjwv

ZGF0ZXM+PHVybHM+PHJlbGF0ZWQtdXJscz48dXJsPmh0dHBzOi8vamdpLmRvZS5nb3Yvb3VyLXNj

aWVuY2Uvc2NpZW5jZS1wcm9ncmFtcy9taWNyb2JpYWwtZ2Vub21pY3MvcGh5bG9nZW5ldGljLWRp

dmVyc2l0eS88L3VybD48L3JlbGF0ZWQtdXJscz48L3VybHM+PGxhbmd1YWdlPjY3PC9sYW5ndWFn

ZT48L3JlY29yZD48L0NpdGU+PC9FbmROb3RlPn==

ADDIN EN.CITE PEVuZE5vdGU+PENpdGU+PEF1dGhvcj5XdTwvQXV0aG9yPjxZZWFyPjIwMTk8L1llYXI+PFJlY051

bT41MzwvUmVjTnVtPjxEaXNwbGF5VGV4dD5bNzYsIDc3XTwvRGlzcGxheVRleHQ+PHJlY29yZD48

cmVjLW51bWJlcj41MzwvcmVjLW51bWJlcj48Zm9yZWlnbi1rZXlzPjxrZXkgYXBwPSJFTiIgZGIt

aWQ9InJhdnhmMDlwczkycDBiZXN2czc1ZHN3ejUwZmFwMHBheHgyeCIgdGltZXN0YW1wPSIxNTY0

Mzc1OTgwIj41Mzwva2V5PjwvZm9yZWlnbi1rZXlzPjxyZWYtdHlwZSBuYW1lPSJKb3VybmFsIEFy

dGljbGUiPjE3PC9yZWYtdHlwZT48Y29udHJpYnV0b3JzPjxhdXRob3JzPjxhdXRob3I+V3UsIEwu

PC9hdXRob3I+PGF1dGhvcj5NYSwgSi48L2F1dGhvcj48L2F1dGhvcnM+PC9jb250cmlidXRvcnM+

PGF1dGgtYWRkcmVzcz4xTWljcm9iaWFsIFJlc291cmNlIGFuZCBCaWcgRGF0YSBDZW50ZXIsIElu

c3RpdHV0ZSBvZiBNaWNyb2Jpb2xvZ3ksIENoaW5lc2UgQWNhZGVteSBvZiBTY2llbmNlcywgQmVp

amluZyAxMDAxMDEsIFBSIENoaW5hLiYjeEQ7MldGQ0MtTUlSQ0VOIFdvcmxkIERhdGEgQ2VudGVy

IGZvciBtaWNyb29yZ2FuaXNtcywgV0RDTSwgQmVpamluZyAxMDAxMDEsIFBSIENoaW5hLjwvYXV0

aC1hZGRyZXNzPjx0aXRsZXM+PHRpdGxlPlRoZSBHbG9iYWwgQ2F0YWxvZ3VlIG9mIE1pY3Jvb3Jn

YW5pc21zIChHQ00pIDEwSyB0eXBlIHN0cmFpbiBzZXF1ZW5jaW5nIHByb2plY3Q6IHByb3ZpZGlu

ZyBzZXJ2aWNlcyB0byB0YXhvbm9taXN0cyBmb3Igc3RhbmRhcmQgZ2Vub21lIHNlcXVlbmNpbmcg

YW5kIGFubm90YXRpb248L3RpdGxlPjxzZWNvbmRhcnktdGl0bGU+SW50IEogU3lzdCBFdm9sIE1p

Y3JvYmlvbDwvc2Vjb25kYXJ5LXRpdGxlPjxhbHQtdGl0bGU+SW50ZXJuYXRpb25hbCBqb3VybmFs

IG9mIHN5c3RlbWF0aWMgYW5kIGV2b2x1dGlvbmFyeSBtaWNyb2Jpb2xvZ3k8L2FsdC10aXRsZT48

L3RpdGxlcz48cGVyaW9kaWNhbD48ZnVsbC10aXRsZT5JbnQgSiBTeXN0IEV2b2wgTWljcm9iaW9s

PC9mdWxsLXRpdGxlPjxhYmJyLTE+SW50ZXJuYXRpb25hbCBqb3VybmFsIG9mIHN5c3RlbWF0aWMg

YW5kIGV2b2x1dGlvbmFyeSBtaWNyb2Jpb2xvZ3k8L2FiYnItMT48L3BlcmlvZGljYWw+PGFsdC1w

ZXJpb2RpY2FsPjxmdWxsLXRpdGxlPkludCBKIFN5c3QgRXZvbCBNaWNyb2Jpb2w8L2Z1bGwtdGl0

bGU+PGFiYnItMT5JbnRlcm5hdGlvbmFsIGpvdXJuYWwgb2Ygc3lzdGVtYXRpYyBhbmQgZXZvbHV0

aW9uYXJ5IG1pY3JvYmlvbG9neTwvYWJici0xPjwvYWx0LXBlcmlvZGljYWw+PHBhZ2VzPjg5NS04

OTg8L3BhZ2VzPjx2b2x1bWU+Njk8L3ZvbHVtZT48bnVtYmVyPjQ8L251bWJlcj48a2V5d29yZHM+

PGtleXdvcmQ+Q2xhc3NpZmljYXRpb24vKm1ldGhvZHM8L2tleXdvcmQ+PGtleXdvcmQ+KkRhdGFi

YXNlcywgR2VuZXRpYzwva2V5d29yZD48a2V5d29yZD5HZW5vbWljcy8qc3RhbmRhcmRzPC9rZXl3

b3JkPjxrZXl3b3JkPlByb2thcnlvdGljIENlbGxzLypjbGFzc2lmaWNhdGlvbjwva2V5d29yZD48

a2V5d29yZD5TZXF1ZW5jZSBBbmFseXNpcywgRE5BLypzdGFuZGFyZHM8L2tleXdvcmQ+PC9rZXl3

b3Jkcz48ZGF0ZXM+PHllYXI+MjAxOTwveWVhcj48cHViLWRhdGVzPjxkYXRlPkFwcjwvZGF0ZT48

L3B1Yi1kYXRlcz48L2RhdGVzPjxpc2JuPjE0NjYtNTAzNCAoRWxlY3Ryb25pYykmI3hEOzE0NjYt

NTAyNiAoTGlua2luZyk8L2lzYm4+PGFjY2Vzc2lvbi1udW0+MzA4MzI3NTc8L2FjY2Vzc2lvbi1u

dW0+PHVybHM+PHJlbGF0ZWQtdXJscz48dXJsPmh0dHA6Ly93d3cubmNiaS5ubG0ubmloLmdvdi9w

dWJtZWQvMzA4MzI3NTc8L3VybD48dXJsPmh0dHBzOi8vd3d3Lm1pY3JvYmlvbG9neXJlc2VhcmNo

Lm9yZy9kb2NzZXJ2ZXIvZnVsbHRleHQvaWpzZW0vNjkvNC84OTVfaWpzZW0wMDMyNzYucGRmP2V4

cGlyZXM9MTU2NDM3Njg5NCZhbXA7aWQ9aWQmYW1wO2FjY25hbWU9c2dpZDAyNjQ0NSZhbXA7Y2hl

Y2tzdW09NzFBMDJDOTQxNENCRUQ4NDI5MDIwODlDNDVDODAwMkY8L3VybD48L3JlbGF0ZWQtdXJs

cz48L3VybHM+PGVsZWN0cm9uaWMtcmVzb3VyY2UtbnVtPjEwLjEwOTkvaWpzZW0uMC4wMDMyNzY8

L2VsZWN0cm9uaWMtcmVzb3VyY2UtbnVtPjxsYW5ndWFnZT42NjwvbGFuZ3VhZ2U+PC9yZWNvcmQ+

PC9DaXRlPjxDaXRlPjxBdXRob3I+VS4gUy4gRGVwYXJ0bWVudCBvZiBFbmVyZ3kgSm9pbnQgR2Vu

b21lIEluc3RpdHV0ZTwvQXV0aG9yPjxSZWNOdW0+NTQ8L1JlY051bT48cmVjb3JkPjxyZWMtbnVt

YmVyPjU0PC9yZWMtbnVtYmVyPjxmb3JlaWduLWtleXM+PGtleSBhcHA9IkVOIiBkYi1pZD0icmF2

eGYwOXBzOTJwMGJlc3ZzNzVkc3d6NTBmYXAwcGF4eDJ4IiB0aW1lc3RhbXA9IjE1NjQzNzYyODgi

PjU0PC9rZXk+PC9mb3JlaWduLWtleXM+PHJlZi10eXBlIG5hbWU9IldlYiBQYWdlIj4xMjwvcmVm

LXR5cGU+PGNvbnRyaWJ1dG9ycz48YXV0aG9ycz48YXV0aG9yPlUuIFMuIERlcGFydG1lbnQgb2Yg

RW5lcmd5IEpvaW50IEdlbm9tZSBJbnN0aXR1dGUsPC9hdXRob3I+PC9hdXRob3JzPjwvY29udHJp

YnV0b3JzPjx0aXRsZXM+PHRpdGxlPlBoeWxvZ2VuZXRpYyBEaXZlcnNpdHk8L3RpdGxlPjwvdGl0

bGVzPjx2b2x1bWU+MjAxOTwvdm9sdW1lPjxudW1iZXI+SnVsIDI5PC9udW1iZXI+PGRhdGVzPjwv

ZGF0ZXM+PHVybHM+PHJlbGF0ZWQtdXJscz48dXJsPmh0dHBzOi8vamdpLmRvZS5nb3Yvb3VyLXNj

aWVuY2Uvc2NpZW5jZS1wcm9ncmFtcy9taWNyb2JpYWwtZ2Vub21pY3MvcGh5bG9nZW5ldGljLWRp

dmVyc2l0eS88L3VybD48L3JlbGF0ZWQtdXJscz48L3VybHM+PGxhbmd1YWdlPjY3PC9sYW5ndWFn

ZT48L3JlY29yZD48L0NpdGU+PC9FbmROb3RlPn==

ADDIN EN.CITE.DATA [76, 77]. One reason for this low number is that only a limited number of the world’s known biodiversity has been subjected to molecular analyses. In many cases, specimens may also be housed in natural history collections and unsuitable for molecular research due to past preservation techniques and/or their age. Furthermore, not all researchers report back ANs to the collection holding institutions. Many institutions have integrated this requirement in their Material Transfer Agreement to overcome this problem.Conclusions on existing NSD traceability mechanismsTo provide an overview of the elements of traceability discussed above, the Venn diagram (Figure 7) shows the overlap and relative amounts of the different aspects of traceability described below: 53% of NSD entries have at least one of the traceability elements described below with 39% having a PubMed ID, 16% having a country tag (see Section 4.2), and 6% having a link to publicly available GR.Figure 7. How do NSD traceability elements overlap? These are the relative amounts of sequences with country tag, PubMed ID and reference to original GR with the respective overlap between each. 1,834,859 entries have all 3 traceability elements, 13,753,437 entries have two elements, and 107,961,046 entries (53% of total) have a single traceability element.The above Section (4.1), although complex for readers new to the field, is actually only a very basic overview of a large, complex data infrastructure. Standardizing, harmonizing, and enabling usability and traceability of complex datasets for millions of users is a challenging, iterative lesson in patience. These critical technical realities should not be overlooked during the policy process.The existing traceability of NSD depends on submitter diligence. Even though INSDC has established required data fields for sequence submissions, the sheer volume of NSD entries makes human error and inaccuracy a statistical reality. Additionally, database fields and required information have evolved over time. Thus, older database entries would not have had full access to the traceability links that are now possible.4.2 Traceability to country of origin of underlying GRSince 1998, a metadata field displayed as “/country” in the database submission form has existed for NSD submissions to INSDC that enables submitters to indicate the country of origin. Its definition reads as follows:“locality of isolation of the sequenced organism indicated in terms of political names for nations, oceans or seas, followed by regions and localities” ADDIN EN.CITE <EndNote><Cite><Author>National Center for Biotechnology Information</Author><RecNum>55</RecNum><DisplayText>[78]</DisplayText><record><rec-number>55</rec-number><foreign-keys><key app="EN" db-id="ravxf09ps92p0besvs75dswz50fap0paxx2x" timestamp="1564376449">55</key></foreign-keys><ref-type name="Web Page">12</ref-type><contributors><authors><author>National Center for Biotechnology Information,</author></authors></contributors><titles><title>The /country qualifier</title></titles><volume>2019</volume><number>Jul 29</number><dates></dates><urls><related-urls><url>;[78] (emphasis added)“Isolation” in the above definition is intended to mean the country where the scientists physically removed the biological specimen and does not mean where the sequencing or any cultivation (for microbial specimens) took place. The country tag is filled in by the person submitting the NSD and is not verified although a list of standardized country names is provided. This is because practically speaking, it is impossible to check if the country of origin of the GR is “correct” as geographic ranges of organisms are not static. For example, many microorganisms and some animals (e.g., migratory birds) and plants are cosmopolitan (i.e., found everywhere) and thus there are many potential locations for them to be. Put another and non-human life does not recognize national borders or international law. Hence, GenBank cannot confirm or deny country information associated with NSD. Furthermore, it is possible to enter the wrong country of origin by error or intentionally or not to include the information if samples came from multiple locations.Where does GR that yielded the NSD in the INSDC originally come from? Figure 8a displays the geographical origin of non-human NSD with a country tag in GenBank. In terms of amount of sequences, China is the leader in NSD origination (18%) followed closely by the USA (17%). The first four countries (China, USA, Canada and Japan) provide over 50% of publicly available NSD with a country tag. Assuming this is representative (see discussion below in which we report that although only 16% of NSD records a country tag, the analysis indicates that missing country data follows similar distribution patterns as the publicly available NSD with a country tag), it suggests that the vast majority of publicly available NSD does not come from so-called “net provider countries” of GR as understood in the CBD context. This could suggest that the so-called “net user countries”, within their scientific research, more typically sample and use their own national GR rather than going abroad. This is a logical outcome of the higher expense of international sampling campaigns, the interests of funding agencies that are accountable to domestic taxpayers, the larger availability of sequencing technology, the less restrictive ABS laws, as well as the wealth of biodiversity these countries have. Biology also plays a role. For example, microorganisms (bacteria, archaea, viruses) do not follow the same (if any) patterns of megadiversity that fueled CBD discussions and so, understandably, the patterns of their sourcing will not reflect the provider/user country dichotomy PEVuZE5vdGU+PENpdGU+PEF1dGhvcj5PdmVybWFubjwvQXV0aG9yPjxZZWFyPjIwMTc8L1llYXI+

PFJlY051bT4xMzk8L1JlY051bT48RGlzcGxheVRleHQ+Wzc5XTwvRGlzcGxheVRleHQ+PHJlY29y

ZD48cmVjLW51bWJlcj4xMzk8L3JlYy1udW1iZXI+PGZvcmVpZ24ta2V5cz48a2V5IGFwcD0iRU4i

IGRiLWlkPSJyYXZ4ZjA5cHM5MnAwYmVzdnM3NWRzd3o1MGZhcDBwYXh4MngiIHRpbWVzdGFtcD0i

MTU2ODc4NTcwNyI+MTM5PC9rZXk+PC9mb3JlaWduLWtleXM+PHJlZi10eXBlIG5hbWU9IkpvdXJu

YWwgQXJ0aWNsZSI+MTc8L3JlZi10eXBlPjxjb250cmlidXRvcnM+PGF1dGhvcnM+PGF1dGhvcj5P

dmVybWFubiwgSi48L2F1dGhvcj48YXV0aG9yPlNjaG9seiwgQS4gSC48L2F1dGhvcj48L2F1dGhv

cnM+PC9jb250cmlidXRvcnM+PGF1dGgtYWRkcmVzcz5MZWlibml6LUluc3RpdHV0IERTTVotRGV1

dHNjaGUgU2FtbWx1bmcgdm9uIE1pa3Jvb3JnYW5pc21lbiB1bmQgWmVsbGt1bHR1cmVuLCBJbmhv

ZmZlbnN0cmFzc2UgN0IsIDM4MTI0IEJyYXVuc2Nod2VpZywgR2VybWFueTsgQnJhdW5zY2h3ZWln

IFVuaXZlcnNpdHkgb2YgVGVjaG5vbG9neSwgQnJhdW5zY2h3ZWlnLCBHZXJtYW55OyBHZXJtYW4g

Q2VudGVyIGZvciBJbnRlZ3JhdGl2ZSBCaW9kaXZlcnNpdHkgUmVzZWFyY2ggKGlEaXYpIEplbmEg

SGFsbGUgTGVpcHppZywgRGV1dHNjaGVyIFBsYXR6IDVlLCAwNDEwMyBMZWlwemlnLCBHZXJtYW55

LiBFbGVjdHJvbmljIGFkZHJlc3M6IGpvZXJnLm92ZXJtYW5uQGRzbXouZGUuJiN4RDtMZWlibml6

LUluc3RpdHV0IERTTVotRGV1dHNjaGUgU2FtbWx1bmcgdm9uIE1pa3Jvb3JnYW5pc21lbiB1bmQg

WmVsbGt1bHR1cmVuLCBJbmhvZmZlbnN0cmFzc2UgN0IsIDM4MTI0IEJyYXVuc2Nod2VpZywgR2Vy

bWFueS48L2F1dGgtYWRkcmVzcz48dGl0bGVzPjx0aXRsZT5NaWNyb2Jpb2xvZ2ljYWwgUmVzZWFy

Y2ggVW5kZXIgdGhlIE5hZ295YSBQcm90b2NvbDogRmFjdHMgYW5kIEZpY3Rpb248L3RpdGxlPjxz

ZWNvbmRhcnktdGl0bGU+VHJlbmRzIE1pY3JvYmlvbDwvc2Vjb25kYXJ5LXRpdGxlPjxhbHQtdGl0

bGU+VHJlbmRzIGluIG1pY3JvYmlvbG9neTwvYWx0LXRpdGxlPjwvdGl0bGVzPjxwZXJpb2RpY2Fs

PjxmdWxsLXRpdGxlPlRyZW5kcyBNaWNyb2Jpb2w8L2Z1bGwtdGl0bGU+PGFiYnItMT5UcmVuZHMg

aW4gbWljcm9iaW9sb2d5PC9hYmJyLTE+PC9wZXJpb2RpY2FsPjxhbHQtcGVyaW9kaWNhbD48ZnVs

bC10aXRsZT5UcmVuZHMgTWljcm9iaW9sPC9mdWxsLXRpdGxlPjxhYmJyLTE+VHJlbmRzIGluIG1p

Y3JvYmlvbG9neTwvYWJici0xPjwvYWx0LXBlcmlvZGljYWw+PHBhZ2VzPjg1LTg4PC9wYWdlcz48

dm9sdW1lPjI1PC92b2x1bWU+PG51bWJlcj4yPC9udW1iZXI+PGtleXdvcmRzPjxrZXl3b3JkPkFy

Y2hhZWEvKmNsYXNzaWZpY2F0aW9uL2dlbmV0aWNzPC9rZXl3b3JkPjxrZXl3b3JkPkJhY3Rlcmlh

LypjbGFzc2lmaWNhdGlvbi9nZW5ldGljczwva2V5d29yZD48a2V5d29yZD5CaW9kaXZlcnNpdHk8

L2tleXdvcmQ+PGtleXdvcmQ+QmlvbWVkaWNhbCBSZXNlYXJjaC8qbGVnaXNsYXRpb24gJmFtcDsg

anVyaXNwcnVkZW5jZS8qc3RhbmRhcmRzPC9rZXl3b3JkPjxrZXl3b3JkPkZ1bmdpLypjbGFzc2lm

aWNhdGlvbi9nZW5ldGljczwva2V5d29yZD48a2V5d29yZD5IdW1hbnM8L2tleXdvcmQ+PC9rZXl3

b3Jkcz48ZGF0ZXM+PHllYXI+MjAxNzwveWVhcj48cHViLWRhdGVzPjxkYXRlPkZlYjwvZGF0ZT48

L3B1Yi1kYXRlcz48L2RhdGVzPjxpc2JuPjE4NzgtNDM4MCAoRWxlY3Ryb25pYykmI3hEOzA5NjYt

ODQyWCAoTGlua2luZyk8L2lzYm4+PGFjY2Vzc2lvbi1udW0+Mjc4ODc3NzE8L2FjY2Vzc2lvbi1u

dW0+PHVybHM+PHJlbGF0ZWQtdXJscz48dXJsPmh0dHA6Ly93d3cubmNiaS5ubG0ubmloLmdvdi9w

dWJtZWQvMjc4ODc3NzE8L3VybD48dXJsPmh0dHBzOi8vd3d3LnNjaWVuY2VkaXJlY3QuY29tL3Nj

aWVuY2UvYXJ0aWNsZS9hYnMvcGlpL1MwOTY2ODQyWDE2MzAxNjQwP3ZpYSUzRGlodWI8L3VybD48

L3JlbGF0ZWQtdXJscz48L3VybHM+PGVsZWN0cm9uaWMtcmVzb3VyY2UtbnVtPjEwLjEwMTYvai50

aW0uMjAxNi4xMS4wMDE8L2VsZWN0cm9uaWMtcmVzb3VyY2UtbnVtPjwvcmVjb3JkPjwvQ2l0ZT48

L0VuZE5vdGU+AG==

ADDIN EN.CITE PEVuZE5vdGU+PENpdGU+PEF1dGhvcj5PdmVybWFubjwvQXV0aG9yPjxZZWFyPjIwMTc8L1llYXI+

PFJlY051bT4xMzk8L1JlY051bT48RGlzcGxheVRleHQ+Wzc5XTwvRGlzcGxheVRleHQ+PHJlY29y

ZD48cmVjLW51bWJlcj4xMzk8L3JlYy1udW1iZXI+PGZvcmVpZ24ta2V5cz48a2V5IGFwcD0iRU4i

IGRiLWlkPSJyYXZ4ZjA5cHM5MnAwYmVzdnM3NWRzd3o1MGZhcDBwYXh4MngiIHRpbWVzdGFtcD0i

MTU2ODc4NTcwNyI+MTM5PC9rZXk+PC9mb3JlaWduLWtleXM+PHJlZi10eXBlIG5hbWU9IkpvdXJu

YWwgQXJ0aWNsZSI+MTc8L3JlZi10eXBlPjxjb250cmlidXRvcnM+PGF1dGhvcnM+PGF1dGhvcj5P

dmVybWFubiwgSi48L2F1dGhvcj48YXV0aG9yPlNjaG9seiwgQS4gSC48L2F1dGhvcj48L2F1dGhv

cnM+PC9jb250cmlidXRvcnM+PGF1dGgtYWRkcmVzcz5MZWlibml6LUluc3RpdHV0IERTTVotRGV1

dHNjaGUgU2FtbWx1bmcgdm9uIE1pa3Jvb3JnYW5pc21lbiB1bmQgWmVsbGt1bHR1cmVuLCBJbmhv

ZmZlbnN0cmFzc2UgN0IsIDM4MTI0IEJyYXVuc2Nod2VpZywgR2VybWFueTsgQnJhdW5zY2h3ZWln

IFVuaXZlcnNpdHkgb2YgVGVjaG5vbG9neSwgQnJhdW5zY2h3ZWlnLCBHZXJtYW55OyBHZXJtYW4g

Q2VudGVyIGZvciBJbnRlZ3JhdGl2ZSBCaW9kaXZlcnNpdHkgUmVzZWFyY2ggKGlEaXYpIEplbmEg

SGFsbGUgTGVpcHppZywgRGV1dHNjaGVyIFBsYXR6IDVlLCAwNDEwMyBMZWlwemlnLCBHZXJtYW55

LiBFbGVjdHJvbmljIGFkZHJlc3M6IGpvZXJnLm92ZXJtYW5uQGRzbXouZGUuJiN4RDtMZWlibml6

LUluc3RpdHV0IERTTVotRGV1dHNjaGUgU2FtbWx1bmcgdm9uIE1pa3Jvb3JnYW5pc21lbiB1bmQg

WmVsbGt1bHR1cmVuLCBJbmhvZmZlbnN0cmFzc2UgN0IsIDM4MTI0IEJyYXVuc2Nod2VpZywgR2Vy

bWFueS48L2F1dGgtYWRkcmVzcz48dGl0bGVzPjx0aXRsZT5NaWNyb2Jpb2xvZ2ljYWwgUmVzZWFy

Y2ggVW5kZXIgdGhlIE5hZ295YSBQcm90b2NvbDogRmFjdHMgYW5kIEZpY3Rpb248L3RpdGxlPjxz

ZWNvbmRhcnktdGl0bGU+VHJlbmRzIE1pY3JvYmlvbDwvc2Vjb25kYXJ5LXRpdGxlPjxhbHQtdGl0

bGU+VHJlbmRzIGluIG1pY3JvYmlvbG9neTwvYWx0LXRpdGxlPjwvdGl0bGVzPjxwZXJpb2RpY2Fs

PjxmdWxsLXRpdGxlPlRyZW5kcyBNaWNyb2Jpb2w8L2Z1bGwtdGl0bGU+PGFiYnItMT5UcmVuZHMg

aW4gbWljcm9iaW9sb2d5PC9hYmJyLTE+PC9wZXJpb2RpY2FsPjxhbHQtcGVyaW9kaWNhbD48ZnVs

bC10aXRsZT5UcmVuZHMgTWljcm9iaW9sPC9mdWxsLXRpdGxlPjxhYmJyLTE+VHJlbmRzIGluIG1p

Y3JvYmlvbG9neTwvYWJici0xPjwvYWx0LXBlcmlvZGljYWw+PHBhZ2VzPjg1LTg4PC9wYWdlcz48

dm9sdW1lPjI1PC92b2x1bWU+PG51bWJlcj4yPC9udW1iZXI+PGtleXdvcmRzPjxrZXl3b3JkPkFy

Y2hhZWEvKmNsYXNzaWZpY2F0aW9uL2dlbmV0aWNzPC9rZXl3b3JkPjxrZXl3b3JkPkJhY3Rlcmlh

LypjbGFzc2lmaWNhdGlvbi9nZW5ldGljczwva2V5d29yZD48a2V5d29yZD5CaW9kaXZlcnNpdHk8

L2tleXdvcmQ+PGtleXdvcmQ+QmlvbWVkaWNhbCBSZXNlYXJjaC8qbGVnaXNsYXRpb24gJmFtcDsg

anVyaXNwcnVkZW5jZS8qc3RhbmRhcmRzPC9rZXl3b3JkPjxrZXl3b3JkPkZ1bmdpLypjbGFzc2lm

aWNhdGlvbi9nZW5ldGljczwva2V5d29yZD48a2V5d29yZD5IdW1hbnM8L2tleXdvcmQ+PC9rZXl3

b3Jkcz48ZGF0ZXM+PHllYXI+MjAxNzwveWVhcj48cHViLWRhdGVzPjxkYXRlPkZlYjwvZGF0ZT48

L3B1Yi1kYXRlcz48L2RhdGVzPjxpc2JuPjE4NzgtNDM4MCAoRWxlY3Ryb25pYykmI3hEOzA5NjYt

ODQyWCAoTGlua2luZyk8L2lzYm4+PGFjY2Vzc2lvbi1udW0+Mjc4ODc3NzE8L2FjY2Vzc2lvbi1u

dW0+PHVybHM+PHJlbGF0ZWQtdXJscz48dXJsPmh0dHA6Ly93d3cubmNiaS5ubG0ubmloLmdvdi9w

dWJtZWQvMjc4ODc3NzE8L3VybD48dXJsPmh0dHBzOi8vd3d3LnNjaWVuY2VkaXJlY3QuY29tL3Nj

aWVuY2UvYXJ0aWNsZS9hYnMvcGlpL1MwOTY2ODQyWDE2MzAxNjQwP3ZpYSUzRGlodWI8L3VybD48

L3JlbGF0ZWQtdXJscz48L3VybHM+PGVsZWN0cm9uaWMtcmVzb3VyY2UtbnVtPjEwLjEwMTYvai50

aW0uMjAxNi4xMS4wMDE8L2VsZWN0cm9uaWMtcmVzb3VyY2UtbnVtPjwvcmVjb3JkPjwvQ2l0ZT48

L0VuZE5vdGU+AG==

ADDIN EN.CITE.DATA [79].Sixteen percent of all NSD entries have a country of origin tag. However, not all categories of NSD can actually be labeled with a country of origin tag (e.g., human and model organism NSD). The requirement to submit a country of origin became mandatory in 2011, so earlier entries mostly do not have a country tag. It is also worth reminding the reader here that not all NSD will have an applicable country of origin (i.e., if it is a model organism, domesticated plant crop, hybrid line, cell line, etc.). Also, 20% of all entries constitute redundant entries on NSD appearing in patents; none of these have a filled in country tag, but the original entries might have and/or come from human or model organism (see also Sections 3.6 and 4.3). The total percentage should be understood within these constraints.Figure 8a. What is the country of origin for non-human NSD? This world map shows the amount of non-human GenBank entries with a country tag per country in a logarithmic scale. The chart on the left shows the ten biggest providers of non-human GenBank entries and their percentage of the total sum of entries with a country tag.Figure 8b. How do INSDC users compare with provided sequences? Here the ratio between GenBank users from Figure 5a and NSD production from 7 is shown. The table on the left lists the ten countries with the highest ratio. For example, the ratio of Ukraine means that there are 4.91 users of GenBank from Ukraine for every GenBank entry that lists Ukraine as country of origin.Figure 8c. How does INSDC usage compare to provided sequences? Here the ratio between requests to GenBank from Figure 5b and NSD production in Figure 6 are shown and the table on the left lists the ten countries with the highest ratio. For example, the ratio of Lebanon means that there are 27.68 server requests to GenBank from Lebanon for every 1 GenBank entry that lists Lebanon as country of origin.The data on NSD country of origin (Figure 8a) can be compared with the user data (Figure 5a) and the number of requests (Figure 5b) to gain insights into the ratio of utilization vs. contribution of NSD per country. The ratios displayed in Figures 8b and 8c were calculated by dividing user data (nominator) by country of origin data (denominator). The resulting ratio is the data displayed in Figures 8b and 8c. This ratio of NSD use relative to NSD provisioning in Figures 8b and 8c shows a more even distribution around the globe than observed in Figure 8a, suggesting NSD use and provisioning often go hand in hand. Furthermore, the patterns in Figures 8b and 8c do not follow the patterns of earlier figures (Figures 5a, 5b, 8a) the US, China and most of Western Europe fall out of the top 10 and peaks pop out in countries from Arabia, North Africa, and Eastern Europe. Whereas China and the USA were leaders both in terms of users and amount of NSD provisioned from these countries, they are both now in the “middle of the pack” (i.e., their use and provisioning of NSD are similar) at position 97 and 71, respectively.The top 10 countries can be understood as countries that are actively using the open access system of the INSDC but do not necessarily provide NSD at a very high rate. One interpretation of these graphs could be that some countries benefit from the open access model of the INSDC and use more NSD than their countries contribute. On the other end of the spectrum (i.e. the very bottom of the list in Figure 8c, data not shown), the sovereign states or regions that appear to contribute more NSD than they use include unique environments such as Antarctica, Greenland or Svalbard, with relatively low levels of researchers/users on their territory.Analysis on the use of the country tagTo check how accurate country of origin information was, we checked a random set of 150 non-human NSD entries with a country tag. The country of origin tag could be positively verified for 86 samples, constituting 57% of all samples. For the other 43% of the samples, it was not possible to either verify or falsify the information because the publication was not available using our institutes’ journal subscriptions. No NSD entry with a false country of origin was identified, suggesting that if the country tag is filled out it, it is usually correct.Our next test was to see whether a sequence without a country tag might actually have country of origin information obtainable through the associated publication. Therefore, 282 random NSD entries, that had no country tag but that did have a publication accessible with our institutes’ subscriptions, were analysed to see whether the country of origin could be obtained from the associated publication. For 44% of samples, the country of origin could be obtained from information given in the publication even though this information was not submitted by the submitting scientist to the INSDC. The “missing country of origin” showed similar patterns of origin as in Figure 8a. Entries missing a country of origin do not come primarily from developing countries, i.e., no pattern of deception could be inferred from the missing information. For example, USA was the most common “missing” country of origin. This is not surprising, since they are also in general the largest provider of NSD entries (Figure 8a). Based on this data, we hypothesize that the country tag is not intentionally left empty to camouflage the origin of NSD to avoid potential ABS, but rather because of an oversight on the part of the submitter. INSDC members do enable NSD submitters to alter their NSD entries and metadata upon request. So, theoretically, this information could be added post-hoc although this would require a proactive request.The country tag over timeThe country metadata tag became available in 1998 in the middle of the HIV epidemic, the GPS coordinates metadata field in 2005, and, with the introduction of the BioSample metadata schema in 2011, the country tag became mandatory for environmental samples and a strong increase in samples with country tags can be seen in the following years (Figure 9).345494929400500Figure 9. How many sequences have a country tag? This graph shows the percentage (vertical axis) of total submitted NSD entries per year (horizontal axis) that had a filled in country of origin tag. *The country tag became mandatory for environmental samples in 2011.Figure 9 shows that starting in 2011 a clear upward trend in the reporting of country of origin began with 2018 data climbing above 40% of all NSD submissions during that year, with the first quarter of 2019 showing already 50%. Considering that human and model organism (they make up at least 24% of total entries each), as well as artificial sequences (1%) should predominantly not have a country tag filled in, this shows a very encouraging trend and growing awareness. The total amount of NSD entries with a country tag is just 16%, but the graph clearly shows that newly submitted sequences have a much higher percentage of reporting the country of origin. Thus, the percentage of NSD with a country tag will steadily increase if this trend prevails.Another geographical traceability option: GPS coordinatesNSD entries can also contain GPS coordinates of the location were the respective sample was taken. Entries with GPS coordinates generally also have a country tag. However, a quick analysis showed that in 5% of the entries the country tag and the coordinates showed a mismatch, meaning that two different countries were indicated. These mismatches were again analysed via the information in their respective publications. Based on the analysis above, the 5% mismatch between country of origin and GPS was caused by errors in the GPS coordinates or technical issues; the country tag was always correct. Most samples with wrong coordinates were taken close to borders and had inexact GPS coordinates. Other samples had wrong (non-sensical coordinates, probably due to human error (e.g., inversion of the coordinates that led to a tree sample with a location in the high seas). For roughly 25% of samples the problem was a just a terminology mismatch. Both GPS and country tag referred to the same location just the exact wording was different, e.g. “Serbia” and “Republic of Serbia”, or territories with different names, e.g. “Israel”, “Westbank” and “Palestine”.Since basically all entries with GPS coordinates also have a country tag and that country tag proved to be highly accurate, the country tag would be the more reliable data source for geographical information. Nonetheless, the GPS coordinates, although sometimes wrong, are essential for long time accuracy of NSD entries. Whenever countries and borders change, the GPS coordinates are helpful to determine the new country tag PEVuZE5vdGU+PENpdGU+PEF1dGhvcj5Eci4gSWxlbmUgTWl6cmFjaGkgKEdlbkJhbmspPC9BdXRo

b3I+PFllYXI+MjAxOTwvWWVhcj48UmVjTnVtPjM2PC9SZWNOdW0+PERpc3BsYXlUZXh0PlsyMSwg

ODBdPC9EaXNwbGF5VGV4dD48cmVjb3JkPjxyZWMtbnVtYmVyPjM2PC9yZWMtbnVtYmVyPjxmb3Jl

aWduLWtleXM+PGtleSBhcHA9IkVOIiBkYi1pZD0icmF2eGYwOXBzOTJwMGJlc3ZzNzVkc3d6NTBm

YXAwcGF4eDJ4IiB0aW1lc3RhbXA9IjE1NjQwMzM0MDciPjM2PC9rZXk+PC9mb3JlaWduLWtleXM+

PHJlZi10eXBlIG5hbWU9IlBlcnNvbmFsIENvbW11bmljYXRpb24iPjI2PC9yZWYtdHlwZT48Y29u

dHJpYnV0b3JzPjxhdXRob3JzPjxhdXRob3I+RHIuIElsZW5lIE1penJhY2hpIChHZW5CYW5rKSw8

L2F1dGhvcj48L2F1dGhvcnM+PC9jb250cmlidXRvcnM+PHRpdGxlcz48L3RpdGxlcz48ZGF0ZXM+

PHllYXI+MjAxOTwveWVhcj48cHViLWRhdGVzPjxkYXRlPkp1bCAxPC9kYXRlPjwvcHViLWRhdGVz

PjwvZGF0ZXM+PHVybHM+PC91cmxzPjxsYW5ndWFnZT4zMSwgNzI8L2xhbmd1YWdlPjwvcmVjb3Jk

PjwvQ2l0ZT48Q2l0ZT48QXV0aG9yPkJhcnJldHQ8L0F1dGhvcj48WWVhcj4yMDEyPC9ZZWFyPjxS

ZWNOdW0+MTA3PC9SZWNOdW0+PHJlY29yZD48cmVjLW51bWJlcj4xMDc8L3JlYy1udW1iZXI+PGZv

cmVpZ24ta2V5cz48a2V5IGFwcD0iRU4iIGRiLWlkPSJyYXZ4ZjA5cHM5MnAwYmVzdnM3NWRzd3o1

MGZhcDBwYXh4MngiIHRpbWVzdGFtcD0iMTU2ODY5ODQxMCI+MTA3PC9rZXk+PC9mb3JlaWduLWtl

eXM+PHJlZi10eXBlIG5hbWU9IkpvdXJuYWwgQXJ0aWNsZSI+MTc8L3JlZi10eXBlPjxjb250cmli

dXRvcnM+PGF1dGhvcnM+PGF1dGhvcj5CYXJyZXR0LCBULjwvYXV0aG9yPjxhdXRob3I+Q2xhcmss

IEsuPC9hdXRob3I+PGF1dGhvcj5HZXZvcmd5YW4sIFIuPC9hdXRob3I+PGF1dGhvcj5Hb3JlbGVu

a292LCBWLjwvYXV0aG9yPjxhdXRob3I+R3JpYm92LCBFLjwvYXV0aG9yPjxhdXRob3I+S2Fyc2No

LU1penJhY2hpLCBJLjwvYXV0aG9yPjxhdXRob3I+S2ltZWxtYW4sIE0uPC9hdXRob3I+PGF1dGhv

cj5QcnVpdHQsIEsuIEQuPC9hdXRob3I+PGF1dGhvcj5SZXNlbmNodWssIFMuPC9hdXRob3I+PGF1

dGhvcj5UYXR1c292YSwgVC48L2F1dGhvcj48YXV0aG9yPllhc2NoZW5rbywgRS48L2F1dGhvcj48

YXV0aG9yPk9zdGVsbCwgSi48L2F1dGhvcj48L2F1dGhvcnM+PC9jb250cmlidXRvcnM+PGF1dGgt

YWRkcmVzcz5OSUgsIE5hdGwgQ3RyIEJpb3RlY2hub2wgSW5mb3JtYXQsIE5hdGwgTGliIE1lZCwg

QmV0aGVzZGEsIE1EIDIwODkyIFVTQTwvYXV0aC1hZGRyZXNzPjx0aXRsZXM+PHRpdGxlPkJpb1By

b2plY3QgYW5kIEJpb1NhbXBsZSBkYXRhYmFzZXMgYXQgTkNCSTogZmFjaWxpdGF0aW5nIGNhcHR1

cmUgYW5kIG9yZ2FuaXphdGlvbiBvZiBtZXRhZGF0YTwvdGl0bGU+PHNlY29uZGFyeS10aXRsZT5O

dWNsZWljIEFjaWRzIFJlc2VhcmNoPC9zZWNvbmRhcnktdGl0bGU+PGFsdC10aXRsZT5OdWNsZWlj

IEFjaWRzIFJlczwvYWx0LXRpdGxlPjwvdGl0bGVzPjxwZXJpb2RpY2FsPjxmdWxsLXRpdGxlPk51

Y2xlaWMgQWNpZHMgUmVzPC9mdWxsLXRpdGxlPjxhYmJyLTE+TnVjbGVpYyBhY2lkcyByZXNlYXJj

aDwvYWJici0xPjwvcGVyaW9kaWNhbD48YWx0LXBlcmlvZGljYWw+PGZ1bGwtdGl0bGU+TnVjbGVp

YyBBY2lkcyBSZXM8L2Z1bGwtdGl0bGU+PGFiYnItMT5OdWNsZWljIGFjaWRzIHJlc2VhcmNoPC9h

YmJyLTE+PC9hbHQtcGVyaW9kaWNhbD48cGFnZXM+RDU3LUQ2MzwvcGFnZXM+PHZvbHVtZT40MDwv

dm9sdW1lPjxudW1iZXI+RDE8L251bWJlcj48a2V5d29yZHM+PGtleXdvcmQ+Z2Vub21pY3M8L2tl

eXdvcmQ+PGtleXdvcmQ+c3lzdGVtPC9rZXl3b3JkPjwva2V5d29yZHM+PGRhdGVzPjx5ZWFyPjIw

MTI8L3llYXI+PHB1Yi1kYXRlcz48ZGF0ZT5KYW48L2RhdGU+PC9wdWItZGF0ZXM+PC9kYXRlcz48

aXNibj4wMzA1LTEwNDg8L2lzYm4+PGFjY2Vzc2lvbi1udW0+V09TOjAwMDI5ODYwMTMwMDAxMDwv

YWNjZXNzaW9uLW51bT48dXJscz48cmVsYXRlZC11cmxzPjx1cmw+Jmx0O0dvIHRvIElTSSZndDs6

Ly9XT1M6MDAwMjk4NjAxMzAwMDEwPC91cmw+PHVybD5odHRwczovL3dhdGVybWFyay5zaWx2ZXJj

aGFpci5jb20vZ2tyMTE2My5wZGY/dG9rZW49QVFFQ0FIaTIwOEJFNDlPb2FuOWtraFdfRXJjeTdE

bTNaTF85Q2YzcWZLQWM0ODV5c2dBQUFxc3dnZ0tuQmdrcWhraUc5dzBCQndhZ2dnS1lNSUlDbEFJ

QkFEQ0NBbzBHQ1NxR1NJYjNEUUVIQVRBZUJnbGdoa2dCWlFNRUFTNHdFUVFNaDFaZXNUQzNXanVv

NjN0VkFnRVFnSUlDWG1OOXNxc3hlWm1CY29KSFN6M2hOS2tXRjUtc3BUSVQtUHhJY0JEY0taQS1L

MVpOOTZUTFVJVHRRM1ZOQ0lpVmk0VDEtR0NselNwSHRNR1ZUdjVxOW5oal80WElRY0NWWjRjWmh6

MmtRYktQanA5eE92Y2FuMVFkd0xhOGF4UE84elNMN3Exdk5uTHVMN1NXNThhUkpPV1pQWXhybmNW

SlFhTmtVajdYQ0dJUzFHN2VWNU9JTnFmMGlFX2cxeFdvN0lWejZfOVd0Z2p3cU95WVM1MmlQMU5O

SHFkQm9tSHE3M1BqRENhRUlWdzU1WlU2MFNhdHZ3RTRmR25oU3A4UlJmSVlRT1lIdFJzeDBiNXJj

YXpxV0NkZTFRenh0Rm83aFMwejhPRHNEeEhJQy1rdnI0ajJZOHlDNDFGbDI1RFJ0ZWluRUZKT0xT

UEV3VVl6cV9CMjdYckhBajBnLUhRb1AtWGVXMWoxZWtGY3RFazE3MUNnUlRYSVNtcElxQUlyaGFi

UDZIUXBiLTRwVWtMRWcyc3B2VWtveEpiMmEweDNmbFBvUUx6QzRDbEZoM0laajFKM0k0bHJHZGly

c3BCeGRPV0FPd0o3MG9fek45SzQtNEdOaml5M0p2V3dHRFZ4VzRyV0dUT0xhNHc4eTVlVHFoS05X

MUNDN3MzWVgydUE0Q0p0eVVHNDE0bUxhMncwYVpfOS1VRnY4Q0VhWUJabkdOUzBDbTB3REpZaVNQ

czV1Q1dEZ18tSVNBamc4VlZHeGJlUHFKRVItVS1KYVRnZ0VCNTFMRWozbjFKUEtmRS1VY25KYXNT

eTJ3bVNoaTNxUERRazMzRmhrZzZ2NnFuQU1nM1h4dUJCX010NzNPd25LR2JlZnFBb1JhOXVYZnh1

bklVaExMRnNLd3c5cUptS1JJalVHV1BJRlExVi1yUWhOclpjZm9SemkxX0ZQY3l1eHYycC1LU09M

a0lDWkxWTUkwSnNiQmVUc0k0dThmOXJxZVQwM2JwZkpvYlJseUFQZkhyTGlIMy1hUjVzXzllb01W

S012QjVrLTlKRFBTdmV2aGpFdmVkbmVBPC91cmw+PC9yZWxhdGVkLXVybHM+PC91cmxzPjxlbGVj

dHJvbmljLXJlc291cmNlLW51bT4xMC4xMDkzL25hci9na3IxMTYzPC9lbGVjdHJvbmljLXJlc291

cmNlLW51bT48bGFuZ3VhZ2U+RW5nbGlzaDwvbGFuZ3VhZ2U+PC9yZWNvcmQ+PC9DaXRlPjwvRW5k

Tm90ZT4A

ADDIN EN.CITE PEVuZE5vdGU+PENpdGU+PEF1dGhvcj5Eci4gSWxlbmUgTWl6cmFjaGkgKEdlbkJhbmspPC9BdXRo

b3I+PFllYXI+MjAxOTwvWWVhcj48UmVjTnVtPjM2PC9SZWNOdW0+PERpc3BsYXlUZXh0PlsyMSwg

ODBdPC9EaXNwbGF5VGV4dD48cmVjb3JkPjxyZWMtbnVtYmVyPjM2PC9yZWMtbnVtYmVyPjxmb3Jl

aWduLWtleXM+PGtleSBhcHA9IkVOIiBkYi1pZD0icmF2eGYwOXBzOTJwMGJlc3ZzNzVkc3d6NTBm

YXAwcGF4eDJ4IiB0aW1lc3RhbXA9IjE1NjQwMzM0MDciPjM2PC9rZXk+PC9mb3JlaWduLWtleXM+

PHJlZi10eXBlIG5hbWU9IlBlcnNvbmFsIENvbW11bmljYXRpb24iPjI2PC9yZWYtdHlwZT48Y29u

dHJpYnV0b3JzPjxhdXRob3JzPjxhdXRob3I+RHIuIElsZW5lIE1penJhY2hpIChHZW5CYW5rKSw8

L2F1dGhvcj48L2F1dGhvcnM+PC9jb250cmlidXRvcnM+PHRpdGxlcz48L3RpdGxlcz48ZGF0ZXM+

PHllYXI+MjAxOTwveWVhcj48cHViLWRhdGVzPjxkYXRlPkp1bCAxPC9kYXRlPjwvcHViLWRhdGVz

PjwvZGF0ZXM+PHVybHM+PC91cmxzPjxsYW5ndWFnZT4zMSwgNzI8L2xhbmd1YWdlPjwvcmVjb3Jk

PjwvQ2l0ZT48Q2l0ZT48QXV0aG9yPkJhcnJldHQ8L0F1dGhvcj48WWVhcj4yMDEyPC9ZZWFyPjxS

ZWNOdW0+MTA3PC9SZWNOdW0+PHJlY29yZD48cmVjLW51bWJlcj4xMDc8L3JlYy1udW1iZXI+PGZv

cmVpZ24ta2V5cz48a2V5IGFwcD0iRU4iIGRiLWlkPSJyYXZ4ZjA5cHM5MnAwYmVzdnM3NWRzd3o1

MGZhcDBwYXh4MngiIHRpbWVzdGFtcD0iMTU2ODY5ODQxMCI+MTA3PC9rZXk+PC9mb3JlaWduLWtl

eXM+PHJlZi10eXBlIG5hbWU9IkpvdXJuYWwgQXJ0aWNsZSI+MTc8L3JlZi10eXBlPjxjb250cmli

dXRvcnM+PGF1dGhvcnM+PGF1dGhvcj5CYXJyZXR0LCBULjwvYXV0aG9yPjxhdXRob3I+Q2xhcmss

IEsuPC9hdXRob3I+PGF1dGhvcj5HZXZvcmd5YW4sIFIuPC9hdXRob3I+PGF1dGhvcj5Hb3JlbGVu

a292LCBWLjwvYXV0aG9yPjxhdXRob3I+R3JpYm92LCBFLjwvYXV0aG9yPjxhdXRob3I+S2Fyc2No

LU1penJhY2hpLCBJLjwvYXV0aG9yPjxhdXRob3I+S2ltZWxtYW4sIE0uPC9hdXRob3I+PGF1dGhv

cj5QcnVpdHQsIEsuIEQuPC9hdXRob3I+PGF1dGhvcj5SZXNlbmNodWssIFMuPC9hdXRob3I+PGF1

dGhvcj5UYXR1c292YSwgVC48L2F1dGhvcj48YXV0aG9yPllhc2NoZW5rbywgRS48L2F1dGhvcj48

YXV0aG9yPk9zdGVsbCwgSi48L2F1dGhvcj48L2F1dGhvcnM+PC9jb250cmlidXRvcnM+PGF1dGgt

YWRkcmVzcz5OSUgsIE5hdGwgQ3RyIEJpb3RlY2hub2wgSW5mb3JtYXQsIE5hdGwgTGliIE1lZCwg

QmV0aGVzZGEsIE1EIDIwODkyIFVTQTwvYXV0aC1hZGRyZXNzPjx0aXRsZXM+PHRpdGxlPkJpb1By

b2plY3QgYW5kIEJpb1NhbXBsZSBkYXRhYmFzZXMgYXQgTkNCSTogZmFjaWxpdGF0aW5nIGNhcHR1

cmUgYW5kIG9yZ2FuaXphdGlvbiBvZiBtZXRhZGF0YTwvdGl0bGU+PHNlY29uZGFyeS10aXRsZT5O

dWNsZWljIEFjaWRzIFJlc2VhcmNoPC9zZWNvbmRhcnktdGl0bGU+PGFsdC10aXRsZT5OdWNsZWlj

IEFjaWRzIFJlczwvYWx0LXRpdGxlPjwvdGl0bGVzPjxwZXJpb2RpY2FsPjxmdWxsLXRpdGxlPk51

Y2xlaWMgQWNpZHMgUmVzPC9mdWxsLXRpdGxlPjxhYmJyLTE+TnVjbGVpYyBhY2lkcyByZXNlYXJj

aDwvYWJici0xPjwvcGVyaW9kaWNhbD48YWx0LXBlcmlvZGljYWw+PGZ1bGwtdGl0bGU+TnVjbGVp

YyBBY2lkcyBSZXM8L2Z1bGwtdGl0bGU+PGFiYnItMT5OdWNsZWljIGFjaWRzIHJlc2VhcmNoPC9h

YmJyLTE+PC9hbHQtcGVyaW9kaWNhbD48cGFnZXM+RDU3LUQ2MzwvcGFnZXM+PHZvbHVtZT40MDwv

dm9sdW1lPjxudW1iZXI+RDE8L251bWJlcj48a2V5d29yZHM+PGtleXdvcmQ+Z2Vub21pY3M8L2tl

eXdvcmQ+PGtleXdvcmQ+c3lzdGVtPC9rZXl3b3JkPjwva2V5d29yZHM+PGRhdGVzPjx5ZWFyPjIw

MTI8L3llYXI+PHB1Yi1kYXRlcz48ZGF0ZT5KYW48L2RhdGU+PC9wdWItZGF0ZXM+PC9kYXRlcz48

aXNibj4wMzA1LTEwNDg8L2lzYm4+PGFjY2Vzc2lvbi1udW0+V09TOjAwMDI5ODYwMTMwMDAxMDwv

YWNjZXNzaW9uLW51bT48dXJscz48cmVsYXRlZC11cmxzPjx1cmw+Jmx0O0dvIHRvIElTSSZndDs6

Ly9XT1M6MDAwMjk4NjAxMzAwMDEwPC91cmw+PHVybD5odHRwczovL3dhdGVybWFyay5zaWx2ZXJj

aGFpci5jb20vZ2tyMTE2My5wZGY/dG9rZW49QVFFQ0FIaTIwOEJFNDlPb2FuOWtraFdfRXJjeTdE

bTNaTF85Q2YzcWZLQWM0ODV5c2dBQUFxc3dnZ0tuQmdrcWhraUc5dzBCQndhZ2dnS1lNSUlDbEFJ

QkFEQ0NBbzBHQ1NxR1NJYjNEUUVIQVRBZUJnbGdoa2dCWlFNRUFTNHdFUVFNaDFaZXNUQzNXanVv

NjN0VkFnRVFnSUlDWG1OOXNxc3hlWm1CY29KSFN6M2hOS2tXRjUtc3BUSVQtUHhJY0JEY0taQS1L

MVpOOTZUTFVJVHRRM1ZOQ0lpVmk0VDEtR0NselNwSHRNR1ZUdjVxOW5oal80WElRY0NWWjRjWmh6

MmtRYktQanA5eE92Y2FuMVFkd0xhOGF4UE84elNMN3Exdk5uTHVMN1NXNThhUkpPV1pQWXhybmNW

SlFhTmtVajdYQ0dJUzFHN2VWNU9JTnFmMGlFX2cxeFdvN0lWejZfOVd0Z2p3cU95WVM1MmlQMU5O

SHFkQm9tSHE3M1BqRENhRUlWdzU1WlU2MFNhdHZ3RTRmR25oU3A4UlJmSVlRT1lIdFJzeDBiNXJj

YXpxV0NkZTFRenh0Rm83aFMwejhPRHNEeEhJQy1rdnI0ajJZOHlDNDFGbDI1RFJ0ZWluRUZKT0xT

UEV3VVl6cV9CMjdYckhBajBnLUhRb1AtWGVXMWoxZWtGY3RFazE3MUNnUlRYSVNtcElxQUlyaGFi

UDZIUXBiLTRwVWtMRWcyc3B2VWtveEpiMmEweDNmbFBvUUx6QzRDbEZoM0laajFKM0k0bHJHZGly

c3BCeGRPV0FPd0o3MG9fek45SzQtNEdOaml5M0p2V3dHRFZ4VzRyV0dUT0xhNHc4eTVlVHFoS05X

MUNDN3MzWVgydUE0Q0p0eVVHNDE0bUxhMncwYVpfOS1VRnY4Q0VhWUJabkdOUzBDbTB3REpZaVNQ

czV1Q1dEZ18tSVNBamc4VlZHeGJlUHFKRVItVS1KYVRnZ0VCNTFMRWozbjFKUEtmRS1VY25KYXNT

eTJ3bVNoaTNxUERRazMzRmhrZzZ2NnFuQU1nM1h4dUJCX010NzNPd25LR2JlZnFBb1JhOXVYZnh1

bklVaExMRnNLd3c5cUptS1JJalVHV1BJRlExVi1yUWhOclpjZm9SemkxX0ZQY3l1eHYycC1LU09M

a0lDWkxWTUkwSnNiQmVUc0k0dThmOXJxZVQwM2JwZkpvYlJseUFQZkhyTGlIMy1hUjVzXzllb01W

S012QjVrLTlKRFBTdmV2aGpFdmVkbmVBPC91cmw+PC9yZWxhdGVkLXVybHM+PC91cmxzPjxlbGVj

dHJvbmljLXJlc291cmNlLW51bT4xMC4xMDkzL25hci9na3IxMTYzPC9lbGVjdHJvbmljLXJlc291

cmNlLW51bT48bGFuZ3VhZ2U+RW5nbGlzaDwvbGFuZ3VhZ2U+PC9yZWNvcmQ+PC9DaXRlPjwvRW5k

Tm90ZT4A

ADDIN EN.CITE.DATA [21, 80].Conclusions on the geographical origin of NSDThe three largest providers of NSD are China, USA and Canada. Together with Japan, they make up over 50% of all NSD with a country tag inside the INSDC (Figure 1). “Net-provider countries” of GR are not the most common providers of NSD on GR, but the “net-user countries” mentioned above, suggesting that access to sequencing and a strong research infrastructure may be a more important factor for NSD provisioning than the amount of biodiversity in a country.The country tag appears to be highly accurate as no entry that was tested had a wrong country tag.For 44% of NSD entries that have no country tag, the country of origin could be obtained from the publication. The main reason for the missing country tag, appears to be due to the automated upload of large amounts of sequences with incomplete metadata and insufficient awareness from many scientists/NSD submitters. Both issues could be optimized, and the data show an increasing trend in the utilization of the country tag at NSD submissions (Figure 9).4.3 Traceability to patents & beyondINSDC databases also contain NSD that is part of the patent application. There are approximately 45 million patent NSD entries in GenBank accounting for roughly 20% of all NSD entries. Importantly, NSD is not per se “patented” but is disclosed as required by patent law if knowledge of the NSD will enable a “practitioner with average skill in the art to practice the invention”. Whether additional information/metadata, such as country of origin, about the NSD is disclosed on the patent application or is subject to some form of ABS compliance evaluation, is dependent on jurisdictional rules.Patent NSD in the INSDCGenBank receives patent NSD that are sent to GenBank from the US Patent and Trade Office upon patent registration. A similar process is in place between the European Patent Office (EPO) and EBI, as well as the Japanese and South Korean Patent Offices and DDBJ. In other words, for at least these four government patent offices there is direct traceability between the patent application and the public availability of the NSD in the INSDC databases and the AN is again the unique identifier that is used ADDIN EN.CITE <EndNote><Cite><Author>Jefferson</Author><Year>2015</Year><RecNum>57</RecNum><DisplayText>[81]</DisplayText><record><rec-number>57</rec-number><foreign-keys><key app="EN" db-id="ravxf09ps92p0besvs75dswz50fap0paxx2x" timestamp="1564377234">57</key></foreign-keys><ref-type name="Journal Article">17</ref-type><contributors><authors><author>Osmat A. Jefferson</author><author>Deniz K?llhofer</author><author>Prabha Ajjikuttira</author><author>Richard A. Jefferson</author></authors></contributors><titles><title>Public disclosure of biological sequences in global patent practice</title><secondary-title>World Patent Information</secondary-title></titles><periodical><full-title>World Patent Information</full-title></periodical><pages>12-24</pages><volume>43</volume><dates><year>2015</year><pub-dates><date>Dec</date></pub-dates></dates><urls><related-urls><url>;[81].We found that only one patent NSD entry has a country of origin listed (USA, found under AN GN358820.1). The reason that country of origin information is not listed in the patent NSD entries appears to be because of the lack of transfer of this information, where relevant, from the patent application into the INSDC (using the system described directly above). This lack of transfer is largely due to resource constraints, and to incompatible data formats. Although, as mentioned further above (Section 4.2), not all patent-associated NSD will have a country of origin, e.g., human, model organism, synthetic, etc. Additionally, country of origin patent disclosure requirements differ across jurisdictions. However, it appears that the data connectivity or traceability on country of origin information is weak. This country of origin information on the patent application is apparently not transferred along with the NSD into the INSDC. (This is speculative based upon a brief analysis of these NSD entries and informal conversations with patent attorneys. A deeper analysis of these procedures within the World Intellectual Property Organization (WIPO) community would be beneficial.)Furthermore, if a patent applicant files a patent using NSD from the public database, the EPO (and assumedly other patent offices) accepts this original AN ADDIN EN.CITE <EndNote><Cite><Author>European Patent Office</Author><RecNum>56</RecNum><DisplayText>[82]</DisplayText><record><rec-number>56</rec-number><foreign-keys><key app="EN" db-id="ravxf09ps92p0besvs75dswz50fap0paxx2x" timestamp="1564376707">56</key></foreign-keys><ref-type name="Web Page">12</ref-type><contributors><authors><author>European Patent Office,</author></authors></contributors><titles><title>Guidelines for Examination: Reference to sequences disclosed in a database</title></titles><volume>2019</volume><number>Jul 29</number><dates></dates><urls><related-urls><url>;[82] (rather than requiring the generation of a new AN). Conversely, if public NSD was used in a patent, there is currently no requirement to cite the original AN, i.e., the patent applicant could either use the original AN or re-submit the NSD and generate a new AN. However, by using a BLAST ADDIN EN.CITE <EndNote><Cite><Author>National Center for Biotechnology Information</Author><RecNum>93</RecNum><DisplayText>[83]</DisplayText><record><rec-number>93</rec-number><foreign-keys><key app="EN" db-id="ravxf09ps92p0besvs75dswz50fap0paxx2x" timestamp="1565241103">93</key></foreign-keys><ref-type name="Web Page">12</ref-type><contributors><authors><author>National Center for Biotechnology Information,</author></authors></contributors><titles><title>Basic Local Alignment Search Tool</title></titles><volume>2019</volume><number>Aug 08</number><dates></dates><urls><related-urls><url>;[83] search it would be technically trivial to determine whether or not a patent sequence was identical to a previously existing sequence in the INSDC.New NSD reporting change in WIPO will improve traceabilityThe World Intellectual Property Organization (WIPO), which is an umbrella organization governing the worldwide implementation of intellectual property has initiated some important technological processes relevant to NSD over the last few years. First, there is a migration of the standard format for reporting NSD from the ten-year old Standard 25 ADDIN EN.CITE <EndNote><Cite><Author>World Intellectual Property Organization</Author><Year>2009</Year><RecNum>108</RecNum><DisplayText>[84]</DisplayText><record><rec-number>108</rec-number><foreign-keys><key app="EN" db-id="ravxf09ps92p0besvs75dswz50fap0paxx2x" timestamp="1568698812">108</key></foreign-keys><ref-type name="Standard">58</ref-type><contributors><authors><author>World Intellectual Property Organization,</author></authors></contributors><titles><title>Handbook on Industrial Property Information and Documentation</title><secondary-title>Standard for the Presentation of Nucleotide and Amino Acid Sequence Listings in Patent Applications - Standard ST.25</secondary-title></titles><dates><year>2009</year></dates><urls><related-urls><url>;[84] to the new Standard 26 ADDIN EN.CITE <EndNote><Cite><Author>World Intellectual Property Organization</Author><Year>2019</Year><RecNum>109</RecNum><DisplayText>[85]</DisplayText><record><rec-number>109</rec-number><foreign-keys><key app="EN" db-id="ravxf09ps92p0besvs75dswz50fap0paxx2x" timestamp="1568698986">109</key></foreign-keys><ref-type name="Standard">58</ref-type><contributors><authors><author>World Intellectual Property Organization,</author></authors></contributors><titles><title>Handbook on Industrial Property Information and Documentation</title><secondary-title>Recommended Standard for the Presentation of Nucleotide and Amino Acid Sequence Listings Using XML (Extensible Markup Language) - Standard ST.26</secondary-title></titles><dates><year>2019</year></dates><urls><related-urls><url>;[85]. The new standard is expected to be formally revised by the end of 2019 and phased into full global use by 2022 and mainly standardizes the reporting requirements for NSD. However, the standard change is important because it will be coupled with the roll-out of a new software application, WIPO Sequence, which will enable applicants to directly submit NSD and simultaneously send them to any patent office/jurisdiction in the world. There it will be subsequently converted into any necessary format required by local patent examiners where it will in most cases eventually land in the INSDC. The key to this technological development was to standardize the database structure (XML system) to ensure harmonization across multiple jurisdictions. These changes will make patent NSD searchable by innovators around the world, which will be a significant (and costly) achievement.WIPO engaged the INSDC as a partner in these new developments and is building a system based on direct consultation from the INSDC that will directly integrate with the existing AN traceability system discussed above. This WIPO-based model of cooperation and engagement with the INSDC could be an important lesson for the CBD as they seek to understand the scientific and technical structure behind the large NSD databases. Such a partnership would seem reasonable in terms of non-duplication, efficiency, and resource effectiveness.Conclusions on patent traceabilityNSD used around patents is sent to the INSDC by the patent offices of USA, Europe, Japan and South Korea and every NSD entry gets an AN enabling traceability.Other patent offices could adopt the same requirement, which would make their patent NSD traceable.In cases where NSD was used from the INSDC in a patent application, the patent submission could use the “old” AN or establish a link to it rather than generating a new AN.Non-patent-based innovationsAlthough patents are one primary mechanism for protecting intellectual property, it is important to note that other “legal tools” exist to enable and protect innovation. In particular, trade secrets or copyright protection could theoretically be employed to profit in some way from NSD. It is our understanding that traceability of NSD within these legal frameworks would likely be exceedingly challenging since there are no disclosure requirements.4.4 When does traceability “break down”?In the above sections, we have outlined how the traceability of NSD works in the scientific world and, in particular, how is works in the public databases. This system, established through decades of international cooperation within the INSDC, came from and is dominantly used by the scientific (private and public) community. However, there are challenges that should be considered. The existing traceability system is not a security or banking system meant to keep track of minute-by-minute transactions. The AN system is analogous to a bar code or a radio frequency ID (RFID) on a new consumer item. It is extremely useful for tracking data flow, linking different data types to each other, and for standardizing and enabling NSD usage in a practical, technical, and transparent sense.The system was built to enable scientific integrity and transparency with a primary focus on publication and scientific exchange. The system works only as long as the users of the system conform with the AN system and (meta)data structures for reporting, exchanging, downloading, interfacing with other databases and publications, and re-using NSD. The established NSD traceability system is highly flexible and is in use in both public and private settings in all known NSD database settings.However, for “bad actors” that deliberately wish to deceive or cheat, there are opportunities to do so. A “cheater” could theoretically use the entire NSD dataset from the INSDC, at some point make a profitable discovery on an NSD entry from a CBD Party and, although aware of ABS obligations, country of origin and, conceivably even access permits, this individual could, in the process of patenting, lie about the AN associated with this piece of NSD or lie about the presence of ABS obligations. To employ a metaphor, if a shoplifter came into a store and stole a bag of chips and left the store, it would be possible to uniquely identify the “bag of chips” using the associated barcode (metaphorical AN). However, if the shoplifter threw away the bag containing the chips, the chips themselves would be difficult to distinguish from other chips. The AN (unique code) is essential for traceability.This problem is not unique to access and benefit sharing. For example, malicious misuse of pathogen NSD is a major biosecurity concern (also known as dual use research of concern) and, for which, despite the promulgation of laws, policies, handbooks, etc., the truth remains that evil actors could weaponize the knowledge that this open system has generated. Ultimately, society must weigh and balance these competing pressures and decide where and when an open system enables the greater societal good to prosper despite the risks.5. Additional technological options for traceabilityBeyond the already established traceability system of ANs and DOIs, there are other methods for data traceability and related applications. This section gives a broad overview of such methods and how they are already applied or could be applied to NSD. Additionally, this section will list issues that are especially relevant or challenging with regard to NSD traceability.5.1 Tracking users of NSDTracking and tracing is commonly done in the Internet. The most used method is the tracking of IP (Internet Protocol) addresses. This address is given to every device that is connected with the internet and enables data flow towards this device. IP address tracking enables the identification of the location of a user. For example, it can be used to identify the country where the device is located. Many media services (YouTube and Netflix) and some governments do this to provide country specific content or deny access to content. The user data shown in Section 3.5 was determined by GenBank via this method.In principle, IP tracking is well established and can also be used for monitoring users of NSD. However, there are several limitations. As with other systems, once NSD is downloaded and analysed or manipulated locally it leaves the system and cannot be followed anymore. IP tracking can identify the address of a user download, but it cannot follow usage that happens afterwards. An IP address identifies a device and its location but not the user and the user’s offline activities. As IP address tracking is used extensively around the world, there is a constant arms race of developing counter-measures and more sophisticated tracking methods. Aside from technical limitations there will also be legal limitations in many countries (including the EU) that may have more strict personal data protection and privacy laws than others. For example, the usage data on a country level (Figures 5a-c) was obtainable from GenBank, but not from EMBL-EBI for exactly this reason. More detailed data on users (e.g. affiliation, which sequences accessed), which may be desired for advanced tracking of NSD usage, may fall under this category. 5.2 BlockchainThe concept of blockchain emerged in the 1990s ADDIN EN.CITE <EndNote><Cite><Author>Haber</Author><Year>1991</Year><RecNum>61</RecNum><DisplayText>[86]</DisplayText><record><rec-number>61</rec-number><foreign-keys><key app="EN" db-id="ravxf09ps92p0besvs75dswz50fap0paxx2x" timestamp="1564378020">61</key></foreign-keys><ref-type name="Journal Article">17</ref-type><contributors><authors><author>Stuart Haber</author><author>W. Scott Stornetta</author></authors></contributors><titles><title>How to Time-Stamp a Digital Document</title><secondary-title>J. Cryptology</secondary-title></titles><periodical><full-title>J. Cryptology</full-title></periodical><pages>99-111</pages><volume>3</volume><number>2</number><dates><year>1991</year><pub-dates><date>Jan</date></pub-dates></dates><urls><related-urls><url>;[86]. Since its application to Bitcoin ADDIN EN.CITE <EndNote><Cite><Author>Nakamoto</Author><Year>2008</Year><RecNum>62</RecNum><DisplayText>[87]</DisplayText><record><rec-number>62</rec-number><foreign-keys><key app="EN" db-id="ravxf09ps92p0besvs75dswz50fap0paxx2x" timestamp="1564378863">62</key></foreign-keys><ref-type name="Electronic Article">43</ref-type><contributors><authors><author>Satoshi Nakamoto</author></authors></contributors><titles><title>Bitcoin: A Peer-to-Peer Electronic Cash System</title></titles><dates><year>2008</year></dates><publisher>The Cryptography and Cryptography Policy Mailing List</publisher><urls><related-urls><url>;[87], the blockchain technology has received significant attention and is seen as a disruptive technology able to diffuse into many areas of application. Bitcoins, as a real case in point, change hands without any trusted third party (e.g. a bank) and yet every transaction can be accurately traced and verified. Given the level of interest in this particular technology, its unique features, technical complexity, and commercial examples in human health genomics, this section will offer a more extensive analysis than other sections.Technical backgroundA blockchain is a public, decentralized transaction ledger shared by many network participants, so-called nodes. Each node contributes its own computational power to the system. Every node within a blockchain system can at all times access the blockchain, where all transaction records are stored, and examine its content and conduct an audit (something an institution like a bank does, just that with blockchain you do not have to trust that institution and its employees). A blockchain can theoretically be created around any digital data where there is an interest to keep a ledger on its possession, e.g., a bitcoin or a nucleotide sequence.Aside from controlling and tracking data usage, blockchain can also be used to store the conditions of use for that data, which is also called smart contracts. For example, the details of a material transfer agreement (MTA) on the underlying GR could be stored in the blockchain of a specific NSD. Everyone that accessed the NSD via the blockchain system would automatically be required to accept the conditions of the MTA. Like a contract, this does not prohibit transgressions, but it gives a stored record that the conditions of the MTA were known and accepted by the user.The data is stored in a block and information on each following transaction is then stored in a new block attached to already existing chain of blocks. Whenever a new transaction happens, a cryptographic puzzle gets sent to all nodes in the system. Every node then starts to solve the cryptographic puzzle, for which it needs computational power. The puzzle is eventually solved by one of the nodes and that node sends the blockchain with the new block to the other nodes in the system. The solving of the cryptographic puzzle is also called “proof-of-work” and demands a lot of computational power. Therefore, the longest blockchain is the one with the most computation invested in. If there are two competing blockchains at the same time, the nodes will consider the longer one as the correct one and ignore the other. In other words, verification is based on the concept that the whole system of nodes has more computational power than any attacker/corrupted single node. A successful attack would need to control at least 51% of the total computation power/nodes within the system in order to be successful. Since the “proof-of-work” is energy and computation intensive, there are many other methods currently being developed. One method for some cryptocurrencies is the “proof-of-stake”. Here, every holder of the cryptocurrency is allowed to verify a percentage of transactions equal to the total percentage of total coins she holds. If someone holds 10% of all coins, she is allowed to process 10% of all transactions happening. This highly reduces necessary computation power, as no competition for prolonging a blockchain exists. Leaving technical difficulties aside, all such alternatives to “proof-of-work” require a higher amount of trust in certain actors (here, the coin holders) and their specific rights and duties. In summary, traditional blockchain overcomes trust completely by “maximal” computation power, while traditional institutions (e.g. banks) need no computation power but “maximal” trust. In between hybrid versions to balance between trust and computation power exist or are being developed.An incentive is needed to get external stakeholders to give their computational power to the system. It is estimated that bitcoin currently consumes 72.57 terawatt hours annually, comparable to the energy consumption of Austria, which costs 3.628 billion USD annually ADDIN EN.CITE <EndNote><Cite><Author>Digiconomist</Author><RecNum>64</RecNum><DisplayText>[88]</DisplayText><record><rec-number>64</rec-number><foreign-keys><key app="EN" db-id="ravxf09ps92p0besvs75dswz50fap0paxx2x" timestamp="1564379395">64</key></foreign-keys><ref-type name="Web Page">12</ref-type><contributors><authors><author>Digiconomist,</author></authors></contributors><titles><title>Bitcoin Energy Consumption Index</title></titles><volume>2019</volume><number>Jul 29</number><dates></dates><urls><related-urls><url> </url></related-urls></urls><language>84</language></record></Cite></EndNote>[88]. The high energy costs result from the fact that several nodes try to solve the same cryptographic puzzle, but only the fastest nodes succeed in doing so and thus prolongs the block chain, whilst the work of the other nodes gets discarded. At the moment, the worth of bitcoins paid to these stakeholders, called bitcoin miners, is higher than the energy cost they invest. For a blockchain outside of a cryptocurrency application, other financial incentives for these computing costs need to be found or created, which is why to-date very few blockchain applications exist. Research is being conducted to tackle the disadvantages of blockchain mentioned above. However, the reliance on computational power is essential to the technology, so it may be optimized but never completely eliminated.Blockchain for Genetic ResourcesA blockchain basically overcomes the necessity of trust (trusting actors to self-report usage) by instead employing computational power to prove trustworthiness. Instead of having a secure and closed environment as in a banking system, the transactions are made public via a blockchain and get permanently actualized and verified via computation. In order to cheat the system by creating a wrong transaction, e.g. a money transfer to the attacker, the attacker would need to have more computation power (>50%) than the rest of the system putational power is the limiting factor for scaling up the blockchain system. Bitcoin is able to conduct a maximum of seven transactions per second ADDIN EN.CITE <EndNote><Cite><Author>Yli-Huumo</Author><Year>2016</Year><RecNum>63</RecNum><DisplayText>[89]</DisplayText><record><rec-number>63</rec-number><foreign-keys><key app="EN" db-id="ravxf09ps92p0besvs75dswz50fap0paxx2x" timestamp="1564379139">63</key></foreign-keys><ref-type name="Journal Article">17</ref-type><contributors><authors><author>Yli-Huumo, J.</author><author>Ko, D.</author><author>Choi, S.</author><author>Park, S.</author><author>Smolander, K.</author></authors></contributors><auth-address>Lappeenranta Univ Technol, Dept Innovat &amp; Software, Lappeenranta, Finland&#xD;Sogang Univ, Dept Comp Sci Engn, Seoul, South Korea&#xD;Aalto Univ, Dept Comp Sci, Helsinki, Finland&#xD;Sogang Univ, Sogang Inst Adv Technol, Seoul, South Korea</auth-address><titles><title>Where Is Current Research on Blockchain Technology?-A Systematic Review</title><secondary-title>Plos One</secondary-title><alt-title>Plos One</alt-title></titles><periodical><full-title>Plos One</full-title><abbr-1>Plos One</abbr-1></periodical><alt-periodical><full-title>Plos One</full-title><abbr-1>Plos One</abbr-1></alt-periodical><pages>e0163477</pages><volume>11</volume><number>10</number><dates><year>2016</year><pub-dates><date>Oct 3</date></pub-dates></dates><isbn>1932-6203</isbn><accession-num>WOS:000385553100035</accession-num><urls><related-urls><url>&lt;Go to ISI&gt;://WOS:000385553100035</url><url>;[89]. In 2018, a total of 105,754,418 requests were sent to GenBank’s Nucleotide database, resulting in 3.35 requests per second on average. However, this is just for the Nucleotide NSD on GenBank. ENA and DDBJ will have requests in similar magnitudes. The amount of total requests to EBI alone, which includes NSD and some SI, is 100-fold higher than the NSD requests from GenBank (e.g., >300 requests per second on average). Furthermore, the theoretical blocks that would be needed for large NSD entries (billions of nucleotides) would require much greater computational power than the relatively simple sizes of bitcoin. Together with the section above, the needed computation power correlates with the blockchain system (“level of trust”) and the amount of data stored in a blockchain.A blockchain system for genetic resources would need to be created de novo and also be maintained. So, it requires high upfront costs and permanent maintenance costs. Both these costs are dependent on the demands made on the system and would need to be “future proofed” and anticipate the currently exponential growth in NSD generation. Furthermore, a block chain for GR would also need to anticipate users. Blockchain does not have a user interface and a majority of the INSDC investment cost is for user interactions. Thus, the costs are not only for maintenance and computing but also for user interfaces and tools.One specialized case for using blockchain on NSD is the individual human genome in the health sector. The complete human genome has been openly available since the human genome project (see Section 3.1), but individual mutations (genetic differences) can play an important role in the research and development of drugs and therapies. There are several companies and startups that give individuals control over the use of their own genome and associated health information, e.g. DNAtix, Nebula genomics and Luna DNA ADDIN EN.CITE <EndNote><Cite><Author>DNAtix</Author><RecNum>140</RecNum><DisplayText>[90-92]</DisplayText><record><rec-number>140</rec-number><foreign-keys><key app="EN" db-id="ravxf09ps92p0besvs75dswz50fap0paxx2x" timestamp="1568785945">140</key></foreign-keys><ref-type name="Web Page">12</ref-type><contributors><authors><author>DNAtix,</author></authors></contributors><titles><title>DNAtix Hompage</title></titles><volume>2019</volume><number>Sep 18</number><dates></dates><urls><related-urls><url> Genomics</Author><RecNum>141</RecNum><record><rec-number>141</rec-number><foreign-keys><key app="EN" db-id="ravxf09ps92p0besvs75dswz50fap0paxx2x" timestamp="1568786025">141</key></foreign-keys><ref-type name="Web Page">12</ref-type><contributors><authors><author>Nebula Genomics,</author></authors></contributors><titles><title>Nebula Hompage</title></titles><volume>2019</volume><number>Sep 18</number><dates></dates><urls><related-urls><url> Inc</Author><RecNum>142</RecNum><record><rec-number>142</rec-number><foreign-keys><key app="EN" db-id="ravxf09ps92p0besvs75dswz50fap0paxx2x" timestamp="1568786100">142</key></foreign-keys><ref-type name="Web Page">12</ref-type><contributors><authors><author>LunaPBC Inc,</author></authors></contributors><titles><title>LunaDNA Homepage</title></titles><volume>2019</volume><number>Sep 18</number><dates></dates><urls><related-urls><url>;[90-92], Encrypgen and Longenesis, with most but not all of them exploring the use of blockchain. The basic idea is always that individuals can get their genome uploaded to that respective company and then decide who can access it or what kind of research can be conducted with it. These individuals must also contribute information on their health/disease status, which greatly increases the value of their genome as it enables large-scale comparisons and correlations of health factors with possible genetic diseases.The terms of access can be set in two principal ways. Customers can select predefined terms of use upfront or can accept/decline each request via their account. The second option provides maximum control for the customer/patient, but is problematic for large scale analysis (e.g. when thousands of genomes shall be analysed but for every genomes consent must be obtained individually). A mixture of both ways could minimize the (dis)advantages of both systems, e.g. allowing access for non-commercial users automatically and deciding individually on commercial requests. Terms of use or requests can include payments to the customer by the company/institution wanting to access his genomic data. There are three different ways in which the data transfer or respectively the encryption can work: The NSD itself can be stored and transferred inside the blockchain and passed along to different users, software platforms, etc. OROnly request and answer are stored inside the blockchain, the analysis/processing is conducted within the servers of the company running the blockchain.Only accessions are stored in blockchainThe first option is the standard. It has the limitation that larger sequences like genomes are hard to put into a block, because this results in the use of massive storage space and computational power. This means not only an increase in costs, but also that the analysis takes longer and that the transaction limit decreases (amount of transactions the system is able to conduct per second). With the human genome, the data can be significantly compressed to fit into a blockchain, because only the points of mutation/difference between the individual and the human reference genome are stored and of importance, which would not be the case for novel NSD.The second option overcomes the problems mentioned above by transferring the needed computational power from the blockchain system towards the company. In that case, the company needs to run large server farms to process all requests and analysis themselves, which are normally conducted by the requester/researcher himself. Then the final results get returned back to the requester, without him being able to obtain/access any underlying data. It has the advantage that even the company paying for the access does not see the NSD itself, but only the results of the analysis. However, this also means that the company running the blockchain must have the capacities to enable or perform all potential scientific ways of analyzing the data ADDIN EN.CITE <EndNote><Cite><Author>Ofer Lidsky (CEO DNAtix)</Author><Year>2019</Year><RecNum>110</RecNum><DisplayText>[93]</DisplayText><record><rec-number>110</rec-number><foreign-keys><key app="EN" db-id="ravxf09ps92p0besvs75dswz50fap0paxx2x" timestamp="1568699462">110</key></foreign-keys><ref-type name="Personal Communication">26</ref-type><contributors><authors><author>Ofer Lidsky (CEO DNAtix),</author></authors></contributors><titles></titles><dates><year>2019</year></dates><urls></urls></record></Cite></EndNote>[93]. Finally, in discussion with experts, it was often noted that the utilization of human genomes raises a lot of privacy issues, like personalized advertisement or identification via genomes of relatives, which do not apply for non-human genomes. These issues are not yet resolved.The third option is the easiest to accomplish, but also provides the least security. The blockchain only counts the accessions and were they come from. This is very similar to tracking traffic at webpages requiring logins. Users need to be somehow identified and traceability is lost once a user has accessed NSD. He can copy, reuse and spread it without being monitored. This option is basically tracking access, as mentioned in the other sections on traceability, making it only a technical alternative to other systems (e.g. webpage logins), with similar advantages and disadvantages. Therefore, this option is not explicitly discussed in the summary. A putative example: Earth Bank of CodesIn 2018, the Earth Biogenome Project ADDIN EN.CITE <EndNote><Cite><Author>Earth Biogenome Project</Author><RecNum>148</RecNum><DisplayText>[94]</DisplayText><record><rec-number>148</rec-number><foreign-keys><key app="EN" db-id="ravxf09ps92p0besvs75dswz50fap0paxx2x" timestamp="1569475474">148</key></foreign-keys><ref-type name="Web Page">12</ref-type><contributors><authors><author>Earth Biogenome Project,</author></authors></contributors><titles><title>Homepage</title></titles><volume>2019</volume><number>Sep 26</number><dates></dates><urls><related-urls><url> </url></related-urls></urls></record></Cite></EndNote>[94] was launched, a global effort to sequence all so-called higher species of the planet. The idea behind it is similar to the human genome project ADDIN EN.CITE <EndNote><Cite><Author>National Human Genome Research Institute</Author><RecNum>149</RecNum><DisplayText>[95]</DisplayText><record><rec-number>149</rec-number><foreign-keys><key app="EN" db-id="ravxf09ps92p0besvs75dswz50fap0paxx2x" timestamp="1569475729">149</key></foreign-keys><ref-type name="Web Page">12</ref-type><contributors><authors><author>National Human Genome Research Institute,</author></authors></contributors><titles><title>The Human Genome Project</title></titles><volume>2019</volume><number>Sep 26</number><dates></dates><urls><related-urls><url> </url></related-urls></urls></record></Cite></EndNote>[95], which was a global effort to completely sequence the human genome. Since this project implies that GR from around the world will be accessed and sequenced, effective compliance with the different national ABS legislation is a key issue in that project PEVuZE5vdGU+PENpdGU+PEF1dGhvcj5MZXdpbjwvQXV0aG9yPjxZZWFyPjIwMTg8L1llYXI+PFJl

Y051bT4xNTA8L1JlY051bT48RGlzcGxheVRleHQ+Wzk2XTwvRGlzcGxheVRleHQ+PHJlY29yZD48

cmVjLW51bWJlcj4xNTA8L3JlYy1udW1iZXI+PGZvcmVpZ24ta2V5cz48a2V5IGFwcD0iRU4iIGRi

LWlkPSJyYXZ4ZjA5cHM5MnAwYmVzdnM3NWRzd3o1MGZhcDBwYXh4MngiIHRpbWVzdGFtcD0iMTU2

OTQ3NTg4MSI+MTUwPC9rZXk+PC9mb3JlaWduLWtleXM+PHJlZi10eXBlIG5hbWU9IkpvdXJuYWwg

QXJ0aWNsZSI+MTc8L3JlZi10eXBlPjxjb250cmlidXRvcnM+PGF1dGhvcnM+PGF1dGhvcj5MZXdp

biwgSC4gQS48L2F1dGhvcj48YXV0aG9yPlJvYmluc29uLCBHLiBFLjwvYXV0aG9yPjxhdXRob3I+

S3Jlc3MsIFcuIEouPC9hdXRob3I+PGF1dGhvcj5CYWtlciwgVy4gSi48L2F1dGhvcj48YXV0aG9y

PkNvZGRpbmd0b24sIEouPC9hdXRob3I+PGF1dGhvcj5DcmFuZGFsbCwgSy4gQS48L2F1dGhvcj48

YXV0aG9yPkR1cmJpbiwgUi48L2F1dGhvcj48YXV0aG9yPkVkd2FyZHMsIFMuIFYuPC9hdXRob3I+

PGF1dGhvcj5Gb3Jlc3QsIEYuPC9hdXRob3I+PGF1dGhvcj5HaWxiZXJ0LCBNLiBULiBQLjwvYXV0

aG9yPjxhdXRob3I+R29sZHN0ZWluLCBNLiBNLjwvYXV0aG9yPjxhdXRob3I+R3JpZ29yaWV2LCBJ

LiBWLjwvYXV0aG9yPjxhdXRob3I+SGFja2V0dCwgSy4gSi48L2F1dGhvcj48YXV0aG9yPkhhdXNz

bGVyLCBELjwvYXV0aG9yPjxhdXRob3I+SmFydmlzLCBFLiBELjwvYXV0aG9yPjxhdXRob3I+Sm9o

bnNvbiwgVy4gRS48L2F1dGhvcj48YXV0aG9yPlBhdHJpbm9zLCBBLjwvYXV0aG9yPjxhdXRob3I+

UmljaGFyZHMsIFMuPC9hdXRob3I+PGF1dGhvcj5DYXN0aWxsYS1SdWJpbywgSi4gQy48L2F1dGhv

cj48YXV0aG9yPnZhbiBTbHV5cywgTS4gQS48L2F1dGhvcj48YXV0aG9yPlNvbHRpcywgUC4gUy48

L2F1dGhvcj48YXV0aG9yPlh1LCBYLjwvYXV0aG9yPjxhdXRob3I+WWFuZywgSC48L2F1dGhvcj48

YXV0aG9yPlpoYW5nLCBHLjwvYXV0aG9yPjwvYXV0aG9ycz48L2NvbnRyaWJ1dG9ycz48YXV0aC1h

ZGRyZXNzPkRlcGFydG1lbnQgb2YgRXZvbHV0aW9uIGFuZCBFY29sb2d5LCBVbml2ZXJzaXR5IG9m

IENhbGlmb3JuaWEsIERhdmlzLCBDQSA5NTYxNjsgTGV3aW5AdWNkYXZpcy5lZHUuJiN4RDtEZXBh

cnRtZW50IG9mIFBvcHVsYXRpb24gSGVhbHRoIGFuZCBSZXByb2R1Y3Rpb24sIFNjaG9vbCBvZiBW

ZXRlcmluYXJ5IE1lZGljaW5lLCBVbml2ZXJzaXR5IG9mIENhbGlmb3JuaWEsIERhdmlzLCBDQSA5

NTYxNi4mI3hEO1RoZSBKb2huIE11aXIgSW5zdGl0dXRlIG9mIHRoZSBFbnZpcm9ubWVudCwgVW5p

dmVyc2l0eSBvZiBDYWxpZm9ybmlhLCBEYXZpcywgQ0EgOTU2MTYuJiN4RDtUaGUgVW5pdmVyc2l0

eSBvZiBDYWxpZm9ybmlhLCBEYXZpcyBHZW5vbWUgQ2VudGVyLCBVbml2ZXJzaXR5IG9mIENhbGlm

b3JuaWEsIERhdmlzLCBDQSA5NTYxNi4mI3hEO0NhcmwgUi4gV29lc2UgSW5zdGl0dXRlIGZvciBH

ZW5vbWljIEJpb2xvZ3ksIERlcGFydG1lbnQgb2YgRW50b21vbG9neSwgYW5kIE5ldXJvc2NpZW5j

ZSBQcm9ncmFtLCBVbml2ZXJzaXR5IG9mIElsbGlub2lzIGF0IFVyYmFuYS1DaGFtcGFpZ24sIFVy

YmFuYSwgSUwgNjE4MDEuJiN4RDtOYXRpb25hbCBNdXNldW0gb2YgTmF0dXJhbCBIaXN0b3J5LCBT

bWl0aHNvbmlhbiBJbnN0aXR1dGlvbiwgV2FzaGluZ3RvbiwgREMgMjAwMTMuJiN4RDtSb3lhbCBC

b3RhbmljIEdhcmRlbnMsIEtldywgUmljaG1vbmQsIFN1cnJleSBUVzkgM0FFLCBVbml0ZWQgS2lu

Z2RvbS4mI3hEO0NvbXB1dGF0aW9uYWwgQmlvbG9neSBJbnN0aXR1dGUsIE1pbGtlbiBJbnN0aXR1

dGUgU2Nob29sIG9mIFB1YmxpYyBIZWFsdGgsIEdlb3JnZSBXYXNoaW5ndG9uIFVuaXZlcnNpdHks

IFdhc2hpbmd0b24sIERDIDIwMDUyLiYjeEQ7RGVwYXJ0bWVudCBvZiBHZW5ldGljcywgVW5pdmVy

c2l0eSBvZiBDYW1icmlkZ2UsIENhbWJyaWRnZSBDQjEwIDFTQSwgVW5pdGVkIEtpbmdkb20uJiN4

RDtXZWxsY29tZSBUcnVzdCBTYW5nZXIgSW5zdGl0dXRlLCBDYW1icmlkZ2UgQ0IxMCAxU0EsIFVu

aXRlZCBLaW5nZG9tLiYjeEQ7RGVwYXJ0bWVudCBvZiBPcmdhbmlzbWljIGFuZCBFdm9sdXRpb25h

cnkgQmlvbG9neSwgSGFydmFyZCBVbml2ZXJzaXR5LCBDYW1icmlkZ2UsIE1BIDAyMTM4LiYjeEQ7

TXVzZXVtIG9mIENvbXBhcmF0aXZlIFpvb2xvZ3ksIEhhcnZhcmQgVW5pdmVyc2l0eSwgQ2FtYnJp

ZGdlLCBNQSAwMjEzOC4mI3hEO05hdHVyYWwgSGlzdG9yeSBNdXNldW0gb2YgRGVubWFyaywgVW5p

dmVyc2l0eSBvZiBDb3BlbmhhZ2VuLCAxMzUwIENvcGVuaGFnZW4sIERlbm1hcmsuJiN4RDtVbml2

ZXJzaXR5IE11c2V1bSwgTm9yd2VnaWFuIFVuaXZlcnNpdHkgb2YgU2NpZW5jZSBhbmQgVGVjaG5v

bG9neSwgTi03NDkxIFRyb25kaGVpbSwgTm9yd2F5LiYjeEQ7RGVwYXJ0bWVudCBvZiBIZWFsdGgg

UG9saWN5IGFuZCBNYW5hZ2VtZW50LCBNaWxrZW4gSW5zdGl0dXRlIFNjaG9vbCBvZiBQdWJsaWMg

SGVhbHRoLCBHZW9yZ2UgV2FzaGluZ3RvbiBVbml2ZXJzaXR5LCBXYXNoaW5ndG9uLCBEQyAyMDA1

Mi4mI3hEO1VTIERlcGFydG1lbnQgb2YgRW5lcmd5IEpvaW50IEdlbm9tZSBJbnN0aXR1dGUsIFdh

bG51dCBDcmVlaywgQ0EgOTQ1OTguJiN4RDtEZXBhcnRtZW50IG9mIFBsYW50IGFuZCBNaWNyb2Jp

YWwgQmlvbG9neSwgVW5pdmVyc2l0eSBvZiBDYWxpZm9ybmlhLCBCZXJrZWxleSwgQ0EgOTQ3MjAu

JiN4RDtBZ3JpY3VsdHVyYWwgUmVzZWFyY2ggQ2VudGVyLCBVUyBEZXBhcnRtZW50IG9mIEFncmlj

dWx0dXJlLCBCZWx0c3ZpbGxlLCBNRCAyMDcwNS4mI3hEO1VDIFNhbnRhIENydXogR2Vub21pY3Mg

SW5zdGl0dXRlLCBVbml2ZXJzaXR5IG9mIENhbGlmb3JuaWEsIFNhbnRhIENydXosIENBIDk1MDY0

LiYjeEQ7SG93YXJkIEh1Z2hlcyBNZWRpY2FsIEluc3RpdHV0ZSwgVW5pdmVyc2l0eSBvZiBDYWxp

Zm9ybmlhLCBTYW50YSBDcnV6LCBDQSA5NTA2NC4mI3hEO0xhYm9yYXRvcnkgb2YgTmV1cm9nZW5l

dGljcyBvZiBMYW5ndWFnZSwgVGhlIFJvY2tlZmVsbGVyIFVuaXZlcnNpdHksIE5ldyBZb3JrLCBO

WSAxMDA2NS4mI3hEO0NvbnNlcnZhdGlvbiBCaW9sb2d5IEluc3RpdHV0ZSwgTmF0aW9uYWwgWm9v

bG9naWNhbCBQYXJrLCBTbWl0aHNvbmlhbiBJbnN0aXR1dGlvbiwgRnJvbnQgUm95YWwsIFZBIDIy

NjMwLiYjeEQ7Tm92aW0gR3JvdXAsIFVuaXZlcnNpdHkgb2YgQ2FsaWZvcm5pYSwgU2FudGEgQmFy

YmFyYSwgQ0EgOTMxMDYuJiN4RDtIdW1hbiBHZW5vbWUgU2VxdWVuY2luZyBDZW50ZXIsIEJheWxv

ciBDb2xsZWdlIG9mIE1lZGljaW5lLCBIb3VzdG9uLCBUWCA3NzAzMC4mI3hEO1dvcmxkIEVjb25v

bWljIEZvcnVtJmFwb3M7cyBHbG9iYWwgRnV0dXJlIENvdW5jaWwgb24gRW52aXJvbm1lbnQgYW5k

IE5hdHVyYWwgUmVzb3VyY2UgU2VjdXJpdHksIENvbG9nbnkvR2VuZXZhIENILTEyMjMsIFN3aXR6

ZXJsYW5kLiYjeEQ7U3BhY2UgVGltZSBWZW50dXJlcywgU2FvIFBhdWxvLCBTUCwgMDU0NDktMDUw

LCBCcmF6aWwuJiN4RDtEZXBhcnRtZW50byBkZSBCb3RhbmljYSwgSW5zdGl0dXRvIGRlIEJpb2Np

ZW5jaWEsIFVuaXZlcnNpZGFkZSBkZSBTYW8gUGF1bG8sIFNhbyBQYXVsbywgU1AgMDU1MDgtMDkw

LCBCcmF6aWwuJiN4RDtTYW8gUGF1bG8gUmVzZWFyY2ggRm91bmRhdGlvbiAoRkFQRVNQKSwgU1Ag

MDU0NjgtOTAxLCBCcmF6aWwuJiN4RDtGbG9yaWRhIE11c2V1bSBvZiBOYXR1cmFsIEhpc3Rvcnks

IFVuaXZlcnNpdHkgb2YgRmxvcmlkYSwgR2FpbmVzdmlsbGUsIEZMIDMyNjExLiYjeEQ7Q2hpbmEg

TmF0aW9uYWwgR2VuZWJhbmssIEJHSS1TaGVuemhlbiwgNTE4MDgzIFNoZW56aGVuLCBHdWFuZ2Rv

bmcsIENoaW5hLiYjeEQ7QkdJLVNoZW56aGVuLCA1MTgwODMgU2hlbnpoZW4sIEd1YW5nZG9uZywg

Q2hpbmEuJiN4RDtTZWN0aW9uIGZvciBFY29sb2d5IGFuZCBFdm9sdXRpb24sIERlcGFydG1lbnQg

b2YgQmlvbG9neSwgVW5pdmVyc2l0eSBvZiBDb3BlbmhhZ2VuLCBESy0yMTAwIENvcGVuaGFnZW4s

IERlbm1hcmsuJiN4RDtTdGF0ZSBLZXkgTGFib3JhdG9yeSBvZiBHZW5ldGljIFJlc291cmNlcyBh

bmQgRXZvbHV0aW9uLCBLdW5taW5nIEluc3RpdHV0ZSBvZiBab29sb2d5LCBDaGluZXNlIEFjYWRl

bXkgb2YgU2NpZW5jZXMsIDY1MDIyMyBLdW5taW5nLCBDaGluYS48L2F1dGgtYWRkcmVzcz48dGl0

bGVzPjx0aXRsZT5FYXJ0aCBCaW9HZW5vbWUgUHJvamVjdDogU2VxdWVuY2luZyBsaWZlIGZvciB0

aGUgZnV0dXJlIG9mIGxpZmU8L3RpdGxlPjxzZWNvbmRhcnktdGl0bGU+UHJvYyBOYXRsIEFjYWQg

U2NpIFUgUyBBPC9zZWNvbmRhcnktdGl0bGU+PGFsdC10aXRsZT5Qcm9jZWVkaW5ncyBvZiB0aGUg

TmF0aW9uYWwgQWNhZGVteSBvZiBTY2llbmNlcyBvZiB0aGUgVW5pdGVkIFN0YXRlcyBvZiBBbWVy

aWNhPC9hbHQtdGl0bGU+PC90aXRsZXM+PHBlcmlvZGljYWw+PGZ1bGwtdGl0bGU+UHJvYyBOYXRs

IEFjYWQgU2NpIFUgUyBBPC9mdWxsLXRpdGxlPjxhYmJyLTE+UHJvY2VlZGluZ3Mgb2YgdGhlIE5h

dGlvbmFsIEFjYWRlbXkgb2YgU2NpZW5jZXMgb2YgdGhlIFVuaXRlZCBTdGF0ZXMgb2YgQW1lcmlj

YTwvYWJici0xPjwvcGVyaW9kaWNhbD48YWx0LXBlcmlvZGljYWw+PGZ1bGwtdGl0bGU+UHJvYyBO

YXRsIEFjYWQgU2NpIFUgUyBBPC9mdWxsLXRpdGxlPjxhYmJyLTE+UHJvY2VlZGluZ3Mgb2YgdGhl

IE5hdGlvbmFsIEFjYWRlbXkgb2YgU2NpZW5jZXMgb2YgdGhlIFVuaXRlZCBTdGF0ZXMgb2YgQW1l

cmljYTwvYWJici0xPjwvYWx0LXBlcmlvZGljYWw+PHBhZ2VzPjQzMjUtNDMzMzwvcGFnZXM+PHZv

bHVtZT4xMTU8L3ZvbHVtZT48bnVtYmVyPjE3PC9udW1iZXI+PGtleXdvcmRzPjxrZXl3b3JkPipC

aW9kaXZlcnNpdHk8L2tleXdvcmQ+PGtleXdvcmQ+RWFydGggKFBsYW5ldCk8L2tleXdvcmQ+PGtl

eXdvcmQ+KkVuZGFuZ2VyZWQgU3BlY2llczwva2V5d29yZD48a2V5d29yZD4qR2Vub21lPC9rZXl3

b3JkPjxrZXl3b3JkPipIaWdoLVRocm91Z2hwdXQgTnVjbGVvdGlkZSBTZXF1ZW5jaW5nPC9rZXl3

b3JkPjwva2V5d29yZHM+PGRhdGVzPjx5ZWFyPjIwMTg8L3llYXI+PHB1Yi1kYXRlcz48ZGF0ZT5B

cHIgMjQ8L2RhdGU+PC9wdWItZGF0ZXM+PC9kYXRlcz48aXNibj4xMDkxLTY0OTAgKEVsZWN0cm9u

aWMpJiN4RDswMDI3LTg0MjQgKExpbmtpbmcpPC9pc2JuPjxhY2Nlc3Npb24tbnVtPjI5Njg2MDY1

PC9hY2Nlc3Npb24tbnVtPjx1cmxzPjxyZWxhdGVkLXVybHM+PHVybD5odHRwOi8vd3d3Lm5jYmku

bmxtLm5paC5nb3YvcHVibWVkLzI5Njg2MDY1PC91cmw+PHVybD5odHRwczovL3d3dy5uY2JpLm5s

bS5uaWguZ292L3BtYy9hcnRpY2xlcy9QTUM1OTI0OTEwL3BkZi9wbmFzLjIwMTcyMDExNS5wZGY8

L3VybD48L3JlbGF0ZWQtdXJscz48L3VybHM+PGN1c3RvbTI+NTkyNDkxMDwvY3VzdG9tMj48ZWxl

Y3Ryb25pYy1yZXNvdXJjZS1udW0+MTAuMTA3My9wbmFzLjE3MjAxMTUxMTU8L2VsZWN0cm9uaWMt

cmVzb3VyY2UtbnVtPjwvcmVjb3JkPjwvQ2l0ZT48L0VuZE5vdGU+AG==

ADDIN EN.CITE PEVuZE5vdGU+PENpdGU+PEF1dGhvcj5MZXdpbjwvQXV0aG9yPjxZZWFyPjIwMTg8L1llYXI+PFJl

Y051bT4xNTA8L1JlY051bT48RGlzcGxheVRleHQ+Wzk2XTwvRGlzcGxheVRleHQ+PHJlY29yZD48

cmVjLW51bWJlcj4xNTA8L3JlYy1udW1iZXI+PGZvcmVpZ24ta2V5cz48a2V5IGFwcD0iRU4iIGRi

LWlkPSJyYXZ4ZjA5cHM5MnAwYmVzdnM3NWRzd3o1MGZhcDBwYXh4MngiIHRpbWVzdGFtcD0iMTU2

OTQ3NTg4MSI+MTUwPC9rZXk+PC9mb3JlaWduLWtleXM+PHJlZi10eXBlIG5hbWU9IkpvdXJuYWwg

QXJ0aWNsZSI+MTc8L3JlZi10eXBlPjxjb250cmlidXRvcnM+PGF1dGhvcnM+PGF1dGhvcj5MZXdp

biwgSC4gQS48L2F1dGhvcj48YXV0aG9yPlJvYmluc29uLCBHLiBFLjwvYXV0aG9yPjxhdXRob3I+

S3Jlc3MsIFcuIEouPC9hdXRob3I+PGF1dGhvcj5CYWtlciwgVy4gSi48L2F1dGhvcj48YXV0aG9y

PkNvZGRpbmd0b24sIEouPC9hdXRob3I+PGF1dGhvcj5DcmFuZGFsbCwgSy4gQS48L2F1dGhvcj48

YXV0aG9yPkR1cmJpbiwgUi48L2F1dGhvcj48YXV0aG9yPkVkd2FyZHMsIFMuIFYuPC9hdXRob3I+

PGF1dGhvcj5Gb3Jlc3QsIEYuPC9hdXRob3I+PGF1dGhvcj5HaWxiZXJ0LCBNLiBULiBQLjwvYXV0

aG9yPjxhdXRob3I+R29sZHN0ZWluLCBNLiBNLjwvYXV0aG9yPjxhdXRob3I+R3JpZ29yaWV2LCBJ

LiBWLjwvYXV0aG9yPjxhdXRob3I+SGFja2V0dCwgSy4gSi48L2F1dGhvcj48YXV0aG9yPkhhdXNz

bGVyLCBELjwvYXV0aG9yPjxhdXRob3I+SmFydmlzLCBFLiBELjwvYXV0aG9yPjxhdXRob3I+Sm9o

bnNvbiwgVy4gRS48L2F1dGhvcj48YXV0aG9yPlBhdHJpbm9zLCBBLjwvYXV0aG9yPjxhdXRob3I+

UmljaGFyZHMsIFMuPC9hdXRob3I+PGF1dGhvcj5DYXN0aWxsYS1SdWJpbywgSi4gQy48L2F1dGhv

cj48YXV0aG9yPnZhbiBTbHV5cywgTS4gQS48L2F1dGhvcj48YXV0aG9yPlNvbHRpcywgUC4gUy48

L2F1dGhvcj48YXV0aG9yPlh1LCBYLjwvYXV0aG9yPjxhdXRob3I+WWFuZywgSC48L2F1dGhvcj48

YXV0aG9yPlpoYW5nLCBHLjwvYXV0aG9yPjwvYXV0aG9ycz48L2NvbnRyaWJ1dG9ycz48YXV0aC1h

ZGRyZXNzPkRlcGFydG1lbnQgb2YgRXZvbHV0aW9uIGFuZCBFY29sb2d5LCBVbml2ZXJzaXR5IG9m

IENhbGlmb3JuaWEsIERhdmlzLCBDQSA5NTYxNjsgTGV3aW5AdWNkYXZpcy5lZHUuJiN4RDtEZXBh

cnRtZW50IG9mIFBvcHVsYXRpb24gSGVhbHRoIGFuZCBSZXByb2R1Y3Rpb24sIFNjaG9vbCBvZiBW

ZXRlcmluYXJ5IE1lZGljaW5lLCBVbml2ZXJzaXR5IG9mIENhbGlmb3JuaWEsIERhdmlzLCBDQSA5

NTYxNi4mI3hEO1RoZSBKb2huIE11aXIgSW5zdGl0dXRlIG9mIHRoZSBFbnZpcm9ubWVudCwgVW5p

dmVyc2l0eSBvZiBDYWxpZm9ybmlhLCBEYXZpcywgQ0EgOTU2MTYuJiN4RDtUaGUgVW5pdmVyc2l0

eSBvZiBDYWxpZm9ybmlhLCBEYXZpcyBHZW5vbWUgQ2VudGVyLCBVbml2ZXJzaXR5IG9mIENhbGlm

b3JuaWEsIERhdmlzLCBDQSA5NTYxNi4mI3hEO0NhcmwgUi4gV29lc2UgSW5zdGl0dXRlIGZvciBH

ZW5vbWljIEJpb2xvZ3ksIERlcGFydG1lbnQgb2YgRW50b21vbG9neSwgYW5kIE5ldXJvc2NpZW5j

ZSBQcm9ncmFtLCBVbml2ZXJzaXR5IG9mIElsbGlub2lzIGF0IFVyYmFuYS1DaGFtcGFpZ24sIFVy

YmFuYSwgSUwgNjE4MDEuJiN4RDtOYXRpb25hbCBNdXNldW0gb2YgTmF0dXJhbCBIaXN0b3J5LCBT

bWl0aHNvbmlhbiBJbnN0aXR1dGlvbiwgV2FzaGluZ3RvbiwgREMgMjAwMTMuJiN4RDtSb3lhbCBC

b3RhbmljIEdhcmRlbnMsIEtldywgUmljaG1vbmQsIFN1cnJleSBUVzkgM0FFLCBVbml0ZWQgS2lu

Z2RvbS4mI3hEO0NvbXB1dGF0aW9uYWwgQmlvbG9neSBJbnN0aXR1dGUsIE1pbGtlbiBJbnN0aXR1

dGUgU2Nob29sIG9mIFB1YmxpYyBIZWFsdGgsIEdlb3JnZSBXYXNoaW5ndG9uIFVuaXZlcnNpdHks

IFdhc2hpbmd0b24sIERDIDIwMDUyLiYjeEQ7RGVwYXJ0bWVudCBvZiBHZW5ldGljcywgVW5pdmVy

c2l0eSBvZiBDYW1icmlkZ2UsIENhbWJyaWRnZSBDQjEwIDFTQSwgVW5pdGVkIEtpbmdkb20uJiN4

RDtXZWxsY29tZSBUcnVzdCBTYW5nZXIgSW5zdGl0dXRlLCBDYW1icmlkZ2UgQ0IxMCAxU0EsIFVu

aXRlZCBLaW5nZG9tLiYjeEQ7RGVwYXJ0bWVudCBvZiBPcmdhbmlzbWljIGFuZCBFdm9sdXRpb25h

cnkgQmlvbG9neSwgSGFydmFyZCBVbml2ZXJzaXR5LCBDYW1icmlkZ2UsIE1BIDAyMTM4LiYjeEQ7

TXVzZXVtIG9mIENvbXBhcmF0aXZlIFpvb2xvZ3ksIEhhcnZhcmQgVW5pdmVyc2l0eSwgQ2FtYnJp

ZGdlLCBNQSAwMjEzOC4mI3hEO05hdHVyYWwgSGlzdG9yeSBNdXNldW0gb2YgRGVubWFyaywgVW5p

dmVyc2l0eSBvZiBDb3BlbmhhZ2VuLCAxMzUwIENvcGVuaGFnZW4sIERlbm1hcmsuJiN4RDtVbml2

ZXJzaXR5IE11c2V1bSwgTm9yd2VnaWFuIFVuaXZlcnNpdHkgb2YgU2NpZW5jZSBhbmQgVGVjaG5v

bG9neSwgTi03NDkxIFRyb25kaGVpbSwgTm9yd2F5LiYjeEQ7RGVwYXJ0bWVudCBvZiBIZWFsdGgg

UG9saWN5IGFuZCBNYW5hZ2VtZW50LCBNaWxrZW4gSW5zdGl0dXRlIFNjaG9vbCBvZiBQdWJsaWMg

SGVhbHRoLCBHZW9yZ2UgV2FzaGluZ3RvbiBVbml2ZXJzaXR5LCBXYXNoaW5ndG9uLCBEQyAyMDA1

Mi4mI3hEO1VTIERlcGFydG1lbnQgb2YgRW5lcmd5IEpvaW50IEdlbm9tZSBJbnN0aXR1dGUsIFdh

bG51dCBDcmVlaywgQ0EgOTQ1OTguJiN4RDtEZXBhcnRtZW50IG9mIFBsYW50IGFuZCBNaWNyb2Jp

YWwgQmlvbG9neSwgVW5pdmVyc2l0eSBvZiBDYWxpZm9ybmlhLCBCZXJrZWxleSwgQ0EgOTQ3MjAu

JiN4RDtBZ3JpY3VsdHVyYWwgUmVzZWFyY2ggQ2VudGVyLCBVUyBEZXBhcnRtZW50IG9mIEFncmlj

dWx0dXJlLCBCZWx0c3ZpbGxlLCBNRCAyMDcwNS4mI3hEO1VDIFNhbnRhIENydXogR2Vub21pY3Mg

SW5zdGl0dXRlLCBVbml2ZXJzaXR5IG9mIENhbGlmb3JuaWEsIFNhbnRhIENydXosIENBIDk1MDY0

LiYjeEQ7SG93YXJkIEh1Z2hlcyBNZWRpY2FsIEluc3RpdHV0ZSwgVW5pdmVyc2l0eSBvZiBDYWxp

Zm9ybmlhLCBTYW50YSBDcnV6LCBDQSA5NTA2NC4mI3hEO0xhYm9yYXRvcnkgb2YgTmV1cm9nZW5l

dGljcyBvZiBMYW5ndWFnZSwgVGhlIFJvY2tlZmVsbGVyIFVuaXZlcnNpdHksIE5ldyBZb3JrLCBO

WSAxMDA2NS4mI3hEO0NvbnNlcnZhdGlvbiBCaW9sb2d5IEluc3RpdHV0ZSwgTmF0aW9uYWwgWm9v

bG9naWNhbCBQYXJrLCBTbWl0aHNvbmlhbiBJbnN0aXR1dGlvbiwgRnJvbnQgUm95YWwsIFZBIDIy

NjMwLiYjeEQ7Tm92aW0gR3JvdXAsIFVuaXZlcnNpdHkgb2YgQ2FsaWZvcm5pYSwgU2FudGEgQmFy

YmFyYSwgQ0EgOTMxMDYuJiN4RDtIdW1hbiBHZW5vbWUgU2VxdWVuY2luZyBDZW50ZXIsIEJheWxv

ciBDb2xsZWdlIG9mIE1lZGljaW5lLCBIb3VzdG9uLCBUWCA3NzAzMC4mI3hEO1dvcmxkIEVjb25v

bWljIEZvcnVtJmFwb3M7cyBHbG9iYWwgRnV0dXJlIENvdW5jaWwgb24gRW52aXJvbm1lbnQgYW5k

IE5hdHVyYWwgUmVzb3VyY2UgU2VjdXJpdHksIENvbG9nbnkvR2VuZXZhIENILTEyMjMsIFN3aXR6

ZXJsYW5kLiYjeEQ7U3BhY2UgVGltZSBWZW50dXJlcywgU2FvIFBhdWxvLCBTUCwgMDU0NDktMDUw

LCBCcmF6aWwuJiN4RDtEZXBhcnRtZW50byBkZSBCb3RhbmljYSwgSW5zdGl0dXRvIGRlIEJpb2Np

ZW5jaWEsIFVuaXZlcnNpZGFkZSBkZSBTYW8gUGF1bG8sIFNhbyBQYXVsbywgU1AgMDU1MDgtMDkw

LCBCcmF6aWwuJiN4RDtTYW8gUGF1bG8gUmVzZWFyY2ggRm91bmRhdGlvbiAoRkFQRVNQKSwgU1Ag

MDU0NjgtOTAxLCBCcmF6aWwuJiN4RDtGbG9yaWRhIE11c2V1bSBvZiBOYXR1cmFsIEhpc3Rvcnks

IFVuaXZlcnNpdHkgb2YgRmxvcmlkYSwgR2FpbmVzdmlsbGUsIEZMIDMyNjExLiYjeEQ7Q2hpbmEg

TmF0aW9uYWwgR2VuZWJhbmssIEJHSS1TaGVuemhlbiwgNTE4MDgzIFNoZW56aGVuLCBHdWFuZ2Rv

bmcsIENoaW5hLiYjeEQ7QkdJLVNoZW56aGVuLCA1MTgwODMgU2hlbnpoZW4sIEd1YW5nZG9uZywg

Q2hpbmEuJiN4RDtTZWN0aW9uIGZvciBFY29sb2d5IGFuZCBFdm9sdXRpb24sIERlcGFydG1lbnQg

b2YgQmlvbG9neSwgVW5pdmVyc2l0eSBvZiBDb3BlbmhhZ2VuLCBESy0yMTAwIENvcGVuaGFnZW4s

IERlbm1hcmsuJiN4RDtTdGF0ZSBLZXkgTGFib3JhdG9yeSBvZiBHZW5ldGljIFJlc291cmNlcyBh

bmQgRXZvbHV0aW9uLCBLdW5taW5nIEluc3RpdHV0ZSBvZiBab29sb2d5LCBDaGluZXNlIEFjYWRl

bXkgb2YgU2NpZW5jZXMsIDY1MDIyMyBLdW5taW5nLCBDaGluYS48L2F1dGgtYWRkcmVzcz48dGl0

bGVzPjx0aXRsZT5FYXJ0aCBCaW9HZW5vbWUgUHJvamVjdDogU2VxdWVuY2luZyBsaWZlIGZvciB0

aGUgZnV0dXJlIG9mIGxpZmU8L3RpdGxlPjxzZWNvbmRhcnktdGl0bGU+UHJvYyBOYXRsIEFjYWQg

U2NpIFUgUyBBPC9zZWNvbmRhcnktdGl0bGU+PGFsdC10aXRsZT5Qcm9jZWVkaW5ncyBvZiB0aGUg

TmF0aW9uYWwgQWNhZGVteSBvZiBTY2llbmNlcyBvZiB0aGUgVW5pdGVkIFN0YXRlcyBvZiBBbWVy

aWNhPC9hbHQtdGl0bGU+PC90aXRsZXM+PHBlcmlvZGljYWw+PGZ1bGwtdGl0bGU+UHJvYyBOYXRs

IEFjYWQgU2NpIFUgUyBBPC9mdWxsLXRpdGxlPjxhYmJyLTE+UHJvY2VlZGluZ3Mgb2YgdGhlIE5h

dGlvbmFsIEFjYWRlbXkgb2YgU2NpZW5jZXMgb2YgdGhlIFVuaXRlZCBTdGF0ZXMgb2YgQW1lcmlj

YTwvYWJici0xPjwvcGVyaW9kaWNhbD48YWx0LXBlcmlvZGljYWw+PGZ1bGwtdGl0bGU+UHJvYyBO

YXRsIEFjYWQgU2NpIFUgUyBBPC9mdWxsLXRpdGxlPjxhYmJyLTE+UHJvY2VlZGluZ3Mgb2YgdGhl

IE5hdGlvbmFsIEFjYWRlbXkgb2YgU2NpZW5jZXMgb2YgdGhlIFVuaXRlZCBTdGF0ZXMgb2YgQW1l

cmljYTwvYWJici0xPjwvYWx0LXBlcmlvZGljYWw+PHBhZ2VzPjQzMjUtNDMzMzwvcGFnZXM+PHZv

bHVtZT4xMTU8L3ZvbHVtZT48bnVtYmVyPjE3PC9udW1iZXI+PGtleXdvcmRzPjxrZXl3b3JkPipC

aW9kaXZlcnNpdHk8L2tleXdvcmQ+PGtleXdvcmQ+RWFydGggKFBsYW5ldCk8L2tleXdvcmQ+PGtl

eXdvcmQ+KkVuZGFuZ2VyZWQgU3BlY2llczwva2V5d29yZD48a2V5d29yZD4qR2Vub21lPC9rZXl3

b3JkPjxrZXl3b3JkPipIaWdoLVRocm91Z2hwdXQgTnVjbGVvdGlkZSBTZXF1ZW5jaW5nPC9rZXl3

b3JkPjwva2V5d29yZHM+PGRhdGVzPjx5ZWFyPjIwMTg8L3llYXI+PHB1Yi1kYXRlcz48ZGF0ZT5B

cHIgMjQ8L2RhdGU+PC9wdWItZGF0ZXM+PC9kYXRlcz48aXNibj4xMDkxLTY0OTAgKEVsZWN0cm9u

aWMpJiN4RDswMDI3LTg0MjQgKExpbmtpbmcpPC9pc2JuPjxhY2Nlc3Npb24tbnVtPjI5Njg2MDY1

PC9hY2Nlc3Npb24tbnVtPjx1cmxzPjxyZWxhdGVkLXVybHM+PHVybD5odHRwOi8vd3d3Lm5jYmku

bmxtLm5paC5nb3YvcHVibWVkLzI5Njg2MDY1PC91cmw+PHVybD5odHRwczovL3d3dy5uY2JpLm5s

bS5uaWguZ292L3BtYy9hcnRpY2xlcy9QTUM1OTI0OTEwL3BkZi9wbmFzLjIwMTcyMDExNS5wZGY8

L3VybD48L3JlbGF0ZWQtdXJscz48L3VybHM+PGN1c3RvbTI+NTkyNDkxMDwvY3VzdG9tMj48ZWxl

Y3Ryb25pYy1yZXNvdXJjZS1udW0+MTAuMTA3My9wbmFzLjE3MjAxMTUxMTU8L2VsZWN0cm9uaWMt

cmVzb3VyY2UtbnVtPjwvcmVjb3JkPjwvQ2l0ZT48L0VuZE5vdGU+AG==

ADDIN EN.CITE.DATA [96].In order to both foster financing for the project, as well as enabling the benefit sharing of the results, the Earth Bank of Codes was created. It theoretically plans to store NSD and SI obtained from the Earth Biogenome Project in a blockchain to enable traceability and the sharing of benefits. According to the website, the Amazonas basin and its biodiversity will be used as a first pilot project for the system of the Earth Bank of Codes, which is sometimes also referred to as the Amazonas Bank of Codes ADDIN EN.CITE <EndNote><Cite><Author>Earth Bank of Codes</Author><RecNum>151</RecNum><DisplayText>[97]</DisplayText><record><rec-number>151</rec-number><foreign-keys><key app="EN" db-id="ravxf09ps92p0besvs75dswz50fap0paxx2x" timestamp="1569476104">151</key></foreign-keys><ref-type name="Web Page">12</ref-type><contributors><authors><author>Earth Bank of Codes,</author></authors></contributors><titles><title>Homepage</title></titles><volume>2019</volume><number>Sep 26</number><dates></dates><urls><related-urls><url> </url></related-urls></urls></record></Cite></EndNote>[97].Although we could not obtain current information on the Earth Bank of Codes or the Amazonas Bank of Codes, the United Kingdom’s Darwin Tree of Life (DToL) project, which will feed into the Earth Biogenome Project, plans to sequence 66,000 UK species with projects costs around 100 million GBP. However, as the project is financed by the government of the UK, it is intended to deposit the NSD from DToL in the INSDC, without using blockchain. This is an example of a “net user country” self-providing NSD rather than relying on a “net provider country” NSD (see Section 3.5). As the UK has no ABS access obligations, the NSD generated will be available as open access via the INSDC.Since the whole project is in a rather early stage, there is no concrete information obtainable on how exactly, or whether at all, the blockchain system will be used, how it is going to work, what the costs might be, and who will pay for them. Conclusions on blockchainIn summary, the blockchain is a technical option that is more applicable the more it meets the following conditions:Willingness to pay high up-front investment for the setup of the system and permanent infrastructure costs for the upkeep.The information inside different blocks needs to be defined and similar to each other. The processes/analyses that can be conducted with the information are clearly defined.Technical limitations of blockchain scale with the amount of information within each block.Human genomes with accompanying health information in the health sector fulfill all these criteria. The information is NSD of a homogenous length and similar characteristics, analysis methods and procedures are defined. A high financial benefit occurs for companies to make them pay for the access. A major factor is that human genomes are of rather similar economic value (as compared to non-human sequences), which on average is way higher than for the average non-human sequence. With regard to DSI under the CBD, there are some important considerations: DSI would have to be defined and limited to a machine readable, highly standardized data format.The kinds of analysis that could be done on the DSI would need to be defined, as well as agreed on by all parties a priori. This could be challenging c and may lead to scientific restrictions.Biodiversity NSD is extremely heterogeneous, often poorly understood, and of predominantly unknown or limited economic value relative to human NSD combined with patient health information. At the same time, the decision to “take” certain NSD under a blockchain must be made prior to any research exploring its potential value.A major problem for blockchain’s applicability to NSD traceability is the possibility of circulation of NSD outside the system. NSD can easily be downloaded, shared online, sent via email and manipulated. Bitcoin, if taken outside of the block chain is worthless and thus strongly motivates users to stay in the blockchain. NSD outside of a blockchain based sequence system is still NSD and has no loss of value. In other words, users are motivated to stay in the Bitcoin blockchain because otherwise all value is lost. This motivation would not exist for NSD.5.3 Data mining and cloud genomicsThe volume of NSD, together with its level of curation and availability, favor large scale meta-analysis. In meta-analysis no experimental data is created, but the information of many studies/experiments are collected and analysed – so-called “big data” analysis. Many new bioinformatics tools and biological databases are built by developing new algorithms and scientific approaches and subsequently mining public databases for large amounts of relevant NSD. The additional value stems from the collection and combination of already existing knowledge, as well as performing new bioinformatic analyses. As storage space and computational power are a limiting factor here, cloud genomics are emerging. Cloud genomics means that a third party, like Google Genomics ADDIN EN.CITE <EndNote><Cite><Author>Google Cloud</Author><RecNum>152</RecNum><DisplayText>[98]</DisplayText><record><rec-number>152</rec-number><foreign-keys><key app="EN" db-id="ravxf09ps92p0besvs75dswz50fap0paxx2x" timestamp="1569476362">152</key></foreign-keys><ref-type name="Web Page">12</ref-type><contributors><authors><author>Google Cloud,</author></authors></contributors><titles><title>Google Genomics Homepage</title></titles><volume>2019</volume><number>Sep 26</number><dates></dates><urls><related-urls><url>;[98] or Amazon AWS ADDIN EN.CITE <EndNote><Cite><Author>Amazon Web Services</Author><RecNum>153</RecNum><DisplayText>[99]</DisplayText><record><rec-number>153</rec-number><foreign-keys><key app="EN" db-id="ravxf09ps92p0besvs75dswz50fap0paxx2x" timestamp="1569476486">153</key></foreign-keys><ref-type name="Web Page">12</ref-type><contributors><authors><author>Amazon Web Services,</author></authors></contributors><titles><title>High Performance Computing</title></titles><volume>2019</volume><number>Sep 26</number><dates></dates><urls><related-urls><url>;[99], rents storage space and computer power to scientific institutions and companies. This is basically like a normal cloud service, just with tailor made applications for genetic research. They provide a private workbench, in which all the people engaged in a project can access the cloud. The major advantage is that the whole data set needs only to be stored once on the cloud, never downloaded and all analysis needs to be done just once, instead of having to use computational power and storage for redundant information/tasks.The use of cloud genomics limits users to a secure system where the analyses and operations available on the hosting platform are fixed by the cloud host. Such a system might not be able to connect with the open public INSDC infrastructure (e.g. its analytical tools) including the >1,600 public databases. So the tools may have to be provided by the cloud host (however, the NSD is still available at the INSDC and can easily be fed into any other system or database)In order to make the concept of cloud genomics more tangible to the reader, we have invented a theoretical example. Let us assume there are 10 major research institutions on cats, located in five different countries, e.g. USA, China, France, South Africa and Chile. These Institutions want to work together in a large research project, which aims at using all existent biological information on all cats currently available. They collect all information of NSD, SI and publications, reaching the sum of 1 petabyte. If every institution would store this dataset, the storage space would be 10 petabytes. Instead they pay a cloud service to have their dataset stored in a cloud, accessible for every researcher participating in the project. Additionally, they can now perform every analysis in the cloud. Thus, the analysis and its results do not need to be sent to the other institutions and every researcher can always see what has been done so far. They can also rent the vast computational power of the cloud service at any time they need it and they all have access to the same software platforms and analysis tools. This gives the single researchers and institutions the option to conduct large scale data analysis, without the need to buy and upkeep large servers, which they do not need otherwise.5.4 Other models for digital content Digital versions of art, like music and movies enjoy wide use around the world and tech giants like Spotify ADDIN EN.CITE <EndNote><Cite><Author>Spotify AB</Author><RecNum>65</RecNum><DisplayText>[100]</DisplayText><record><rec-number>65</rec-number><foreign-keys><key app="EN" db-id="ravxf09ps92p0besvs75dswz50fap0paxx2x" timestamp="1564379600">65</key></foreign-keys><ref-type name="Web Page">12</ref-type><contributors><authors><author>Spotify AB,</author></authors></contributors><titles><title>Spotify Homepage</title></titles><volume>2019</volume><number>Jul 29</number><dates></dates><urls><related-urls><url> </url></related-urls></urls><language>85</language></record></Cite></EndNote>[100] or Netflix ADDIN EN.CITE <EndNote><Cite><Author>Netflix International B.V.</Author><RecNum>66</RecNum><DisplayText>[101]</DisplayText><record><rec-number>66</rec-number><foreign-keys><key app="EN" db-id="ravxf09ps92p0besvs75dswz50fap0paxx2x" timestamp="1564379681">66</key></foreign-keys><ref-type name="Web Page">12</ref-type><contributors><authors><author>Netflix International B.V.,</author></authors></contributors><titles><title>Netflix Homepage</title></titles><volume>2019</volume><number>Jul 29</number><dates></dates><urls><related-urls><url>;[101] enable users to access and consume content without being able to download or extract it from the provider platform. At first glance, it is appealing to imagine a CBD-relevant NSD dataset in a Spotify-bundle where subscription fees support use. However, the main difference with regard to NSD is that, in order to be useful for research, NSD needs to be manipulated and used – there is no “passive use” of NSD as there is with media consumption (even the most simple analysis of NSD such as the use of BLAST requires the user to actively select a sequence and to adjust parameters and define cut-off values). Simply put, a user cannot “read” the ACGTs of NSD and come away from this type of passive interaction informed or content. NSD gets analysed via bioinformatics tools and compared with each other and modified and used. It is digitally “hands on” analysis (see also sections 4.3 and 5.2-3for further explanation). The download, transfer and manipulation of NSD is a necessary pre-condition for the generation of new SI. Any technological solution modeled on Spotify and Netflix for NSD would need to have a near endless amount of necessary bioinformatic tools available and integrated, in order to be of any value for users. This would be more like the business model of Apple Inc. ADDIN EN.CITE <EndNote><Cite><Author>Apple Inc.</Author><RecNum>67</RecNum><DisplayText>[102]</DisplayText><record><rec-number>67</rec-number><foreign-keys><key app="EN" db-id="ravxf09ps92p0besvs75dswz50fap0paxx2x" timestamp="1564379754">67</key></foreign-keys><ref-type name="Web Page">12</ref-type><contributors><authors><author>Apple Inc.,</author></authors></contributors><titles><title>Apple Homepage</title></titles><volume>2019</volume><number>Jul 29</number><dates></dates><urls><related-urls><url>;[102], providing not only a content system (e.g. iTunes), but also every software and hardware related in order to keep a closed system. Such a system would have high development and maintenance costs likely far beyond the annual $50 million USD of the INSDC databases and, contrary to Apple, Netflix, etc., no broad market of billions of users.6. Implications for future discussions on DSI 6.1 Challenges for NSD traceabilityA primary obstacle towards a new system of NSD traceability (Section 5) is that a significant amount of NSD is and will be freely available via the open traceability system offered by the INSDC (Section 3 and 4). This openness has revolutionized the life sciences and remains the default assumption for users of NSD. Since the majority of NSD is statistically unlikely to have ABS obligations (see Section 4.2), the INSDC is likely to continue regardless of decisions made within the CBD. If CBD-relevant NSD submitters and users of a new non-INSDC system were forced into an alternative NSD database outside of the INSDC, they would be at a significant disadvantage in terms of scientific utilization, scientific interest, functionality, tools, ability to publish, collaborate, and work openly. Furthermore, any new system is also likely to be costly.The biological and scientific nature of NSD has unique characteristics that do not directly correlate with other fields, e.g. cryptocurrencies. For one, NSD has little value without context and comparison. Knowledge generation through NSD analysis is almost always done by comparative, iterative analysis, meaning the comparison of sequences in large quantities and the application of insights gained from scientific research continuously builds upon itself. The value of NSD comes primarily from additional information generated by scientific work and hypothesis testing which is enabled and complemented by unfettered access to NSD. The scientific interest in newly generated NSD stems from comparing it with the entire body of publicly known NSD. Without context and comparison to other NSD, single entries or small amounts of NSD are just letters in a row – millions of A, C, G and Ts without relevance or orientation. If NSD is partially or totally isolated from the public sphere, these separate NSD may be of limited value. In other words, isolated or new systems outside of the INSDC would greatly diminish the value of the NSD they contain, because the isolated NSD could not be put into relative context with the billions of NSD entries and publications already in public databases. The public NSD available via the INSDC would also suffer from this separation because the dataset would be less complete. This nature of NSD could suggest a holistic regulatory approach instead of differentiating between singular NSD entries and their specific parameters. In order to trace something, the unit of traceability must be defined. If NSD were to become a legally defined, traceable object, size thresholds that guarantee sequence uniqueness would have to be set, since identical short sequences can be found in every organism. NSD would likely need to have a certain minimum length of at least 30 bp, in order to be distinguishable from randomly-similar sequences (see Section 8.7 for calculations). However, this calculation assumes independent nucleotides at every position in the sequence, which is not biologically accurate. If the sequence codes for an enzyme found in many related organisms (where evolution has led to high similarity between organisms in a sequence) this sequence will have a lot of very similar counterparts. Here, a sequence in organism A can be identical to a sequence in organism B, either by natural selection or by biotechnological methods. In such cases, the nucleotide sequence alone will not be sufficient for tracing and a much longer sequence length would be needed to establish uniqueness. These biological definitions with legal implications will become extremely important as the policy process develops. The INSDC AN system of traceability avoids this biological challenge since the uniqueness is created through the identifier not the sequence itself.Another final consideration for discussions on traceability is that individuals and companies value their privacy (and in many democracies have a legal right to it) and any tracking mechanisms that involve user data could face significant hurdles from other legal sectors or ministries. Furthermore, tracking can also slow down the speed of data accession and analysis.Finally, as noted above, the INSDC-originated AN-based traceability system is largely intended for scientific purposes and not for regulatory purposes. There are several important issues to consider if CBD Parties should consider a traceable regulatory path for NSD that would build off the existing system:The NSD in the public databases are heterogeneous and our estimates suggest that at least half of the entire public NSD dataset is out of CBD scope (human, model organisms, biodiversity from non-party or free access countries). This means that there is no “blanket” solution for traceability of the entire INSDC database. This implies that significant intellectual, technological, and regulatory effort would need to be made to address this heterogeneity.There are more than >1,600 biological databases that are inter-connected and exchange data daily. Behind the scenes, different types of data are converted, transformed, and exchanged in many and multiple directions. This downstream database infrastructure is built by automated data flows and is technically. A CBD solution to the DSI problem that did not account for the integration of NSD+SI into this downstream infrastructure or did not enable CBD-relevant NSD to remain integrated in this infrastructure, would dramatically decrease the scientific value and utilization potential of these NSD.The volume of NSD is exponentially growing. This means that in addition to points 1 and 2 above, any long-term regulatory scheme would need to be prepared for “big data” interventions and the accompanying IT investments.While the AN system is helpful because every entry has a unique identifier, biology itself is more complicated. There are millions of repetitive NSD entries or parts of entries in the databases making it difficult to attribute all entries to a specific sovereign state. Finally, as mentioned above (Sections 3.5, 4.4, 5.2) with current technology, traceability outside of and beyond the databases is nearly impossible or technologically mis-matched to current practices.6.2 Practical observations about NSD & DSIOur analyses in Sections 3-4 uncovered technical observations that could be improved in the existing traceability system and which could increase legal certainty for both provider countries and scientists that use NSD. These observations were collected during the analyses carried out for this study as we attempted to understand the existing NSD traceability system and should be understood as practical “lessons learned”.On the NSD generation side, scientists could:Create a better link to the original Genetic Resource. Our analysis shows that 6% of sequences in GenBank have a link to the original publicly available GR. As our tests show, this number is too low and could be improved by scientists being more accurate with their sequence submissions and following citation guidelines of collection objects.Improve traceability to the country of origin. As our control tests show, 44% of NSD entries that did not report a country of origin could and should have reported a country of origin. The reporting trend is improving over time but has room for improvement. Scientists should be encouraged to become more diligent and receive appropriate training when submitting NSD.On the NSD infrastructure side, the INSDC could:Enforce country of origin requirements on new NSD submissions and increase user awareness. When sequences are submitted, there are requirements since 2011 to use the “/country” metadata tag provided in the submission form but, as our control tests show, there is clearly room for improvement. As a result, there are thousands or even millions of sequences in the INSDC that do not have country information associated with them (Figure 7) that could. Country information can be irrelevant or even inappropriate: when submitting NSD from humans, model organisms, or information on threatened and endangered species. However, for the majority of environment-originated NSD submissions, country information and GPS coordinates would add significant scientific and legal value. Create a new metadata field for IRCCs and access date information. In order to further support transparency and legal compliance, it would be useful to offer a metadata field for an IRCC unique identifier and its link, if available, and a metadata field for the date of first accession (in some cases already provided) to help downstream users infer any possible CBD or Nagoya Protocol implications for a given NSD entry. This information is not available at present in the metadata and is often very difficult if not impossible to infer from the associated publication. Other temporal information that could also be recorded would include the date of the beginning of sequencing projects.In the international policy process, the Parties to the CBD could:Simplify traceability of NSD by relying exclusively on internationally recognized certificates of compliance (IRCC) via the ABS Clearing House. An IRCC posted on the Clearinghouse produces a unique identifier and stable link that can be linked to a sequence entry in INSDC. The INSDC is considering a metadata change to create a standardized field for an IRCC identifier. If Parties increasingly used IRCCs there would be an even stronger motivation for the INSDC to do so. Access permits in PDF formats are not technologically linkable to an NSD entry unless available under a stable online URL with a unique identifier.Engage the INSDC in DSI discussions. Because the INSDC is the central sequence database portal (Section 3.2 and Figure 6) in the public NSD database landscape, any effort to link DSI to ABS must necessarily work closely with these three databases. Given that GenBank is a governmental agency in the U.S.A, an Observer to the CBD, any change to database policy would likely be driven by scientific cooperation rather than political negotiation. EMBL-EBI is an inter-governmental organization and DDBJ is a non-governmental institution, so while policy decisions are perhaps somewhat less complicated than with a non-Observer, it would probably be most effective if done in close collaboration with the relevant stakeholders.During the patent process, patent applicants could:Disclose information to the INSDC that is already in the patent application. If patent NSD submissions provided more complete information already listed in the patent application, this would support better NSD traceability. Two specific types of existing information from the patent application could be listed in the patent-originated NSD entry: 1) if relevant, the original AN if public NSD from the INSDC was used in a patent application (rather than the generation of a new AN) and 2) the country of origin, if it was previously disclosed in the patent application, could be noted in the /country tag in the NSD submission.While detailed in nature and surely not exhaustive, these observations could enable both providers and users of GR increased transparency and legal certainty and could be incorporated in the existing INSDC system.6.3 Extension of lessons learned from NSD to DSIThe discussions above demonstrate that public exchange of NSD is governed by a system of traceability within the INSDC that is widely used by both academic and commercial researchers. This traceability system is in use across the research and development spectrum from initial GR to patent disclosure. However, if we return to the initial scope of this study – DSI rather than simply NSD – and to the context of active discussions within the CBD, our findings here have further implications. Before considering these broader implications, we first acknowledge that the narrow focus on DSI is a limitation of this this study and that further analysis is required to better understand the databases and traceability issues associated with SI that may potentially constitute DSI.DSI is not yet defined but this will be a crucial decision. Where does DSI start and stop? NSD is often used to predict protein sequences and the technological format of protein sequences and protein sequence databases is, in many ways, quite similar to the NSD/INSDC system, although it has unique bioinformatics conversions/properties not discussed here. Indeed, in some database NSD and protein sequences are even directly linkable although this is not universal. So, the lessons learned above from NSD could be likely extendable to this secondary data type. But beyond protein sequence data, other types of SI are unlikely to be so easy to understand, define, and trace and as Study 1 outlines a continuous spectrum exists across the data type landscape. Although the tracing of NSD is technically challenging and requires user awareness and compliance, it is technically feasible from the sequencing of the underlying genetic resource to the upload to a public database and to related scientific publications. However, traceability breaks down when NSD (and assumedly also SI) leaves a public database. Although we did not directly assess this, SI is, at best, likely to be less traceable than NSD – at best traceable in some data formats under some conditions, i.e., there would be many different technical and scientific contingencies. This would mean that future policy or regulatory decisions would likely face an administrative patchwork of different data types, databases, contingencies, rules, that would almost lead to high transaction costs. Furthermore, because SI often has a much more limited or even non-existent connection to GR, the relationship between GR, NSD, and SI would quickly become indistinguishable or even lost.Going from NSD to protein sequences and metabolites and beyond to other forms of SI, the traceability to the underlying genetic resource becomes more difficult. This confronts ABS policymakers with the dilemma, that the broader the definition of DSI will be, the less traceability will be possible. This can result in high administrative and compliance burdens for accessions that are non-commercial and/or non-relevant to ABS regulations, whilst relevant accessions may have enough loopholes to potentially evade ABS obligations. At the same time, a narrow definition of DSI may facilitate traceability, although transaction costs could conceivably remain high. A potential way to avoid this dilemma could be the establishment of a system that does not rely on traceability. This could for example take the form of a multilateral system with a general payment mechanism that is decoupled from access and use of specific DSI. If the decision-making process moves towards a new technology or a new database separate from the existing scientific infrastructure and the INSDC, the INSDC will continue with non-CBD relevant NSD. And the >1,600 biological databases that build on the INSDC and the publications and journals that rely on ANs will also likely maintain their connection to the INSDC. This could lead to unfortunate unintended consequences such as a “lonely island” NSD/SI system for CBD-relevant NSD. This could, amongst other consequences, create challenges for scientists to publish their results if their data is not public (see Section 6.1) and result in underuse or even avoidance of such NSD. Furthermore, as discussed in Section 5, there are doubts about the economic feasibility of new NSD databases that should be better analysed.Finally, as noted in Section 3.4, the INSDC doubles in size every 18 months and raw NSD (sequence reads) is growing even faster. This has important implications for policy decisions in terms of the speed at which new policies would affect the entire dataset. For example, if new policies on NSD were set tomorrow, these policies, at the current data growth rate, would affect 75% of NSD database entries within three years. This could suggest that retroactive attempts to update old NSD are less critical than effective and timely management of new NSD.AcknowledgementsWe are very grateful for the support of Dr. Lorenz Reimer and student assistants from the Technical University of Braunschweig: Tom Luthe, Lisa Abendroth, and Chris Zaydowiczw. They were instrumental in the manual analyses described in 8.1, the public database inventory, and 8.4, the country of origin and GPS checks, as well as the checking and harmonization of references (TL). We are also grateful to INSDC members including the head of GenBank, Dr. Ilene Karsch Mizrachi, and colleagues, Dr. Eric Sayers, Dr. Kim Pruitt as well as the head of ENA, Dr. Guy Cochrane, and DDBJ’s Director Masanori Arita and Dr. Yasukazu Nakamura for their responses to technical questions and provisioning of user data. We also thank the interviewees that participated in the private database case studies.7. References ADDIN EN.REFLIST 1.Parties to the Convention on Biological Diversity 2018. Decision 14/20. Digital sequence information on genetic resources. Sharm El-Sheikh, Egypt: United Nations.2.Laird, S.A. and Wynberg, R.P. 2018. A Fact Finding and Scoping Study on Digital Sequence Information on Genetic Resources in the Context of the Convention on Biological Diversity and the Nagoya Protocol. Montreal, Canada: United Nations.3.Ad Hoc Technical Expert Group on Digital Sequence Information on Genetic Resources 2018. Report of the Ad Hoc Technical Expert Group on Digital Sequence Information on Genetic Resources. Montreal, Canada: United Nations.4.Secretariat of the Convention on Biological Diversity 2002. Bonn Guidelines on Access to Genetic Resources and Fair and Equitable Sharing of the Benefits Arising out of their Utilization United Nations.5.National Center for Biotechnology Information. GenBank and WGS Statistics. [accessed 2019 Jul 26]; Available: Research. Reporting standards and availability of data, materials, code and protocols. [accessed 2019 Aug 06]; Available: Principles. The Bermuda Principles Story. [accessed 2019 Jul 24]; Available: Wellcome Trust 2003. Sharing Data from Large-scale Biological Research Projects: A System of Tripartite Responsibility. Fort Lauderdale, USA.9.Amann, R.I., Baichoo, S., Blencowe, B.J., Bork, P., Borodovsky, M., Brooksbank, C., Chain, P.S.G., Colwell, R.R., Daffonchio, D.G., Danchin, A., de Lorenzo, V., Dorrestein, P.C., Finn, R.D., Fraser, C.M., Gilbert, J.A., Hallam, S.J., Hugenholtz, P., Ioannidis, J.P.A., Jansson, J.K., Kim, J.F., Klenk, H.P., Klotz, M.G., Knight, R., Konstantinidis, K.T., Kyrpides, N.C., Mason, C.E., McHardy, A.C., Meyer, F., Ouzounis, C.A., Patrinos, A.A.N., Podar, M., Pollard, K.S., Ravel, J., Munoz, A.R., Roberts, R.J., Rossello-Mora, R., Sansone, S.A., Schloss, P.D., Schriml, L.M., Setubal, J.C., Sorek, R., Stevens, R.L., Tiedje, J.M., Turjanski, A., Tyson, G.W., Ussery, D.W., Weinstock, G.M., White, O., Whitman, W.B., and Xenarios, I., Toward unrestricted use of public genomic data. Science, 2019. 363(6425): p. 350-352, DOI: 10.1126/science.aaw1280.10.Andersen, D., Guidelines for good scientific practice. Dan Med Bull, 1999. 46(1): p. 60-1, 11.DFG Deutsche Forschungsgemeinschaft, Safeguarding Good Scientific Practice. 2013: Wiley-VCH.12.National Institutes of Health. Final NIH statement on sharing research data. 2003 [accessed 2019 Aug 06]; Available: , D.J. and Fernandez, X.M., The 26th annual Nucleic Acids Research database issue and Molecular Biology Database Collection. Nucleic Acids Res, 2019. 47(D1): p. D1-D7, DOI: 10.1093/nar/gky1267.14.Yi, Y., Zhao, Y., Li, C., Zhang, L., Huang, H., Li, Y., Liu, L., Hou, P., Cui, T., Tan, P., Hu, Y., Zhang, T., Huang, Y., Li, X., Yu, J., and Wang, D. RAID v2.0: an updated resource of RNA-associated interactions across organisms. 2017 [accessed 2019 Sep 18]; Available: , Y., Zhao, Y., Li, C., Zhang, L., Huang, H., Li, Y., Liu, L., Hou, P., Cui, T., Tan, P., Hu, Y., Zhang, T., Huang, Y., Li, X., Yu, J., and Wang, D. PRIdictor - Protein-RNA Interaction Predictor. [accessed 2019 Sep 18]; Available: , K., Fortriede, J.D., Lotay, V.S., Burns, K.A., Wang, D.Z., Fisher, M.E., Pells, T.J., James-Zorn, C., Wang, Y., Ponferrada, V.G., Chu, S., Chaturvedi, P., Zorn, A.M., and Vize, P.D. Xenbase: a genomic, epigenomic and transcriptomic model organism database. 2018 [accessed 2019 Sep 18]; Available: , I., Bode, J., Frisch, M., and Wingender, E. S/MARt DB: a database on scaffold/matrix attached regions. 2002 [accessed 2019 Sep 18]; Available: , H., Globalizing Genomics: The Origins of the International Nucleotide Sequence Database Collaboration. J Hist Biol, 2018. 51(4): p. 657-691, DOI: 10.1007/s10739-017-9490-y.19.Mitchell, A.L., Scheremetjew, M., Denise, H., Potter, S., Tarkowska, A., Qureshi, M., Salazar, G.A., Pesseat, S., Boland, M.A., Hunter, F.M.I., Ten Hoopen, P., Alako, B., Amid, C., Wilkinson, D.J., Curtis, T.P., Cochrane, G., and Finn, R.D. EBI Metagenomics in 2017: enriching the analysis of microbial communities, from sequence reads to assemblies. 2018 [accessed 2019 Sep 18]; Available: , L., Sternberg, P., Durbin, R., Thierry-Mieg, J., and Spieth, J. WormBase: network access to the genome and biology of Caenorhabditis elegans. 2001 [accessed 2019 Sep 18]; Available: . Ilene Mizrachi (GenBank) 2019.22.Beijing Institute of Genomics. Homepage National Genomics Data Center & BIG Data Center. [accessed 2019 Aug 06]; Available: for Arab Genomic Studies. Homepage Centre for Arab Genomic Studies. [accessed 2019 Aug 06]; Available: National GeneBank. Homepage China National GeneBank. [accessed 2019 Aug 06]; Available: Center for Biotechnology Information. About NCBI. [accessed 2019 Jul 25]; Available: Bioinformatics Institute. Leadership. [accessed 2019 Jul 25]; Available: Molecular Biology Laboratory. Member States. [accessed 2019 Jul 25]; Available: . How we are funded. [accessed 2019 Jul 25]; Available: Institute of Genetics. Support Us. [accessed 2019 Jul 25]; Available: , Y., Imanishi, T., Miyazaki, S., Fukami-Kobayashi, K., Saitou, N., Sugawara, H., and Gojobori, T., DNA Data Bank of Japan (DDBJ) for genome scale research in life science. Nucleic Acids Res, 2002. 30(1): p. 27-30, DOI: 10.1093/nar/30.1.27.31.Brunak, S., Danchin, A., Hattori, M., Nakamura, H., Shinozaki, K., Matise, T., and Preuss, D., Nucleotide sequence database policies. Science, 2002. 298(5597): p. 1333, DOI: 10.1126/science.298.5597.1333b.32.Cochrane, G., Karsch-Mizrachi, I., Takagi, T., and International Nucleotide Sequence Database Collaboration, The International Nucleotide Sequence Database Collaboration. Nucleic Acids Res, 2016. 44(D1): p. D48-50, DOI: 10.1093/nar/gkv1323.33.National Center for Biotechnology Information. GenBank. [accessed 2019 Jul 25]; Available: Bioinformatics Institute. Terms of Use. [accessed 2019 Sep 26]; Available: . Training. [accessed 2019 Aug 06]; Available: Institutes of Health. H3Africa Program Resources. [accessed 2019 Aug 06]; Available: of Health and Human Services, National Institutes of Health, and National Library of Medicine (NLM) 2018. Congressional Justification FY 2018 Budget.38.European Molecular Biology Laboratory 2018. Annual Report.39.DNA Databank of Japan. DDBJ Annual Reports. [accessed 2019 Aug 06]; Available: Center for Biotechnology Information. Taxonomy Browser. [accessed 2019 Jul 26]; Available: . List of model organisms. [accessed 2019 Sep 26]; Available: Bioinformatics Institute 2017. Scientific Report 2017.43.Dr. Johanna Kleine (EMBL-EBI) 2019.ernment of Japan 2017. Current state of the use of digital sequence information on genetic resources in the biodiversity field.45.Guy Cochrane (Head of ENA) 2019.46.Lombard, V., Golaconda Ramulu, H., Drula, E., Coutinho, P.M., and Henrissat, B. The carbohydrate-active enzymes database (CAZy) in 2013. 2014 [accessed 2019 Sep 18]; Available: , A., Yamada, Y., and Sakurai, T. Alga-PrAS (Algal Protein Annotation Suite): A Database of Comprehensive Annotation in Algal Proteomes. 2017 [accessed 2019 Sep 18]; Available: , Q., Schlueter, S.D., and Brendel, V. PlantGDB, plant genome database and analysis tools. 2004 [accessed 2019 Sep 18]; Available: , K., Van Bel, M., Richard, G., Van Landeghem, S., Verhelst, B., Moreau, H., Van de Peer, Y., Grimsley, N., and Piganeau, G. pico-PLAZA, a genome database of microbial photosynthetic eukaryotes. 2013 [accessed 2019 Sep 18]; Available: Life Sciences. Genome Quest Homepage. [accessed 2019 Jul 29]; Available: Center for Biotechnology Information. Database of Genotypes and Phenotypes. [accessed 2019 Jul 29]; Available: . European Genome-phenome Archive. [accessed 2019 Jul 29]; Available: , L., McCluskey, K., Desmeth, P., Liu, S., Hideaki, S., Yin, Y., Moriya, O., Itoh, T., Kim, C.Y., Lee, J.S., Zhou, Y., Kawasaki, H., Hazbon, M.H., Robert, V., Boekhout, T., Lima, N., Evtushenko, L., Boundy-Mills, K., Bunk, B., Moore, E.R.B., Eurwilaichitr, L., Ingsriswang, S., Shah, H., Yao, S., Jin, T., Huang, J., Shi, W., Sun, Q., Fan, G., Li, W., Li, X., Kurtboke, I., and Ma, J., The global catalogue of microorganisms 10K type strain sequencing project: closing the genomic gaps for the validly published prokaryotic and fungi species. Gigascience, 2018. 7(5), DOI: 10.1093/gigascience/giy026.54.Springer Nature. Research data policies. [accessed 2019 Jul 26]; Available: . Sharing research data. [accessed 2019 Jul 26]; Available: . Database Linking. [accessed 2019 Jul 26]; Available: , M.A.F., Zimmerman, K.J., and Teeter, K.C., Data Sharing: How Much Doesn't Get Submitted to GenBank? PLoS Biol, 2006. 4(7): p. e228, DOI: 10.1371/journal.pbio.0040228.58.National Center for Biotechnology Information. BioSample. [accessed 2019 Jul 26]; Available: Nucleotide Sequence Database Collaboration. The DDBJ/ENA/GenBank Feature Table Definition. [accessed 2019 Aug 06]; Available: , S., Ciufo, S., Starchenko, E., Darji, D., Chlumsky, L., Karsch-Mizrachi, I., and Schoch, C.L., The NCBI BioCollections Database. Database (Oxford), 2018. 2018, DOI: 10.1093/database/bay006.61.National Center for Biotechnology Information. BioSample Documentation. [accessed 2019 Aug 08]; Available: Standards Consortium. Homepage Genomic Standards Consortium. [accessed 2019 Aug 08]; Available: DOI Foundation. The DOI system. [accessed 2019 Jul 26]; Available: , J., Melero-Fuentes, D., Gumpenberger, C., and Valderrama-Zurian, J.C., Availability of digital object identifiers (DOIs) in Web of Science and Scopus. Journal of Informetrics, 2016. 10(1): p. 98-109, DOI: 10.1016/j.joi.2015.11.008.65.Canese, K. PubMed Celebrates its 10th Anniversary! NLM Tech Bull., 2006. 352:e5.66.National Center for Biotechnology Information. NLM Catalog: Journals referenced in the NCBI Databases. [accessed 2019 Aug 08]; Available: , G., Barker, K., Seberg, O., Coddington, J., Benson, E., Berendsohn, W.G., Bunk, B., Butler, C., Cawsey, E.M., Deck, J., Doring, M., Flemons, P., Gemeinholzer, B., Guntsch, A., Hollowell, T., Kelbert, P., Kostadinov, I., Kottmann, R., Lawlor, R.T., Lyal, C., Mackenzie-Dodds, J., Meyer, C., Mulcahy, D., Nussbeck, S.Y., O'Tuama, E., Orrell, T., Petersen, G., Robertson, T., Sohngen, C., Whitacre, J., Wieczorek, J., Yilmaz, P., Zetzsche, H., Zhang, Y., and Zhou, X., The Global Genome Biodiversity Network (GGBN) Data Standard specification. Database (Oxford), 2016. 2016, DOI: 10.1093/database/baw125.68.Global Biodiversity Information Facility. Homepage GBIF. [accessed 2019 Aug 06]; Available: Information Standards. Homepage TDWG. [accessed 2019 Aug 06]; Available: . Synthesis of Systematic Resources Hompage. [accessed 2019 Sep 26]; Available: . Consortium of European Taxonomic Facilities Homepage. [accessed 2019 Sep 26]; Available: . Global Genome Diversity Network Homepage. [accessed 2019 Sep 26]; Available: , A., Hyam, R., Hagedorn, G., Chagnoux, S., Ropert, D., Casino, A., Droege, G., Glockler, F., Godderz, K., Groom, Q., Hoffmann, J., Holleman, A., Kempa, M., Koivula, H., Marhold, K., Nicolson, N., Smith, V.S., and Triebel, D., Actionable, long-term stable and semantic web compatible identifiers for access to biological collection objects. Database (Oxford), 2017. 2017(1), DOI: 10.1093/database/bax003.74.Penev, L., Mietchen, D., Chavan, V., Hagedorn, G., Remsen, D., Smith, V., and Shotton, D. Pensoft Data Publishing Policies and Guidelines for Biodiversity Data. 2011.75.Zenodo. Custom GBIF Occurrence Download. 2019 Jun 18 [accessed 2019 Aug 06]; Available: , L. and Ma, J., The Global Catalogue of Microorganisms (GCM) 10K type strain sequencing project: providing services to taxonomists for standard genome sequencing and annotation. Int J Syst Evol Microbiol, 2019. 69(4): p. 895-898, DOI: 10.1099/ijsem.0.003276.77.U. S. Department of Energy Joint Genome Institute. Phylogenetic Diversity. [accessed 2019 Jul 29]; Available: Center for Biotechnology Information. The /country qualifier. [accessed 2019 Jul 29]; Available: , J. and Scholz, A.H., Microbiological Research Under the Nagoya Protocol: Facts and Fiction. Trends Microbiol, 2017. 25(2): p. 85-88, DOI: 10.1016/j.tim.2016.11.001.80.Barrett, T., Clark, K., Gevorgyan, R., Gorelenkov, V., Gribov, E., Karsch-Mizrachi, I., Kimelman, M., Pruitt, K.D., Resenchuk, S., Tatusova, T., Yaschenko, E., and Ostell, J., BioProject and BioSample databases at NCBI: facilitating capture and organization of metadata. Nucleic Acids Research, 2012. 40(D1): p. D57-D63, DOI: 10.1093/nar/gkr1163.81.Jefferson, O.A., K?llhofer, D., Ajjikuttira, P., and Jefferson, R.A., Public disclosure of biological sequences in global patent practice. World Patent Information, 2015. 43: p. 12-24, DOI: 10.1016/j.wpi.2015.08.005.82.European Patent Office. Guidelines for Examination: Reference to sequences disclosed in a database. [accessed 2019 Jul 29]; Available: Center for Biotechnology Information. Basic Local Alignment Search Tool. [accessed 2019 Aug 08]; Available: Intellectual Property Organization 2009. Handbook on Industrial Property Information and Documentation. Standard for the Presentation of Nucleotide and Amino Acid Sequence Listings in Patent Applications - Standard ST.25.85.World Intellectual Property Organization 2019. Handbook on Industrial Property Information and Documentation. Recommended Standard for the Presentation of Nucleotide and Amino Acid Sequence Listings Using XML (Extensible Markup Language) - Standard ST.26.86.Haber, S. and Stornetta, W.S., How to Time-Stamp a Digital Document. J. Cryptology, 1991. 3(2): p. 99-111, DOI: 10.1007/BF00196791.87.Nakamoto, S. Bitcoin: A Peer-to-Peer Electronic Cash System. 2008.88.Digiconomist. Bitcoin Energy Consumption Index. [accessed 2019 Jul 29]; Available: 89.Yli-Huumo, J., Ko, D., Choi, S., Park, S., and Smolander, K., Where Is Current Research on Blockchain Technology?-A Systematic Review. Plos One, 2016. 11(10): p. e0163477, DOI: 10.1371/journal.pone.0163477.90.DNAtix. DNAtix Hompage. [accessed 2019 Sep 18]; Available: Genomics. Nebula Hompage. [accessed 2019 Sep 18]; Available: Inc. LunaDNA Homepage. [accessed 2019 Sep 18]; Available: Lidsky (CEO DNAtix) 2019.94.Earth Biogenome Project. Homepage. [accessed 2019 Sep 26]; Available: 95.National Human Genome Research Institute. The Human Genome Project. [accessed 2019 Sep 26]; Available: 96.Lewin, H.A., Robinson, G.E., Kress, W.J., Baker, W.J., Coddington, J., Crandall, K.A., Durbin, R., Edwards, S.V., Forest, F., Gilbert, M.T.P., Goldstein, M.M., Grigoriev, I.V., Hackett, K.J., Haussler, D., Jarvis, E.D., Johnson, W.E., Patrinos, A., Richards, S., Castilla-Rubio, J.C., van Sluys, M.A., Soltis, P.S., Xu, X., Yang, H., and Zhang, G., Earth BioGenome Project: Sequencing life for the future of life. Proc Natl Acad Sci U S A, 2018. 115(17): p. 4325-4333, DOI: 10.1073/pnas.1720115115.97.Earth Bank of Codes. Homepage. [accessed 2019 Sep 26]; Available: 98.Google Cloud. Google Genomics Homepage. [accessed 2019 Sep 26]; Available: Web Services. High Performance Computing. [accessed 2019 Sep 26]; Available: AB. Spotify Homepage. [accessed 2019 Jul 29]; Available: flix International B.V. Netflix Homepage. [accessed 2019 Jul 29]; Available: Inc. Apple Homepage. [accessed 2019 Jul 29]; Available: Academic. NAR Database Summary Paper Category List. [accessed 2019 Aug 08]; Available: Center for Biotechnology Information. Genetic Sequence Data Bank Distribution Release Notes. 2019 [accessed 2019 Aug 06]; Available: . Country Codes Alpha-2 & Alpha-3. [accessed 2019 Aug 08]; Available: A/S. Novozymes Homepage. [accessed 2019 Jul 26]; Available: GmbH. TraitGenetics Homepage. [accessed 2019 Jul 26]; Available: SE. BASF Homepage. [accessed 2019 Jul 26]; Available: Nations, Department of Economic and Social Affairs, and Population Division. World Population Prospects. 2019 [accessed 2019 Aug 08]; Available: . Technical MethodsHere we provide a section by section explanation of the approaches employed below.8.1 Analysis of the public database inventoryThis section describes the methods used within Section 3.2.The NAR Database Issue divides the list of 1,778 database entries, constituting 1,613 different databases, into 15 categories ADDIN EN.CITE <EndNote><Cite><Author>Oxford Academic</Author><RecNum>94</RecNum><DisplayText>[103]</DisplayText><record><rec-number>94</rec-number><foreign-keys><key app="EN" db-id="ravxf09ps92p0besvs75dswz50fap0paxx2x" timestamp="1565241389">94</key></foreign-keys><ref-type name="Web Page">12</ref-type><contributors><authors><author>Oxford Academic,</author></authors></contributors><titles><title>NAR Database Summary Paper Category List</title></titles><volume>2019</volume><number>Aug 08</number><dates></dates><urls><related-urls><url>;[103]. The first two categories, named “Nucleotide Sequence Databases” and “RNA sequence databases” focused on NSD. There was no description or definition of the categories and it could not be excluded that databases within other categories would not allow the upload of NSD. Therefore, the categories “Genomics Databases (non-vertebrate)”, “Human and other Vertebrate Genomes” and “Plant databases” were also included in the analysis.These categories contained 808 entries, which were analysed by hand. Information was obtained both from reading the texts available on the webpages of the databases, as well as reading the publications on the databases, if existent. The first selection step was sorting out the database entries which were on human NSD only, leaving 743 entries. The second step was to select only those entries, which potentially allow the upload of NSD.This list of 743 entries was then analysed in depth (see Acknowledgements), leading to 38 databases allowing the use submission of NSD. The detailed analysis excluded entries for several reasons. The upload function or the entire database could have been shut down. This happens often as public databases are primarily created by research groups that have to use their researchers and staff members to administer the database, which takes away their working time for other projects. Two entries could link to the same database, as the NAR database issue lists publications. When a database gets an update that is published, both the old and the new publication can be found at the NAR database issue. Similar, many entries referred to updates and new features of databases from GenBank and EBI. Many databases contain the section “data submission” or “submit data”. This field can either refer to upload data into the database or the usage of a bioinformatic tool, which only processes the input data and gives a result. In the latter case, no data is uploaded into the database. Additionally, it was often just seen on closer examination that a database did not allow the upload of NSD or was just on human NSD.The final step was to answer the question whether the uploaded NSD is somehow linked to the INSDC. There are several different criteria to be linked to the INSDC. The database could state that they submit their NSD regularly to the INSDC, or that they require either PubMed IDs or ANs for an upload, indicating that the Data has to be at the INDSC already. In many cases more than one of the criteria were fulfilled. 8.2 Analysis of GenBank datasetThis section describes the underlying dataset relevant to the Sections 3.4, 3.5, 4.2.The analyses presented in this study of the NSD currently stored in public databases was done by using bioinformatic queries of a downloaded copy of GenBank ADDIN EN.CITE <EndNote><Cite><Author>National Center for Biotechnology Information</Author><Year>2019</Year><RecNum>78</RecNum><DisplayText>[104]</DisplayText><record><rec-number>78</rec-number><foreign-keys><key app="EN" db-id="ravxf09ps92p0besvs75dswz50fap0paxx2x" timestamp="1565094965">78</key></foreign-keys><ref-type name="Web Page">12</ref-type><contributors><authors><author>National Center for Biotechnology Information,</author></authors></contributors><titles><title>Genetic Sequence Data Bank Distribution Release Notes</title></titles><volume>2019</volume><number>Aug 06</number><dates><year>2019</year></dates><urls><related-urls><url>, 90</language></record></Cite></EndNote>[104], an official release dated to April 15, 2019. All information published here is publicly available and can also be queried using the GenBank browser. However, since the GenBank browser continually adds new data every day, we chose to work with a local copy so that all analyses were standardized to a single point in time. Additionally, the GenBank server can be very slow during peak use times and since our inquiries were for the entire Nucleotide database the local copy provided greater efficiency and response speeds. Our analyses focused on key properties (e.g., taxonomic distribution, size) of the stored NSD, as well as tracking and tracing information such as documentation of traceability to GR and country of origin. Randomly-sampled NSD entries were checked for the validity of the stated country of origin or validity of the absence of a country of origin as determined by the associated scientific publication.GenBank entries contain a metadata field providing a Taxonomic identification number (TaxID). In Figure 3, the GenBank entries were sorted along their taxonomic identity. The model organisms were obtained by counting together all the sequences of model organisms and were subtracted from their respective taxa. E.g., all the sequences of mouse (Mus musculus) were added to “model organisms” and this count was subtracted from the total sum of “animals”, in order to avoid double counts. In a second step, the total bases of the NSD of each category was counted.8.3 User data from GenBankThis section describes the methods used within Sections 3.5.User data was requested from GenBank. We received an Excel sheet containing the web activity and the user numbers for the GenBank database and tools. The data is divided by countries and years (from 2014 to 2018; only the 2018 data was used). The web activity is giving the count on how many times a web page of GenBank was accessed. However, there is no defined standard for web activity or for what counts as accessing a webpage. Analytical tools different from those used by GenBank might thus lead to different results. The Users were counted via an approach, which counts unique combinations of IP addresses and web browser cookies. This is a more accurate approach than just measuring IP addresses, since a computer can have more than one IP address, thus arbitrarily increasing counts.The countries are divided by the alpha 2 country code (ISO 3166) ADDIN EN.CITE <EndNote><Cite><Author></Author><RecNum>95</RecNum><DisplayText>[105]</DisplayText><record><rec-number>95</rec-number><foreign-keys><key app="EN" db-id="ravxf09ps92p0besvs75dswz50fap0paxx2x" timestamp="1565241686">95</key></foreign-keys><ref-type name="Web Page">12</ref-type><contributors><authors><author>,</author></authors></contributors><titles><title>Country Codes Alpha-2 &amp; Alpha-3</title></titles><volume>2019</volume><number>Aug 08</number><dates></dates><urls><related-urls><url>;[105], which includes overseas territories. Some overseas territories were not listed, because they are inhabited and/or have no internet access, whilst others were listed, but had no requests/users. Both cases were treated as having zero requests/users and excluded from the list (this only becomes relevant for method Section 8.6).8.4 Private database case studiesThis section describes the methods used within Section 3.6.For Information on private databases, companies aware of the DSI topic were asked to participate in an interview. From these interviews short case studies were created to exemplify the content of private databases and their usage. Due to the short time frame, interview requests were focused on direct contacts with the persons and institutes conducting this study and those representatives from industry who are active within the CBD process. They were also asked to establish contact or forward our request to persons/companies in their network if they knew that they had background knowledge of DSI discussions. This approach led to direct contact with 20 companies and an unknown number (estimated 10-20 additional companies) indirectly. The interviews were semi-structured and planned for 45 minutes, with 15 minutes up front for explanations and questions. The case studies were drafted based upon written notes of the interview and modified, until both parties agreed with the resulting summary. Six interviews were conducted and six case studies obtained from them. Three Case studies are anonymized, either on request of the company or because the process of getting approval to state the name exceeded the time scope. The option of allowing anonymous submissions was approved by the CBD Secretariat.Interviews with companies were conducted in order to make case studies exemplifying the content and usage of internal databases. Due to the short time frame, interview requests were focused on direct contacts of the persons and institutes conducting this study, as well as those representatives from industry that are active within the CBD. They were also asked to establish contact or forward our request to persons/companies of which they knew that they had background knowledge on the topic of DSI. This way, 20 companies with internal databases were contacted directly and an unknown number indirectly (we were only notified if other companies were interested and not how many companies/persons were asked).Afterwards, Table 1 was created to summarize the results. The table was sent to every interviewed company in order to fill in potential gaps (thus, some of the information in the table may not be found in the case studies).Case study 1: Novozymes A/S ADDIN EN.CITE <EndNote><Cite><Author>Novozymes A/S</Author><RecNum>49</RecNum><DisplayText>[106]</DisplayText><record><rec-number>49</rec-number><foreign-keys><key app="EN" db-id="ravxf09ps92p0besvs75dswz50fap0paxx2x" timestamp="1564120066">49</key></foreign-keys><ref-type name="Web Page">12</ref-type><contributors><authors><author>Novozymes A/S,</author></authors></contributors><titles><title>Novozymes Homepage</title></titles><volume>2019</volume><number>Jul 26</number><dates></dates><urls><related-urls><url>;[106] Novozymes A/S is an international biotech company headquartered in Denmark, with over 6,000 employees globally. It focuses on the development and production of enzymes. Novozymes has a main research database and additional databases, which control the flow of DSI in the product development pipeline. The NSD+SI primarily originates from microorganisms, with roughly 50% of the DSI coming from public databases. However, this number is likely shifting towards internal DSI, as the amount of internal generated, highly curated DSI is growing faster than in the public sphere. Novozymes currently stores ca. 500 million protein sequences, coming from both public and private sources. Novozymes undertakes bioprospecting projects around the world, both solitary and together with public institutions. If the project is completely funded by Novozymes and without public collaborations, the DSI is by default not published but just kept in the internal database. Novozymes always aims to refer to the country of origin of the NSD and other metadata in patenting activities.Case study 2: Company XCompany X is an international corporation headquartered in Europe, with over 20,000 employees. It is active in the fields of health, nutrition and materials. All these fields include biotechnological R&D. Due to its different fields of research, Company X has many scattered databases, storing very different types of NSD+SI. The data is obtained from public databases and then curated and integrated into internal databases. Beside these large databases smaller ones for the microbial strain collections of Company X, as well as for licensed or patented NSD, also exist. The ratio of public to private NSD+SI inside the databases is not known exactly, but the total amount of private NSD+SI is likely less than 0.1% of the amount of DSI stored in public databases. Bioprospecting, and thus internal NSD+SI, is limited solely to microorganisms. However, for enzyme discovery, NSD+SI of higher organisms is accessed through public databases. There are many interconnections between Company X and other companies with regard to NSD+SI. Bioprospecting is conducted both for internal reasons and as a service for other companies. Company X uses commercial patent sequence databases.Case study 3: Company YCompany Y is an international corporation headquartered in Europe, with over 2,000 employees. Much of the company’s DSI-related activities include agricultural plant breeding and seed production for farmers. The DSI databases of Company Y are divided vertically according to the type of NSD used (raw sequence, annotation, 3D structures, etc.) and, in some cases, horizontally according to different kinds of crops. Sequencing of genetic material is done both in-house and externally. The databases are focused on plants, but may also include information on plant pathogens. Company Y uses patent NSD databases.The underlying genetic material comes from internal breeding programs plus collaborations with public and private partners around the world. If sequence information is produced within a public funded project, the information is normally published and will be submitted to a public database. Some NSD+SI collaborations involve only contract services to and from other companies. The percentage of NSD+SI in the internal databases that comes from public databases depend on the crop. In general, the more that public research has been done on a crop, the more NSD+SI is usually available in public databases. For example, for most cereal crops, the percentage of public NSD+SI used is estimated to be in the vicinity of 50%, while for most dicotyledonous crops it is lower. When accessing NSD+SI through the INSDC, Company Y uses an INSDC service (presumably ftp) that ensures that no third parties can track the exact sequences accessed; thus, the service prevents competitors from identifying actual research projects that are on-going at Company Y.Case study 4: TraitGenetics ADDIN EN.CITE <EndNote><Cite><Author>TraitGenetics GmbH</Author><RecNum>50</RecNum><DisplayText>[107]</DisplayText><record><rec-number>50</rec-number><foreign-keys><key app="EN" db-id="ravxf09ps92p0besvs75dswz50fap0paxx2x" timestamp="1564120167">50</key></foreign-keys><ref-type name="Web Page">12</ref-type><contributors><authors><author>TraitGenetics GmbH,</author></authors></contributors><titles><title>TraitGenetics Homepage</title></titles><volume>2019</volume><number>Jul 26</number><dates></dates><urls><related-urls><url>;[107]TraitGenetics is a company with around 20 employees located in Germany. Since 2018, it is part of the multinational company SGS headquartered in Switzerland. TraitGenetics develops molecular markers and performs genotyping as a service for customers in plant breeding companies and plant research institutions. Molecular markers are used in breeding to identify traits and characteristics of individual plants. As TraitGenetics is the only part of SGS working primarily on molecular markers, it holds its own DSI database. This database solely focuses on NSD+SI on sequence polymorphisms in plants. The information it contains comes from both public databases and private sources, which is either NSD+SI provided by customers or internal genome sequencing projects. The databases are not used for patent applications. Customers are companies and academic institutions, including CGIAR institutions, involved in agriculture and breeding from all around the world. As TraitGenetics only gets DNA or genetic material through customers/partners, it expects that all material received from the customers is compliant with the Nagoya Protocol, CBD and the national legislation of the provider country.Case study 5: BASF SE ADDIN EN.CITE <EndNote><Cite><Author>BASF SE</Author><RecNum>51</RecNum><DisplayText>[108]</DisplayText><record><rec-number>51</rec-number><foreign-keys><key app="EN" db-id="ravxf09ps92p0besvs75dswz50fap0paxx2x" timestamp="1564120284">51</key></foreign-keys><ref-type name="Web Page">12</ref-type><contributors><authors><author>BASF SE,</author></authors></contributors><titles><title>BASF Homepage</title></titles><volume>2019</volume><number>Jul 26</number><dates></dates><urls><related-urls><url>;[108]BASF SE is an international multi-sectoral company headquartered in Germany with over 122,000 employees worldwide. It has R&D programs in almost every industry, including agricultural biotech applications in its biological sections, and industrial biotech applications in its chemical sections. One example is reducing the carbon footprint of chemical products. Due to the diversity of R&D activities, the databases used by different sections to manage information are diverse in size, structure and content, and in the manner processes are run. The databases contain a mixture of sequence data from the public domain and sequence data generated in-house. A large part of the nucleotide and protein sequences generated will eventually be shared with public databases via publications of all kinds. It is estimated that the average percentage of public sequence data in the databases used within the biological sections of BASF SE is between 50% and 90%, with the total storage exceeding one terabyte in size. The content of the INSDC is downloaded on a regular basis. The reason for this is not only to have the data ready at hand, but also to allow the browsing of the data without giving potential competitors the chance to track the browsing profile and thus get an indication of projects currently running. Collaborations with public and private partners have occurred over many decades all around the world and are still an important part of the R&D. The country of origin can be obtained for all nucleotide sequences with two exceptions: 1) The country of origin of the sequence data from public databases does not exist or is not provided, or 2) The sequence data comes from a 3rd party and the country of origin cannot be obtained anymore. Case study 6: Company ZCompany Z is a biotech company based in the USA with around 350 employees, which produces and supplies recombinant enzymes for the life sciences and is focused on enzymes for DNA handling. The NSD+SI used, accessed and generated by Company Z is solely focused on enzymes and the microorganisms that produce the enzymes. The NSD+SI stored is in the order of magnitude of one terabyte. However, the main interest is the enzymology and in particular interaction information of the enzymes, which uses more storage space than mere nucleotide sequences. The majority of NSD+SI is either already derived from public databases or is submitted to public databases, as the policy of Company Z is to publish as much of their own NSD+SI as possible. For this reason, Company Z runs two public databases where NSD+SI is submitted and made publicly available. One database contains NSD+SI on restriction enzymes and the other NSD+SI on polymerases. A big part of the company’s private NSD+SI is on genetic constructs. These are artificially generated plasmids for the development and production of the enzymes. Company Z collaborates with public institutions worldwide in research projects which lead to the generation of new NSD+SI. For newly generated NSD, the origin is always retrievable. However, country of origin of NSD is not always available for two reasons: 1) NSD that stems from public databases often lacks country of origin; and 2) NSD which predates the CBD or the Nagoya Protocol and came from external sources (e.g., a researcher at a university) and the country of origin is no longer retrievable.8.5 Analysis of GenBank NSD entriesThis section describes the methods used within Section 4.2.Analysis of entries with country tagA random set of 150 non-human NSD entries with country tag was extracted from the GenBank dataset (Section 8.2). Both the entry and connected publications were checked for information on the country of origin. This information could often be found in descriptions of geographical origin in either the GenBank entries or the publications. For 108 of the 150 random samples a cited publication was found, constituting 72% of all samples. The publication was not always directly linked via a PubMed ID, but sometimes just indicated as a reference that could be found via internet search. However, for only 94 of the 108 samples with citing publications, a publication was accessible (using the academic accounts of the authors).Number of samplesIncorrect country tagCorrect country tagNo information150086 | 57%64 | 43%Table 2. Check of random samples with country tag. Total numbers and percentages of the sample set and the subgroups for which the country tag was verifiable (Correct country tag), falsifiable (incorrect country tag) or no information. Analysis of entries without country tagA random set of 660 non-human NSD entries without country tags was extracted from the GenBank dataset. In total, 310 entries could be linked to a publication, but only 282 could be accessed by using the academic account of the authors.As the sample number was rather high and the samples were randomly selected, large sequencing projects appeared within the set with more than one entry. For example, of the 660 samples, 140 belonged to just 38 different publications. This includes all 9 environmental entries from Ecuador and all 7 environmental entries of Finland, which both came from a single publication, respectively. Another important aspect is that the 375 samples for which no country of origin was obtainable many include artificial and synthetic NSD, for which a country of origin is not applicable.For two additional entries without publication, the country of origin could be obtained from the GenBank entry itself. In these cases, the country tag was not filled out, but in other metadata fields the country information was given. For the sake of reduced complexity, these two entries were ignored in the analysis.Number of samplesPublication accessibleCountry could have been reported? Origin of these 123 entries 660282 | 43%Yes: 124| 44%No: 158 | 56%48% other46% environment6% human microbialTable 3. Check of random samples without country tag. Shown are the total numbers and percentages of the sample set and the subgroups for which a publication was accessible and for which the country of origin could be obtained. Additionally, the origin of the entries was identified. The category “other” could be model organisms, domesticated/in-bred crops, and other GR that did not fall clearly into one of the other categories. 282 of the 660 entries had a publication accessible with our institute’s available subscriptions and could be used for this analysis. For 124 entries the country information could be obtained from the accessed publications, constituting 44% of all entries with an accessible publication. 57 of the 124 entries (46%) with identified country of origin came from the environment, e.g. wildlife or environmental samples. Of these 57 entries, 16 were from the USA, 9 from Ecuador, 7 each from China and Finland. The remaining 18 entries belonged to several other countries with 1 or 2 entries.7 entries (6%) were microorganisms and viruses isolated from human hosts, in which case the location of the humans at sampling was interpreted as the country of origin. The remaining 60 entries (48%) are from “other” categories, such as cultivations, like microorganisms grown in a laboratory environment or domesticated crops. However, such cultivations could also originate from an environmental sample. It was outside the time scope to conduct a deeper analysis on the origin of cultivation entries. The 158 entries with publication for which no country could be reported include entries, for which the country of origin may not be applicable or defined. For example, at least 28 of these 158 entries (18%) constitute NSD from artificial constructs, which resulting from laboratory research. Here, no underlying GR may have been used in the creation of this NSD. When the information on the country of origin was obtainable from a related publication, the submitter should have been able to fill out the country tag. At least for the environmental samples the country origin should be explicit to the submitter. In the case of cultivations, isolations etc., the submitter may be unsure and thus simply prefer to leave this field unfilled. 8.6 World mapsThis section describes the methods used in the Sections 3.5 and 4.2 including Figures 5a and 5b.The figures 5a-c and 8a-c were constructed using a final dataset/excel sheet, which was constructed in the following way:User data existed for all sovereign countries, except for the states of Kiribati and Tuvalu. Similar, for several overseas territories no user data existed. This might be due to the fact that these territories are uninhabited or have no official internet connection. When this was the case, the sequence entries of that respective overseas territory were added to the sovereign country, e.g. sequence entries form U.S. minors were added to the count of the USA. As a result of this, inside our calculations, some overseas territories have user and sequence data and some not. However, as the numbers of users, usage and sequences of overseas territories were several magnitudes smaller than those of their sovereign countries, adding or leaving out their numbers does not change the overall results.The population data for Figure 5c was obtained from the UN population Division ADDIN EN.CITE <EndNote><Cite><Author>United Nations</Author><Year>2019</Year><RecNum>96</RecNum><DisplayText>[109]</DisplayText><record><rec-number>96</rec-number><foreign-keys><key app="EN" db-id="ravxf09ps92p0besvs75dswz50fap0paxx2x" timestamp="1565242028">96</key></foreign-keys><ref-type name="Web Page">12</ref-type><contributors><authors><author>United Nations,</author><author>Department of Economic and Social Affairs,</author><author>Population Division,</author></authors></contributors><titles><title>World Population Prospects</title></titles><volume>2019</volume><number>Aug 08</number><dates><year>2019</year></dates><urls><related-urls><url>;[109]. Here, overseas territories and small island states did not have individual population numbers, but clustered ones, e.g. “small pacific islands”. Thus, their population data is missing in the data set (those territories and states are not visible in figure 5c). We removed Faroe Islands (rank 2) and Puerto Rico (rank 8) from the top 10 ranking, as they are not sovereign states.The country of origin data shown in Figure 8a was obtained from the GenBank dataset (Section 8.2). It lists the number of entries for each country tag, which had to be manually processed, as several names were standing for the same country. E.g. there were the different spellings of Cote d’Ivoire (Ivory Coast), which all had their one count and were thus manually added up. There were some entries that could not be mapped to a country/territory and thus are not shown/integrated in the world maps. This counts for all entries that had an ocean as country tag, e.g. Atlantic Ocean, as well as some entries which could not be mapped indistinguishable to a single political country. The latter consists of entries from two geographic regions that contain more than one country, Borneo and Korea, and entries of former Soviet countries that split up into several countries, like Soviet Union or Czechoslovakia. However, the total number of all such entries mentioned in this paragraph (except entries from oceans) is far less than 0.1% of the total amount of entries and can be considered neglectable for this analysis.In Figures 8b+c, the usage and the users per country were divided by the amount of GenBank entries with a country tag from the respective country. From the top 10 lists, we removed all non-sovereign territories and city states.In Figures 9a+b, as well as Figure 10, there was no manually adaption of the data obtained from GenBank (e.g. no addition of downloads from US minors to USA). This data again makes up far less than 0.1% of the total amount of downloads and can therefore be neglected. 8.7 Similarity of short nucleotide sequencesThe table below shows the theoretical probabilities of randomly having two identical sequences of the same length within a given set of sequences. It is important to note that this percentage is not only depended on the length of the sequence itself, but the size of the total data set. For example, the probability of finding an identical sequence within the human genome is lower than finding that sequence in GenBank. Therefore, with continuously increasing amounts of NSD entries the probability of identical sequences increases. Data set10 bp sequence20 bp sequence25 bp sequence30 bp sequenceHuman Genome (3x109 bp)100%0.3%~0%~0%GenBank (1,65x1012 bp)100%77.7%0.15%~0%GenBank x10 100%99.9%1.5%~0%GenBank x100100%100%13.6%~0%Table 4. Probability of a random sequences appearing by chance within different datasets, dependent on their length. This is a purely mathematical calculation, not taking into account that nucleotide sequences are not completely independent of biology (see Section 5.5) The formula is given by 1-(1-1/4k)N , where k is the length of the sequence and N the length/size of the data set. (~0% indicates that the number is more than 25 positions behind the decimal point). ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download