Introduction
While there is significant literature around biodiversity loss, there is a limited effort in reviewing biodiversity using high-throughput data acquisition technologies. Today, scientists recognise the important roles that genetic and genomic data (e.g. reference genomes, DNA/RNA barcoding approaches, metagenomics and metabarcoding), can play in biodiversity discovery, assessment, monitoring, conservation, and restoration, and its impact in policy and decision making processes.
These research activities present unique data management challenges, especially in terms of complexity, data integration, and the need for interoperability across diverse datasets. Some of challenges include:
- Managing biological resources (samples, associated specimens or genetic resources): this requires compliance with the national and international frameworks and adherence to data sharing principles (FAIR principles[1]) and ethical principles (CARE principles[2]).
- Metadata in biodiversity research: metadata must go beyond basic descriptions to include detailed context, such as geographical locations, temporal data, methods, and environmental conditions, ideally using standard vocabularies (terminology) and ontology terms. Together, these aid ensuring that the data can be more easily discovered and effectively reused by both humans and machines.
- Data management and integration systems: current systems still fail often to maintain critical links between the data, the metadata, the physical specimens, and the taxonomic information (e.g., correct scientific names) from which the data originates. Moreover, integrating molecular data (e.g., genomic sequences) with ecological, behavioural, or morphological datasets can be complex due to differences in formats, scales, and metadata structures.
- Taxonomic harmonisation: taxonomy is an evolving field and linking data to and through taxonomy is challenging due to the spread of available resources and often unharmonised practices.
- Handling and processing large-scale biodiversity data: molecular data (e.g., genomics or transcriptomics data) handling requires bioinformatics pipelines that are computationally intensive and can scale with large datasets.
- Tracking provenance: ensuring proper tracking of the provenance, updates, and transformations of datasets is crucial, especially in collaborative biodiversity projects involving multiple stakeholders.
Addressing these challenges requires collaborative solutions involving better metadata practices, advanced bioinformatics tools, improved data integration platforms, and standardization efforts across the global biodiversity research community.
In addition to the specific data management challenges faced in biodiversity research, there is the need to move towards Open Link Data in Biodiversity and a linked Biodiversity Knowledge Graph (see machine actionability for further information on knowledge graph).
This shift could fundamentally transform the landscape of biodiversity research by enabling more efficient data integration, discovery, and re-use. Moving towards a Biodiversity Knowledge Graph represents an ambitious but essential step in modernizing biodiversity research. While significant challenges remain, the potential for improved data integration, accessibility, and re-use across multiple disciplines could transform the field and open up new avenues for understanding and conserving biodiversity.
Biological resource management and compliance
Description
Before starting your data collection, you need to plan how you will manage the biological resources (samples, associated specimens or genetic resources) and the data related to your experiment. However, managing biological resources - such as samples, associated specimens, and genetic materials - requires strict compliance with national and international frameworks like the Nagoya Protocol and adherence to ethical principles such as the CARE (Collective Benefit, Authority to Control, Responsibility, and Ethics) principles [2]. While these frameworks are essential for promoting equitable benefit-sharing and responsible stewardship, they introduce significant data management challenges. Tracking the origin, consent, ownership, and permitted use of biological materials demands accurate and well-maintained metadata, persistent identifiers, and clear documentation of legal and ethical constraints. These requirements often vary across jurisdictions and can evolve over time, making it difficult to standardise processes across institutions and platforms.
Here, we highlight key considerations to keep in mind when managing biological resource data, and we suggest relevant documents, standards, and frameworks to guide compliance and responsible data stewardship.
Considerations
- Do you have the required national and international permits for sample collection?
- Will your sampling include Indigenous locations and/or be associated with traditional knowledge?
- What biological material should be biobanked and where (samples, specimens, genetic resources)?
- How will the data be preserved and shared?
Solutions
- Ensure that due diligence was made regarding Nagoya treaty compliance, possibly with the help of your national focal point or institutional help desk. The Global Genome Biodiversity Network (GGBN) has developed a ABS FactSheet and answer page to help their network of biobanks to comply with the Nagoya protocol. More information is available on the Compliance monitoring & measurement page. The main steps are summarised below:
- Check whether the countries where the samples are to be collected have signed the protocol, and identify the national entities that are issuing the permits for sample collection through the Access and Benefit Sharing Clearing House (ABSCH)
- If required, obtain the Prior Informed Consent (PIC) and Mutually Agreed Terms (MAT) and the final national permit of collection from the relevant(s) national focal point(s).
- Submit a Due Diligence declaration to the European Web-portal DECLARE (for European researchers) or to the ABSCH to get your Internationally Recognized Certificate of Compliance (IRCC).
- Ensure that the CARE principles are taken into account. If the samples are related to Indigenous Peoples and/or Local Communities, engage with the relevant communities during the planning phase of the experiment[3]. Mc Cartney et al. offer a framework, grounded in environmental justice and the CARE principles, for biodiversity genomic researchers, projects, and initiatives to support and promote the building of trustworthy and sustainable partnerships with Indigenous Peoples & Local Communities. Among other topics, engagement should consider:
- How the community will possibly access the specimens/samples collected and data generated.
- How the data will be labelled, accessed, and reused, including by the Indigenous communities for their own purposes, using, for example, Local Contexts Labels and Notices.
- Ensure that your Data Management Plan includes a section on specimen or biomaterial management or develop a specific Specimen management plan (see Bentley A. et al.[4] for a proposal of guidelines). According to these guidelines this should include:
- The definition of the collections repository/biobank where the biomaterial will be stored.
- The type and anticipated number of specimens and/or samples, metadata collection and associated data.
- Plans for collection and preservation of the biomaterial that should be in line with established best practices for the relevant organisms (including the expectations of specimen curation and care).
- Plans for making the specimens/biomaterial available to the research community and for metadata publication and linkage to the data (also see recommendations for the digital extended specimen [5]).
- You can find more information on Biobanking and data management in Alkhatib R. et al.[6].
Biological material: metadata collection and publication
Description
Collecting and organizing rich metadata for biological materials, such as those stored in biobanks or specimen collections, is essential for enabling data interpretation, reuse, and integration across biodiversity studies. These materials are often linked to other types of data, including genomic sequences, images, phenotypic traits, and environmental context, making consistent metadata especially critical. However, a major challenge lies in identifying and applying suitable metadata standards, as multiple frameworks exist with varying levels of compatibility, coverage, and specificity. This complexity can hinder the accurate and interoperable description of samples, their origin, and associated information. In addition, managing persistent identifiers and maintaining consistent metadata across biological materials and their associated data is crucial for ensuring long-term traceability and interoperability.
Here, we suggest commonly used metadata standards, as well as repositories and registries for metadata publication, to support effective and FAIR-compliant biobanking and biodiversity data management.
Considerations
Certain core metadata—such as collection date, biome, and geographical location—should always be collected to ensure basic contextualization and future usability. Beyond these essentials, the specific research context and use case will determine which additional metadata are most valuable. Key considerations include:
- Type of biological material: What kind of sample is being collected and analyzed (e.g., tissue, DNA, environmental sample)?
- Type and intended use of data: What kind of data will be generated (e.g., genomic, phenotypic, environmental), and how is it expected to be used or reused?
- Storage context: Will the biological material be preserved in a biobank, museum, or other long-term repository?
- Data submission destination: Which database, repository, or registry will host the resulting data and metadata?
Collecting extensive (or “long-tail”) metadata can greatly enhance data reuse and interoperability, but it also requires time and expertise. Therefore, metadata collection should be prioritized based on its anticipated value to future research and balanced against available human and technical resources.
Solutions
- Consult your institution or project Data Management Plan to obtain information around the standard metadata that should be collected (also see Documentation and metadata), the procedures that should be used and the best practices for storing and sharing your data and metadata. Alternatively you can develop a Data Management Plan specific for your study following the guidelines available in the Data Management Plan page.
- The metadata collected should be made available following community widespread standards to promote interoperability. The most relevant standards include the Minimum Information about any (x) Sequence (MIxS)[7] for genomics data, Ecological Metadata Language (EML)[8] for general biodiversity data, and Darwin Core (DwC)[9] mainly for taxa and their occurrences. For DNA/RNA barcoding data, the Barcode of Life Data System (BOLD)[10] is developing the Barcoding Data Model (BCDM). The core Darwin Core (DwC) terms have been aligned with the Minimum Information about any (x) Sequence (MIxS) terms[11], with further work underway to align many DwC extension terms to increase the interoperability.
Sample metadata and checklists
- Assess the type of sample/organism being analysed and the type of data being produced to identify the relevant metadata to collect. Take into consideration that richer metadata collection enables greater reuse of biodiversity information.
- Identify the appropriate sample metadata checklist for submission to a public repository. There are several checklists available for different types of samples and analyses:
- The European Nucleotide Archive (ENA) has several sample metadata checklists for various sequence data types, part of these associated with biodiversity related data, that are adequate for different types of samples and research purposes.
- For reference barcodes, Barcode of Life Data System (BOLD) is using the Barcoding Data Model (BCDM).
- For reference genomes there are specific guidelines regarding sample metadata collection. For example, the ERGA (European Reference Genomes Atlas) initiative developed the ERGA sample manifest that aligns with current standards and the Tree of Life Checklist at European Nucleotide Archive (ENA).
- When referencing in the sample metadata information held in other repositories, as for example taxonomic information (also see the section Biological material: Taxonomic information), specimen collections or sampling protocols, always use persistent identifiers.
- Also see this page on documentation and metadata.
Biobanking and specimen linking
- If possible keep vouchers and your tissues and DNA samples in a biobank/collection with relevant metadata (see Biological resource management and compliance)
- The Global Genome Biodiversity Network (GGBN) is a global network of curated collections of genomic samples, working together to make DNA and tissue collections discoverable for biodiversity research. Global Genome Biodiversity Network (GGBN) is actively developing resources and recommendations for data management in relation to biobanking and specimen collections with their networks of repositories - GGBN data standard.
- The Consortium of European Taxonomic Facilities (CETAF), an European Network of biological and geological collections, generates specimen identifiers for specimens in CETAF collections.
- The European research infrastructure DISSCO (Distributed System of Scientific Collections) is working on developing a Digital Specimen Repository where DOIs will be provided for digital specimens.
- In the sample metadata in BioSamples, reference the specimen or tissue/DNA samples using persistent identifiers available or using the Darwin Core (DwC) standard (‘triplet’ that includes the institution and collection codes, and specimen catalogue number), see also recommendations in Agosti D. et al.[12].
- In relation with collections of genetic resources that also encompass the intraspecific diversity, part of the recommendations are developed in the frame of FAO’s activities and completed through consortia of researchers supported by initiatives such as the Research Data Alliance.
- You can also find relevant information by consulting other domain pages related to Biodiversity:
Sample metadata sharing and publication
- Identify the repository where the sample metadata will be stored:
- For samples associated with genomic data, BioSamples database is recommended, as a persistent identifier will be assigned that will be linked with the data.
- You can also reach out to data brokers or use brokering tools, that can help manage metadata and data submission to public repositories, as for example:
- COPO (Collaborative OPen Omics) is a data broker that is involved in metadata and data submission for the ERGA initiative
- MADBOT (Metadata And Data Brokering Online Tool) is a web application that provides a dashboard for managing research data and metadata.
- Galaxy Ecology, a Biodiversity oriented Galaxy instance, proposing data brokers functionalities in a common platform. Some examples of the tools available:
- ENA upload tool to publish in European Nucleotide Archive (ENA),
- Ecological Metadata Language (EML) oriented tools allowing to create EML metadata,
- data packages that can be used to share data through international repositories as DataONE, accepting raw datafiles related to earth observation, or Global Biodiversity Information Facility (GBIF), The Ocean Biodiversity Information System (OBIS), Emodnet accepting DwC Archives related mainly to taxon occurrences.
- Modern technology such as mobile phone apps can streamline registration and collection of standardised metadata and field measurements. Automation such as this can reduce the burden on the sample collectors and increase the metadata quality. A great example is the NMDC Field Notes.
Biological material: Taxonomic information
Description
Taxonomy is an evolving field and harmonisation is challenging. Multiple classifications exist for the same group of organisms and these are reported in many taxonomy databases with varying taxonomic and geographic coverage, and quality. For example, WORMS, the World Registry of Marine Species, holds an extensive record for marine species and AlgaeBase is a global database of algae including taxonomy, nomenclature and distributional information.
The data repositories may be associated with or link to different taxonomic databases, so it is not always straightforward to understand what classification should be used in which situation. In addition, the advances in reference library production and in environmental DNA analysis lead to the discovery and identification of new taxa; submitting data for undescribed taxa or from environmental samples may also be challenging.
This complex taxonomic foundation makes the reporting and linking of data to taxonomy challenging. The different public databases are used by researchers, but the taxonomic references in publications do not usually include persistent identifiers for taxon names or information regarding the taxonomy database used. Moreover, in some national biodiversity studies it is sometimes necessary to use other taxonomic reference systems that are more widely used by public monitoring agencies and that may not have persistent identifiers.
Here we present some considerations regarding ongoing initiatives to facilitate taxonomy mapping, clustering and linking, and provide some guidance for the submission and reporting of taxonomic information.
Considerations
There are multiple taxonomic checklists covering different geographic and taxonomic ranges that are included in multiple databases. These checklists follow rules in taxonomic naming that are guided by nomenclature codes according to the type of organism. Most taxon names are published in scientific literature, with a description (taxonomic treatment).
In addition to published taxonomy, some taxonomic databases also include placeholders for undescribed species and/or clustering mechanisms that identify novel taxonomic units. A classification for environmental samples is also available in NCBI Taxonomy.
Most of the main taxonomic databases and checklists services provide persistent identifiers in different forms for taxon names that should be used when referring to a species in a publication (see Agosti D. et al.[12]).
Therefore, key considerations include:
- Does the work involve organisms or environmental samples?
- Is the organism being analysed already published in a taxonomic journal?
- Are there taxonomic treatments (descriptions of the taxa in publications) available for the species?
- Which taxonomy checklist/backbone is being used?
- Is there a persistent identifier available for the taxon name?
Solutions
- If you have questions regarding taxonomic nomenclature you can consult the nomenclature codes, such as the ICZN and the IAPT.
- You can also find information on Taxonomic treatments, i.e. detailed descriptions of a specific group of organisms (a taxon) within a scientific publication, and taxonomic citations on TreatmentBank, which also provides persistent identifiers for the taxonomic names annotated in nomenclature sections of publications.
- Choose the reference taxonomic backbone and reference it in the publication including a version if available.
- For sequence data, the most used taxonomic databases are NCBI Taxonomy used by the International Nucleotide Sequence Database Collaboration International Nucleotide Sequence Database Collaboration (INSDC) and Barcode of Life Data System (BOLD) taxonomy used by the International Barcode of Life (iBOL).
- NCBI Taxonomy, in addition to the published taxon names, also allows for placeholder names for undescribed or novel species. These can then be updated once the taxon is published.
- If you are using an unpublished name, use available processes for requesting the databases to mint an identifier to a placeholder name (for sequence data and NCBI Taxonomy see Blaxter M. et al.[13]).
- If working with environmental samples, you can use NCBI Taxonomy (environmental biome level taxonomy).
- There are also sequence clustering frameworks, used in some databases and management systems that identify novel taxonomic units (Operational Taxonomic Units - OTU) and assign them with identifiers:
- Barcode of Life Data System (BOLD) processes barcode sequences through an online framework that clusters the sequences into units and generates Barcode Identification Numbers (BINs [14]).
- UNITE, a database that targets the nuclear ribosomal internal transcribed spacer (ITS) region and is used for molecular identification primarily of fungi, holds a pipeline that clusters ITS sequences into units, the UNITE Species Hypotheses (SHs) to which a unique DOI is assigned[15].
- The Catalogue of Life (CoL) is a global collaboration between taxonomists and bioinformaticians aiming at gathering up-to-date listings of all the world’s known species. In collaboration with the Global Biodiversity Information Facility (GBIF) provides a global list of accepted names by integrating existing checklists, both from large scale and national initiatives. The GBIF taxonomic backbone derives from CoL and merges additional names from authoritative nomenclatural and taxonomic datasets including identifiers such as BINs from Barcode of Life Data System (BOLD) and SHs from UNITE.
- Use mapping tools available to facilitate the discovery of taxon names and persistent identifiers, such as:
- Taxize and taxonomyCleanr R packages
- ChecklistBank, a repository of taxonomic datasets, allows mapping of taxonomic names and identifiers between the different taxonomies/checklists in CoL.
- Use persistent identifiers for the taxon name in the data publication if available (e.g. Catalogue of Life (CoL), NCBI Taxonomy, Barcode of Life Data System (BOLD), UNITE).
Reference libraries: meta(data) collection and publication
Description
Reference genomes (DNA sequences of an organism’s genome) are crucial for scientists to study genetic diversity and evolutionary relationships within, and between populations and species. These also enable analysis of population and functional genomics that facilitate monitoring of ecosystems. DNA barcoding instead consists of the analysis of short, standardised DNA sequences to identify organisms at the species level. This fast and cost-effective approach has helped revolutionise species detection in biodiversity using metabarcoding and eDNA methods. However, the accuracy of metabarcoding methods relies on well-curated DNA barcoding reference libraries, as they provide the essential benchmarks needed for correctly identifying and assigning DNA sequences to the right taxa.
The scaling up of both barcoding and reference genome production for documenting biodiversity raises some data management challenges. These include the application of sequencing, barcoding, and genome analysis standards, the availability and documentation of analysis tools and computing platforms, and the sharing of reference sequence libraries. Collecting rich metadata for reference libraries is essential for ensuring the utility, traceability, and interoperability of genomic data. Proper metadata facilitates comparisons, reproducibility, and downstream analyses.
Here, we highlight key considerations to keep in mind when annotating and sharing reference libraries, and we suggest relevant standards and tools.
Considerations
- Is the relevant metadata about the specimens being collected and following the recommendations in the section Biological material: metadata collection and publication?
- Consider collecting additional data, as for example, specimen images, that can be linked to the specimen.
- Are the current standards for barcoding or reference genome production and publication being followed?
- Are public bioinformatic pipelines being used or the bioinformatic pipelines and workflows being recorded in public repositories?
- Consider making the data produced publicly available in an appropriate repository.
Solutions
Reference barcodes (meta)data collection and publication
- Standards for barcoding are being developed by Barcode of Life Data System (BOLD), the Barcoding Data Model (BCDM) - these include metadata for sample collection and processing as well as on sequence information and primers.
- When additional data is collected, as for example, specimen image data, this should also be made available in a public repository. Some museum collections store and make available this information. Barcode of Life Data System (BOLD) stores image data linked with the specimen information. There are also specific databases, such as the BioImageArchive, where these are linked with the sample metadata in BioSamples. For more information on bioimaging see the BioImaging domain.
- The details of the laboratory processes and bioinformatic pipelines (including taxonomic assignment pipelines) and workflows used to produce the reference barcodes, should be captured and made available by registering them into GitHub or WorkflowHub[16] and linking to them from the data.
- Publish the samples, raw data (if high throughput sequencing is used) and barcode sequences in the relevant public repositories:
- Samples can be submitted to Barcode of Life Data System (BOLD) and to BioSamples
- Raw data and curated barcodes can be submitted to the International Nucleotide Sequence Database Collaboration (INSDC) (e.g. in Europe to European Nucleotide Archive (ENA))
- Curated barcodes can be submitted to Barcode of Life Data System (BOLD)
- Use the appropriate reference libraries for your taxonomic area(s) of interest.
Reference genomes (meta)data collection and publication
- Standards for reference genomes production are published by the Earth Biogenome Project - these include reports on standards for sample collection and processing to genome analysis and annotations.
- Reference genomes and raw sequence data should be submitted to the International Nucleotide Sequence Database Collaboration (INSDC), through the European Nucleotide Archive (ENA) (its European node) or National Center for Biotechnology Information (NCBI) or DNA Data Bank of Japan (DDBJ), where all the data will be linked to the biosample and bioproject information.
- When additional data is collected, as for example, specimen image data, this should also be made available in a public repository. Some museum collections also store and make available this information. There are also specific databases, such as the BioImageArchive, where these are linked with the sample metadata in BioSamples. For more information on bioimaging see the BioImaging domain.
- Analysis tools and pipelines should be published and linked to the data. There are some repositories available for capturing information on bioinformatic tools and databases (e.g. GitHub, bio.tools) and on pipelines and workflows WorkflowHub.
- The European Reference Genome Atlas (ERGA) initiative has a Workflow Hub space where genome assembly, annotation and curation pipelines are available.
- GoAT is a powerful data aggregator and portal for reference genomes.
- You can also use data brokering services or brokering tools that can help manage metadata and data submission to the public repositories (see section on biological material: metadata collection and publication).
- The bioproject accession should be included in the publication.
Bibliography
- Wilkinson, M. D. et al. The FAIR Guiding Principles for scientific data management and Stewardship. Sci. Data 3, 160018 (2016).
- Carroll, S. R. et al. The CARE principles for indigenous data governance. Data Sci. J. 19, (2020).
- Mc Cartney, A. M. et al. Indigenous peoples and local communities as partners in the sequencing of global eukaryotic biodiversity. NPJ Biodivers. 2, 8 (2023).
- Bentley, A. et al. Community action: Planning for specimen management in funding Proposals. Bioscience 74, 435–439 (2024).
- Hardisty, A. R. et al. Digital extended specimens: Enabling an extensible network of biodiversity data records as integrated digital objects on the Internet. Bioscience 72, 978–987 (2022).
- Alkhatib, R. & Gaede, K. I. Data management in biobanking: Strategies, challenges, and future directions. BioTech (Basel) 13, 34 (2024).
- Field, D. et al. The Genomic Standards Consortium. PLoS Biol. 9, e1001088 (2011).
- Michener, W. K., Brunt, J. W., Helly, J. J., Kirchner, T. B. & Stafford, S. G. Nongeospatial metadata for the ecological sciences. Ecol. Appl. 7, 330–342 (1997).
- Wieczorek, J. et al. Darwin Core: an evolving community-developed biodiversity data Standard. PLoS One 7, e29715 (2012).
- Ratnasingham, S. & Hebert, P. D. N. bold: The Barcode of Life Data System (http://www.barcodinglife.org). Mol. Ecol. Notes 7, 355–364 (2007).
- Meyer Raı̈ssa et al. Aligning standards communities for omics biodiversity data: Sustainable Darwin Core-MIxS Interoperability. Biodivers. Data J. 11, e112420 (2023).
- Agosti, D. et al. Recommendations for use of annotations and persistent identifiers in taxonomy and biodiversity publishing. Res. Ideas Outcomes 8, (2022).
- Blaxter, M., Pauperio, J., Schoch, C. & Howe, K. Taxonomy Identifiers (TaxId) for Biodiversity Genomics: a guide to getting TaxId for submission of data to public databases. Wellcome Open Res. 9, 591 (2024).
- Ratnasingham, S. & Hebert, P. D. N. A DNA-based registry for all animal species: the barcode index number (BIN) system. PLoS One 8, e66213 (2013).
- The UNITE database for molecular identification and taxonomic communication of fungi and other eukaryotes: sequences, taxa and classifications reconsidered. Nucleic Acids Res. 52, D791–D797 (2024).
- Gustafsson, O. J. R. et al. WorkflowHub: a registry for computational workflows. (2024) doi:10.48550/arXiv.2410.06941.
Related pages
More information
Training
Tools and resources on this page
Tool or resource | Description | Related pages | Registry |
---|---|---|---|
Barcode of Life Data System (BOLD) | Barcode of Life Data System (BOLD) is an online platform that manages and provides access to DNA barcoding data, facilitating species identification and biodiversity research. | Standards/Databases | |
bio.tools | Essential scientific and technical information about software tools, databases and services for bioinformatics and the life sciences. | Human pathogen genomics Data analysis | Tool info Standards/Databases Training |
BioImageArchive | The BioImage Archive stores and distributes biological images that are useful to life-science researchers. | Bioimaging data Data publication | Standards/Databases |
BioSamples | BioSamples stores and supplies descriptions and metadata about biological samples used in research and development by academia and industry. | Plant Genomics Plant sciences Virology | Tool info Standards/Databases Training |
Catalogue of Life (CoL) | Catalogue of Life is a collaboration that brings together contributions of taxonomists and informaticians. | Standards/Databases | |
CETAF | CETAF is a European network of biological and geological collections. | Standards/Databases | |
ChecklistBank | ChecklistBank is a repository and management system for taxonomic datasets. | Standards/Databases | |
COPO | Collaborative OPen Omics (COPO) is a portal for scientists to describe, store and retrieve data more easily, using community standards and public repositories that enable the open sharing of results. The COPO project is one of several projects supported by the [Earlham Institute](https://www.earlham.ac.uk/research-project/collaborative-open-omics-copo) (EI), in Norwich, United Kingdom. | Plant Phenomics Plant sciences Data discoverability Documentation and meta... | Tool info Standards/Databases |
Darwin Core (DwC) | Darwin Core (DwC) is a vocabulary that includes a glossary of terms defined to facilitate the sharing of information about biological diversity. | Standards/Databases | |
DISSCO | Distributed System of Scientific Collections is a project that aims to enhance the accessibility and interoperability of scientific collection data, facilitating research and collaboration across various biodiversity collections and institutions. | Standards/Databases | |
DNA Data Bank of Japan (DDBJ) | A database of DNA sequences | Microbial biotechnology | Tool info Standards/Databases |
Ecological Metadata Language (EML) | The Ecological Metadata Language (EML) metadata standard was originally developed for the earth, environmental and ecological sciences. It is based on prior work done by the Ecological Society of America and associated efforts. It has been developed to document any research data, and as such can be used outside of these original subject areas. EML is implemented as a series of XML document types that can by used in a modular and extensible manner to document research data. Each EML module is designed to describe one logical part of the total metadata that should be included with any dataset. | Standards/Databases | |
ENA upload tool | The program submits experimental data and respective metadata to the European Nucleotide Archive (ENA). | Data brokering | Training |
European Nucleotide Archive (ENA) | A record of sequence information scaling from raw sequcning reads to assemblies and functional annotation | Galaxy Plant Genomics Human pathogen genomics Microbial biotechnology Single-cell sequencing Virology Data brokering Data publication Project data managemen... | Tool info Standards/Databases Training |
Galaxy Ecology | Galaxy initiative for Biodiversity related domains. For now, two main Galaxy Ecology instances are available, on ecology.usegalaxy.eu and ecology.usegalaxy.fr. Many tools exist for data / metadata management, notably to deal with Ecological Metadata Language (EML) | Tool info | |
GitHub | Versioning system, used for sharing code, as well as for sharing of small data | Machine learning Creating a data-flow d... Data organisation Documentation and meta... | Standards/Databases Training |
Global Biodiversity Information Facility (GBIF) | Global Biodiversity Information Facility (GBIF) is an international network and data platform that provides open access to biodiversity data from sources worldwide to support research and conservation. | Tool info Standards/Databases | |
Global Genome Biodiversity Network (GGBN) | Global Genome Biodiversity Network (GGBN) is an international network that preserves and provides access to high-quality genomic samples from across the world's biodiversity for research and conservation | Standards/Databases | |
GoAT | Genome on a Tree (GoaT) is a powerful data aggregator and portal to explore and report underlying data for the eukaryotic tree of life. It indexes publicly available genomic metadata for all eukaryotic species and interpolates missing values through phylogenetic comparison. | Tool info | |
IAPT | The International Code of Nomenclature for algae, fungi, and plants is the set of rules and recommendations that govern the scientific naming of all organisms traditionally treated as algae, fungi, or plants. | ||
ICZN | ICZN is the International Code for Zoological Nomenclature. | Standards/Databases | |
International Nucleotide Sequence Database Collaboration | The International Nucleotide Sequence Database Collaboration (INSDC) is a long-standing foundational initiative that operates between DDBJ, EMBL-EBI and NCBI. INSDC covers the spectrum of data raw reads, through alignments and assemblies to functional annotation, enriched with contextual information relating to samples and experimental configurations. | Galaxy Microbial biotechnology Plant sciences Data publication | Training |
International Nucleotide Sequence Database Collaboration (INSDC) | A collaborative database of genetic sequence datasets from DDBJ, EMBL-EBI and NCBI | Galaxy Microbial biotechnology Plant sciences Data publication | Tool info Training |
MADBOT | MADBOT is a web application that provides a dashboard for managing research data and metadata. | ||
Minimum Information about any (x) Sequence (MIxS) | An overarching framework of standard metadata that includes sequence-type and technology-specific checklists. | Human pathogen genomics Marine metagenomics Virology | Standards/Databases |
National Center for Biotechnology Information (NCBI) | Online database hosting a vast amount of biotechnological information including nucleic acids, proteins, genomes and publications. Also boasts integrated tools for analysis. | Identifiers | Training |
NCBI Taxonomy | NCBI's taxonomy browser is a database of biodiversity information | Human pathogen genomics Microbial biotechnology | Standards/Databases |
NMDC Field Notes | NMDC Field Notes is a mobile app that streamlines collection of sample information with automated syncing with the NMDC Submission Portal. | Standards/Databases | |
Taxize | Taxize is a tool that interacts with several application programming interfaces (API) for taxonomic tasks, such as getting database specific taxonomic identifiers, verifying species names, among other tasks. | Tool info | |
taxonomyCleanr | The taxonomyCleanr is a user friendly workflow and collection of functions to help process taxonomic information. | ||
The Ocean Biodiversity Information System (OBIS) | The Ocean Biodiversity Information System (OBIS) is a global data platform that provides access to marine biodiversity information to support research, conservation, and sustainable ocean management. | Standards/Databases | |
TreatmentBank | TreatmentBank is a service that extracts and makes available taxonomic treatments, treatment citations and material citations among other data from scholarly publications. | Standards/Databases | |
UNITE | UNITE is a database and sequence management platform of eukaryotic sequences of the nuclear ribosomal ITS region. | Tool info Standards/Databases | |
WorkflowHub | WorkflowHub is a registry for describing, sharing and publishing scientific computational workflows. | Galaxy Data analysis Data provenance | Tool info Standards/Databases Training |
WORMS | An authoritative classification and catalogue of marine names. | Tool info Standards/Databases |