European Nucleotide Archive (ENA)

ENA features

How to submit data

ENA allows submissions via three routes, each of which is appropriate for a different set of submission types. You may be required to use more than one in the process of submitting your data.

Interactive Submissions.
This is often the most accessible submission route. It requires filling out web forms directly in your browser or alternatively downloading spreadsheets that can be completed off-line and uploaded to ENA.
Command Line Submissions (Webin-CLI).
This process requires command line skills.
Programmatic Submissions.
This process requires knowledge of XML format, HTTPS protocol and RFC1867.

Non-Personal vs Personal data

Non personal data only.

Access to data

Open, Public.

Embargo

Possible (confidential phase).

Data licence

Not specified. However, no use restrictions or licensing requirements will be included in any sequence data records, and no restrictions or licensing fees will be placed on the redistribution or use of the database by any party (ENA Policies).

Data/Experiments types

Study. A study (project) groups together data submitted to the archive, so it is the first step towards submitting your data to ENA.
Sample. Each sample in ENA represents biomaterial that a sequencing library was produced from. A sample contains information about the sequenced source material, so it is important to first register your biological samples with ENA. Samples are typically associated with checklists, which define the metadata fields used to annotate the samples.

Submit:

Raw reads. Within ENA, raw reads are represented as the following submission objects:
- Experiment: contains information (metadata) that describes the methods used to sequence the sample, including library and instrument details.
- Run: is part of an experiment and refers to data files containing raw read generated in a run of sequencing.
Genome, transcriptome, metagenome or metatranscriptome assemblies. All assemblies are submitted as “analysis” submission objects (secondary analysis results derived from sequence reads). Further information about metagenome can be found here and here.
Targeted sequences. Short assembled and annotated sequences representing interesting features or gene regions. All targeted sequences are submitted as “analysis” submission objects.
Other analysis. Any secondary analyses that are not Assemblies or Targeted sequences.

Metadata

Study: Short name, Descriptive title, Abstract.
Sample: see what’s the best sample checklists to describe your samples. If you choose “Register Samples Interactively”, samples checklist can be downloaded in a spreadsheet format. This allows you to more easily register multiple samples in a single submission and is more durable than a web form.
Raw reads: see metadata fields for interactive and Webin-CLI submission of raw reads.
Assemblies: see here the metadata required for each type of assembly.
Targeted sequences: many types of sequence can be submitted using a checklist. Checklist submission allows you to avoid having to create the flatfile record manually. Therefore, please check the list of available sequence checklists to determine whether one of them meets the needs of your submission.
Other analysis: see here the metadata required for each type of analysis.

Ontology

Repository-developed ontology or controlled vocabulary.

Samples: check the sample checklists to find the permitted terms in each metadata field of a sample checklist.
Raw reads: see here the permitted values in each metadata field required for describing raw reads.
Assemblies: see here the permitted terms for each type of assembly.
Targeted sequences: check the submission options to find the permitted terms in each metadata field of a sequence checklist.
Other analysis: see here the permitted terms for each type of analysis.

Data documentation

ENA doesn’t allow upload of README file. All relevant information about the data (metadata) need to be provided in the designated fields. See “Metadata” section above.

File format(s)

Information about file formats accepted by ENA can be found here. Accepted file formats for:

Data volume and costs

No limit for data volume. No costs. ENA Fair Use Policy.

Data quality

Beyond limited editorial control and some internal integrity checks, the quality and accuracy of the record are the responsibility of the submitting author, not of the database. The databases will work with submitters and users of the database to achieve the best quality resource possible (adapted from ENA Policies).

Identifiers

All database records submitted to the ENA will remain permanently accessible as part of the scientific record. Corrections of errors and update of the records by authors are welcome and erroneous records may be removed from the next database release, but all will remain permanently accessible by accession number (adapted from ENA Policies).

The top-level Project accession should be cited as well as a link to where the data can be found in the browser, as shown in the example here.

Tips for data submission

SARS-CoV-2 data

ELIXIR Belgium developed and compiled Galaxy tools and workflows necessary to clean, assemble and submit SARS-CoV-2 sequences to the European Nucleotide Archive (ENA). Read more in the Covid-19 section.