DNA barcoder

Classification

Unidentified sequences
Reference data
Similarity cutoff
Settings

How to use this page

On this page sequences can be classified by using a reference dataset and similarity cutoffs. Similarity cutoffs can be calculated on the cutoff page. Explanations of which inputs to use for classification can be found below.

Unidentified Sequences Input

In this section the sequences that need to be identified are filled in. This can be done via a FASTA file or you can fill in the sequence in the text area. Make sure this follows the FASTA format.

Reference Data Input

The reference dataset can be given as a file with FASTA format. A standard reference file can also be chosen:

  • CBS_ITS.fasta: fungal full length ITS sequences produced at the Westerdijk Institute [source]
  • CBS_ITS2.fasta: fungal ITS2 sequences produced at the Westerdijk Institute [source]
When using your own dataset make sure the description lines of the FASTA file have the following format:
>ID_NAME k__kingdom;p__phylum;c__class;o__order;f__family;s__species
Example:
>MH854570 k__Fungi;p__Ascomycota;c__Sordariomycetes;o__Hypocreales;f__Nectriaceae;g__Fusarium;s__Fusarium_equiseti

Similarity Cutoff

The similarity cutoff is a percentage at which an unidentified sequence and a reference sequence have to minimally coincide. This can be given as a global value or local values. A global similarity cutoff is representative for the whole dataset. Local similarity cutoffs are given per taxonomic group. Local similarity cutoffs will generally give more accurate classification results (see the about page). Similarity cutoffs can be calculated on the cutoff calculation page.

Settings

Note for the Minimum sequence alignment length setting:
The default for this setting is 400. It is however recommended lowering this when working with smaller barcodes.
When using ITS2 sequences it is recommended to use 50.