On this page sequences can be classified by using a reference dataset and similarity cutoffs. Similarity cutoffs can be calculated on the cutoff page. Explanations of which inputs to use for classification can be found below.
In this section the sequences that need to be identified are filled in. This can be done via a FASTA file or you can fill in the sequence in the text area. Make sure this follows the FASTA format.
The reference dataset can be given as a file with FASTA format. A standard reference file can also be chosen:
>ID_NAME k__kingdom;p__phylum;c__class;o__order;f__family;s__species
>MH854570 k__Fungi;p__Ascomycota;c__Sordariomycetes;o__Hypocreales;f__Nectriaceae;g__Fusarium;s__Fusarium_equiseti
The similarity cutoff is a percentage at which an unidentified sequence and a reference sequence have to minimally coincide. This can be given as a global value or local values. A global similarity cutoff is representative for the whole dataset. Local similarity cutoffs are given per taxonomic group. Local similarity cutoffs will generally give more accurate classification results (see the about page). Similarity cutoffs can be calculated on the cutoff calculation page.
Note for the Minimum sequence alignment length setting:
The default for this setting is 400. It is however recommended
lowering this when working with smaller barcodes.
When using ITS2 sequences it is recommended to use 50.