DNA barcoder

Help

Five results files will be created:
  • .bestmatch file

    This file is the result of a preprocessing step in the classification. For each sequence is shown which reference sequence matches the most.

  • .classification file

    The classification file shows the classification for each best match. It shows the following information: ID, reference ID, taxonomy, identification rank, cutoff and confidence. This file also shows which sequences were not classified.

  • .classified file

    This file shows the classification for each ID. It shows the following information: ID, the predicted name, full classification, identification rank, cutoff confidence, reference ID, BLAST score, BLAST sim and BLAST coverage. This file also shows which sequences were not classified.

  • .krona.html file

    This is a Krona chart which shows the taxonomy distribution of the classification.

  • .krona.report file

    This file contains the same information as the Krona chart, but in text format.

A table of only the classified sequences is shown at the bottom the results page. This table can also be downloaded.

Three results files are created:
  • .cutoffs.json file

    This file contains the cutoff calculation results. The first level of the file show the identification rank. In the second level the higher rank groups are listed. For each higher rank group the cutoff, confidence and other extra information is given.
    This is the file which needs to be used as input for classification.

  • .cutoffs.json.txt file

    This file contains the same information as the cutoffs.json file, but written in a tab delimited table instead.

  • .best.json file

    The best cutoffs are calculated of the .cutoffs.json file This file contains the best cutoffs for the taxonomic groups. If a cutoff for a taxonomic group has a low F-measure, the cutoff of a taxonomic group higher is chosen with a higher F-measure.

  • .best.json.txt file

    This file contains the same information as the best.json file, but written in a tab delimited table instead.

  • .lowerconfidence.immediatehigherlevel.txt and .lowerconfidence.txt files

    These files contain information about how the cutoffs changed from the original cutoffs in the .cutoffs.json file to the best cutoffs in the .best.json file.

  • .predicted file

    Here all calculated F-measures for each possible cutoff are given. This file does not contain the final results, but can be used for verification purposes.

At the bottom of the page results figures are shown.

Five results files will be created:
  • .krona.html file

    This is a Krona chart which shows the taxonomy distribution of the submitted reference dataset.

  • .krona.report file

    This file contains the same information as the Krona chart, but in text format.

  • .length.txt file

    A tab-delimited file which shows the number of sequences per length interval of the submitted dataset.

  • .distribution.txt file

    A tab-delimited file which contains the taxonomic distribution of the submitted reference dataset. The first column shows the taxonname and the second column the number of sequences with that taxon. The last column shows the percentage of sequences with said taxonname of the total amount of sequences in the dataset.

  • .sim file

    This is a similarity matrix file. This file consist of three columns. The first two show the IDs of the sequences that are being compared. The third column shows to percentage for which these sequences coincide.

At the bottom of the page results figures are shown.

This interface currently has three analysis option:
  • Length distribution

    A sequence length distribution of a FASTA file can be generated. The interval setting for this distribution is automatically set to 100, but can be changed.

  • Taxonomic distribution

    The taxonomic distribution analysis generates the distribution of taxon names in the given dataset. This distribution is shown per taxonomic rank. For which ranks this is done can be changed in the settings.

  • Similarity matrix

    This analysis generates a similarity matrix (.sim) file of the given dataset. This file consist of three columns. The first two show the IDs of the sequences that are being compared. The third column shows to percentage for which these sequences coincide.
    This file can be used as input for the cutoff calculation and visualization. A .sim file will be created automatically for these calculations if this file is not given.

The command line version of DNA barcoder also has analysis options for a general overview of the submitted file and variation of the sequences. These options might be implemented into this interface in future developments.

Two results files will be created:
  • .coord file

    This file contains the coordinates of each point of the visualization.

  • .sim file

    A similarity matrix file. This file consist of three columns. The first two show the IDs of the sequences that are being compared. The third column show to percentage for which these sequences coincide. This file will only be created if no similarity matrix file is given as input.

At the bottom of the page the visualization is shown.