Skip to content

Preparing for a real run: fetch or upload, and index

Fetch

After installation, one needs to populate the database directory. By default this will be /data/virmet_databases and will occupy about 60 GB. In order to populate this, use the subcommand fetch, for example as follows

virmet fetch --viral n  # this downloads viral sequences, nucleotide only
virmet fetch --human
virmet fetch --bact
virmet fetch --fungal
virmet fetch --bovine

This might take long. If it's taking too long, you might want to include the above commands in a down.sh file and run them overnight.

Only the option --viral takes an argument: n for nucleotide and p for protein viral database. Currently only nucleotide sequences are used, while the protein ones are foreseen as useful in discovery of novel viral sequences (in a future version).

Update

More and more sequences are uploaded to NCBI database every month.

VirMet provides a simple way to update the viral database without the need to download all the genomes again. This can be done with the subcommand update as in the example

virmet update --viral n

Similarly, new bacterial sequences can be added as well as fungal. Human and bovine database, since they consist of a single organism, don't need to be updated so often.

Don't forget to index the database again once it has been updated.

Adding sequences manually

By adding the switch --picked file_with_ids users can add sequences by writing their ids in a file, one per line.

Index

After dowloading or updating the databases, it's always needed to index them. The subcommand virmet index is used for that, and it can take multiple arguments, so you can run

virmet index --viral n --human --bact --fungal --bovine

and wait for the indexing to finish.