Modifying the Database¶
Databases can be modified in various ways using the
Appending New Data¶
Adding new samples to a database is simply running the steps in Running the Pipeline on Your Data just on the new FASTA/FASTQ or AIRR files. Effort has been made to reduce the amount of information that needs to be recomputed when samples are added. However, after new samples are added all affected subjects will be entirely re-collapsed and clones will be recalculated.
Metadata specified when initially populating ImmuneDB via importing or identification can be updated in two steps. First, export the metadata currently in the database with:
$ immunedb_export PATH_TO_CONFIG samples --for-update
This will generate a
samples.tsv file which can by modified. Headers and
values can be changed, deleted, or added.
Note that changing the subject of any sample will require steps after and
immunedb_collapse to be re-run.
After modifying the metadata, update the database with:
$ immunedb_modify PATH_TO_CONFIG update-metadata samples.tsv
You cannot collapse samples from multiple subjects. If that functionality
is desired, first modify the metadata to set the subject for each sample to
be the same with
update-metadata, and then run
One assumption ImmuneDB makes is that each sample is a biological replicate in that no one cell has its BCR/TCR sequence in more than one sample. If you have technical replicates, multiple independent sequencing runs of the same same biological replicate, they should be combined into one ImmuneDB-sample each. To do so, add a metadata field to the database as described in Changing Metadata where all technical replicates from the same biological replicate have the same value.
For example, if we have the following samples where each sample has two technical replicates:
You would update the metadata to be:
And then run:
$ immunedb_modify PATH_TO_CONFIG combine-samples collapse
This will result in the four replicates being collapsed into two, using the
collapse field as the new name for each:
Note the header
collapse can have any value you want so long as it’s passed
immunedb_modify. Further, the values in that column can be arbitrary
but will be used as the new name of the samples after collapsing.
The following command can be used to delete samples by ID:
$ immunedb_modify PATH_TO_CONFIG delete-samples [sample_ids]
Note that deleting samples will require the subject to be re-analyzed by
running all pipeline steps after and including