.. _modifying:

Modifying the Database
======================
Databases can be modified in various ways using the ``immunedb_modify``
command.

Appending New Data
------------------
Adding new samples to a database is simply running the steps in
:ref:`pipeline_full` just on the new FASTA/FASTQ or AIRR files.  Effort has
been made to reduce the amount of information that needs to be recomputed when
samples are added.  However, after new samples are added all affected subjects
will be entirely re-collapsed and clones will be recalculated.


Changing Metadata
-----------------
Metadata specified when initially populating ImmuneDB via importing or
identification can be updated in two steps.  First, export the metadata
currently in the database with:

.. code-block:: bash

    $ immunedb_export PATH_TO_CONFIG samples --for-update

This will generate a ``samples.tsv`` file which can by modified.  Headers and
values can be changed, deleted, or added.

.. note::

    Note that changing the subject of any sample will require steps after and
    including ``immunedb_collapse`` to be re-run.

After modifying the metadata, update the database with:

.. code-block:: bash

    $ immunedb_modify PATH_TO_CONFIG update-metadata samples.tsv


Combining Samples
-----------------
.. warning::

    You cannot collapse samples from multiple subjects.  If that functionality
    is desired, first modify the metadata to set the subject for each sample to
    be the same with ``update-metadata``, and then run ``combine-samples``.

One assumption ImmuneDB makes is that each sample is a *biological replicate*
in that no one cell has its BCR/TCR sequence in more than one sample.  If you
have *technical replicates*, multiple independent sequencing runs of the same
same biological replicate, they should be combined into one ImmuneDB-sample
each.  To do so, add a metadata field to the database as described in
:ref:`Changing Metadata` where all technical replicates from the same
biological replicate have the same value.

For example, if we have the following samples where each sample has two
technical replicates:

================    =======
sample              subject
================    =======
biorep1_techrep1    S1
biorep1_techrep2    S1
biorep2_techrep1    S1
biorep2_techrep2    S1
================    =======

You would update the metadata to be:

================    =======     ========
sample              subject     collapse
================    =======     ========
biorep1_techrep1    S1          first_sample
biorep1_techrep2    S1          first_sample
biorep2_techrep1    S1          second_sample
biorep2_techrep2    S1          second_sample
================    =======     ========

And then run:

.. code-block:: bash

    $ immunedb_modify PATH_TO_CONFIG combine-samples collapse

This will result in the four replicates being collapsed into two, using the
``collapse`` field as the new name for each:

================    =======
sample              subject
================    =======
first_sample        S1
second_sample       S1
================    =======

Note the header ``collapse`` can have any value you want so long as it's passed
to ``immunedb_modify``.  Further, the values in that column can be arbitrary
but will be used as the new name of the samples after collapsing.

Deleting Samples
----------------
The following command can be used to delete samples by ID:


.. code-block:: bash

    $ immunedb_modify PATH_TO_CONFIG delete-samples [sample_ids]

Note that deleting samples will require the subject to be re-analyzed by
running all pipeline steps after and including ``immunedb_collapse``.