Exporting Data to Files

You can use the immunedb_export command to export your data in a variety of formats.

Exporting Samples

To export samples statistics run the command:

$ immunedb_export PATH_TO_CONFIG samples

After completion, a TSV file samples.tsv will be written with the following headers, one line per sample:

Field

Description

id

Unique numeric sample identifier

name

Name given to the sample

subject

Subject from which the sample originated

input_sequences

Reads input into ImmuneDB

identified

Reads successfully annotated

in_frame

Reads in-frame

stops

Reads with stop codons

functional

Functional reads (in-frame and no stop codons)

avg_clone_cdr3_num_nts

Average clonal CDR3 length in nucleotides

avg_clone_v_identity

Average clonal V-region identity

clones

Total number of clones

Exporting Clones

In it’s most basic form, the command to export clones is:

$ immunedb_export PATH_TO_CONFIG clones

This will generate one file per sample each with one line per clone having the fields below. Note that intances, copies, avg_v_identity, and top_copy_seq are for the clone in the context of that sample. That is, those fields may vary for the same clone in different samples.

Field

Description

clone_id

Database-wide unique clone identifier. This number can be used to track clones across samples.

subject

Subject in which the clone was found

v_gene

V-gene of the clone

j_gene

J-gene of the clone

functional

If the clone is in-frame and contains no stop in the consensus (T or F)

insertions

Insertions in the clone (deprecated)

deletions

Deletions in the clone (deprecated)

cdr3_nt

CDR3 nucleotide sequence

cdr3_num_nts

CDR3 nucleotide sequence length

cdr3_aa

CDR3 amino-acid sequence

uniques

Unique sequences in the clone overall

instances

Sequences instances in the clone in the associated sample

copies

Copies in the clone in the associated sample

germline

Clonal germline sequence

parent_id

Parent ID (deprecated)

avg_v_identity

Average V-gene identity to germline

top_copy_seq

Nucleotide sequence of top-copy sequence

The --pool-on parameter can be used to change how data is aggregated. By default it takes the value sample (as described above) but it also accepts, subject, or any custom metadata field(s).

For the purposes of illustration, assume we have samples with the associated metadata below.

sample

subject

tissue

subset

sample1

S1

blood

naive

sample2

S1

spleen

naive

sample3

S1

spleen

mature

sample4

S3

blood

native

Passing --pool-on subject will generate one file per subject with the clone information aggregated across all samples in that subject. Alternatively, passing --pool-on tissue will generate one file per subject/tissue combination. You can pass multiple metadata fields to the --pool-on parameter as well. For example --pool-on tissue subset will generate one file per subject/tissue/subset combination.

Two other common parameters are --sample-ids which restricts which samples to include in the export and --format which accepts immunedb (the default) or vdjtools for interoperability with the VDJtools suite.

Exporting Sequences

Sequences can be exported in Change-O and AIRR formats.

The basic command is:

$ immunedb_export PATH_TO_CONFIG sequences

This will generate one file per sample in Change-O format. To use AIRR format, specify --format airr. You can filter out sequences that were not assigned to a clone with the --clones-only flag.

Exporting Selection Pressure

If selection pressure was calculated with the immunedb_clone_pressure command, the results can be exported in TSV format, one row per clone/sample combination. Additionally, unless the --filter samples is passed, there will be one additional row per clone with a All Samples value for the sample which indicates the overall selection pressure on the clone.

For more information on interpreting the values see Uduman, et al, 2011 and Yaari, et al. 2012.

Field

Value

clone_id

Clone ID

subject

Subject to which the clone belongs

sample

Sample within which the selection pressure was calculated. If All Samples the overall selection pressure for the clone.

threshold

The threshold at which the selection pressure was calculated

expected_REGION_TYPE

The expected number of TYPE (r or s) mutations in REGION (cdr or fwr)

observed_REGION_TYPE

The observed number of TYPE (r or s) mutations in REGION (cdr or fwr)

sigma_REGION

The selection pressure in REGION

sigma_REGION_cilower

The lower bound of the confidence interval of selection in REGION

sigma_REGION_ciupper

The upper bound of the confidence interval of selection in REGION

sigma_p_REGION

The P-value of the selection in REGION

Exporting MySQL Data

The final method of exporting data is to dump the entire MySQL database to a file. This is meant to be a backup method rather than for downstream-analysis.

To backup run:

$ immunedb_admin backup PATH_TO_CONFIG BACKUP_PATH

To restore a backup run:

$ immunedb_admin restore PATH_TO_CONFIG BACKUP_PATH