Exporting Data to Files¶

You can use the immunedb_export command to export your data in a variety of formats.

Exporting Samples¶

To export samples statistics run the command:

$ immunedb_export PATH_TO_CONFIG samples

After completion, a TSV file samples.tsv will be written with the following headers, one line per sample:

Field	Description
`id`	Unique numeric sample identifier
`name`	Name given to the sample
`subject`	Subject from which the sample originated
`input_sequences`	Reads input into ImmuneDB
`identified`	Reads successfully annotated
`in_frame`	Reads in-frame
`stops`	Reads with stop codons
`functional`	Functional reads (in-frame and no stop codons)
`avg_clone_cdr3_num_nts`	Average clonal CDR3 length in nucleotides
`avg_clone_v_identity`	Average clonal V-region identity
`clones`	Total number of clones

Exporting Clones¶

In it’s most basic form, the command to export clones is:

$ immunedb_export PATH_TO_CONFIG clones

This will generate one file per sample each with one line per clone having the fields below. Note that intances, copies, avg_v_identity, and top_copy_seq are for the clone in the context of that sample. That is, those fields may vary for the same clone in different samples.

Field	Description
`clone_id`	Database-wide unique clone identifier. This number can be used to track clones across samples.
`subject`	Subject in which the clone was found
`v_gene`	V-gene of the clone
`j_gene`	J-gene of the clone
`functional`	If the clone is in-frame and contains no stop in the consensus (`T` or `F`)
`insertions`	Insertions in the clone (deprecated)
`deletions`	Deletions in the clone (deprecated)
`cdr3_nt`	CDR3 nucleotide sequence
`cdr3_num_nts`	CDR3 nucleotide sequence length
`cdr3_aa`	CDR3 amino-acid sequence
`uniques`	Unique sequences in the clone overall
`instances`	Sequences instances in the clone in the associated sample
`copies`	Copies in the clone in the associated sample
`germline`	Clonal germline sequence
`parent_id`	Parent ID (deprecated)
`avg_v_identity`	Average V-gene identity to germline
`top_copy_seq`	Nucleotide sequence of top-copy sequence

The --pool-on parameter can be used to change how data is aggregated. By default it takes the value sample (as described above) but it also accepts, subject, or any custom metadata field(s).

For the purposes of illustration, assume we have samples with the associated metadata below.

sample	subject	tissue	subset
sample1	S1	blood	naive
sample2	S1	spleen	naive
sample3	S1	spleen	mature
sample4	S3	blood	native

Passing --pool-on subject will generate one file per subject with the clone information aggregated across all samples in that subject. Alternatively, passing --pool-on tissue will generate one file per subject/tissue combination. You can pass multiple metadata fields to the --pool-on parameter as well. For example --pool-on tissue subset will generate one file per subject/tissue/subset combination.

Two other common parameters are --sample-ids which restricts which samples to include in the export and --format which accepts immunedb (the default) or vdjtools for interoperability with the VDJtools suite.

Exporting Sequences¶

Sequences can be exported in Change-O and AIRR formats.

The basic command is:

$ immunedb_export PATH_TO_CONFIG sequences

This will generate one file per sample in Change-O format. To use AIRR format, specify --format airr. You can filter out sequences that were not assigned to a clone with the --clones-only flag.

Exporting Selection Pressure¶

If selection pressure was calculated with the immunedb_clone_pressure command, the results can be exported in TSV format, one row per clone/sample combination. Additionally, unless the --filter samples is passed, there will be one additional row per clone with a All Samples value for the sample which indicates the overall selection pressure on the clone.

For more information on interpreting the values see Uduman, et al, 2011 and Yaari, et al. 2012.

Field	Value
`clone_id`	Clone ID
`subject`	Subject to which the clone belongs
`sample`	Sample within which the selection pressure was calculated. If `All Samples` the overall selection pressure for the clone.
`threshold`	The threshold at which the selection pressure was calculated
`expected_REGION_TYPE`	The expected number of `TYPE` (`r` or `s`) mutations in `REGION` (`cdr` or `fwr`)
`observed_REGION_TYPE`	The observed number of `TYPE` (`r` or `s`) mutations in `REGION` (`cdr` or `fwr`)
`sigma_REGION`	The selection pressure in `REGION`
`sigma_REGION_cilower`	The lower bound of the confidence interval of selection in `REGION`
`sigma_REGION_ciupper`	The upper bound of the confidence interval of selection in `REGION`
`sigma_p_REGION`	The P-value of the selection in `REGION`

Exporting MySQL Data¶

The final method of exporting data is to dump the entire MySQL database to a file. This is meant to be a backup method rather than for downstream-analysis.

To backup run:

$ immunedb_admin backup PATH_TO_CONFIG BACKUP_PATH

To restore a backup run:

$ immunedb_admin restore PATH_TO_CONFIG BACKUP_PATH