Commandline Use

Running abstar from the command line is reasonably simple, even for users with minimal experience with command-line applications. In the most basic case, with a single input file of human antibody sequences:

$ abstar -i /path/to/mydata.fasta -t /path/to/temp/ -o /path/to/output/

abstar will process all sequences contained in mydata.fasta and the results will be written to /path/to/output/mydata.json. If either (or both) of /path/to/temp/ or /path/to/output/ don’t exist, they will be created.

If you have a directory of FASTA/Q-formatted files for abstar to process, you can pass a directory via -i and all files in the directory will be processed:

$ abstar -i /path/to/input/ -t /path/to/temp/ -o /path/to/output/

For input directories that contain paired FASTQ files that need to be merged prior to processing, passing the -m flag instructs abstar to merge paired files with PANDAseq:

$ abstar -i /path/to/input/ -t /path/to/temp/ -o /path/to/output/ -m

The merged reads will be deposited into a merged directory located in the parent directory of the input directory. By default, abstar will use PANDAseq’s simple_bayesian merging algorithm, although alternate merging algorithms can be selected with --pandaseq-algo.

For data generated with Illumina sequencers, abstar can directly interface with BaseSpace to download raw sequencing data. In order for abstar to connect to BaseSpace, you need BaseSpace access token. The easiest way to do this is to set up a BaseSpace developer account following these instructions. Once you have your credentials, you can generate a BaseSpace credentials file by running:

$ make_basespace_credfile

and following the instructions.

When downloading data from BaseSpace, you obviously don’t have an input directory of data for abstar to process (since that data hasn’t been downloaded yet). Instead of providing input, output and temp directories, you can just pass abstar a project directory using -p and abstar will create all of the necessary subdirectories within the project directory. Running abstar with the -b option indicates that input data should be downloaded from BaseSpace:

$ abstar -p /path/to/project_dir/ -b

A list of available BaseSpace projects will be displayed and you can select the appropriate project. If downloading data from BaseSpace, -m is assumed and paired-end reads will be merged.

abstar uses a human germline database by default, but germline databases are also provided for macaque, mouse and rabbit. To process macaque antibody sequences (from BaseSpace):

$ abstar -p /path/to/project_dir/ -b -s macaque