CLI Reference

abstar provides two main commands: run for annotating sequences and build_germline_database for creating custom germline databases.

abstar run

Annotate antibody or TCR sequences.

Usage:

abstar run INPUT_PATH PROJECT_PATH [OPTIONS]

Arguments:

INPUT_PATH

Path to a FASTA/Q file or a directory of FASTA/Q files. Gzip-compressed files (.gz) are supported.

PROJECT_PATH

Directory for output files, logs, and temporary data. Created if it does not exist.

Basic Examples:

# Single file
abstar run sequences.fasta output_dir/

# Directory of files
abstar run fastq_directory/ output_dir/

# Mouse sequences
abstar run sequences.fasta output_dir/ --germline_database mouse

# TCR sequences
abstar run tcr.fasta output_dir/ --receptor tcr

Germline and Receptor Options

--germline_database TEXT

Name of the germline database for assignment.

Built-in options: human (default), mouse, macaque, humouse

Custom databases in ~/.abstar/germline_dbs/ are also available.

abstar run seqs.fasta out/ --germline_database mouse
--receptor [bcr|tcr]

Receptor type. Default: bcr

abstar run tcr_seqs.fasta out/ --receptor tcr

Output Options

-o, --output_format [airr|parquet]

Output format. Can be specified multiple times for multiple formats. Default: airr

# Parquet only
abstar run seqs.fasta out/ -o parquet

# Both formats
abstar run seqs.fasta out/ -o airr -o parquet
--copy-inputs/--no-copy-inputs

Copy input files to project directory. Default: --copy-inputs

UMI Options

--umi_pattern TEXT

Pattern for UMI extraction. Use [UMI] as placeholder for the UMI sequence, with surrounding conserved sequences for anchoring.

Built-in patterns: smartseq-human-bcr

# Built-in pattern
abstar run seqs.fasta out/ --umi_pattern smartseq-human-bcr

# Custom pattern
abstar run seqs.fasta out/ --umi_pattern "[UMI]TCAGCGGGAAGACATT" --umi_length 12
--umi_length INT

Length of the UMI sequence to extract.

  • Positive value: UMI at 5’ end of sequence

  • Negative value: UMI at 3’ end of sequence

# 12bp UMI at 5' end
abstar run seqs.fasta out/ --umi_length 12

# 8bp UMI at 3' end
abstar run seqs.fasta out/ --umi_length -8

Read Merging Options

-m, --merge

Merge paired-end FASTQ files before annotation using fastp.

abstar run paired_reads/ out/ --merge
--merge_kwargs TEXT

Additional arguments for the merge function. Format: key1=val1,key2=val2

abstar run reads/ out/ --merge --merge_kwargs "minimum_overlap=20,quality_cutoff=25"
--interleaved_fastq

Input FASTQ files are interleaved (R1 and R2 alternating in single file). Implies --merge.

Performance Options

-c, --chunksize INT

Number of sequences per annotation batch. Default: 500

--mmseqs_chunksize INT

Number of sequences per MMseqs2 search batch. Default: 1000000

--mmseqs_threads INT

Number of threads for MMseqs2 searches. Default: auto-detected

--n_processes INT

Number of parallel annotation workers. Default: number of CPU cores

Logging Options

--verbose/--quiet

Enable or disable verbose output. Default: --verbose

--debug

Retain temporary files and enable detailed logging. Useful for troubleshooting.

abstar build_germline_database

Build a custom germline database from FASTA or JSON files.

Usage:

abstar build_germline_database NAME [OPTIONS]

Arguments:

NAME

Name for the new germline database.

Options:

-f, --fasta PATH

FASTA file containing gapped germline gene sequences. Can be specified multiple times.

-j, --json PATH

JSON file containing germline gene sequences. Can be specified multiple times.

-c, --constant PATH

FASTA file containing constant region sequences. Can be specified multiple times.

-m, --manifest PATH

Plain text file with database metadata (origin, date, etc.).

-r, --receptor [bcr|tcr]

Receptor type. Default: bcr

-l, --location PATH

Custom location for the database. Default: ~/.abstar/germline_dbs/

--reference TEXT

Reference species for adding IMGT gaps to ungapped sequences. Default: human

--include_species_in_name/--exclude_species_from_name

Include species in sequence names (e.g., IGHV1-2*02__homo_sapiens). Default: --include_species_in_name

Example:

abstar build_germline_database my_custom_db \
    -f v_genes.fasta \
    -f d_genes.fasta \
    -f j_genes.fasta \
    -c constant_genes.fasta \
    --receptor bcr