Description of file formats used in D-Genies

PAF (Pairwise mApping Format)

Default output format of minimap2.

It's a tabulated file. Description of columns below.

Col Type Description
1 string Query sequence name
2 int Query sequence length
3 int Query start coordinate (0-based)
4 int Query end coordinate (0-based)
5 char ‘+’ if query/target on the same strand; ‘-’ if opposite
6 string Target sequence name
7 int Target sequence length
8 int Target start on original strand (0-based)
9 int Target end on original strand (0-based)
10 int Number of matching bases in the mapping
11 int Number bases, including gaps, in the mapping
12 int Mapping quality (0-255; 255 for missing)

Column 11 gives the total number of sequence matches, mismatches and gaps in the alignment; column 10 divided by column 11 gives the BLAST-like alignment identity.

PAF may optionally have additional fields in the SAM-like typed key-value format. Minimap2 may output the following tags:

Tag Type Description
tp A Type of aln: P/primary, S/secondary and I,i/inversion
cm i Number of minimizers on the chain
s1 i Chaining score
s2 i Chaining score of the best secondary chain
NM i Total number of mismatches and gaps in the alignment
AS i DP alignment score
ms i DP score of the max scoring segment in the alignment
nn i Number of ambiguous bases in the alignment

Source: minimap2 documentation.

Maf (Multiple Alignment File)

Description of the format is available here.

Index file

Index files used in D-Genies are built as follow.

First line contains the name of the sample. Next lines describes contigs of the sample. They are composed of two columns, tab separated. First it the name of the contig, second it's size in bases.

Example:

Homo sapiens  
chr1    248956422  
chr2    242193529  
chr3    198295559

Backup file

Backup file is a TAR archive. It contains three files:

  • The alignment file, in paf format, named map.paf.
  • The target index, named target.idx.
  • The query index, named query.idx.

Names of files must be kept. Otherwise, the backup file will not be accepted by the run form.