biopython v1.71.0 Bio.AlignIO.MauveIO
Bio.AlignIO support for “xmfa” output from Mauve/ProgressiveMauve.
You are expected to use this module via the Bio.AlignIO functions (or the Bio.SeqIO functions if you want to work directly with the gapped sequences).
For example, consider a progressiveMauve alignment file containing the following::
> 1:0-0 + a.fa
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
> 2:5417-5968 + b.fa
TTTAAACATCCCTCGGCCCGTCGCCCTTTTATAATAGCAGTACGTGAGAGGAGCGCCCTAAGCTTTGGGAAATTCAAGC-
--------------------------------------------------------------------------------
CTGGAACGTACTTGCTGGTTTCGCTACTATTTCAAACAAGTTAGAGGCCGTTACCTCGGGCGAACGTATAAACCATTCTG
> 3:9476-10076 - c.fa
TTTAAACACCTTTTTGGATG--GCCCAGTTCGTTCAGTTGTG-GGGAGGAGATCGCCCCAAACGTATGGTGAGTCGGGCG
TTTCCTATAGCTATAGGACCAATCCACTTACCATACGCCCGGCGTCGCCCAGTCCGGTTCGGTACCCTCCATGACCCACG
---------------------------------------------------------AAATGAGGGCCCAGGGTATGCTT
=
> 2:5969-6015 + b.fa
-----------------------
GGGCGAACGTATAAACCATTCTG
> 3:9429-9476 - c.fa
TTCGGTACCCTCCATGACCCACG
AAATGAGGGCCCAGGGTATGCTT
This is a multiple sequence alignment with multiple aligned sections, so you would probably load this using the Bio.AlignIO.parse() function:
>>> from Bio import AlignIO
>>> align = AlignIO.parse("Mauve/simple.xmfa", "mauve")
>>> alignments = list(align)
>>> for aln in alignments:
... print(align)
SingleLetterAlphabet() alignment with 3 rows and 240 columns
--------------------------------------------...--- 1
TTTAAACATCCCTCGGCCCGTCGCCCTTTTATAATAGCAGTACG...CTG 2
TTTAAACACCTTTTTGGATG--GCCCAGTTCGTTCAGTTGTG-G...CTT 3
SingleLetterAlphabet() alignment with 3 rows and 46 columns
---------------------------------------------- 1
-----------------------GGGCGAACGTATAAACCATTCTG 2
TTCGGTACCCTCCATGACCCACGAAATGAGGGCCCAGGGTATGCTT 3
Additional information is extracted from the XMFA file and available through the annotation attribute of each record::
>>> for record in alignments[0]:
... print record.id, len(record), record.annotations
1 240 {'start': 0, 'end': 0, 'strand': 1}
2 240 {'start': 5417, 'end': 5968, 'strand': 1}
3 240 {'start': 9476, 'end': 10076, 'strand': -1}
Link to this section Summary
Functions
Return (name, start, end) string tuple from an identifier (PRIVATE)
Link to this section Functions
Link to this function
_identifier_split()
Return (name, start, end) string tuple from an identifier (PRIVATE).