biopython v1.71.0 Bio.SeqIO.FastaIO

Bio.SeqIO support for the “fasta” (aka FastA or Pearson) file format.

You are expected to use this module via the Bio.SeqIO functions.

Link to this section Summary

Functions

FastaIterator()

Generator function to iterate over Fasta records (as SeqRecord objects)

SimpleFastaParser()

Generator function to iterate over Fasta records (as string tuples)

Link to this section Functions

FastaIterator()

Generator function to iterate over Fasta records (as SeqRecord objects).

Arguments:

handle - input file
alphabet - optional alphabet
title2ids - A function that, when given the title of the FASTA file (without the beginning >), will return the id, name and description (in that order) for the record as a tuple of strings. If this is not given, then the entire title line will be used as the description, and the first word as the id and name.

By default this will act like calling Bio.SeqIO.parse(handle, “fasta”) with no custom handling of the title lines:

 >>> with open("Fasta/dups.fasta") as handle:
 ...     for record in FastaIterator(handle):
 ...         print(record.id)
 ...
 alpha
 beta
 gamma
 alpha
 delta

However, you can supply a title2ids function to alter this:

 >>> def take_upper(title):
 ...     return title.split(None, 1)[0].upper(), "", title
 >>> with open("Fasta/dups.fasta") as handle:
 ...     for record in FastaIterator(handle, title2ids=take_upper):
 ...         print(record.id)
 ...
 ALPHA
 BETA
 GAMMA
 ALPHA
 DELTA

SimpleFastaParser()

Generator function to iterate over Fasta records (as string tuples).

For each record a tuple of two strings is returned, the FASTA title line (without the leading ‘>’ character), and the sequence (with any whitespace removed). The title line is not divided up into an identifier (the first word) and comment or description.

 >>> with open("Fasta/dups.fasta") as handle:
 ...     for values in SimpleFastaParser(handle):
 ...         print(values)
 ...
 ('alpha', 'ACGTA')
 ('beta', 'CGTC')
 ('gamma', 'CCGCC')
 ('alpha (again - this is a duplicate entry to test the indexing code)', 'ACGTA')
 ('delta', 'CGCGC')