biopython v1.71.0 Bio.SeqIO.FastaIO
Bio.SeqIO support for the “fasta” (aka FastA or Pearson) file format.
You are expected to use this module via the Bio.SeqIO functions.
Link to this section Summary
Functions
Generator function to iterate over Fasta records (as SeqRecord objects)
Generator function to iterate over Fasta records (as string tuples)
Link to this section Functions
Generator function to iterate over Fasta records (as SeqRecord objects).
Arguments:
- handle - input file
- alphabet - optional alphabet
- title2ids - A function that, when given the title of the FASTA file (without the beginning >), will return the id, name and description (in that order) for the record as a tuple of strings. If this is not given, then the entire title line will be used as the description, and the first word as the id and name.
By default this will act like calling Bio.SeqIO.parse(handle, “fasta”) with no custom handling of the title lines:
>>> with open("Fasta/dups.fasta") as handle:
... for record in FastaIterator(handle):
... print(record.id)
...
alpha
beta
gamma
alpha
delta
However, you can supply a title2ids function to alter this:
>>> def take_upper(title):
... return title.split(None, 1)[0].upper(), "", title
>>> with open("Fasta/dups.fasta") as handle:
... for record in FastaIterator(handle, title2ids=take_upper):
... print(record.id)
...
ALPHA
BETA
GAMMA
ALPHA
DELTA
Generator function to iterate over Fasta records (as string tuples).
For each record a tuple of two strings is returned, the FASTA title line (without the leading ‘>’ character), and the sequence (with any whitespace removed). The title line is not divided up into an identifier (the first word) and comment or description.
>>> with open("Fasta/dups.fasta") as handle:
... for values in SimpleFastaParser(handle):
... print(values)
...
('alpha', 'ACGTA')
('beta', 'CGTC')
('gamma', 'CCGCC')
('alpha (again - this is a duplicate entry to test the indexing code)', 'ACGTA')
('delta', 'CGCGC')