biopython v1.71.0 Bio.Seq
Provide objects to represent biological sequences with alphabets.
See also the Seq_ wiki and the chapter in our tutorial:
HTML Tutorial
_PDF Tutorial
_
.. Seq: http://biopython.org/wiki/Seq
.. HTML Tutorial
: http://biopython.org/DIST/docs/tutorial/Tutorial.html
.. _PDF Tutorial
: http://biopython.org/DIST/docs/tutorial/Tutorial.pdf
Link to this section Summary
Functions
Make a python string translation table (PRIVATE)
Run the Bio.Seq module’s doctests (PRIVATE)
Translate nucleotide string into a protein string (PRIVATE)
Return the RNA sequence back-transcribed into DNA
Return the complement sequence of a nucleotide string
Return the reverse complement sequence of a nucleotide string
Transcribe a DNA sequence into RNA
Translate a nucleotide sequence into amino acids
Link to this section Functions
Make a python string translation table (PRIVATE).
Arguments:
- complement_mapping - a dictionary such as ambiguous_dna_complement and ambiguous_rna_complement from Data.IUPACData.
Returns a translation table (a string of length 256) for use with the python string’s translate method to use in a (reverse) complement.
Compatible with lower case and upper case sequences.
For internal use only.
Run the Bio.Seq module’s doctests (PRIVATE).
Translate nucleotide string into a protein string (PRIVATE).
Arguments:
- sequence - a string
- table - a CodonTable object (NOT a table name or id number)
- stop_symbol - a single character string, what to use for terminators.
- to_stop - boolean, should translation terminate at the first in frame stop codon? If there is no in-frame stop codon then translation continues to the end.
- pos_stop - a single character string for a possible stop codon (e.g. TAN or NNN)
- cds - Boolean, indicates this is a complete CDS. If True, this checks the sequence starts with a valid alternative start codon (which will be translated as methionine, M), that the sequence length is a multiple of three, and that there is a single in frame stop codon at the end (this will be excluded from the protein sequence, regardless of the to_stop option). If these tests fail, an exception is raised.
- gap - Single character string to denote symbol used for gaps. Defaults to None.
Returns a string.
e.g.
>>> from Bio.Data import CodonTable
>>> table = CodonTable.ambiguous_dna_by_id[1]
>>> _translate_str("AAA", table)
'K'
>>> _translate_str("TAR", table)
'*'
>>> _translate_str("TAN", table)
'X'
>>> _translate_str("TAN", table, pos_stop="@")
'@'
>>> _translate_str("TA?", table)
Traceback (most recent call last):
...
TranslationError: Codon 'TA?' is invalid
In a change to older versions of Biopython, partial codons are now always regarded as an error (previously only checked if cds=True) and will trigger a warning (likely to become an exception in a future release).
If cds=True, the start and stop codons are checked, and the start codon will be translated at methionine. The sequence must be an while number of codons.
>>> _translate_str("ATGCCCTAG", table, cds=True)
'MP'
>>> _translate_str("AAACCCTAG", table, cds=True)
Traceback (most recent call last):
...
TranslationError: First codon 'AAA' is not a start codon
>>> _translate_str("ATGCCCTAGCCCTAG", table, cds=True)
Traceback (most recent call last):
...
TranslationError: Extra in frame stop codon found.
Return the RNA sequence back-transcribed into DNA.
If given a string, returns a new string object.
Given a Seq or MutableSeq, returns a new Seq object with an RNA alphabet.
Trying to transcribe a protein or DNA sequence raises an exception.
e.g.
>>> back_transcribe("ACUGN")
'ACTGN'
Return the complement sequence of a nucleotide string.
If given a string, returns a new string object. Given a Seq or a MutableSeq, returns a new Seq object with the same alphabet.
Supports unambiguous and ambiguous nucleotide sequences.
e.g.
>>> complement("ACTG-NH")
'TGAC-ND'
Return the reverse complement sequence of a nucleotide string.
If given a string, returns a new string object. Given a Seq or a MutableSeq, returns a new Seq object with the same alphabet.
Supports unambiguous and ambiguous nucleotide sequences.
e.g.
>>> reverse_complement("ACTG-NH")
'DN-CAGT'
Transcribe a DNA sequence into RNA.
If given a string, returns a new string object.
Given a Seq or MutableSeq, returns a new Seq object with an RNA alphabet.
Trying to transcribe a protein or RNA sequence raises an exception.
e.g.
>>> transcribe("ACTGN")
'ACUGN'
Translate a nucleotide sequence into amino acids.
If given a string, returns a new string object. Given a Seq or MutableSeq, returns a Seq object with a protein alphabet.
Arguments:
- table - Which codon table to use? This can be either a name (string), an NCBI identifier (integer), or a CodonTable object (useful for non-standard genetic codes). Defaults to the “Standard” table.
- stop_symbol - Single character string, what to use for any terminators, defaults to the asterisk, “*”.
- to_stop - Boolean, defaults to False meaning do a full translation continuing on past any stop codons (translated as the specified stop_symbol). If True, translation is terminated at the first in frame stop codon (and the stop_symbol is not appended to the returned protein sequence).
- cds - Boolean, indicates this is a complete CDS. If True, this checks the sequence starts with a valid alternative start codon (which will be translated as methionine, M), that the sequence length is a multiple of three, and that there is a single in frame stop codon at the end (this will be excluded from the protein sequence, regardless of the to_stop option). If these tests fail, an exception is raised.
- gap - Single character string to denote symbol used for gaps. Defaults to None.
A simple string example using the default (standard) genetic code:
>>> coding_dna = "GTGGCCATTGTAATGGGCCGCTGAAAGGGTGCCCGATAG"
>>> translate(coding_dna)
'VAIVMGR*KGAR*'
>>> translate(coding_dna, stop_symbol="@")
'VAIVMGR@KGAR@'
>>> translate(coding_dna, to_stop=True)
'VAIVMGR'
Now using NCBI table 2, where TGA is not a stop codon:
>>> translate(coding_dna, table=2)
'VAIVMGRWKGAR*'
>>> translate(coding_dna, table=2, to_stop=True)
'VAIVMGRWKGAR'
In fact this example uses an alternative start codon valid under NCBI table 2, GTG, which means this example is a complete valid CDS which when translated should really start with methionine (not valine):
>>> translate(coding_dna, table=2, cds=True)
'MAIVMGRWKGAR'
Note that if the sequence has no in-frame stop codon, then the to_stop argument has no effect:
>>> coding_dna2 = "GTGGCCATTGTAATGGGCCGC"
>>> translate(coding_dna2)
'VAIVMGR'
>>> translate(coding_dna2, to_stop=True)
'VAIVMGR'
NOTE - Ambiguous codons like “TAN” or “NNN” could be an amino acid or a stop codon. These are translated as “X”. Any invalid codon (e.g. “TA?” or “T-A”) will throw a TranslationError.
It will however translate either DNA or RNA.