biopython v1.71.0 Bio.SeqFeature.FeatureLocation
Specify the location of a feature along a sequence.
The FeatureLocation is used for simple continuous features, which can be described as running from a start position to and end position (optionally with a strand and reference information). More complex locations made up from several non-continuous parts (e.g. a coding sequence made up of several exons) are described using a SeqFeature with a CompoundLocation.
Note that the start and end location numbering follow Python’s scheme, thus a GenBank entry of 123..150 (one based counting) becomes a location of [122:150] (zero based counting).
>>> from Bio.SeqFeature import FeatureLocation
>>> f = FeatureLocation(122, 150)
>>> print(f)
[122:150]
>>> print(f.start)
122
>>> print(f.end)
150
>>> print(f.strand)
None
Note the strand defaults to None. If you are working with nucleotide sequences you’d want to be explicit if it is the forward strand:
>>> from Bio.SeqFeature import FeatureLocation
>>> f = FeatureLocation(122, 150, strand=+1)
>>> print(f)
[122:150](+)
>>> print(f.strand)
1
Note that for a parent sequence of length n, the FeatureLocation start and end must satisfy the inequality 0 <= start <= end <= n. This means even for features on the reverse strand of a nucleotide sequence, we expect the ‘start’ coordinate to be less than the ‘end’.
>>> from Bio.SeqFeature import FeatureLocation
>>> r = FeatureLocation(122, 150, strand=-1)
>>> print(r)
[122:150](-)
>>> print(r.start)
122
>>> print(r.end)
150
>>> print(r.strand)
-1
i.e. Rather than thinking of the ‘start’ and ‘end’ biologically in a strand aware manor, think of them as the ‘left most’ or ‘minimum’ boundary, and the ‘right most’ or ‘maximum’ boundary of the region being described. This is particularly important with compound locations describing non-continuous regions.
In the example above we have used standard exact positions, but there are also specialised position objects used to represent fuzzy positions as well, for example a GenBank location like complement(<123..150) would use a BeforePosition object for the start.
Link to this section Summary
Functions
Combine location with another FeatureLocation object, or shift it
Check if an integer position is within the FeatureLocation object
Implement equality by comparing all the location attributes
Initialize the class
Iterate over the parent positions within the FeatureLocation object
Return the length of the region described by the FeatureLocation object
Implement the not-equal operand
Return True regardless of the length of the feature
Add a feature locationanother FeatureLocation object to the left
Represent the FeatureLocation object as a string for debugging
Return a representation of the FeatureLocation object (with python counting)
Return a copy of the location after the parent is reversed (PRIVATE)
Get function for the strand property (PRIVATE)
Set function for the strand property (PRIVATE)
Return a copy of the FeatureLocation shifted by an offset (PRIVATE)
End location - right most (maximum) value, regardless of strand
Extract the sequence from supplied parent sequence using the FeatureLocation object
End position (integer, approximated if fuzzy, read only) (OBSOLETE)
Start position (integer, approximated if fuzzy, read only) (OBSOLETE)
Read only list of sections (always one, the FeatureLocation object)
Start location - left most (minimum) value, regardless of strand
Link to this section Functions
Combine location with another FeatureLocation object, or shift it.
You can add two feature locations to make a join CompoundLocation:
>>> from Bio.SeqFeature import FeatureLocation
>>> f1 = FeatureLocation(5, 10)
>>> f2 = FeatureLocation(20, 30)
>>> combined = f1 + f2
>>> print(combined)
join{[5:10], [20:30]}
This is thus equivalent to:
>>> from Bio.SeqFeature import CompoundLocation
>>> join = CompoundLocation([f1, f2])
>>> print(join)
join{[5:10], [20:30]}
You can also use sum(…) in this way:
>>> join = sum([f1, f2])
>>> print(join)
join{[5:10], [20:30]}
Furthermore, you can combine a FeatureLocation with a CompoundLocation in this way.
Separately, adding an integer will give a new FeatureLocation with its start and end offset by that amount. For example:
>>> print(f1)
[5:10]
>>> print(f1 + 100)
[105:110]
>>> print(200 + f1)
[205:210]
This can be useful when editing annotation.
Check if an integer position is within the FeatureLocation object.
Note that extra care may be needed for fuzzy locations, e.g.
>>> from Bio.SeqFeature import FeatureLocation
>>> from Bio.SeqFeature import BeforePosition, AfterPosition
>>> loc = FeatureLocation(BeforePosition(5), AfterPosition(10))
>>> len(loc)
5
>>> [i for i in range(15) if i in loc]
[5, 6, 7, 8, 9]
Implement equality by comparing all the location attributes.
Initialize the class.
start and end arguments specify the values where the feature begins
and ends. These can either by any of the *Position
objects that
inherit from AbstractPosition, or can just be integers specifying the
position. In the case of integers, the values are assumed to be
exact and are converted in ExactPosition arguments. This is meant
to make it easy to deal with non-fuzzy ends.
i.e. Short form:
>>> from Bio.SeqFeature import FeatureLocation
>>> loc = FeatureLocation(5, 10, strand=-1)
>>> print(loc)
[5:10](-)
Explicit form:
>>> from Bio.SeqFeature import FeatureLocation, ExactPosition
>>> loc = FeatureLocation(ExactPosition(5), ExactPosition(10), strand=-1)
>>> print(loc)
[5:10](-)
Other fuzzy positions are used similarly,
>>> from Bio.SeqFeature import FeatureLocation
>>> from Bio.SeqFeature import BeforePosition, AfterPosition
>>> loc2 = FeatureLocation(BeforePosition(5), AfterPosition(10), strand=-1)
>>> print(loc2)
[<5:>10](-)
For nucleotide features you will also want to specify the strand, use 1 for the forward (plus) strand, -1 for the reverse (negative) strand, 0 for stranded but strand unknown (? in GFF3), or None for when the strand does not apply (dot in GFF3), e.g. features on proteins.
>>> loc = FeatureLocation(5, 10, strand=+1)
>>> print(loc)
[5:10](+)
>>> print(loc.strand)
1
Normally feature locations are given relative to the parent sequence you are working with, but an explicit accession can be given with the optional ref and db_ref strings:
>>> loc = FeatureLocation(105172, 108462, ref="AL391218.9", strand=1)
>>> print(loc)
AL391218.9[105172:108462](+)
>>> print(loc.ref)
AL391218.9
Iterate over the parent positions within the FeatureLocation object.
>>> from Bio.SeqFeature import FeatureLocation
>>> from Bio.SeqFeature import BeforePosition, AfterPosition
>>> loc = FeatureLocation(BeforePosition(5), AfterPosition(10))
>>> len(loc)
5
>>> for i in loc: print(i)
5
6
7
8
9
>>> list(loc)
[5, 6, 7, 8, 9]
>>> [i for i in range(15) if i in loc]
[5, 6, 7, 8, 9]
Note this is strand aware:
>>> loc = FeatureLocation(BeforePosition(5), AfterPosition(10), strand = -1)
>>> list(loc)
[9, 8, 7, 6, 5]
Return the length of the region described by the FeatureLocation object.
Note that extra care may be needed for fuzzy locations, e.g.
>>> from Bio.SeqFeature import FeatureLocation
>>> from Bio.SeqFeature import BeforePosition, AfterPosition
>>> loc = FeatureLocation(BeforePosition(5), AfterPosition(10))
>>> len(loc)
5
Implement the not-equal operand.
Return True regardless of the length of the feature.
This behaviour is for backwards compatibility, since until the len method was added, a FeatureLocation always evaluated as True.
Note that in comparison, Seq objects, strings, lists, etc, will all evaluate to False if they have length zero.
WARNING: The FeatureLocation may in future evaluate to False when its length is zero (in order to better match normal python behaviour)!
Add a feature locationanother FeatureLocation object to the left.
Represent the FeatureLocation object as a string for debugging.
Return a representation of the FeatureLocation object (with python counting).
For the simple case this uses the python splicing syntax, [122:150] (zero based counting) which GenBank would call 123..150 (one based counting).
Return a copy of the location after the parent is reversed (PRIVATE).
Get function for the strand property (PRIVATE).
Set function for the strand property (PRIVATE).
Return a copy of the FeatureLocation shifted by an offset (PRIVATE).
End location - right most (maximum) value, regardless of strand.
Read only, returns an integer like position object, possibly a fuzzy position.
Extract the sequence from supplied parent sequence using the FeatureLocation object.
The parent_sequence can be a Seq like object or a string, and will generally return an object of the same type. The exception to this is a MutableSeq as the parent sequence will return a Seq object.
>>> from Bio.Seq import Seq
>>> from Bio.Alphabet import generic_protein
>>> from Bio.SeqFeature import FeatureLocation
>>> seq = Seq("MKQHKAMIVALIVICITAVVAAL", generic_protein)
>>> feature_loc = FeatureLocation(8, 15)
>>> feature_loc.extract(seq)
Seq('VALIVIC', ProteinAlphabet())
End position (integer, approximated if fuzzy, read only) (OBSOLETE).
This is now an alias for int(feature.end), which should be used in preference — unless you are trying to support old versions of Biopython.
Start position (integer, approximated if fuzzy, read only) (OBSOLETE).
This is now an alias for int(feature.start), which should be used in preference — unless you are trying to support old versions of Biopython.
Read only list of sections (always one, the FeatureLocation object).
This is a convenience property allowing you to write code handling both simple FeatureLocation objects (with one part) and more complex CompoundLocation objects (with multiple parts) interchangeably.
Start location - left most (minimum) value, regardless of strand.
Read only, returns an integer like position object, possibly a fuzzy position.