biopython v1.71.0 Bio.AlignIO.MafIO.MafIndex

Index for a MAF file.

The index is a sqlite3 database that is built upon creation of the object if necessary, and queried when methods search or get_spliced are used.

Link to this section Summary

Functions

__check_existing_db()

Perform basic sanity checks upon loading an existing index (PRIVATE)

__init__()

Indexes or loads the index of a MAF file

__len__()

Return the number of records in the index

__maf_indexer()

Return index information for each bundle (PRIVATE)

__make_new_index()

Read MAF file and generate SQLite index (PRIVATE)

_get_record()

Retrieve a single MAF record located at the offset provided (PRIVATE)

_region2bin()

Find bins that a region may belong to (PRIVATE)

_ucscbin()

Return the smallest bin a given region will fit into (PRIVATE)

get_spliced()

Return a multiple alignment of the exact sequence range provided

search()

Search index database for MAF records overlapping ranges provided

Link to this section Functions

__check_existing_db()

Perform basic sanity checks upon loading an existing index (PRIVATE).

__init__()

Indexes or loads the index of a MAF file.

__len__()

Return the number of records in the index.

__maf_indexer()

Return index information for each bundle (PRIVATE).

Yields index information for each bundle in the form of (bin, start, end, offset) tuples where start and end are 0-based inclusive coordinates.

__make_new_index()

Read MAF file and generate SQLite index (PRIVATE).

_get_record()

Retrieve a single MAF record located at the offset provided (PRIVATE).

_region2bin()

Find bins that a region may belong to (PRIVATE).

Converts a region to a list of bins that it may belong to, including largest and smallest bins.

_ucscbin()

Return the smallest bin a given region will fit into (PRIVATE).

Adapted from http://genomewiki.ucsc.edu/index.php/Bin_indexing_system

get_spliced()

Return a multiple alignment of the exact sequence range provided.

Accepts two lists of start and end positions on target_seqname, representing exons to be spliced in silico. Returns a MultipleSeqAlignment of the desired sequences spliced together.

starts should be a list of 0-based start coordinates of segments in the reference. ends should be the list of the corresponding segment ends (in the half-open UCSC convention: http://genome.ucsc.edu/blog/the-ucsc-genome-browser-coordinate-counting-systems/).

To ask for the alignment portion corresponding to the first 100 nucleotides of the reference sequence, you would use search([0], [100])

search()

Search index database for MAF records overlapping ranges provided.

Returns MultipleSeqAlignment results in order by start, then end, then internal offset field.