biopython v1.71.0 Bio.AlignIO.MafIO.MafIndex

Index for a MAF file.

The index is a sqlite3 database that is built upon creation of the object if necessary, and queried when methods search or get_spliced are used.

Link to this section Summary

Functions

Perform basic sanity checks upon loading an existing index (PRIVATE)

Indexes or loads the index of a MAF file

Return the number of records in the index

Return index information for each bundle (PRIVATE)

Read MAF file and generate SQLite index (PRIVATE)

Retrieve a single MAF record located at the offset provided (PRIVATE)

Find bins that a region may belong to (PRIVATE)

Return the smallest bin a given region will fit into (PRIVATE)

Return a multiple alignment of the exact sequence range provided

Search index database for MAF records overlapping ranges provided

Link to this section Functions

Link to this function __check_existing_db()

Perform basic sanity checks upon loading an existing index (PRIVATE).

Indexes or loads the index of a MAF file.

Return the number of records in the index.

Link to this function __maf_indexer()

Return index information for each bundle (PRIVATE).

Yields index information for each bundle in the form of (bin, start, end, offset) tuples where start and end are 0-based inclusive coordinates.

Link to this function __make_new_index()

Read MAF file and generate SQLite index (PRIVATE).

Retrieve a single MAF record located at the offset provided (PRIVATE).

Find bins that a region may belong to (PRIVATE).

Converts a region to a list of bins that it may belong to, including largest and smallest bins.

Return the smallest bin a given region will fit into (PRIVATE).

Adapted from http://genomewiki.ucsc.edu/index.php/Bin_indexing_system

Return a multiple alignment of the exact sequence range provided.

Accepts two lists of start and end positions on target_seqname, representing exons to be spliced in silico. Returns a MultipleSeqAlignment of the desired sequences spliced together.

starts should be a list of 0-based start coordinates of segments in the reference. ends should be the list of the corresponding segment ends (in the half-open UCSC convention: http://genome.ucsc.edu/blog/the-ucsc-genome-browser-coordinate-counting-systems/).

To ask for the alignment portion corresponding to the first 100 nucleotides of the reference sequence, you would use search([0], [100])

Search index database for MAF records overlapping ranges provided.

Returns MultipleSeqAlignment results in order by start, then end, then internal offset field.

starts should be a list of 0-based start coordinates of segments in the reference. ends should be the list of the corresponding segment ends (in the half-open UCSC convention: http://genome.ucsc.edu/blog/the-ucsc-genome-browser-coordinate-counting-systems/).