biopython v1.71.0 Bio.GenBank.Record.Record

Hold GenBank information in a format similar to the original record.

The Record class is meant to make data easy to get to when you are just interested in looking at GenBank data.

Attributes:

locus - The name specified after the LOCUS keyword in the GenBank record. This may be the accession number, or a clone id or something else.
size - The size of the record.
residue_type - The type of residues making up the sequence in this record. Normally something like RNA, DNA or PROTEIN, but may be as esoteric as ‘ss-RNA circular’.
data_file_division - The division this record is stored under in GenBank (ie. PLN -> plants; PRI -> humans, primates; BCT -> bacteria…)
date - The date of submission of the record, in a form like ‘28-JUL-1998’
accession - list of all accession numbers for the sequence.
nid - Nucleotide identifier number.
pid - Proteint identifier number
version - The accession number + version (ie. AB01234.2)
db_source - Information about the database the record came from
gi - The NCBI gi identifier for the record.
keywords - A list of keywords related to the record.
segment - If the record is one of a series, this is info about which segment this record is (something like ‘1 of 6’).
source - The source of material where the sequence came from.
organism - The genus and species of the organism (ie. ‘Homo sapiens’)
taxonomy - A listing of the taxonomic classification of the organism, starting general and getting more specific.
references - A list of Reference objects.
comment - Text with any kind of comment about the record.
features - A listing of Features making up the feature table.
base_counts - A string with the counts of bases for the sequence.
origin - A string specifying info about the origin of the sequence.
sequence - A string with the sequence itself.
contig - A string of location information for a CONTIG in a RefSeq file
project - The genome sequencing project numbers (will be replaced by the dblink cross-references in 2009).
dblinks - The genome sequencing project number(s) and other links. (will replace the project information in 2009).

Link to this section Summary

Functions

__init__()

Initialize

__str__()

Provide a GenBank formatted output option for a Record

_accession_line()

Output for the ACCESSION line

_base_count_line()

Output for the BASE COUNT line with base information

_comment_line()

Output for the COMMENT lines

_contig_line()

Output for CONTIG location information from RefSeq

_db_source_line()

Output for DBSOURCE line

_definition_line()

Provide output for the DEFINITION line

_features_line()

Output for the FEATURES line

_keywords_line()

Output for the KEYWORDS line

_locus_line()

Provide the output string for the LOCUS line

_nid_line()

Output for the NID line. Use of NID is obsolete in GenBank files

_organism_line()

Output for ORGANISM line with taxonomy info

_origin_line()

Output for the ORIGIN line

_pid_line()

Output for PID line. Presumedly, PID usage is also obsolete

_segment_line()

Output for the SEGMENT line

_sequence_line()

Output for all of the sequence

_source_line()

Output for SOURCE line on where the sample came from

_version_line()

Output for the VERSION line

Link to this section Functions

__init__()

Initialize.

__str__()

Provide a GenBank formatted output option for a Record.

The objective of this is to provide an easy way to read in a GenBank record, modify it somehow, and then output it in ‘GenBank format.’ We are striving to make this work so that a parsed Record that is output using this function will look exactly like the original record.

Much of the output is based on format description info at:

ftp://ncbi.nlm.nih.gov/genbank/gbrel.txt

_accession_line()

Output for the ACCESSION line.

_base_count_line()

Output for the BASE COUNT line with base information.

_comment_line()

Output for the COMMENT lines.

_contig_line()

Output for CONTIG location information from RefSeq.

_db_source_line()

Output for DBSOURCE line.

_definition_line()

Provide output for the DEFINITION line.

_features_line()

Output for the FEATURES line.

_keywords_line()

Output for the KEYWORDS line.

_locus_line()

Provide the output string for the LOCUS line.

_nid_line()

Output for the NID line. Use of NID is obsolete in GenBank files.

_organism_line()

Output for ORGANISM line with taxonomy info.

_origin_line()

Output for the ORIGIN line.

_pid_line()

Output for PID line. Presumedly, PID usage is also obsolete.

_segment_line()

Output for the SEGMENT line.

_sequence_line()

Output for all of the sequence.

_source_line()

Output for SOURCE line on where the sample came from.

_version_line()

Output for the VERSION line.