biopython v1.71.0 Bio.GenBank.Record.Record

Hold GenBank information in a format similar to the original record.

The Record class is meant to make data easy to get to when you are just interested in looking at GenBank data.

Attributes:

  • locus - The name specified after the LOCUS keyword in the GenBank record. This may be the accession number, or a clone id or something else.
  • size - The size of the record.
  • residue_type - The type of residues making up the sequence in this record. Normally something like RNA, DNA or PROTEIN, but may be as esoteric as ‘ss-RNA circular’.
  • data_file_division - The division this record is stored under in GenBank (ie. PLN -> plants; PRI -> humans, primates; BCT -> bacteria…)
  • date - The date of submission of the record, in a form like ‘28-JUL-1998’
  • accession - list of all accession numbers for the sequence.
  • nid - Nucleotide identifier number.
  • pid - Proteint identifier number
  • version - The accession number + version (ie. AB01234.2)
  • db_source - Information about the database the record came from
  • gi - The NCBI gi identifier for the record.
  • keywords - A list of keywords related to the record.
  • segment - If the record is one of a series, this is info about which segment this record is (something like ‘1 of 6’).
  • source - The source of material where the sequence came from.
  • organism - The genus and species of the organism (ie. ‘Homo sapiens’)
  • taxonomy - A listing of the taxonomic classification of the organism, starting general and getting more specific.
  • references - A list of Reference objects.
  • comment - Text with any kind of comment about the record.
  • features - A listing of Features making up the feature table.
  • base_counts - A string with the counts of bases for the sequence.
  • origin - A string specifying info about the origin of the sequence.
  • sequence - A string with the sequence itself.
  • contig - A string of location information for a CONTIG in a RefSeq file
  • project - The genome sequencing project numbers (will be replaced by the dblink cross-references in 2009).
  • dblinks - The genome sequencing project number(s) and other links. (will replace the project information in 2009).

Link to this section Summary

Functions

Initialize

Provide a GenBank formatted output option for a Record

Output for the ACCESSION line

Output for the BASE COUNT line with base information

Output for the COMMENT lines

Output for CONTIG location information from RefSeq

Output for DBSOURCE line

Provide output for the DEFINITION line

Output for the FEATURES line

Output for the KEYWORDS line

Provide the output string for the LOCUS line

Output for the NID line. Use of NID is obsolete in GenBank files

Output for ORGANISM line with taxonomy info

Output for the ORIGIN line

Output for PID line. Presumedly, PID usage is also obsolete

Output for the SEGMENT line

Output for all of the sequence

Output for SOURCE line on where the sample came from

Output for the VERSION line

Link to this section Functions

Initialize.

Provide a GenBank formatted output option for a Record.

The objective of this is to provide an easy way to read in a GenBank record, modify it somehow, and then output it in ‘GenBank format.’ We are striving to make this work so that a parsed Record that is output using this function will look exactly like the original record.

Much of the output is based on format description info at:

ftp://ncbi.nlm.nih.gov/genbank/gbrel.txt

Link to this function _accession_line()

Output for the ACCESSION line.

Link to this function _base_count_line()

Output for the BASE COUNT line with base information.

Link to this function _comment_line()

Output for the COMMENT lines.

Output for CONTIG location information from RefSeq.

Link to this function _db_source_line()

Output for DBSOURCE line.

Link to this function _definition_line()

Provide output for the DEFINITION line.

Link to this function _features_line()

Output for the FEATURES line.

Link to this function _keywords_line()

Output for the KEYWORDS line.

Provide the output string for the LOCUS line.

Output for the NID line. Use of NID is obsolete in GenBank files.

Link to this function _organism_line()

Output for ORGANISM line with taxonomy info.

Output for the ORIGIN line.

Output for PID line. Presumedly, PID usage is also obsolete.

Link to this function _segment_line()

Output for the SEGMENT line.

Link to this function _sequence_line()

Output for all of the sequence.

Output for SOURCE line on where the sample came from.

Link to this function _version_line()

Output for the VERSION line.