Unpacking S4 Objects in R

Modern statistics was invented by a doctor, whose income from curing people was just not enough. To make more money on the side from gambling, he came up with the earliest versions of the rules of probability.

Fast forward by a few centuries, and we have a discipline that survives by making things more confusing for the doctors, biologists and everyone else ! If you found formula syntax in R confusing, you have not seen statisticians doing object-oriented programming.

R has at least two ways to define classes - S3 and S4. They are so ugly that they remind me of the step sisters of Cinderella. Unfortunately, you cannot avoid them, if Bioconductor is part of your standard workflow. Feel free to use the following functions to make sense of the S4 classes.

Here I am creating an object “dds” using the function DESeqDataSetFromMatrix from the DESeq2 library.

dds=DESeqDataSetFromMatrix(count_table,design_table,~condition)

Here “dds” is a S4 class.

typeof(dds)

# [1] "S4"

To find more about a S4 class, use the function “attributes”.

attributes(dds)

# $design
# condition
# 
# $dispersionFunction
# function () 
# NULL
# <bytecode: 0x0000000022406e10>
# 
# $rowRanges
# GRangesList object of length 14599:
# $FBgn0000003 
# GRanges object with 0 ranges and 0 metadata columns:
#    seqnames    ranges strand
#       <Rle> <IRanges>  <Rle>
# 
# $FBgn0000008 
# GRanges object with 0 ranges and 0 metadata columns:
#      seqnames ranges strand
# 
# $FBgn0000014 
# GRanges object with 0 ranges and 0 metadata columns:
#      seqnames ranges strand
# 
# ...
# <14596 more elements>
# -------
# seqinfo: no sequences
# 
# $colData
# DataFrame with 7 rows and 2 columns
#                 names condition
#              <factor>  <factor>
# untreated1 untreated1 untreated
# untreated2 untreated2 untreated
# untreated3 untreated3 untreated
# untreated4 untreated4 untreated
# treated1     treated1   treated
# treated2     treated2   treated
# treated3     treated3   treated
# 
# $assays
# Reference class object of class "ShallowSimpleListAssays"
# Field "data":
# List of length 1
# names(1): counts
# 
# $NAMES
# `\001NULL\001`
# 
# $elementMetadata
# DataFrame with 14599 rows and 0 columns
# 
# $metadata
# $metadata$version
# [1] ‘1.22.2’
# 
# 
# $class
# [1] "DESeqDataSet"
# attr(,"package")
# [1] "DESeq2"

Well, as you can see, the S4 class “dds” is packing many different types of data. At this point, you are probably interested in knowing the fields and the tables stored in those fields.

To find the fields, use the function “slotNames”.

slotNames(dds)
# [1] "design"             "dispersionFunction" "rowRanges"         
# [4] "colData"            "assays"             "NAMES"             
# [7] "elementMetadata"    "metadata"          

To find the values in each of those slot, use S4 object name, “@” sign and then the name of the slot. Here are two examples.

dds@design
# condition

dds@colData
# DataFrame with 7 rows and 2 columns
#                 names condition
#              <factor>  <factor>
# untreated1 untreated1 untreated
# untreated2 untreated2 untreated
# untreated3 untreated3 untreated
# untreated4 untreated4 untreated
# treated1     treated1   treated
# treated2     treated2   treated
# treated3     treated3   treated

I hope the above information helps you in unpacking various S4 classes in Bioconductor and extracting individual tables out of them.

‹»Papers and Online Tutorials on RNAseq Data Analysis« »Go is Now the Best Programming Languages for Full-fledged Bioinformatics - Really?«›