Tryst between Marilyn Monroe and Albert Einstein

Tryst between Marilyn Monroe and Albert Einstein


You may have heard part of this story, where Marilyn Monroe told Einstein - “Would it not be wonderful if we had a child with your brains and my beauty?” Einstein replied promptly: “Yes, but imagine a child with my beauty and your brains!”

The rest of the story is not well-publicized (for obvious reasons). Monroe and Einstein met secretly and produced not one but two children. One of them got Einstein’s brain and Monroe’s beauty, whereas the other one inherited the opposite characteristics.

Both kids had an even bigger problems to deal with. Their parents were famous, and therefore they could not live under their real names. Instead they chose the nicknames “Bioconductor” and “tidyverse”.

Let me explain, where I am going with all that. My students from biology love tidyverse and especially dplyr. The functions are clean and easy to learn. Bioconducor, on the other hand, is ugly as hell, but that is what they are stuck with to analyze NGS data.

Switching between the libraries is not easy, because they prefer different data formats. Here again the tidyverse data syntax is logical, whereas Bioconductor seems to be designed by Monroe’s brain and Einstein’s beauty.

The following commands will help you quickly switch between two formats. Let us create a data frame for gene expression data, but in real life, you will possibly load your Kallisto or Salmon counts as data frames.

gene = c('gene1', 'gene2', 'gene3', 'gene4', 'gene5', 'gene6', 'gene7', 'gene8', 'gene9', 'gene10')
heart1=c(10,3,4,5,8,9,1,2,4,5)
kidney1=c(3,4,5,8,9,1,2,4,5,10)
brain1=c(4,5,8,9,1,2,4,5,3,2)
heart2=c(2,5,1,9,1,2,4,5,1,12)
kidney2=c(10,3,4,5,1,2,4,5,4,9)
brain2=c(8,2,7,2,1,2,4,5,3,2)

expt=data.frame(gene, heart1,kidney1,brain1,heart2, kidney2, brain2)

Going from Tidyverse Style to Bioconductor Style

Tidyverse likes the above style, whereas Bioconductor wants the gene names as the names of the columns.

expt_bioc=expt %>% select(-gene) %>% as.matrix
row.names(expt_bioc)=expt$gene
expt_bioc

Going back from Bioconductor Style to Tidyverse Style

If you stick to the Bioconductor format and operate tidyverse functions, your gene names will disappear. Therefore, you need to get them as a column first.

expt=expt_bioc %>% as.data.frame %>% rownames_to_column("gene")

Adding Row Number as Another Column

There are times you many also want to add the row number as a column. That task is simple, because you can apply “rownames_to_column” again.

expt=expt %>% rownames_to_column("id")

Please do not tell me that tidyverse (dplyr) has another function to accomplish the later task. I am trying to memorize the least number of functions to survive the R world.



Written by M. //