R is the Most Powerful Language, but not for Bioinformatics

We often make fun of bold claims about “the best programming language for bioinformatics” based on benchmarking, but Paul Graham explained in a convincing blog post that programming languages do differ in power. R is designed after the “most powerful language” described by him.

I’ll begin with a shockingly controversial statement: programming languages vary in power.

Few would dispute, at least, that high level languages are more powerful than machine language. Most programmers today would agree that you do not, ordinarily, want to program in machine language. Instead, you should program in a high-level language, and have a compiler translate it into machine language for you. This idea is even built into the hardware now: since the 1980s, instruction sets have been designed for compilers rather than human programmers.

Everyone knows it’s a mistake to write your whole program by hand in machine language. What’s less often understood is that there is a more general principle here: that if you have a choice of several languages, it is, all other things being equal, a mistake to program in anything but the most powerful one. [3]

Last year, I was reading “Advanced R” by Hadley Wickham and discovered that it had features similar to “the most powerful language” described by Graham (namely LISP). That got me curious, and I looked into the history of R. I found that R was developed following Scheme, one of three major LISP dialects.

Functional programming is gaining popularity in the programming world. If you enjoy functional programming, you will like many R features attractive. As a common rule, functional paradigms discourage the use of “if” and “for” statements. Instead the same tasks can be performed in R using vectors and logical comparisons. Here is how you can check whether a number N (=2669) is prime.

N=2669
Remainders=N %%(2:(N-1))
((Remainders==0) %>% sum)==0

The most powerful feature of LISP-like languages is metaprogramming, or a program modifying itself. Metaprogramming allows adapting a language to different problem domains by creating DSLs or domain-specific languages. You can read about the metaprogramming aspects of R here.

In an ironic twist, when R got its first big opportunity to create a “domain-specific language”, it instead took a complete U-turn into object-oriented paradigm. S4 vectors were introduced in 2001 as part of the Bioconductor development. We will discuss that change and its ramifications in a later commentary.

‹»Genomfart« »Using Synteny in Genome Assembly, an Interesting New Direction?«›