Tutorials

Enjoy This Site? Join Our Remote R/Bioinformatics Classes

Note: These tutorials are incomplete. More complete versions are being made available for our members. Sign up for free.

Programming Languages

Why do so many programming languages exist? Why do some programming languages fail? Why do some obsolete languages like LISP get revived many years later?

A computer or rather the ALU of the computer speaks only one language consisting of 0s and 1s. The closest human abstraction to the language looks like the following code:

<textarea height=70> .L3: movl (%r12,%rsi), %ecx movzbl %cl, %eax movzbl BitReverseTable256(%rax), %edx movl %ecx, %eax shrl $24, %eax mov %eax, %eax movzbl BitReverseTable256(%rax), %eax sall $24, %edx orl %eax, %edx movzbl %ch, %eax shrl $16, %ecx movzbl BitReverseTable256(%rax), %eax movzbl %cl, %ecx sall $16, %eax orl %eax, %edx movzbl BitReverseTable256(%rcx), %eax sall $8, %eax orl %eax, %edx movl %edx, (%r13,%rsi) addq $4, %rsi cmpq $400000000, %rsi jne .L3 </textarea>

The above code fragment is talking about moving numbers between the RAM and registers in ALU, and performing various arithmetic/logical operations on the numbers in various registers. Every other code, from C++ to python, needs to be converted to the above form (assembly language) before the ALU can understand and execute it.

Few points -

(a) The number of ‘opcodes’ (such as movl, movzbl, mov, compq, addq, etc.) known by an ALU is very small. All fancy codes we write in higher level languages finally get reduced to that tiny set.

(b) If code 1 appears elegant in high level language, but is inefficient in assembly language, it is inefficient. No ifs and buts.

With the above introduction, let us revisit the solution proposed for bit reversal. Point 1. You can see how much thought was given in the first response to accommodate flow of data between memory, cache and registers. The relative efficiencies of two proposed solutions (bit twiddling and memory lookup) will vary depending on the how much cache is added in the processor. Older processors with no cache will keep the lookup table in RAM. Point 2. The bit twiddling solutions are elegant, because they try to solve problems by using the ALUs and registers as much as possible. At that level, whether a processor does branching efficiently will have much say on whether a program runs efficiently. We rarely think about those differences in our programming and hope that the compilers will take care of processor-related differences. Point 3. If Intel ever adds another opcode called ‘bit reverse’ or ‘bre’ to its set of codes for ALU, all discussions for bit reversal would be redundant. Only one line of program will be needed. bre %rsi That is how efficient a hardware-based solution can be, but, unfortunately, all problems cannot be solved by hardware.

How to deal with large numbers?

Multiple spots.

Software-based solution of bioinformatics problems focus on three aspects - algorithm, data structure and programming language. We will consider algorithms in section 2, and here we discuss language and data structure.

Why Do so Many Programming Languages Exist?

If any language can be used to solve any computing problem, why do we have so many? The reason for proliferation of programming languages is not too different from the existence of multitude of human languages. Each language was first adopted by a small community to satisfy specific computing needs. Over time, some of those languages evolved to find widespread use, while many others did not survive the race to be among the fittest.

In this context, it is important to note that the only ‘programming language’ a computer understands contains two ‘commands’ - 0 and 1. That language happens to be the hardest one for human programmers. So the need arose for human-like languages. However, any human-like code needs to be ultimately translated to 0s and 1s, and therefore a programming language providing closest abstraction of hardware can be converted most efficiently. The proliferation of languages over the last 50 years was related to balancing of those two aspects. In the next subsection, we will look into how C, C++, PERL, python and R came into existence.

data structure Only one possible - a large memory block. The rest are all abstractions.

Lisp, C, C++ and Java

Lisp created in 1958 is one of the earliest surviving programming languages. Lisp syntax was modeled following human logic. However, C language, developed a decade later at AT&T Bell Labs, rapidly took over Lisp in popularity. The success of C came from its being the best abstraction of hardware architecture. C codes could be translated to various hardware platforms very efficiently, and was used to develop Unix operating system, another lasting contribution of the same Bell Labs group.

C++, developed a decade later, closely followed the syntax of C, but made several improvements. By that time, C was in widespread use and code maintenance and reuse started to become big problems. Computer hardware itself also improved over the decade and allowed room for more human-like programming language. C++ introduced several constructs like classes, etc. to make codes more reusuable.

Java, designed in mid-90s, closely followed the syntax of C and C++, but removed all C features tied to specific hardware architectures. That way Java programmers did not need to worry about which type of computer their program ran on.

PERL, Python and R

All programs written in C, C++ and Java need to be ‘compiled’ so that human-readable form (English text) gets translated to a form understood by computers (0s and 1s). Usually, the step of compilation is done separately prior to running the programs on real data. Morever, programs written in C, C++ and Java have to follow very rigid structure so that the compilation process produces the most efficient code. Those sequence of steps were helpful, if computers were used for solving large problems, but were quite inefficient for using computers to do some quick and dirty work (such as removing the first column of a large table).

Thanks to Moore’s law, computers became quite powerful by early 1990s making room for little bit of sloppiness in programming. PERL came from UNIX shell programming. (1989) Python came from C. (1989) They do not need to be compiled. –> Interpreted.

Computers became even faster allowing room for another level of abstraction–> R (1993)

Haskell and Clojure The only ‘programming language’ a computer understands has two ‘commands’ – 0 and 1. That language also happens to be the one that human programmers have the most difficulty with, and so the need arose for human-like languages. However, it could never be forgotten that all codes developed in human-like abstract languages had to be translated into 0s and 1s at the end of the day. The proliferation of programming languages over the last 50 years was related to balancing the above two aspects. At first, programs were written in assembly language, which was nothing but Anglicized machine code. When computers became larger and programmers felt a bit more ‘wasteful’, languages like C and PASCAL came in. It took another 30 years for computers to be large enough to allow domination of scripting languages (PERL, python, ruby, PHP) over C and C++. The computing hardware world went through another major transformation over the last five years to support another upgrade in programming paradigm. It is the introduction of many cores. The older languages like C or C++ cannot be programmed seamlessly, while taking advantage of many cores. That is where Haskell, a functional language, can help. Please note that we are not saying that multi-cores cannot be programmed in C/C++ or programmed efficiently in C/C++. Haskell will make it easy to write scalable code without significant loss in speed. The transformation we are suggesting is similar to the one from assembly language to C. Most programmers knew that assembly codes were efficient, but those, who continued to write assembly language code, did not manage to build very complex applications. Here we will add links to few Haskell resources, if you are interested to get started. A. SO Comment: The best place to start with is to read the first comment in this stackoverflow exchange. B. Book ‘Learn You a Haskell’: freely available online C. Book ‘ Real World Haskell’: freely available online D. Haskell 99 problems: link E. Project Euler problems: link (I am starting to think that Euler was a smart kid given how he managed to impress both bioinformaticians and Haskell programmers. Was he from Harvard?). F. GHC compiler: Download G. Haskell online manual: Link H. A set of 13 video lectures on functional programming: Link

An introduction to Monads !! Enjoy !!!

Here is a comparison of Python vs Haskell. Funniest response is shown below.

Speed comparison of Haskell and Python is here -

Based on my understand, Haskell programs run slightly slower than C/C++, but often faster than PERL/python (without PyPy) in single CPU. The biggest advantage is in how easy it is to modigy codes to take advantage of multiple cores.