Does Gangolf Jobb's Treefinder Program Have Any

Does Gangolf Jobb's Treefinder Program Have Any "Value"?

We read a blog post from Keith Robison describing his miserable day and thought - “Isn’t Treefinder solving his problem?” Quotes from Robison’s blog post are below -

Miserable day today - spent my entire day wrestling with bad formats and flaky tools and trying to bull my way past them, leading to many a mad expostulation. The whole day down in the pit, with the pendulum of multiple deadlines swinging just over my head.


Let’s go back to today’s nightmare. I’m in the middle of trying to generate some pretty phylogenetic trees marked up based on metadata for the sequences and with confidence information on the tree topology. Doing this well often involves a cycle of aligning the data, marking up the tree and then discovering some glitch in the input data or the metadata.

Since this is what a lot of folks do, there should be great tools out there, right? Perhaps lying around in plain sight? Perhaps, but that’s not my experience.

First, there’s a plethora of programs for each stage of the process. Multiple aligners for protein? Well, there’s Clustal Omega, MUSCLE, MAFFT and probably a few dozen more. Each offers a different array of possible alignment outputs. Then a wealth of tree generation programs, with again a raft of formats.

Phylogenetic formats: – I feel immured by them. There’s Newick, named after New Hampshire seafood restaurant (which is why it is sometimes called New Hampshire format). There’s an extended version of Newick. There’s Nexus format. Two different XML standards: PhyloXML and NeXML. The venerable PHYLIP format. And that’s the tip of the iceberg.

I’ve seen things you people would not believe. The first problem, beyond the sheer cacophony of different formats, is that different programs support different ones – and often badly. For example, the Mr.Bayes software for estimating tree confidence (for large trees, in geologic time, unless it crashes), will write in Nexus format – and then refuse to read its own output! Perl’s Bio::TreeIO happily generates XML files that many other programs won’t read, complaining about tags that don’t belong – somebody is just plain wrong here! Ditto the various tree viewers / editors that refused to consume the XML generated by upstream programs. And at least one of these packages insists that everything after the angle bracket in a FASTA file is part of the unique identifier, which it then has the temerity to complain contains spaces!

We bring up Treefinder here, because our reader Oli wrote an interesting comment in response to Gangolf Jobb’s interview -

Its nice when people are able to do what Gangolf has dine. It proves that we live in a demoracy, which is ironic, considering why hes doing it!

In an alternative world, Gangolf could be rich, if only he had grasped the concept of capitalism. He has created software which has a very narrow range of users, but those people must use it as part of their work. Using the same licencing concept that has made him a bad person in the eyes of academics, he could have charged a fee for every document that included records created with his software. In fact, he could have given them the software for free, and only charged when it was published. That fee could have been $100 per report, and the report funders would have paid it!

He could have been very rich by today, but was too stupid and blinkered to realise that the world is a complex flow of a billion different actions and variables, and nobody actually controls it, or influences it or anything as simplistic.

There are people who believe they can make things happen. When enough other people believe in them, things happen.

To which Jobb replied -

Oli, democracy means that a state does what people want, not that a state ignores what people want but at least does not punish the critics. We definitely do not have genuine democracy in Germany, nor in the EU, nor in any western country. Except, maybe, in Switzerland.

Let us calculate how stupid it really was not to charge for Treefinder: 100 Euros (I am in Europe) per published report, multiplied by some 1000 publications so far, is some 100000 Euros. Divided by more than a decade of hard work is per year much less than social wellfare telling from the way you calculate you cannot be rich yourself, so better do not advise others how to become rich. The professors who refused to pay me decently for my work earn 100000 Euros in one and a half years. The immigrant programmers they hired in maybe two or three years.

The reason why scientific work is usually funded by tax money is that one normally cannot sell the results well.

Based on Keith Robison’s post, clearly there is a need for user-friendly phylogeny program. Is treefinder a commercially viable solution, as Oli suggests? Are you willing to pay a hard-working independent bioinformatician developing programs useful for the community, or will you only choose grant- funded free programs for your work?

Also, speaking of software related to evolutionary biology, the readers may find this new paper (“The State of Software in Evolutionary Biology”) informative.

Written by M. //