Bioinformatics


Training Approach in Evo and Evo2

In the earlier posts of this series (here, here, here and here), we covered the mathematical and biological aspects of evo and evo2. One important topic that we have not covered yet is how the models were trained.

Massively Parameterized Statistics

In this article, I will argue that Multi Parameter Statistics, or even better, Massively Parameterized Statistics (MPS) better describes the application of AI models in biology and medicine. Also, I will introduce you to a new preprint on DNA sequence modeling that claims to match evo.

Biological Aspects of Evo and Evo2 - Semantic Mining

In the last three posts of this series (here, here and here), we covered the mathematical aspect of evo and evo2. Let us now discuss the biological findings from these models. It will take multiple posts to go over these topics.

StripedHyena in Evo and Evo2

In the first two posts of this series (here and here), we covered the AI-related mathematical concepts applied to evo and evo2. Before moving on to the biological side, here is one last post on the model.

Evo and Evo2 - Math and Algorithm

In the first post of this series, we covered the basic technical terms of the evo and evo2 papers. We also mentioned the key technological innovation that made their work possible. That led to the question - if they were using fast fourier transform (FFT), were they using convolutional neural network (CNN)? The answer is no. The computer science work done by the Stanford group is quite groundbreaking. Let me go over that in detail.

Discussing the Evo and Evo2 Papers

Two recent papers applying AI-related large language models on DNA sequences are gaining a lot of attentions and a bit of controversy. The first paper titled Sequence Modeling and Design from Molecular to Genome Scale with Evo wrote -

Trained on 2.7M prokaryotic and phage genomes, Evo can generalize across the three fundamental modalities of the central dogma of molecular biology to perform zero-shot function prediction that is competitive with, or outperforms, leading domain-specific language models. Evo also excels at multi-element generation tasks, which we demonstrate by generating synthetic CRISPR-Cas molecular complexes and entire transposable systems for the first time. Using information learned over whole genomes, Evo can also predict gene essentiality at nucleotide resolution and can generate coding-rich sequences up to 650 kb in length, orders of magnitude longer than previous methods.

Rules of the Genomes

What are the rules of the genomes? What patterns do the genome sequences follow? What biochemical and evolutionary mechanisms are behind these patterns? Are newly published genomes and pangenomes displaying many exceptions to the rules, or do they all confirm the expected patterns?

Did They Fake Their Entire NGS Experiment?

In NGS experiments, when the researchers encounter issues with genome assembly or analysis, they go back to the raw data composed of sequencing reads. In a latest preprint submitted to zenodo, Steven C. Quay did exactly that for a seminal paper and concluded - “The alternative conclusion is that this sample was not a fecal specimen but was contrived. The data cannot, however, distinguish between a non-fecal specimen that came from true field work on the one hand and a specimen created de novo in the laboratory on the other hand.” This is no simple matter, because the entire world had been running like headless chicken for the last two years relying on the genome assembly submitted in the paper.

Another Unusual Connection Between Covid and AIDS

In early 2020, Prashant Pradhan and collaborators posted a preprint titled “Uncanny similarity of unique inserts in the 2019-nCoV spike protein to HIV-1 gp120 and Gag” in biorxiv. Based on the released emails from NIH under FOIA, we now know that this article and its coverage in zerohedge upset Fauci so much that he immediately convened an urgent meeting of virologists and several health bureaucrats from US, UK and Europe. All details of this meeting had been redacted, but the virologists present in the meeting fast-tracked a Nature Medicine paper claiming the virus definitely came from animals even though they described it as lab-engineered in their private emails. This paper was then used for over one year to censor all counter-arguments. Especially, biorxiv retracted the preprint due to intense pressure and thus destroyed its reputation as a preprint server for good.

Leaky Vaccines, Freaky Mutants and Viral Quasiparticle Swarms

Dishonest Trevor Bedford Wins Howard Hughes and MacArthur Awards

US establishment biologists are so tone-deaf that they gave Trevor Bedford both Howard Hughes and MacArthur awards. These same people also scream at the top of their lungs - “Trust the experts”. Here is what I got by trusting “experts” like Trevor Bedford.

Is It Time to Retract All Papers from Zhengli Shi and Peter Daszak?

This DEFUSE Grant Proposal is the Scariest Document I Have Ever Read

Yesterday, an explosive set of leaked documents on the origin of SARS-CoV-2 virus got released by DRASTIC. People following the topic are describining them as “worse than the Chernobyl in the biology field”. In my opinion, this release changed the entire understanding of the origin of the pandemic and exposed a group of people as extremely wicked, shockingly evil and vile (sorry to borrow the movie name). Let me explain why.

Three Major Breakthroughs on the Origin of Covid

The scientists looking for the true origin of the Wuhan flu are excited about three major developments over the last few weeks. Let me list them here.

Pushback against Using PCA, tSNE and UMAP in Biology

Shotgun Development Biology

In an earlier post, I wrote about five open problems in bioinformatics. In the next several posts, I will select each of them and discuss in some detail. The current post is on the shotgun development biology experiments and related challenges.

Top Five Open Problems in Bioinformatics (2021)

In twitter, a number of researchers are discussing about the open problems in bioinformatics. Therefore, I wanted to share a set of unsolved problems I am curious about. Please tweet your suggestions in reply to this tweet, and I will add them below with your name.

Genome Assembly Experts - Was RaTG13 Fraudulently Constructed?

I like to make our readers be aware of the Chinese publications from where the claim that the virus came from bat originated. The key sequence to understand the origin of Covid is RaTG13, which you can download from here. You can also download the raw data files from NCBI SRA (SRX7724752 and SRX8357956).

Covid Coverup Party (CCP) Invades Zenodo

Over the last eighteen months, biologists funded by the NIH participated in a massive coverup of the origin of the covid virus. Now that they are exposed, these people are acting rather strangely, reminding me of cockroaches running away from the shining light. We like to provide our readers with a detailed overview so that they can get entertained by the actions of these lowly creatures. I have not been so amused ever since Dan Graur vanquised the ENCODE team in 2013 (check “On the Immortality of Television Sets: “Function” in the Human Genome According to the Evolution-Free Gospel of ENCODE”), but ENCODE, to its credit, did not get anyone killed (apart from science itself).

Important Covid-related Datasets Disappeared from NCBI SRA

The origin of SARS coronavirus causing the pandemic is still a mystery due to paucity of early data. This is puzzling because Wuhan, where the pandemic started, is equipped with world-class virology labs. A recent finding by Jesse Bloom, a virologist from Fred Hutch, suggests that we are likely being deliberately misled. By checking the internet caches, he recovered an entire set of early measurements deleted from the NCBI SRA database in March 2020, possibly based on an order from the Chinese government. Incorporating these early measurements point to a progenitor of SARS-Cov2 different from the commonly accepted one.

More Articles ›