StripedHyena in Evo and Evo2

In the first two posts of this series (here and here), we covered the AI-related mathematical concepts applied to evo and evo2. Before moving on to the biological side, here is one last post on the model.

Yesterday’s post mentioned that the Stanford group came up with an efficient algorithm to build an AI system to store information from long collection of texts. Such systems can be used to guess the next word or token as your familiar GPT-based tools can do.

Before proceeding, I want to add that the mathematical work of this group was motivated by the 2019 PhD thesis Dynamical Systems in Spiking Neuromorphic Hardware by Aaron Voelker from University of Waterloo. Those interested in getting deep into math should check out “Legendre Memory Units: Continuous-Time Representation in Recurrent Neural Networks” in addition to the papers mentioned yesterday.

Today I will talk about a bit of engineering done to make the model scalable for large volumes of DNA sequences. The basic unit of their architecture is described in Hyena Hierarchy: Towards Larger Convolutional Language Models and it is made of “a long convolution and element-wise multiplicative gating”. Using it, the researchers first modeled the human genome and reported in HyenaDNA: Long-Range Genomic Sequence Modeling at Single Nucleotide Resolution. Where does the animal name hyena come from? I suppose it is just an upgrade from HiPPo from their previous papers, because this fortunate animal also has its name starting with “H”.

The technical term you will come across in their later papers is “StripedHyena”. StripedHyena is a combination (hybrid) of hyena modules (their creation) and attention modules (GPT style) together, and therefore they call it a “hybrid architecture”. The specific form of attention module they use is “multi-head attention equipped with rotary position embeddings (RoPE)”, and it came from a 2021-2024 paper from a group in Shenzen, China. The paper took a while to get published, but thanks to arxiv, you have an early access. Evo paper used StripedHyena to model a large collection of bacterial and archaeal genomes.

The last bit of engineering needed to handle evo2 came in StripedHyena2. It is described in Systems and Algorithms for Convolutional Multi-Hybrid Language Models at Scale. What is the technical problem they are solving? It is described in this paragraph.

Despite being a promising alternative, hybrids based on operators such as linear attention or state-space models have struggled to replace baseline Transformers as the de facto standard for language modeling due to a variety of reasons. One limitation is that these fixed-state operators realize efficiency gains only when applied to very long sequences, which is also where they drastically underperform full self-attention (Arora et al., 2023; Jelassi et al., 2024). Compared to Transformers, these methods are generally slower at the usual pretraining training regime: shorter contexts with larger and wider models. Furthermore, most of these approaches have been developed with the explicit goal of matching self-attention performance on in-context recall over longer sequences, but have only been successfully deployed in hybrids due to quality gaps. This has introduced redundancy in architectures, as multiple operators are optimized for the same capability: in-context recall.

Here is what they do. They combined three different kinds of Hyenas, namely Hyena-LI, Hyena-SE and Hyena-MR. The paper by Ku et al mentioned above presents all details, but briefly -

StripedHyena 2 is based on three different types of input-dependent convolutional operators: (i.) short, explicitly-parametrized hyena operators that maximize hardware utilization, specializing in local multi-token recall, (ii.) mediumlength, regularized hyena operators tailored to efficient modeling across hundreds of tokens, and (iii.) long, implicit hyena operators that aggregate information over the entire sequence.

Well, that is more math and technology than you ever asked for. Let us explore the biological side next week.

‹»Evo and Evo2 - Math and Algorithm« »Biological Aspects of Evo and Evo2 - Semantic Mining«›