# Category: geometry

“A subway is just a hole in the ground, and that hole is a maze.”

“The map is the last vestige of the old system. If you can’t read the map, you can’t use the subway.”

Eddie Jabbour in Can he get there from here? (NYT)

Sometimes, lines between adjacent stations can be uni-directional (as in the Paris Metro map below in the right upper corner, 7bis). So, it is best to view a subway map as a directed graph, with vertices the different stations, and directed arrows when there’s a service connecting two adjacent stations.

Aha! But, directed graphs form a presheaf topos. So, each and every every subway in the world comes with its own logic, its own bi-Heyting algebra!

Come again…?

Let’s say Wally (or Waldo, or Charlie) is somewhere in the Paris metro, and we want to find him. One can make statements like:

$P$ = “Wally is on line 3bis from Gambetta to Porte des Lilas.”, or

$Q$ = “Wally is traveling along line 11.”

Each sentence pinpoints Wally’s location to some directed subgraph of the full Paris metro digraph, let’s call this subgraph the ‘scope’ of the sentence.

We can connect such sentences with logical connectives $\vee$ or $\wedge$ and the scope will then be the union or intersection of the respective scopes.

The scope of $P \vee Q$ is the directed subgraph of line 11 (in both directions) together with the directed subgraph of line 3bis from Gambetta to Porte des Lilas.

The scope of $P \wedge Q$ is just the vertex corresponding to Porte des Lilas.

The scope of the negation $\neg R$ of a sentence $R$ is the subgraph complement of the scope of $R$, so it is the full metro graph minus all vertices and directed edges in $R$-scope, together with all directed edges starting or ending in one of the deleted vertices.

For example, the scope of $\neg P$ does not contain directed edges along 3bis in the reverse direction, nor the edges connecting Gambetta to Pere Lachaise, and so on.

In the Paris metro logic the law of double negation does not hold.

$\neg \neg P \not= P$ as both statements have different scopes. For example, the reverse direction of line 3bis is part of the scope of $\neg \neg P$, but not of $P$.

So, although the scope of $P \wedge \neg P$ is empty, that of $P \vee \neg P$ is not the full digraph.

The logical operations $\vee$, $\wedge$ and $\neg$ do not turn the partially ordered set of all directed subgraphs of the Paris metro into a Boolean algebra structure, but rather a Heyting algebra.

Perhaps we were too drastic in removing all “problematic edges” from the scope of $\neg R$ (those with a source or target station belonging to the scope of $R$)?

We might have kept all problematic edges, and added the missing source and/or target stations to get the scope of another negation of $R$: $\sim R$.

Whereas the scope of $\neg \neg R$ always contains that of $R$, the scope of $\sim \sim R$ is contained in $R$’s scope.

The scope of $R \vee \sim R$ will indeed be the whole graph, but now $R \wedge \sim R$ does no longer have to be empty. For example, $P \wedge \sim P$ has as its scope all stations on line 3bis.

In general $R \wedge \sim R$ will be called the ‘boundary’ $\partial(R)$ of $R$. It consists of all stations within $R$’s scope that are connected to the outside of $R$’s scope.

The logical operations $\vee$, $\wedge$, $\neg$ and $\sim$ make the partially ordered set of all directed subgraphs of the Paris metro into a bi-Heyting algebra.

There’s plenty more to say about all of this (and I may come back to it later). For the impatient, there’s the paper by Reyes and Zolfaghari: Bi-Heyting Algebras, Toposes and Modalities.

Right now, I’m more into exploring whether this setting can be used to revive an old project of mine: Heyting Smullyanesque problems (btw. the algebra in that post is not Heyting, oops!).

A Conway musical sequence is an infinite word in $L$ and $S$, containing no two consecutive $S$’s nor three consecutive $L$’s, such that all its inflations remain musical sequences.

We’ve seen that such musical sequences encode an aperiodic tiling of the line in short ($S$) and long ($L$) intervals, and that such tilings are all finite locally isomorphic.

But, apart from the middle $C$-sequences (the one-dimensional cartwheel tilings) we gave no examples of such tilings (or musical sequences). Let’s remedy this!

Take any real number $c$ as long as it is not an integral combination of $1$ and $\tfrac{1}{\tau}$ (with $\tau$ the golden ratio) and assign to any integer $a \in \mathbb{Z}$ a tile
$P_c(a) = \begin{cases} S \\ L \end{cases} ~\text{iff}~\lceil c+(a+1)\frac{1}{\tau} \rceil – \lceil c+a \frac{1}{\tau} \rceil = \begin{cases} 0 \\ 1 \end{cases}$
(instead of ceilings we might have taken floors, because of the restriction on $c$).

With a little bit of work we see that the deflated word determined by $P_c$ is again of this type, more precisely $def(P_c) = P_{-(c-\lfloor c \rfloor)\frac{1}{\tau}}$. But then it also follows that inflated words are of this type, meaning that all $P_c$ define a musical sequence.

Let’s just check that these sequences satisfy the gluing restrictions. If there is no integer between $c+a\tfrac{1}{\tau}$ and $c+(a+1)\tfrac{1}{\tau}$, because $2 \tfrac{1}{\tau} \approx 1.236$ there must be an interval in the preceding and the following $\tfrac{1}{\tau}$-interval, showing that an $S$ in the sequence has an $L$ on its left and right, so there are no two consecutive $S$’s in the sequences.

Similarly, if two consecutive $\tfrac{1}{\tau}$-intervals have an integer in them, the next interval cannot contain an integer as $3 \tfrac{1}{\tau} \approx 1.854 < 2$.

Now we come to the essential point: these sequences can be obtained by the cut-and-project method.

Take the line $L$ through the origin with slope $\tfrac{1}{\tau}$ and $L^{\perp}$ the line perpendicular it.

Consider the unit square $H$ and $H_{\vec{\gamma}}=H + \vec{\gamma}$ its translation under a shift vector $\vec{\gamma}=(\gamma_x,\gamma_y)$ and let $\pi$ (or $\pi^{\perp}$) be the orthogonal projection of the plane onto $L$ (or onto $L^{\perp}$). One quickly computes that
$\pi(a,b) = (\frac{\tau^2 a + \tau b}{1+\tau^2},\frac{\tau a + b}{1+\tau^2}) \quad \text{and} \quad \pi^{\perp}(a,b) = (\frac{a-\tau b}{1+\tau^2},\frac{\tau^2b-\tau a}{1+\tau^2})$
In the picture, we take $\vec{\gamma}=(c,-\tau c)$.

The window $W$ will be the strip, parallel with $L$ with basis $\pi^{\perp}(H_{\vec{\gamma}})$.

We cut the standard lattice $\mathbb{Z}^2$, of all points with integer coordinates in the plane, by retricting to the window $\mathcal{P}=\mathbb{Z}^2 \cap W$.

Next, we project $\mathcal{P}$ onto the line $L$, and we get a set of endpoints of intervals which divide the line $L$ into short intervals of length $\tfrac{1}{\sqrt{1+\tau^2}}$ and long intervals of length $\tfrac{\tau}{\sqrt{1+\tau^2}}$.

For $(a,b) \in W$, the interval will be short if $(a,b+1) \in W$ and long if $(a+1,b) \in W$.

Because these intervals differ by a factor $\tau$ in length, we get a tiling of the line by short intervals $S$ and long intervals $L$. It is easy to see that they satisfy the gluing restrictions (remember, no two consecutive short intervals and no three consecutive long intervals): the horizontal width of the window $W$ is $1+\tau \approx 2.618$ (so there cannot be three consecutive long intervals in the projection) and the vertical width of the window $W$ is $1+\tfrac{1}{\tau} = \tau \approx 1.618$ so there cannot be two consecutive short intervals in the projection.

The sequence obtained from projecting $\mathcal{P}$ is equal to the sequence $P_{(1+\tau^2)c}$. So, we get all musical sequences of this form from the cut-and-project method!

On $L^{\perp}$ the two end-points of the window are
$\begin{cases} \pi^{\perp}(c+1,-\tau c) = (\frac{(1+\tau^2)c+1}{1+\tau^2},- \tau \frac{(1+\tau^2)c+1}{1+\tau^2}) \\ \pi^{\perp}(c,-\tau c+1) = (\frac{(1+\tau^2)c-\tau}{1+\tau^2},-\tau \frac{(1+\tau^2)c-\tau}{1+\tau^2}) \end{cases}$
Therefore, a point $(a,b) \in \mathbb{Z}^2$ lies in the window $W$ if and only if
$(1+\tau^2)c-\tau < a-\tau b < (1+\tau^2)c+1$ or equivalently, if $(1+\tau^2)c+(b-1)\tau < a < (1+\tau^2)c+b \tau + 1$ Observe that $\lceil (1+\tau^2)c + b\tau \rceil - \lceil (1+\tau^2)c+(b-1)\tau \rceil = P_{(1+\tau^2)c}(b-1) + 1 \in \{ 1,2 \}$ We separate the two cases: (1) : If $\lceil (1+\tau^2)c + (b+1)\tau \rceil - \lceil (1+\tau^2)c+b \tau \rceil =1$, then there must be an integer $a$ such that $(1+\tau^2)c +(b+1) \tau -1 < a < (1+\tau^2) b+1$, and this forces $\lceil (1+\tau^2)c + (b+2)\tau \rceil - \lceil (1+\tau^2)c+(b+1)\tau \rceil =2$. With $b_i = (1+\tau^2)c+(b+i)\tau$ and $d_i = b_i+1$ we have the situation

and from the inequalities above this implies that both $(a+1,b+1)$ and $(a+1,b+2)$ are in $W$, giving a short interval $S$ in the projection.

(2) : If $\lceil (1+\tau^2)c + (b+1)\tau \rceil – \lceil (1+\tau^2)c+b \tau \rceil =1$, then there must be an integer $a$ such that $(1+\tau^2)c+b \tau < a < (1+\tau^2)cv + (b+1)\tau -1$, giving the situation

giving from the inequalities that both $(a+1,b+1)$ and $(a+2,b+1)$ are in $W$, giving a long interval $L$ in the projection, finishing the proof.

Before we’ll come to applications of quasicrystals to viruses it is perhaps useful to illustrate essential topics such as deflation, inflation, aperiodicity, local isomorphism and the cut-and project method in the simplest of cases, that of $1$-dimensional tilings.

We want to tile the line $\mathbb{R}^1$ with two kinds of tiles, short ($S$) and ($L$) long intervals, differing by a golden ratio factor $\tau=\tfrac{1}{2}(1+\sqrt{5}) \approx 1.618$.

Clearly, no two tiles may overlap and we impose a gluing restriction: there can be no two consecutive $S$-intervals in the tiling, and no three consecutive $L$-intervals.

The code of a tiling is a doubly infinite word in $S$ and $L$ such that there are no two consecutive $S$’s nor three consecutive $L$’s. For example
$\sigma = \dots LSLLSLS\underline{L}LSLLSL \dots$
We underline one tile to distinguish the sequence from shifts of it.

Conway’s musical sequences will be special codes (or tilings), allowing for the inverse operations of inflation and deflation, terms coined by John Conway in relation to Penrose tilings. The musical sequences are important to understand Conway’s worms (sometimes called “wormholes”) which are strings of Long and Short bow ties in a Penrose tiling, and to measure the distances between Amman bars. In fact, many of the properties of Penrose tilings and $3$-dimensional quasicrystals (for example, local isomorphism) have their counterparts for tilings having a musical sequence as code.

Conway’s investigations of Penrose tiles held up work on the ATLAS-project and caused some problems at home:

“In pursuing his investigations, he unsurped some of his wife Eileen’s territory, covering the dining table with an infinite nuisance of tiles. He cut them out himself, causing his right hand to hurt with cramps for days. To Eileen’s dismay, he studied the dining table mosaic for a year, relegating family meals to the kitchen and prohibiting dinner parties.”

From “Genius at Play – The curious mind of John Horton Conway” by Siobhan Roberts

Let’s investigate inflation and deflation of these tilings.

The point of the golden factor $\tau$ is to allow for deflation. That is, we can replace a tiling by another one with tiles $S$ and $L$ both a factor $\tfrac{1}{\tau}=\tau-1 \approx 0.618$ smaller than the original tiles. If the original $S$-tile has length $a$ (and the $L$-tile length $\tau a$), then the new tile $S$ wil have length $\tfrac{1}{\tau}a$ and the new $L$-tile length $a$.
We do this by replacing each old $S$-tile by a new $L$-tile, and to break up any old $L$-tile in a new $L$ and new $S$-tile, as $\tau a = a + \tfrac{1}{\tau}a$ (note that $\tau^2=\tau+1$)

To get the code of the deflated tiling we replace each letter $S$ by a letter $L$ and each $L$ by $LS$. The underlined letter will be the first letter of the deflated underlined letter in the original sequence. The deflated sequence of the one above is
$def(\sigma) = \dots LSLLSLSLLSL\underline{L}SLSLLSLSLLS \dots$
and it is easy to see that the deflated tiling satisfies again the gluing condition.

Certain of these tilings (not all!) allow for an inverse to deflation, called inflation, increasing the size of the tiles by a factor $\tau$.

Starting from a tiling we divide each $L$-tile in half and these mid-points will be end-points of the tiles in the new tiling, erasing all endpoints of the original one. The inflated tiling will have two sorts of tiles, a new short one $S$ of length $\tau a$ obtained from the end-half of an original $L$-tile, followed b the start-half of an original $L$-tile, and a new long tile $L$ of length $\tau^2 a = (\tau+1) a$, made of the end-half of an original $L$, followed by an original $S$, followed by the start-half of an original $L$.

We get the code of the inflated tiling by replacing first each $L$ by $ll$ and subsequently replace each word $lSl$ by a letter $L$ and each $ll$ by $S$. An example,
$\sigma = \dots LSLLSLSLLSL\underline{L}SLSLLSLSLLS \dots \\ \dots llSllllSllSllllSlll\underline{l}SllSllllSllSllllS \dots \\ inf(\sigma) = \dots (l)LSLLSLS\underline{L}LSLLS(lS) \dots$

But, the inflated tiling may no longer satisfy the gluing condition. An example
$\dots LSLSLSL \dots \mapsto \dots llSllSllSll \dots \mapsto \dots (l)LLL(l) \dots$

A Conway musical sequence is the code of a tiling $\sigma$ such that all its consecutive inflations $inf^n(\sigma)$ satisfy the gluing condition. For the corresponding Conway tiling $\sigma$ we have that
$def(inf(\sigma))=\sigma=inf(def(\sigma))$

Let’s construct at least two such Conway tilings (later we’ll see that there are uncountably many). Take $C_n=def^n(LS\underline{L})$ and write it in a special form to highlight symmetries.
\begin{eqnarray*}
C_0 =& (L.S)\underline{L} \\
C_1 =& L(S.L)\underline{L}S \\
C_2 =& LSL(L.S)\underline{L}SL \\
C_3 =& LSLLSL(S.L)\underline{L}SLLS \\
C_4 =& LSLLSLSLLSL(L.S)\underline{L}SLLSLSL
\end{eqnarray*}

The even terms have middle-part $(L.S)$ and the odd ones $(S.L)$. The remaing left and right parts are each others reflexion (or part of it). This is easily seen by induction as are the inclusions
$C_0 \subset C_2 \subset C_4 \subset \dots \subset C_{even} \quad \text{and} \quad C_1 \subset C_3 \subset C_5 \subset \dots \subset C_{odd}$

$C_{even}$ and $C_{odd}$ are special Conway musical sequences, called the middle $C$-sequences, and are each others inflation and deflation. If you are familiar with Penrose tilings, these are the $1$-dimensional counterparts of the cartwheel Penrose tiling (here with the $10$ Conway worms emanating from the center, and with the borders of the first few cartwheels drawn).

A direct consequence of inflation on Conway’s musical sequences is that the corresponding tiling is aperiodic, that is, it has no translation symmetry.

For, inflation only depends on the local configuration of tiles, so if translation by $R$ is a symmetry of a musical sequence $\sigma$ then it is also a symmetry of $inf(\sigma)$, and so also of $inf^n(\sigma)$. But for large $n$ we will have that $R < \tau^n a$ (with $a$ the size of the tiles in $\sigma$). But then a tile in $inf^n(\sigma)$ and its translation by $R$ must overlap which is impossible if $+R$ is a translation symmetry of $inf^n(\sigma)$. Done!

Returning to the middle C-sequences, what was the point of starting with $C_0 = LSL$? Well, it follows directly from the gluing restrictions that any letter in a musical sequence is part of a subword $LSL$ of $\sigma$. But then, every finite subword $W$ of $\sigma$ is also a subword of $C_{2n}$ for some large $n$.

For, let $d$ be the length of the interval corresponding to $W$ and choose $n$ such that $d > \tau^{2n} a$ then the interval of the line corresponding to $W$ is contained in a single tile in $inf^{2n}(\sigma)$ and this tile belongs to a subword $LSL$ of $inf^{2n}(\sigma)$. But then $W$ will be a subword of the $2n$-th deflation of that interval $LSL \subset inf^{2n}(\sigma)$, which is $C_{2n}$.

Or, as Conway would phrase it with respect to Penrose tilings (quote again from Siobhan Roberts’ book)

“Every points is in the cartwheel somewhere. If you jab your finger anywhere, on any point anywhere on teh pattern, you are part of a cartwheel. The whole ting is overlapping cartwheels.”

An immediate consequence is the local isomorphism theorem: Every subword of a musical sequence $\sigma$ appears infinitely many times as subword of any other musical sequence. That is, one cannot distinguish two tilings of the line with musical sequence codes from each other by looking at finite intervals!

The argument is similar to the one above. The finite interval corresponding to the subword lies in a unique tile of $inf^n(\sigma)$ for $n$ large enough. Now, take another musical sequence $\mu$ and consider any of the infinitely many tiles of the same type in $inf^n(\mu)$, then $def^n$ of such a tile will contain the subword in $\phi$.

Another time, we’ll see that musical sequences can be produced by the ‘cut-and-project’-method (what I called the ‘windows’-method before).
This time we will project parts of the standard $2$-dimensional lattice $\mathbb{Z}^2$ onto the line, which is a lot easier to visualise than de Bruijn’s projection from $\mathbb{R}^5$ to produce Penrose tilings or the projection from six dimensional space to harvest quasicrystals.

If you look around for mathematical theories of the structure of viruses, you quickly end up with the work of Raidun Twarock and her group at the University of York.

We’ve seen her proposal to extend the Caspar-Klug classification of viruses. Her novel idea to distribute proteins on the viral capsid along Penrose-like tilings shouldn’t be taken too literally. The inherent aperiodic nature of Penrose tiles doesn’t go together well with perfect tilings of the sphere.

Instead, the observation that these capsid tilings resemble somewhat Penrose tilings is a side-effect of another great idea of the York group. Recently, they borrowed techniques from the theory of quasicrystals to gain insight in the inner structure of viruses, in particular on the interaction of the capsid with the genome.

By the crystallographic restriction theorem no $3$-dimensional lattice can have icosahedral symmetry. But, we can construct aperiodic structures (quasicrystals) which have local icosahedral structure, much like Penrose tilings have local $D_5$-symmetry

This is best explained by de Bruijn‘s theory of pentagrids (more on that another time). Here I’ll just mention the representation-theoretic idea.

The isometry group of the standard $5$-dimensional lattice $\mathbb{Z}^5$ is the group of all signed permutation $5 \times 5$ matrices $B_5$ (Young’s hyperoctahedral group). There are two distinct conjugacy classes of subgroups in $B_5$ isomorphic to $D_5$, one such subgroup generated by the permutation matrices
$x= \begin{bmatrix} 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 0 & 1 \\ 1 & 0 & 0 & 0 & 0 \end{bmatrix} \qquad \text{and} \qquad y = \begin{bmatrix} 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 \\ 0 & 0 & 0 & 1 & 0 \\ 0 & 0 & 1 & 0 & 0 \\ 0 & 1 & 0 & 0 & 0 \end{bmatrix}$
The traces of $x,x^2$ and $y$, together with the character table of $D_5$ tell us that this $5$-dimensional $D_5$-representation splits as the direct sum of the trivial representation and of the two irreducible $2$-dimensional representations.
$\mathbb{R}^5 = A \simeq T \oplus W_1 \oplus W_2$
with $T = \mathbb{R} d$, $W_1 = \mathbb{R} u_1 + \mathbb{R} u_2$ and $W_2 = \mathbb{R} w_1 + \mathbb{R} w_2$ where
$\begin{cases} (1,1,1,1,1)=d \\ (1,c_1,c_2,c_3,c_4)= u_1 \\ (0,s_1,s_2,s_3,s_4) = u_2 \\ (1,c_2,c_4,c1,c3)= w_1 \\ (0,s_2,s_4,s_1,s_3)= w_2 \end{cases}$
and $c_j=cos(2\pi j/5)$ and $s_j=sin(2 \pi/5)$. We have a $D_5$-projection
$\pi : A \rightarrow W_1 \quad (y_0,\dots,y_4) \mapsto \sum_{i=0}^4 y_i(c_i u_1+s_i u_2)$
The projection maps the vertices of the $5$-dimensional hypercube to a planar configuration with $D_5$-symmetry.

de Bruijn’s results say that if we take suitable ‘windows’ of lattice-points in $\mathbb{Z}^5$ and project them via the $D_5$-equivariant map $\pi$ onto the plane, then the images of these lattice points become the vertices of a rhombic Penrose tiling (and we get all such tilings by choosing our window carefully).

This explains why Penrose tilings have a local $D_5$-symmetry. I’ll try to come back to de Bruijn’s papers in future posts.

But, let’s go back to viruses and the work of Twarock’s group using methods from quasicrystals. Such aperiodic structures with a local icosahedral symmetry can be constructed along similar lines. This time one starts with the standard $6$-dimensional lattice $\mathbb{Z^6}$ with isometry group $B_6$ (signed $6 \times 6$ permutation matrices).

This group has three conjugacy classes of subgroups isomorphic to $A_5$, but for only one of them this $6$-dimensional representation decomposes as the direct sum of the two irreducible $3$-dimensional representations of $A_5$ (the decompositions in the two other cases contain an irreducible of dimension $4$ or $5$ together with trivial factor(s)). A representant of the crystallographic relevant case is given by the signed permutation matrices
$x= \begin{bmatrix} 0 & 1 & 0 & 0 & 0 & 0 \\ 1 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & -1 & 0 & 0 \\ 0 & 0 & -1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & -1 & 0 \\ 0 & 0 & 0 & 0 & 0 & -1 \end{bmatrix} \qquad \text{and} \qquad y= \begin{bmatrix} 0 & 0 & 1 & 0 & 0 & 0 \\ 1 & 0 & 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & -1 & 0 \\ 0 & 0 & 0 & 0 & 0 & 1 \\ 0 & 0 & 0 & -1 & 0 & 0 \end{bmatrix}$

Again, using suitable windows of $\mathbb{Z}^6$-lattice points and using the $A_5$-equivariant projection to one of the two $3$-dimensional components, one obtains quasicrystals with local $A_5$-symmetry.

In this $3$-dimensional case the replacements of the thick and thin rhombi are these four parallellepipeda, known as the Amman blocks

which must be stacked together obeying the gluing condition that dots of the same colour must be adjacent.

Has anyone looked at a possible connection between the four Amman blocks (which come in pairs) and the four (paired) nucleotides in DNA? Just an idle thought…

These blocks grow into quasicrystals with local icosahedral symmetry.

The faces on the boundary of such a sphere-like quasicrystal then look a lot like a Penrose tiling.

How can we connect these group and representation-theoretic ideas to the structure of viruses? Here’s another thought-provoking proposal coming from the York group.

Take the $A_5$ subgroup of the hyperoctahedral group in six dimensiona $B_6$ generated by the above two matrices (giving a good $A_5$-equivariant projection $\pi$ to three dimensional space) and consider an intermediate group
$A_5 \subsetneq G \subseteq B_6$
Take a point in $\mathbb{R}^6$ and look at its orbit under the isometries of $G$, then all these points have the same distance from the origin in $\mathbb{R}^6$. Now, project this orbit under $\pi$ to get a collection of points in $\mathbb{R}^3$.

As $\pi$ is only $A_5$-equivariant (and not $G$-equivariant) the image points may lie in different shells from the origin. We can try to relate these shells of points to observational data on the inner structures of viruses.

Here’s a pretty convincing instance of such a correlation, taken from the thesis by Emilio Zappa “New group theoretical methods for applications in virology and quasicrystals”.

This is the inner structure of the Hepatitis B virus, showing the envelope (purple), capsid protein (cream) and genome (light blue). The coloured dots are the image points in the different shells around the origin.

Do viruses invade us from the sixth dimension??

As you may have guessed from the symmetries of Covid-19 post, I did spend some time lately catching up with the literature on the geometric structure and symmetries of viruses. It may be fun to run a little series on this.

A virus is a parasite, so it cannot reproduce on its own and needs to invade a host cell to replicate. All information needed for this replication process is stored in a fragile DNA or RNA string, the viral genome.

This genome needs to be protected by a coating made of proteins, the viral capsid. Most viruses have an additional fatty protection layer, the envelope, decorated by virus (glyco)proteins (such as the ‘spikes’ needed to infiltrate the host cell).

Most viruses are extremely small (between 20 and 200nm), our friend the corona-virus measures between 80 and 120nm. So, its genome is also pretty small (the corona genome has around 30.000 base pairs). To maximise its information, the volume of the protective capsid must be as large as possible, and must be formed by just a few different proteins (to free as much space in the code of the genome for other operations) and clusters of them are distributed over the polyhedral capsid, as symmetric as possible.

This insight led Watson and Crick, the discoverers of the structure of DNA, to the ‘genetic economy’-proposal that most sphere-like viruses will have an icosahedral capsid because the icosahedron is the Platonic solid with the largest volume and rotational symmetry group. They argued that the capsid is most likely constructed from a single subunit (capsomere), which is repeated many times to form the protein shell.

Little is known about capsid formation, that is the process in which the capsid proteins self-assemble into an icosahedral shape, nor about the precise interplay between the genome and the capsid proteins. If we would understand these two things better it might open new possibilities for anti-viral drugs, by either blocking the self-assembly process or by breaking the genome-capsid interaction.

A first proposal for the capsid structure was put forward by Caspar and Klug. Their quasi-equivalence principle asserts that each of the 20 triangular faces of the icosahedron is subdivided in 3 subunits, each consisting of at least one protein.

Most viruses have much more than 60 proteins in their capsid, so Caspar and Klug introduced their $T$-number giving the number of proteins per subunit. One superimposes the triangulation of the icosahedron with the hexagonal plane lattice, then $T$ is the number of sub-triangles of these hexagons contained in each subunit. For $T = 7$ we have the following situation

Folding back the triangulation to form the icosahedron one then obtains a tiling consisting of hexagons (the green regions) and pentagons (the blue regions)

It turned out that many viruses with icosahedral symmetry consist of subunits having a different number of proteins, such as dimers (2 proteins), trimers (3 proteins), or pentamers (5 proteins) and these self-organise around a 2, 3, or 5-fold rotational axis of the icosahedron.

This led Reidun Twarock around 2000 to propose her virus tiling theory. This is a generalisation of the Caspar-Klug theory in which one superimposese the triangulation of the icosahedron with other tilings of the plane, consisting of two or more non-congruent tiles. Here an example which looks a bit like the aperiodic Penrose tilings of the plane.

Here’s a recent Quanta-Magazine article on Twarock’s work and potential consequences: The illuminating geometry of viruses.

And here’s an LMS Popular Lecture, from 2008, by Raidun Twarock herself: “Know your enemy – viruses under the mathematical microscope”.