BVBbc*BVBdd->CBVBda->DBVBc_->E
Defining a Language
A programming language is a formal system for specifying rule-governed behavior. You may be familiar with popular programming languages such as Python, C, or JavaScript. Here, I introduce a new programming language called AND.
AND is not intended to be a practical programming language for general purpose computing. It is neither easy to read, nor easy to write, and limited in what it can express.
The goal of this language is two-fold. First, to provide a programming language whose structure captures some key aspects of gene regulation; both the way it works, and how it changes. Second, to explore what such a language is capable of achieving by embedding it in simple models of both single- and multi-cellular behaviour.
The language is primarily a thinking tool—a means to reason about role that DNA and its evolution can play in the bottom-up control of multicellular development.
Programs
Here is an example AND program.
This is written in the simplest compact format. Adding spaces and syntax coloring to the renders the structure more clearly (we will learn what the colors indicate below).
BVBbc*BVBdd->C BVBda->D BVBc_->E
Now we see three distinct expressions. Each expression encodes an update rule that modifies the program state over time. The program state is analogous to working memory in a traditional program. In AND, the program state consists of a set of binary registers (containing 0 or 1) named from A to Z. Each update rule reads from, and writes to, these registers.
Understanding Expressions
We will come back to the program state. For now, let’s focus on the structure of the expressions (see Table 1).
| Name | Example(s) |
|---|---|
| Expression | BNBab*XZXcd->E |
| Activation Condition | BVBab*XZX_d |
| Binding Term | BVB_b |
| Binding Operator | BNB, UNU, XZX |
| Composition Operator | *, + |
| Target Register (read) | a, b, _ |
| Source Register (write) | A, B |
Each expression consists of an activation condition and a target register separated by an arrow: ->. The activation condition has one or more binding terms, separated by a composition operator. A binding term starts with a three-letter binding operator, followed by two source registers.
Both upper and lower-case letters address the same registers. When we read from register, we use a lower-case letter, and when we write to registers, we use an upper-case letter. There is also special, read-only, null register, _, that always contains the value 0.
So how do we evaluate an expression? Both the composition and binding operator encode Boolean operations. There are two composition operators: * for logical AND, or + for logical OR. The binding operators capture all sixteen possible binary logical operations on two inputs (shown in Table 2).
| Op | Common name | Compact DNF |
|---|---|---|
| XZX | FALSE | ∅ |
| BNB | AND | αβ |
| BNU | α AND NOT β | α¬β |
| BNX | α | α |
| UNB | β AND NOT α | ¬αβ |
| XNB | β | β |
| QZQ | XOR | α¬β + ¬αβ |
| BVB | OR | α + β |
| UNU | NOR | ¬(α + β) |
| QNQ | EQ | αβ + ¬α¬β |
| XNU | NOT β | ¬β |
| BVU | α OR NOT β | α + ¬β |
| UNX | NOT α | ¬α |
| UVB | NOT α OR β | ¬α + β |
| BZB | NAND | ¬(αβ) |
| XNX | TRUE | ∞ |
BNBαβ. Both the command name, and the compact DNF form is shown (see below).
To see how theses operators work together, consider the first expression in the program above: BVBbc*BVBdd->C. We can translate this AND expression into something more familiar by replacing the syntax with logical operators.
\[ \begin{array}[cccc] \enspace\texttt{BVBbc} & * & \texttt{BVBdd} & \texttt{->B} \\ \downarrow & \downarrow & \downarrow & \downarrow \\ (\mathtt{b} \lor \mathtt{c}) & \land & (\mathtt{d} \lor \mathtt{d}) & \Rightarrow \mathtt{C} \\ \end{array} \]
These expressions can became long and complex when dealing with larger AND programs. Two further steps make the resulting translation shorter and easier to read. First, we convert the equation to Disjunctive Normal Form (DNF). This gathers any common factors together, into a series of ORed terms. Then, we write it in a compact syntax, dropping the logical operators, using juxtaposition to indicate AND, and + to indicate OR. I call this compact DNF.
In the case above:
\[ (\mathtt{b} \lor \mathtt{c}) \land \mathtt{d} \Rightarrow \mathtt{B} \] becomes: \[ (\mathtt{b}\mathtt{d}) + (\mathtt{c}\mathtt{d}) \Rightarrow \mathtt{C} \]
Translating the other two expressions from the program, results in:
\[ \begin{align} (\mathtt{b} \mathtt{d}) + (\mathtt{c}\mathtt{d}) &\Rightarrow \mathtt{C}\\ \mathtt{a} + \mathtt{d} &\Rightarrow \mathtt{D}\\ \mathtt{c} &\Rightarrow \mathtt{E} \end{align} \]
For the purpose of updating the program state, these three expressions are equivalent to the original AND program.
You may be thinking: Why all this palaver just to represent logical expressions? Why not just write the program using compact DNF? The syntax of AND is designed to resemble gene regulation, both in how it works, and how it changes over time. We will see later how the somewhat arcane syntax of AND contributes to these goals.
Running a Program
We can now look at how an AND program runs. The program above consists of the three expressions shown above. To run these as update rules, we do the following for each time step:
- Read current values from registers indicated in lower-case.
- Evaluate each expression using these values.
- Then, for each expression, write the result (either 0 or 1) to the target register.
- Move to the next time step.
Importantly, the target registers are written only after all expressions have been evaluated. This ensures that each expression uses the same values when it is evaluated at particular time step.
We can see this in action by running the program above. Here is one outcome, given a particular input:
| Running a Program | ||||
| Input: A then B | ||||
| t | Input | Mem. | Output | Expressions |
|---|---|---|---|---|
| 0 | AB | CD | E | (bd)+(cd) ↦ C a+d ↦ D c ↦ E |
| 1 | AB | CD | E | (bd)+(cd) ↦ C a+d ↦ D c ↦ E |
| 2 | AB | CD | E | (bd)+(cd) ↦ C a+d ↦ D c ↦ E |
| 3 | AB | CD | E | (bd)+(cd) ↦ C a+d ↦ D c ↦ E |
| 4 | AB | CD | E | (bd)+(cd) ↦ C a+d ↦ D c ↦ E |
| 5 | AB | CD | E | (bd)+(cd) ↦ C a+d ↦ D c ↦ E |
The program has run for 6 time steps (from 0 to 5) The three columns on the left show the status of the registers. When a register is active, the register name is colored, otherwise it is shown in gray.
The column on the rights shows the expressions we translated above. If the expression is highlighted, then it evaluates to 1, and the target register of the expression will be set in the next time-step. For example, in time-step 0, we see A is present. This activates the second expression (highlighted), and we see the D register is set in the next time step (time-step 1).
Above, you can see columns containing the registers are grouped into three types: input, memory, and output, each with a different colour.
- Input registers can be read by the expressions, but not written to.
- Memory registers are those that the program can both read from and write to.
- Output registers are those that the program can write to but not read from.
Each expression writes to either a memory register or output register. Input registers provide a way to interact with an AND program. Input can be supplied at any time step. Above, we set register A at time-step 0, and then set register B at time-step 2. If we do the reverse, the program behaves differently:
| Running a Program | ||||
| Input: B then A | ||||
| t | Input | Mem. | Output | Expressions |
|---|---|---|---|---|
| 0 | AB | CD | E | (bd)+(cd) ↦ C a+d ↦ D c ↦ E |
| 1 | AB | CD | E | (bd)+(cd) ↦ C a+d ↦ D c ↦ E |
| 2 | AB | CD | E | (bd)+(cd) ↦ C a+d ↦ D c ↦ E |
| 3 | AB | CD | E | (bd)+(cd) ↦ C a+d ↦ D c ↦ E |
| 4 | AB | CD | E | (bd)+(cd) ↦ C a+d ↦ D c ↦ E |
| 5 | AB | CD | E | (bd)+(cd) ↦ C a+d ↦ D c ↦ E |
Some kind of conclusion
TODO: complete this
As with most programming languages, different inputs change how the program behaves. When it receives A followed by B, the output register E remains on. If the input order is reversed, however, E remains off.
The idea that a program can compute a boolean function is not surprising. One thing the program can do is compute more complex functions by combining expressions. This kind of compositionality is a key feature of programming languages. But what is interesting is the temporal aspect of the computation. People have made a good deal about. Feed-forward networks. The program state changes over time, and the timing of inputs matters.
The simple AND program above is sensitive to the ordering of its input.