In this chapter we will step down to a lower level to get an idea of how computers actually process data. Most of the time we don't need to work at this low level and many people have had careers where they have not had to look at instruction sets or assembly language a single time. However, knowledge is power and knowing how the computer might represent higher-level constructs is useful in understanding optimizations and gives you the pre-requisite information in case you run into one of the few situations where you might need to step down to assembly to solve a specific problem.
However, if you really are not interested, out of all of these chapters this one could be considered skippable. This chapter is pretty information-dense, and thus a fairly heavy read. But I encourage a healthy amount of curiosity.
Small, precise instructions
We have already established that programming is about creating lists of instructions for the computer to follow. Declaring variables, looping, branching, modifying values. However, we have done so in a higher-level language, where our instructions can be quite elaborate and can easily deal with various different types of data, including pretty complex constructions like lists, strings and structured data.
At the low level, computers follow a similar pattern of lists of instructions, but the difference comes in how much, or rather, how little we can express per instruction. And the concept of types disappears almost entirely: computers deal with all data in the form of numbers.
We humans tend to deal with numbers in the decimal, or base 10, system. Computers deal with numbers in binary, or base 2. At the level of the hardware, there are only values of 1 or 0. A high value and a low value. However, even at the level we are going to be talking about today, we don't really deal with individual binary values (bits) that much. Instead, we bundle a number of bits to form actually useful numbers. 8 bits combined together is called a "byte". Values consisting of more bits than that are typically called "words".
Computers typically have a particular length of bits that they are able to process easily. This native word length is then usually referred to when we talk about that computer's "bitness". Modern Intel, AMD and ARM processors tend to be 64-bit, meaning that they are natively capable of computing values of 64 ones and zeros, giving them a range of values from 0 up to 2^64 (18 quintillion). However, often computers are able to process smaller values as well, since often the full length is not needed. It's also possible for a computer to work with representations of numbers larger than the native word length, but often times this involves a more complex series of instructions.
Computers also don't have a concept of variables. Instead the computer memory is just a series of addressable, numbered slots. You can think of it as just a plain list of numbers, which we can index into. Since the index is also a number, the native word length of the computer typically determines how much memory can be directly addressed. A memory index (also known as a pointer) of 16 bits can address 65536 memory positions, modern processors have pointer lengths closer to 64, leading us closer to that 18 quintillion words of memory. This memory is where all of the various temporary values of the program go. And typically the program's instructions are also values that are stored in the same memory. There is nothing unusual about them, the computer's instructions are represented as numbers, same as any other value we might want to represent.
So, since we are only dealing with numbers and a linear, flat array of memory, the computer's instructions operate roughly at this basic level. Individual instructions do things like move values around in memory, direct the program execution to continue at a specific point in the memory, add or subtract values, compare values and so on. Different types of computers might offer different sets of instructions. The instructions that are supplied for the computer form the instruction set architecture.
Instruction set architecture
Not all computers supply the same instructions and how instructions are encoded also varies. Different computer architectures may vary in the amount and complexity of the instructions provided to the programmer. A more elaborate set of more complex instructions might allow a program to use fewer instructions to express a particular computation. A simpler set of fewer instructions on the other hand might reduce hardware complexity, allowing computers to be made with better energy efficiency.
The somewhat blurry line here represents our first taxonomy of instruction sets. Complex instruction sets are referred to as CISC, complex instruction set computers, whereas computers with "reduced" instructions are referred to as RISC, reduced instruction set computers. This line is blurry in that interpretations vary somewhat on what "reduced" means in practice. Some argue that the number of instructions doesn't matter as much and that RISC is about each individual instruction being simple in nature, doing fewer things. One typical characteristic of RISC architecture is to separate instructions that deal with memory from those that do other types of computation, creating a "load-store" architecture where values are first loaded from memory, computations performed on them and results then stored back in memory, all as separate instructions.
The second categorization we will use refers to what it means for data to be "loaded from memory". All computers need some kind of a place where values currently operated on are temporarily stored in order to manipulate them. Technically this could be just the main memory, however for efficiency reasons computers typically have a small amount of memory that is way faster than the main memory. And it's important to remember that to CPUs accessing memory is actually really slow. Hundreds of instructions can happen in the time span of a single memory access, although techniques like multi-level caching help mitigate this somewhat.
In typical computer architectures, this area of special memory is referred to as the "register file". It is a small number of slots called registers into which values can be loaded from the memory. The number of registers can vary from as few as one to dozens. Having more allows more data to be kept in the register file and therefore more complex calculations to be done without reaching into the main memory. However, the instruction set will need to represent how to access each of the registers by encoding which registers are being worked on into the respective instructions.
An alternative approach is a stack machine, where most of the instructions operate on a small number of values organized into 2 or more stacks. A stack is a simple data structure, where new values are placed on top and the top-most values are the ones that are readable first. Values are consumed from the top of the stacks and any new values are then placed back onto the stack. Stack machines allow instructions to be significantly simpler, since instructions will always operate on the tops of the stacks instead of shuffling data in and out of registers. However, managing the stacks might require specialized hardware and require some work on the part of the programmer in order to ensure data is properly and efficiently organized on the stack for computation. Stack machine architecture is popular for virtual machines, such as those used by several programming languages for their cross-platform compatible output.
As mentioned before, in all cases the instructions are effectively just numbers and they encode information such as what the instruction is supposed to do, what registers might be involved and if there are operands such as memory addresses or immediate values that should be used. In some architectures all of this information is encoded into a fixed length, meaning that for example each instruction is 32-bits long. Others might use variable length instructions, where sometimes an instruction might be a single byte and others might be several bytes.
F7 - an example computer
While it would be possible to use a popular real-world architecture as a teaching tool, this poses some problems. Architectures like Intel/AMD X86-64, ARM64 or even RISC-V are all comparatively complex and contain dozens upon dozens of instructions. This is because they are intended to be practical and performant. But unfortunately for us it would require writing a book to comprehensively cover any one of them.
So, instead we are going to use our own instruction set architecture loosely based on DAK Engineering F4 architecture. The F4 architecture is classified as "MISC", a minimal instruction set computer. You could theoretically go even more minimal, down to a one-instruction computer, but the F4 is already quite spartan, to such an extent that I decided that I could afford to add a few convenience features on top of it, bringing us to an F7 architecture (7 instructions instead of 4, with two additional variant instructions).
The F7 instruction set uses 32-bit words and 32-bit fixed length instructions consisting of the opcode and a single 16-bit operand. It is a register machine, although it only provides 3 registers: A and
- The A register, also known as the accumulator, is used for all
computations, whereas the B register is used to temporarily store a value that might be needed next and the third register PC is used to keep track of which instruction is being executed. PC stands for "program counter".
The operand part of the instruction is used to pass immediate values or memory addresses to the instructions. For our purposes, the F7 computer is equipped with 4096 words of memory, but up to 65536 words could be used.
The F7 is capable of doing the following instructions:
- Add values to A register directly, from memory or indirectly
- Subtract values from A register directly, from memory or indirectly
- Branch to a memory address if A register is zero
- Load a value to A register directly, from memory, indirectly or from PC
- Store the value of A register to memory, indirect memory address or to PC
- Copy the value of A register to B
- Swap the values of A and B registers
Our F7 system doesn't provide any instructions to perform input or output, but if we wanted to, we could implement this by using memory-mapped I/O, where specific memory addresses allow us to write to and read from any I/O devices we would want to.
Because of the small number of instructions and variants, we can encode them with a single bit in the top 16 bits of each instruction, making decoding instructions easy.
Building our computer simulator
We're going to build a simulator for our computer using Lua. Often other languages that compile to native code are used for constructing virtual machines like this, but we are not concerned with speed and Lua provides all of the mechanisms we need to represent the various aspects of our F7 system.
Let's start by laying out a structure for all of the state our computer needs to keep track of in a structured format and creating functions that allow us to construct out computer.
function makeMemory()
local memory = {}
for i = 1, 4096 do
table.insert(memory, 0)
end
return memory
end
function makeComputer()
local computer = {
a = 0,
b = 0,
pc = 1,
zero = false,
memory = makeMemory()
}
return computer
end
As you can see, our computer has the necessary registers, and memory, which
is a list of 4096 elements, which we create with a function. a, b and pc
we have already gone through, but the zero value might be unclear. This is
what we would call a "flag register". It cannot be modified directly, but is
modified when the value of a is changed. In our instruction set when we
perform branching, we look at the zero flag to determine whether to branch
or not.
Next, we need to create the encodings of our instruction set and for that we are going to need to learn a few bit-level operations. Each of the instructions will occupy a single bit in the top 16 bits of each 32-bit instruction, so if we want to create instructions we will somehow need to be able to set those bits. One way to do it is to get the value 1, which means just setting the lowest bit of a word to 1, and then performing a bit-shift to the desired position. Shifting left once would set the second bit, shifting twice would set the third bit and so on.
In Lua we can do bit shifts with the << and >> operations, the direction
of the arrows indicates whether it's a left shift or right shift.
Let's create each of our opcode sections of instructions as constants, so we can use them to create and read instructions:
ADDI = 1
ADDM = 1 << 1
ADDPC = 1 << 2
SUBI = 1 << 3
SUBM = 1 << 4
SUBPC = 1 << 4
BZ = 1 << 5
LDAI = 1 << 6
LDAM = 1 << 7
LDAMM = 1 << 8
LDAPC = 1 << 9
STAM = 1 << 10
STAMM = 1 << 11
STAPC = 1 << 12
COPY = 1 << 13
SWAP = 1 << 14As you can see, we are using fairly short names for each instruction. These mnemonics might seem difficult at first to understand, but they follow basic rules. The start of the mnemonic stands for the type of instruction: addition, subtraction, branch, load, store, copy or swap. The ending refers to the data we are referring to. "I" stands for "immediate", meaning we are operating on an immediate value in the operand part of the instruction. "M" refers to "memory", meaning the operand is a memory address. "MM" means an indirect address, where the address of our data needs to be loaded from the memory address pointd to by our operand, so a pointer to a pointer.
Now that we have the opcodes listed out, we can very easily create valid instructions for our computer. We simply take the opcode we want and add an operand to it, taking care to first move the opcode into the top 16 bits of the instruction.
function makeInstruction(opcode, operand)
return (opcode << 16) + operand
end
With that function and our list of constants, we can create a
rudimentary "assembler" to construct programs. We simply need
to place the results of makeInstruction() into the computer's
memory one after another to create a program:
local computer = makeComputer()
computer.memory[1] = makeInstruction(ADDI, 50)
computer.memory[2] = makeInstruction(SUBI, 50)However, our computer doesn't yet understand what to do with those instructions, so we will need to implement a way to decode instructions as well. We need to just separate out the opcode and the operand and return them individually, so we can operate on them.
Here's a function that does that.
function getOpcodeAndOperand(instruction)
local instruction = instruction << 32 >> 32
local opcode = instruction >> 16
local operand = instruction - (opcode << 16)
return { opcode = opcode, operand = operand }
endWhen given an instruction, it will shift the values around to extract the top and bottom 16-bit words out of the 32-bit instruction. The first part simply clears the top bits of the instruction, since numbers in Lua are ordinarily 64-bit, but we are not interested anything besides the bottom 32-bits, so we just shift out those values to clear them to zeros as a sanity check.
Next we will have to make our computer actually execute instructions. What we want to do is the following:
- Load the next instruction from memory pointed by the PC register.
- Increment the PC register.
- Decode the instruction to opcode and operand.
- Depending on the opcode, execute different actions.
- Set the zero flag based on whether A register is zero or not.
- Repeat as needed.
We will implement a function that takes a computer and executes one cycle the program. This is good for two reasons: it separates out any looping behavior and makes the program a bit less complex and also allows us to run programs step by step to see how the computer's state changes.
function runCycle(computer)
local instruction = getOpcodeAndOperand(computer.memory[computer.pc])
computer.pc = computer.pc + 1
if instruction.opcode == ADDI then
computer.a = computer.a + instruction.operand
elseif instruction.opcode == ADDM then
computer.a = computer.a + computer.memory[instruction.operand]
elseif instruction.opcode == ADDPC then
computer.a = computer.a + computer.pc
elseif instruction.opcode == SUBI then
computer.a = computer.a - instruction.operand
elseif instruction.opcode == SUBM then
computer.a = computer.a - computer.memory[instruction.operand]
elseif instruction.opcode == SUBPC then
computer.a = computer.a - computer.pc
elseif instruction.opcode == BZ then
if computer.a == 0 then
computer.pc = instruction.operand
end
elseif instruction.opcode == LDAI then
computer.a = instruction.operand
elseif instruction.opcode == LDAM then
computer.a = computer.memory[instruction.operand]
elseif instruction.opcode == LDAMM then
computer.a = computer.memory[computer.memory[instruction.operand]]
elseif instruction.opcode == LDAPC then
computer.a = computer.pc
elseif instruction.opcode == STAM then
computer.memory[instruction.operand] = computer.a
elseif instruction.opcode == STAMM then
computer.memory[computer.memory[instruction.operand]] = computer.a
elseif instruction.opcode == STAPC then
computer.pc = computer.a
elseif instruction.opcode == COPY then
computer.b = computer.a
elseif instruction.opcode == SWAP then
local temp = computer.a
computer.a = computer.b
computer.b = temp
end
computer.zero = computer.a == 0
endEach branch simply implements logic for one instruction. None of the individual instructions are very complex, so in most cases we get away with just a single expression per branch. This is quite a long snippet of code though, so feel free to read through it with care to understand it.
So, with all of the parts combined we will end up with the following program:
ADDI = 1
ADDM = 1 << 1
ADDPC = 1 << 2
SUBI = 1 << 3
SUBM = 1 << 4
SUBPC = 1 << 4
BZ = 1 << 5
LDAI = 1 << 6
LDAM = 1 << 7
LDAMM = 1 << 8
LDAPC = 1 << 9
STAM = 1 << 10
STAMM = 1 << 11
STAPC = 1 << 12
COPY = 1 << 13
SWAP = 1 << 14
function makeMemory()
local memory = {}
for i = 1, 4096 do
table.insert(memory, 0)
end
return memory
end
function makeComputer()
local computer = {
a = 0,
b = 0,
pc = 1,
zero = false,
memory = makeMemory()
}
return computer
end
function makeInstruction(opcode, operand)
return (opcode << 16) + operand
end
function getOpcodeAndOperand(instruction)
local instruction = instruction << 32 >> 32
local opcode = instruction >> 16
local operand = instruction - (opcode << 16)
return { opcode = opcode, operand = operand }
end
function runCycle(computer)
local instruction = getOpcodeAndOperand(computer.memory[computer.pc])
computer.pc = computer.pc + 1
if instruction.opcode == ADDI then
computer.a = computer.a + instruction.operand
elseif instruction.opcode == ADDM then
computer.a = computer.a + computer.memory[instruction.operand]
elseif instruction.opcode == ADDPC then
computer.a = computer.a + computer.pc
elseif instruction.opcode == SUBI then
computer.a = computer.a - instruction.operand
elseif instruction.opcode == SUBM then
computer.a = computer.a - computer.memory[instruction.operand]
elseif instruction.opcode == SUBPC then
computer.a = computer.a - computer.pc
elseif instruction.opcode == BZ then
if computer.a == 0 then
computer.pc = instruction.operand
end
elseif instruction.opcode == LDAI then
computer.a = instruction.operand
elseif instruction.opcode == LDAM then
computer.a = computer.memory[instruction.operand]
elseif instruction.opcode == LDAMM then
computer.a = computer.memory[computer.memory[instruction.operand]]
elseif instruction.opcode == LDAPC then
computer.a = computer.pc
elseif instruction.opcode == STAM then
computer.memory[instruction.operand] = computer.a
elseif instruction.opcode == STAMM then
computer.memory[computer.memory[instruction.operand]] = computer.a
elseif instruction.opcode == STAPC then
computer.pc = computer.a
elseif instruction.opcode == COPY then
computer.b = computer.a
elseif instruction.opcode == SWAP then
local temp = computer.a
computer.a = computer.b
computer.b = temp
end
computer.zero = computer.a == 0
endNow we can test out the computer. Let's create a basic program that just adds and subtracts some immediate values and check how the computer behaves:
<<f7>> -- Assume the F7 code goes here
local computer = makeComputer()
computer.memory[1] = makeInstruction(ADDI, 50)
computer.memory[2] = makeInstruction(SUBI, 50)
print("A:", computer.a)
runCycle(computer)
print("A: ", computer.a)
runCycle(computer)
print("A:", computer.a)A: 0 A: 50 A: 0
We have our immediate computation instructions working. Can we branch?
<<f7>> -- Assume the F7 code goes here
local computer = makeComputer()
computer.memory[1] = makeInstruction(LDAI, 0)
computer.memory[2] = makeInstruction(BZ, 50)
runCycle(computer)
runCycle(computer)
print("PC:", computer.pc)PC: 50
Yup, loading zero into the A register allows us to branch our code and jump to address 50.
We should also be able to load and store data from memory.
<<f7>> -- Assume the F7 code goes here
local computer = makeComputer()
computer.memory[50] = 42
computer.memory[51] = 52
computer.memory[52] = 200
computer.memory[1] = makeInstruction(LDAM, 50)
computer.memory[2] = makeInstruction(LDAMM, 51)
computer.memory[3] = makeInstruction(STAM, 55)
runCycle(computer)
print("A:", computer.a)
runCycle(computer)
print("A:", computer.a)
runCycle(computer)
print("Memory address 55: ", computer.memory[55])A: 42 A: 200 Memory address 55: 200
And we can operate on the B register with COPY and SWAP:
<<f7>> -- Assume the F7 code goes here
local computer = makeComputer()
computer.memory[1] = makeInstruction(LDAI, 42)
computer.memory[2] = makeInstruction(COPY, 0)
computer.memory[3] = makeInstruction(LDAI, 7)
computer.memory[4] = makeInstruction(SWAP, 0)
runCycle(computer)
runCycle(computer)
print("A:", computer.b)
print("B:", computer.b)
runCycle(computer)
runCycle(computer)
print("A:", computer.a)
print("B:", computer.b)A: 42 B: 42 A: 42 B: 7
So, now it would be possible for us to create more complex programs. The main difficulty is keeping track of all the places we are storing our program instructions and data, since we have to constantly keep in mind which address they go to. We could implement a proper assembler that will automatically place instructions in correct memory locations and allow us to label the memory, so that we can refer to labels instead of raw memory addresses. Then we could write programs in the following style:
LDAM data
ADDI 8
STAM data
data: DW 42 ;; DW stands for "data-word", meaning the following data should be placed into memory as-is
However, we will leave that as an exercise to the reader. For now
constructing instructions with makeInstruction() should allow
you to play around with at least short programs to see how they
might be represented as low-level computer instructions.
If you'd like some ideas what to try, you could for example write a program that sums up a list of numbers or implement a program to perform multiplication.
Conclusion
So, now you should have a slightly better understanding of how computers perform computation at the instruction level and have a basic ability to read and write in assembly language, although of course the specific details are specific to each architecture.
It's worth keeping in mind that the F7 system we implemented is intentionally very bare-bones. It lacks a lot of comfort features and implementing some programs with it might be annoyingly verbose and inefficient. If you wanted, you could expand it to a more complete system with more instructions or you could take the learnings from here to implement an entirely different instruction set. Or you could go and learn the assembly language for your real-world computer.
A lot more could also be said about computer architecture. We didn't really cover things like stack and heap memory, nor did we consider more modern computer features like virtual memory, protection rings or caches. But my hope is that if you want to learn more, this chapter has been a decent starting point from which you can expand out to areas you want to explore further.
Since our programs are getting longer and more complex, next time we will look into software testing to allow us to make sure our programs are working as they should. See you then!