Yul is a low-level language that can be used in-line in Solidity via an assembly block, as a standalone language, and as a compilation target. Currently, the default dialect of Yul is the EVM dialect, so to harness this power, you must first gain a deep understanding of how the EVM works and second master the abstraction of standards Solidity imposed.
Since the EVM is a stack-based virtual machine, it operates by a set of instructions that can be categorized to:
1- Stack Instructions
pop
function to drop variables.
pushN
, dupN
, swapN
, and
jumpN.
2- Arithmetic Instructions
add
,
div
,
mul
, and
mod.
3- Comparison Instructions
lt
,
gt
,
eq
, and
iszero.
4- Bitwise Instructions
and
,
or
,
xor
, and
not.
5- Memory Instructions
mstore
,
mload
, and
mstore8.
6- Read Context Instructions
caller
,
sload
, and
chainid.
7- Write Context Instructions
call
,
create
, and
sstore.
You can find a list of all opcodes used in Yul
here.
Note: please note that we will toggle between EVM instructions and Solidity layout a lot in this article.
As per Solidity documentation, there are five standard layouts that every developer must be aware of. The crucial aspects of layouts are:
Storage is persistent between function calls, writing to and reading from the storage is the most expensive in terms of gas.
Contract storage is simply a key mapping to a value, it maps a 32-byte key which represents the position of a
variable in
storage to a 32-byte value at that given position sstore(key, value)
1โ1. Layout of Statically-Sized Variables in Storage:
contract FixedSizeVariables {
uint256 private value1; // value1 = 1 in slot 0
uint256[2] private value2; // value2[0] = 2 & value2[1] = 3 in slot 1 & 2
uint128 private value3; // value3 = 4 in slot 3
uint128 private value4; // value4 = 5 in slot 3
uint8 private value5; // value5 = 6 in slot 4
uint8 private value6; // value6 = 7 in slot 4
}
// Storage Layout:
// 0x00: 0x0000000000000000000000000000000000000000000000000000000000000001
// 0x001: 0x0000000000000000000000000000000000000000000000000000000000000002
// 0x002: 0x0000000000000000000000000000000000000000000000000000000000000003
// 0x003: 0x0000000000000000000000000000000500000000000000000000000000000004
// 0x004: 0x0000000000000000000000000000000000000000000000000000000000000706
Let's assume the value of each variable as stated above in the comments:
value1
is 1, since the EVM operates on bytes only we have to
pad 1 to bytes32 and add the hexadecimal which will occupy slot 0.
uint256
will occupy 2 slots; slot
1 and slot 2.
value3
and value4
are both of uint128
type so EVM will compact them in one slot, which is slot number 3. value3
which is equal to 4 will be right aligned to the next variable, and so on. The value type Sizes are:
- uint256: 32 bytes.
- uint128: 16 bytes.
- uint64: 8 bytes.
- uint32: 4 bytes
- uint16: 2 bytes.
- uint8: 1 byte.
- bytes32: 32 bytes.
- address: 20 bytes.
- bool: 1 byte.
1โ2. Layout of Dynamically-Sized Variables in Storage:
Using reserved slots doesn't work for dynamically-sized arrays and mapping because there is no way of knowing how many slots to reserve, instead:
contract DyanmicSizeVariables {
mapping(address => uint256) private _balances; // account -> balance slot 0
uint256[] private _values; // slot 1
string private _name; // slot 2
}
// Storage Layout:
// 0x00: 0x0000000000000000000000000000000000000000000000000000000000000000
// 0x01: 0x0000000000000000000000000000000000000000000000000000000000000002
// 0x02: 0x4a65726f6d650000000000000000000000000000000000000000000000000012
// mapping elements:
// 0x3ddcac31351e0705625963ec259851464733fec321375bc6bada6a59752ea7c4: 0x00000000000000000000000000000000000000000000000000000000000004b0
// 0xbabeeff9e42c6a75123df37ff2f874914fb38fdf5076178f847844476f22232a: 0x0000000000000000000000000000000000000000000000000000000000000171
// array elements [50, 60]:
// 0xb10e2d527612073b26eecdfd717e6a320cf44b4afac2b0732d9fcbe2b7fa0cf6: 0x0000000000000000000000000000000000000000000000000000000000000032
// 0xb10e2d527612073b26eecdfd717e6a320cf44b4afac2b0732d9fcbe2b7fa0cf7: 0x000000000000000000000000000000000000000000000000000000000000003c
Mapping in Slot 0 :
_balances
are the first state variable it
occupies slot 0 with empty bytes32 as follows:0x00: 0x0000000000000000000000000000000000000000000000000000000000000000
_balances
mapping is
`0x266626BC2bb7C645ce958DA731E2C3F4705E8d87` as the address occupies 20 bytes, so we have to pad it to 32 bytes
by adding 12 more bytes to the left-most side to be 24 more zeros as follows://Please note that address has to be all lowercased
000000000000000000000000266626bc2bb7c645ce958cc731e2c34705e8d87
0000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000266626bc2bb7c645ce958cc731e2c34705e8d870000000000000000000000000000000000000000000000000000000000000000
3ddcac31351e0705625963ec259851464733fec321375bc6bada6a59752ea7c4
balance
of address
`0x266626BC2bb7C645ce958DA731E2C3F4705E8d87` in _balances
mapping is 1200, so we will pad it to
bytes32 to be:
00000000000000000000000000000000000000000000000000000000000004b0
_balances
mapping to sum up all the steps:
// address of the second account is
0x266626bc2bb7c645cc958cc731e2c34705e7f87
// pad address to 32 bytes without hexadecimal
000000000000000000000000266626bc2bb7c645cc958cc731e2c34705e7f87
// index of mapping slot which is slot 0
0000000000000000000000000000000000000000000000000000000000000000
// concatenate key to the slot index
000000000000000000000000266626bc2bb7c645cc958cc731e2c34705e7f870000000000000000000000000000000000000000000000000000000000000000
// keccak256 of the concatenation is:
babeeff9e42c6a75123df37ff2f874914fb38fdf5076178f847844476f22232a
// balance of the address is 369 to bytes32
0000000000000000000000000000000000000000000000000000000000000171
Array in Slot 1:
_values
array [50, 60] is 2, and it is declared in slot 1 so the slot of
declaration will store the array's length in the right-most side.
// declared in slot 1 with 2 elements in length
0x01: 0x0000000000000000000000000000000000000000000000000000000000000002
Keccak256(0000000000000000000000000000000000000000000000000000000000000001) = b10e2d527612073b26eecdfd717e6a320cf44b4afac2b0732d9fcbe2b7fa0cf6
so this the where the array element of index 0 will be stored, now it's time to store the element itself which has the value 50 as follows:
0000000000000000000000000000000000000000000000000000000000000032
storage location of the first element with index 0 was:
b10e2d527612073b26eecdfd717e6a320cf44b4afac2b0732d9fcbe2b7fa0cf6
storage location of the second element with index 1 will increment the hash to be: b10e2d527612073b26eecdfd717e6a320cf44b4afac2b0732d9fcbe2b7fa0cf7
and bytes32 representation of 60 is:
000000000000000000000000000000000000000000000000000000000000003c
String in Slot 2:
_name
Jerome is
4a65726f6d650000000000000000000000000000000000000000000000000000
then multiplying its length of 6 characters by 2 which equals 12 that is added at the most right side as displayed in slot 2 and add hexadecimal.
0x4a65726f6d650000000000000000000000000000000000000000000000000012
1โ3. Layout of Inherited State Variables in Storage:
contract First {
uint256 private x; // x = 0
}
contract Second {
uint256 private y; // y = 1
}
contract Third is First, Second {
uint256 private z; // z = 2
}
// storage Layout
// 0x00: 0x0000000000000000000000000000000000000000000000000000000000000000
// 0x01: 0x0000000000000000000000000000000000000000000000000000000000000001
// 0x02 : 0x0000000000000000000000000000000000000000000000000000000000000002
Solidity has a set of predefined errors but starting from v0.8.4 it allowed developers to define custom errors by name and argument type. A general rule is that errors are stored by the first 4 bytes of the hashing the error and any error data if any.
// bytes4(keccak256('InsufficientBalance(uint256,uint256)')
bytes32 constant insufficientBalanceSelector = 0xcf47918100000000000000000000000000000000000000000000000000000000;
// bytes4(keccak256('UnauthorizedCaller()')
bytes32 constant unauthorizedCallerSelector = 0x5c427cd900000000000000000000000000000000000000000000000000000000;
error InsufficientBalanceSelector(uint256 available, uint256 required);
error UnauthorizedCaller();
function transfer(address to, uint256 amount) public pure {
assembly {
if eq(caller(), to) {
mstore(0x00, unauthorizedCallerSelector)
revert(0x00, 0x04)
}
let callerBalance := sload(keccak256(mload(0x40), 0x40))
if lt(callerBalance, amount) {
mstore(0x00, insufficientBalanceSelector)
revert(0x00, 0x04)
}
}
}
InsuffiecientBalance(uint256,uint256)
is
cf4791818fba6e019216eb4864093b4947f674afada5d305e57d598b641dad1dUnauthorizedCaller()
is
5c427cd9530cc2f15c24eb9ab95a0c7157bdefd597f18e0b4b4ed82a60681983if eq(caller(), to)
we are storing at slot 0 the error selector
mstore(0x00, unauthorizedCallerSelector)
and revert the function execution with the message
displayed to end user of the error of the 4 bytes.
InsufficientBalance
and data of the available
and
required
amounts to the end user.
While reading from and writing to memory is cheaper than storage, you still have to consider cost carefully when writing to memory as it's cost quadratically; you can read more about gas in this guide.
Reading from memory is limited to a width of 256 bits, while writing can be either 8 bits or 256 bits wide, in the case of writing Solidity reserved 4 slots as follows:
0x00
(32 bytes) scratch space0x20
(32 bytes) scratch space0x40
(32 bytes) free memory pointer0x60
(32 bytes) zero slotThe 64 bytes scratch spaces are used for hashing methods and shouldn't be touched or written to. When coding in
inline assembly, writing to memory should always start after the free pointer, and that's why we load from memory
the first 2 slots as reserved mload(0x40)
.
Worth to note that variables are stored differently in memory than in storage:
Example of how variables are stored differently in memory:
uint8[4] public ids;
In storage: the above array occupies 1 slot (8 *4 = 32 bytes)
In memory: the same array occupies 4 slots ( 4 * 32 = 128 bytes)
struct Person {
uint256 amount;
uint256 id;
uint8 rank;
uint8 deposit;
}
In storage: 2 slots for uint256 each and 1 slot for uint8 combined
In memory: 1 slot for each variable, 4 slots in total.
As per the ABI standards, the calldata is the first four bytes of the Keccak-256 hash of the signature of the function; it's the function name with the parenthesizes list of parameter types and the return type of a function is not part of this signature.
Parameter types are split by a single comma - no spaces are used and each argument is padded to 32 bytes. If an argument is of dynamic size, the 32-byte slot will be a pointer to the dynamic value.
Solidity supports all the types with the exception of tuples, on the other hand, some Solidity types are not supported by the ABI but are represented with alternative types as follows:
How to Encode Different Argument Types and Hash the Function Selector
function baz(uint32 x, bool y) public pure returns (bool r) {
r = x > 32 || y;
}
keccak256('baz(uint32,bool)')
equals to
0xcdcd77c0992ec5bbfc459984220f8c45084cc24d9b6efed1fae540db8de801d2
Taking the first left-most bytes as function selector or Id 0xcdcd77c0
69
and is padded to 32 bytes
0x000000000000000000000000000000000000000000000000000000045
true
which always has the value of 1
and is padded to 32
bytes 0x000000000000000000000000000000000000000000000000000000001
function bar(bytes3[2] memory) public pure {}
keccak256('bar(bytes3[2])')
is fce353f601a3db60cb33e4b6ef4f91e4465eaf93c292b64fcde1bf4ba6819b6a
function selector: 0xfce353f6
abc
encoding is
0x6162630000000000000000000000000000000000000000000000000000000000
def
encoding is
0x6465660000000000000000000000000000000000000000000000000000000000
function sam(bytes memory, bool, uint[] memory) public pure {}
If we wanted to call sam
with the arguments "dave"
, false
, and
[1,2,3]
keccak256('sam(bytes,bool,uint256[])')
is
0xa5643bf27e2786816613d3eeb0b62650200b5a98766dfcfd4428f296fb56d043
noting that type uint[]
is encoded as type uint256[]
The function selector: 0xa5643bf2
0x000000000000000000000000000000000000000000000000000000060
0x0000000000000000000000000000000000000000000000000000000000
0x0000000000000000000000000000000000000000000000000000000a0
dave
which is 4
0x000000000000000000000000000000000000000000000000000000004
dave
is
0x646176650000000000000000000000000000000000000000000000000
0x000000000000000000000000000000000000000000000000000000003
1
is
0x000000000000000000000000000000000000000000000000000000001
2
is
0x000000000000000000000000000000000000000000000000000000002
3
is
0x000000000000000000000000000000000000000000000000000000003
As per the ABI standards, events are stored in the logs entries which include the contract's address, series of topics, and some arbitrary binary data. Note that the address of the contract is provided internally and needs no manual encoding.
An event has a name and a series of event parameters; indexed parameters are called topics and non-indexed parameters are called the data.
An event can have up to four topics, the first topic is the keccak256 hash of the event signature, and the rest is based on actual event parameters.
Non-indexed parameters or arbitrary data are stored in memory and then passed to the log instructions a pointer to the start of the data and the length of the data.
event Transfer(address indexed sender, address indexed receiver, uint256 amount);
function transfer(address to, uint256 amount) public returns(bool) {
_transfer(msg.sender, to, amount);
emit Transfer(msg.sender, to, amount)
}
That's the transfer
function from ERC20 in Solidity, to code the event in inline assembly as
follows:
event Transfer(address indexed sender, address indexed receiver, uint256 amount);
function transfer(address to, uint256 amount) public returns(bool) {
// hash of the event name
bytes32 transferHash = keccak256("Transfer(address,address,uint256)")
// amount is non indexed so will be stored in memory
mstore(0x00, amount)
// event has 3 parameters
// `0x00` is the memory pointer
// `0x20` the 32 bytes length of amount
log3(0x00, 0x20, transferHash, caller(), receiver)
}
bytes
and string
endings are just the string contents without padding or length
prefixes.struct
encoding is the concatenation of its members, always padded to 32 bytes even
bytes
and string
.
Now let's overview everything we learned so far in the access storage contract, link to the source code in GitHub:
MAX_DONATION
is a compile-time variable and doesn't occupy any slot.
- slot 0 => mapping donor's address to balance.
- _owner
is a deployment-time variable and doesn't occupy any slot.
- slot 1 => a static variable of total donations received.
- slot 2 => a dynamic array of donors' addresses.
- slot 3 => a static variable of the total number of donors.
- slot 4 => a dynamic variable string
that stores the donation cause.1- Constructor
remember we are reading from memory mload
and writing to storage sstore
, so memory and
storage layout standards are applied here.
nameData
which is the actual data of whatever string
value as input by assigning the value of:
- The bytes32 of purpose,
as EVM operates on 32 bytes, loaded from memory.
- Adding an offset of 32 bytes 0x20
, the size of one slot, as the pointer from where the string
starts in memory.length
of string.shl
to shift the length
to the
left by 1 Bit which effectively multiplies the length
by 2.0x04
the combination of the actual data
nameData
and the length times 2 valueToStore.
_owner
because it is an immutable variable, the assembly code block doesn't have access to so we
set it as we do in Solidity.we applied the memory layout when we loaded the length
, the string pointer, and the actual data
nameData.
we applied the storage layout of a string with 31 or less length which is packed in one slot and the right-most byte is the length multiplied by 2.
2- Read Function getCause()
ptr := mlaod(0x40)
loading the free memory pointer.mstore(ptr, 0x20)
storing the string pointer in one slot after the free memory pointer.storedCause := sload(0x04)
loading the string from slot 4.length := and(storedCause, 0xFFFF)
using the bitwise operator and
to extract the
last right-most 2 bytes from storedCause
and OxFFFF.
mstore(add(ptr, 0x20), length)
we're storing the length
right after ptr
and the string pointer.mstore(add(ptr, 0x40), sub(storedCause, length))
we're storing the actual data of the string which
is the subtraction of storedCause
remember it's the actual data and length times 2 aligned to the
right-most bytes right, right after the memory pointer and the slots 0x40
that contain the string
pointer in one slot and the length is the other slot.return(ptr, 0x60)
we're returning the result of the 3 slots starting from the position of
ptr
3- Write Function donate()
if
statement is to check if caller()
is the zero address isZero
then we store the error AddressZero()
selector and revert the function with the first 4 bytes of
the error to the frontend.if
statement is to check if the callvalue()
which is equal to
msg.sender
in Solidity is gt
greater than MAX_DONATION
then we store the
error ExceedsDonations()
selector and revert the function with the first 4 bytes of the error to
the frontend.
if
statement is to check if the callvalue()
is zero then we follow the same
steps as in the first two statements.ptr := mlaod(0x40)
laod the free memory pointer.
- mstore(ptr, caller())
store the caller after the memory pointer.
- mstore(add(ptr, 0x20), 0x00)
stores the slot 0x00
right after the offset of the
memory pointer and the slot occupied by the caller.
- callerBalanceSlot := keccak256(ptr, 0x40)
hashing the 2 slots after the memory pointer to get the
slot of the key address
where the value balance
is stored.
- callerBalance := sload(callerBalanceSlot)
loading from storage the balance
of the
caller.
- sstore(callerBalanceSlot, add(callerBalance, callvalue()))
if the caller donated before, then we
add his existing balance to the new donation in the slot correspondent to the hash of his address and the slot
of declaration._totalDonations
in slot 1: storing in slot 0x01
the sum of what exists in the slot
sload(0x01)
and the donation amount made.
_donors
in slot 2 and _totalDonors
ins lot 3: here we want to
increment the total number of donors only if the donor address doesn't exist in the array. Remember false is
always zero and true is one.
- donorsArray := sload(0x02)
loading the array from slot 2.
- arrayLength := sload(0x03)
laoding _totalDonors
from slot 3.
- found := 0
sets found to false.
- we're using for loop, iterating over donorsArray
to check if the caller exists then we exit the
loop and do nothing. If the caller doesn't exist then we add his address to the array and increment total
donors.
- in logs, there are 2 parameters so: mstore(0x00, callvalue())
storing the non-indexed parameter
in memory. log2(0x00, 0x20, donationHash, caller())
reserving the first 32 bytes0x00
as a pointer, followed by the 32 bytes of donation stored earlier in memory, then the 2 topics stored in the
stack.4- Read Function totalDonations()
we're loading what is in slot 1 sload(0x01)
, storing it in memory, and returning the 32 bytes from
memory at index zero. The same goes for the next read function totalDonors.
5- Read Function owner()
which is the deployer address but since it is a compile-time variable, we can't use assembly code to read it.
6- Read Function getAllDonors()
it reads from storage all addresses in the array _donors
- outside the assembly code, we initiate 2 local variables to copy the array and its length.
- sload(0x03)
we load from slot 3 _totalDonors
as it truly reflects how many addresses
are in the array _donors
in slot 2 since we prevent duplications.
- mload(0x40)
we allocate memory for the local array allDonors
|
- mstore(allDonors, arrayLength)
set the local array and its length in memory, I didn't forget about
the free memory pointer which I'm going to offset down the line in the code but according to Solc docs:
Scratch space can be used between statements (i.e. within inline assembly). The zero slot is used as the initial value for dynamic memory arrays and should never be written to.
donorAddress
is loaded from the _donors
array using
sload(add(sload(0x02), mul(0x20, i)))
. The sload(0x02)
is used to get the starting
memory slot of the _donors
array.
- the donorAddress
is then stored in the appropriate slot within the allDonors
array
in memory using mstore(add(allDonors, mul(0x20, add(i, 1))), donorAddress)
.
- the loop continues until all donor addresses are copied to the new memory array.
- mul(0x20, add(arrayLength, 1))
calculates the total size in bytes needed to store the entire
allDonors
array along with the length value at the beginning. Each element of the array occupies 32
bytes (0x20
in hexadecimal), and there's also the length stored at the beginning, so
mul(0x20, arrayLength)
calculates the space needed for the addresses, and
mul(0x20, add(arrayLength, 1))
adds the space for the addresses and an additional 32 bytes for the
length value.
- mstore(0x40, add(allDonors, mul(0x20, add(arrayLength, 1))))
store the memory pointer offset and then the memory slot that corresponds to the end of the
allDonors
array in memory. It adds the starting memory slot of the
allDonors
array to the total size calculated in the previous step..
7- Read Function
donorAmout()
pardon me for the typo I didn't notice it until I was testing the contract.
_balances
mapping in slot 0, so the balance
value of the address
key is stored at the keccak256 hash of the concatenation of key and slot index.
mstore(0x00, account)
slot of declaration is slot 0 and the account stored in memory.
mstore(0x20, 0x00)
the next 32 bytes 0x20
will store the declaration slot of index 0
in memory.keccak256(0x00,0x40)
now we get the key and the slot index, we're hashing the 2 bytes
0x40
starting from index 0x00.
donorBalance := sload(hash)
we allocated the slot, so it's time to load the balance
stored in this slot.mstore(0x00, donorBalance)
storing in memory at index 0x00
the
donorBalance
we just loaded.
donorBalance.