Database pruning for Ethereum Geth validator. Freeing up space on SSD.

pruning GETH ethereum databaseFor POS mining of the Ethereum cryptocurrency, special software is used under the general name Validator. The Validator, in turn, consists of two components: the Execution Layer and the Consensus Layer. The Consensus Layer is the POS mining itself, while the Execution Layer is the former POW blockchain network of Ethereum that operated from 2015 until the merge in 2022. All data accumulated during this time has been completely transferred to the new POS network.

As there are several options for validator software, the following considerations will be applicable to the GETH client, the most popular one.

Each layer uses its own database, and for the Execution Layer, this database increases by 14GB every week. The total size of the database is already nearly 2TB. For the archival validator (used for various services like Etherscan and beacoincha.in), the SSD space requirements are even higher, exceeding 14TB when using the GETH client.

To reduce the SSD size requirements for a full node (standard validator), developers devised a database pruning procedure to the last 128 blocks for each transaction.

full node ethereum validator

Theoretically, this procedure allows the use of a 1TB SSD, but pruning needs to be done every month. Therefore, it's advisable to purchase a 2TB SSD for the validator and perform database reduction once a year. This procedure takes about 6-7 hours, during which your validator will be offline and may incur penalties.

The Consensus Layer also has its own database, but its size increases not as rapidly as the Execution Layer's database. Hence, the main focus is on reducing the database size for the Execution Layer.

Important: From version 1.13.0, the GETH client supports a new type of database called Pebble. If you have already transitioned to Pebble or are planning to install the ETH validator with the new database type, there is no need to prune the database. For all other clients that have updated to version 1.13.0 or higher, the old database version continues to work. To switch to the new database, you need to completely remove the Geth client's database (using the command geth removedb), set the flag --db.engine pebble in the GETH settings, and wait for the node to fully synchronize within 15 days, provided you have a fast SSD, powerful modern CPU, and fast internet.

Now let's move on to the instructions on how to perform this database pruning for the GETH client.

 

Database Pruning for GETH

Important: You can only prune the database if there is at least 40GB of free space on the SSD. If you have less space on the SSD, you can increase it by reducing the SWAP file or expanding the logical disk size. Alternatively, temporarily move the consensus layer's database to another disk, or as a last resort, perform a new synchronization from scratch, but with the Pebble database.

To check free space on the disk in Linux, use the command:

 

df -h

Another point: Your GETH client must be fully synchronized and run for at least 35 minutes after starting to gather the necessary data.

The first step is to stop the GETH process:

sudo systemctl stop geth

You can also stop the consensus layer. Stop commands for the PRYSM client:

sudo systemctl stop prysmvalidator
sudo systemctl stop prysmbeacon

Wait for 3 minutes before proceeding to the next step.

Before further actions, it's recommended to start a terminal multiplexer, such as TMUX, to ensure that the pruning process won't be interrupted if you lose access to the remote server terminal.

tmux

This is necessary in case you lose access to the remote terminal. In this case, the database pruning process won't be interrupted along with your session.

Now, you can start reducing the database size.

Command to prune the database:

 geth snapshot prune-state

If you installed the validator following Somer Esat's instructions on Medium, meaning GETH is launched with systemd, the command would be different:

sudo -u <user> geth --datadir <path> snapshot prune-state

For example: sudo -u geth geth --datadir /var/lib/geth snapshot prune-state

or sudo -u goeth geth --datadir /var/lib/goethereum snapshot prune-state

The choice of command depends on where your GETH database is stored on the server. You can find the database path in the geth.service file:

sudo nano /etc/systemd/system/geth.service

The path to the database is specified under the --datadir flag.

If done correctly, the database reduction process will begin, divided into 3 stages:

 

The first stage is building the Bloom filter, which took just over 2 hours on my computer.

reducing the ether validator database

elapsed: how much time has passed

eta: estimated time until the end of the process

The actual pruning of the database will begin, taking about 4 more hours.

pruning state data eth geth

After pruning, the database will be compacted, noted in the console as "Compacting database." This process may take about an additional hour.

Upon successful completion of the Ethereum validator database reduction task, a message will indicate "State pruning successful."

Turn off TMUX:

tmux kill-server

Next, start the GETH client and the validator if it was also turned off:

sudo systemctl start geth

sudo systemctl start prysmbeacon

sudo systemctl start prysmvalidator

Error: "Snapshot not old enough yet: need 128 more blocks"

This error occurs if your validator has not been running long enough before deciding to reduce the database. It is recommended that GETH runs for at least 35 minutes before pruning.

Also, this error occurs if you chose the wrong command for pruning the GETH client's database. Specify the correct path or use a different command format.

Another way to start GETH Ethereum Database Pruning

Edit the existing geth.service file:

sudo nano /etc/systemd/system/geth.service

Add the following line to the end of ExecStart:

snapshot prune-state

All other parameters should remain unchanged.


Reload systemd:

sudo systemctl daemon-reload

Start the GETH service:

sudo systemctl start geth

You can monitor the progress of the database pruning with the command:

journalctl -fu geth

After completing the GETH database pruning, remove the line snapshot prune-state from the geth.service file:

sudo nano /etc/systemd/system/geth.service

Reload systemd:

sudo systemctl daemon-reload

Start the GETH service:

sudo systemctl start geth

The entire process of reducing the GETH client's database took about 7 hours on a server with a Ryzen 1700X processor, 32GB RAM, and a 2TB SSD ADATA XPG GAMMIX S11 Pro.

If your server configuration is significantly weaker, the overall time spent on reducing the database will be longer. Conversely, on a more powerful computer, pruning may take less time.

Conclusion: Pruning the database of an ETH validator running on the GETH client is a very useful function as it allows the use of a much smaller SSD than would actually be required. This also eliminates problems if your 2TB SSD is already insufficient to support the Ethereum validator. In just a few hours, you can reduce the database and avoid the need for a full GETH synchronization for 2 weeks to transition to Pebble, where database reduction occurs automatically.

If you decide not to switch to Pebble, remember to periodically execute the snapshot prune-state command while there is still 40GB of free space on your SSD.