Well, few days ago I saw databse size of few terabytes which never was VACUUM
-ed. Or, at least, not for last years.
And one more fun fact, project database
projectname
was absolutely empty. All data in this instance was somehow stored in systempostgres
database.
In Postgres, there’s a vacuum feature and autovacuum process, which executes it from time to time. Last one is a mystery separate process, which acts on your data, produces IO spikes and does its own process stuff.
Do you need it? Yes, of course, it saves you database from uncontrolled growing in size.
Let’s say, you have row A
with some data, e.g. foo
.
Then you update this row and set it contents to bar
.
PostgreSQL supports concurrenct reads and writes which are often happen in transactions. So it needs to save old value to provide transactions started before this update with correct data.
New value called tuple and old value called dead tuple.
When a dead tuple is not needed for any running transaction, it’s safe to remove and that’s when autovacuum comes in: to clean things up.
Yeah, pretty like generic garbage collection process.
So this stale data could be stored forever, increasing database size and make DB less perfomant.
Now, here is really useful article for fine-tuning autovacuum process: 2ndquadrant.com/autovacuum-tuning-basics
And here you will find more detailed description on how VACUUM
works and best practices: