Sep 13, 2020

Vacuum Me This, Vacuum Me That

I was reading up a bit on postgres vacuum commands. It caught my interest because we had this issue where some operation that should just be a fast search on an index was taking a lot longer than expected, causing performance problems with some of our deletes. Something with the whole story was off, and that's when it sometimes pays off to look at the details. Unfortunately I didn't really get an answer to this weird behaviour, but I kind of (re)learned some things about vacuuming that is worth knowing.

There are three commands that I believe are relevant to the sanity of a postgres database that have some distinction:

Let's go through them one by one. First I just want to mention though that this whole topic becomes interesting because postgres performs "soft" deletes rather than hard ones when you delete or update data. This is part of the MVCC (Multi Version Concurrency Control) scheme, which is what allows postgres to be high performant even when multiple transactions are operating around the same data.

VACUUM

This command goes into a table and makes sure to remove those soft deleted rows, clearing up this space and allows it to be claimed by new data. The information of where this new available space is available is given to the FSM (Free Space Map). If this is not performed, your DB will just swell and swell, even if you delete rows in your DB.

VACUUM FULL

This does the same as the above command, but also rebuilds the table to make it compact. It is kind of like running Vacuum in combination with a defragmentation

ANALYZE

This has nothing to do with Vacuum, but is also highly relevant for performance. This analyzes the data in a table to store histograms of possible values that are available. This statistical data is stored in the DB table pg_statistic. This data is then used by the query planner when trying to figure out what the most efficient way is to execute a query. Often you will see people writing the command as VACUUM ANALYZE, or ANALYZE EXPLAIN. These are actually then performing two commands. In the first case it first performs a vacuum and then an analyze on the table, in the second example first an analyze and then it explains a query.