Checking for changes to db before backing up

I have a cronjob that backs up my db daily. The problem I am having is that most days I just access the data in the db and don’t make material changes such as adding / altering / removing pw or entries. So why would I want to make a backup of the db (and all the other stuff the wiki recommends) if the only changes in the db are how many devices are connected or 2fa entries?

I was just using diff which is not fine grained enough. Now I am looking at sqldiff but the only thing that would appear to do what I want is the --table option.

So a couple of questions -

  1. Is my assumption that there is no need to backup based on changes to tables like ‘users’ and ‘devices’ correct?
  2. It is a batch script that I am running but I would prefer not to have 10s of diff statements to check all the tables that should trigger a backup. I worry that is brittle and would prefer to err on the side of too many backups rather than not enough. What I was thinking was having the script select a list of tables in the db and then filtering out some tables (like ‘users’) and then feeding that list into a loop that sqldiffs each table. Is there an easier way that I am missing?
  3. What are the tables that you think should be excluded when checking for changes to the db that warrant another backup?
  4. Are changes to the “attachments” and “sends” folders reflected in the db? i.e. do I need to be diffing those folders when considering a backup or is there always a corresponding change to a table in the db?

Thanks,
Dennis

Excluding any table could cause issues in the future. Also with restoring. What if there is a change that you miss because you excluded it which was important?

In my opinion a backup should backup everything from a specific application. Only thing excluded could be logs. And in contrast to that the events table could be skipped maybe.

I’m not sure what your concern is regarding the backup and that you want to exclude stuff. The database isn’t that big probably.

I suggest to look into incremental backups if storage is your concern.

I my self have a nextcloud environment where the whole data folder is synced to every 15 minutes. And i create backups of the database every 5 hours. Because i use SQLite i have that file synced every 15 minutes also.

And then that nextcloud server create incremental backups.

I’d rather have a bit too much then too little in my backup.

To be clear, I am wondering what table changes should elicit a backup of all the tables. I am not talking about backing up only certain tables.

And you are saying that you “sync” to a nextcloud as a backup. Does that mean that you are replacing the backup file each time? My solution will be copying each new dump to amazon’s glacier and I would prefer not to clutter it up with hundreds of copies of the db that do not have any meaningful differences.

Thanks

Well tables like only the tables twofactor_incomplete,events and devices come to mind to exclude when they are the only tables which where changed. All the rest are needed. Unless you want to start looking at column level. But that would be tricky in my opinion.

My 5 hour backups have the date and time in the file name. And i keep max 7 days of those. Nextcloud removes them also during sync, so no clutter.

Do you know what the “twofactor” table is for? Is it tracking log in activity or doing something more?

And do you have any thoughts on number 4 above?

It is used for storing the two factor tokens data. So, if you add it change our remove a mfa that table changes. Without that table mfa is lost. Which would not be a huge use issue, because you can’t still login afterwards but no need for 2fa anymore.

In theory those tables reflect to the folders. But sends could also be changed without the sends directory being changed, because sends also have text only items.

If I am understanding you correctly changes for sends would always be reflected in a change to the db but not necessarily in the sends folder itself.

I will post a bash script here in case others have a similar thought or others want to fix / improve my script.

Thanks for your help in this and for the work you do on this project. It is appreciated.

#!/bin/bash
#https://github.com/dani-garcia/vaultwarden/wiki/Backing-up-your-vault
#https://vaultwarden.discourse.group/t/checking-for-changes-to-db-before-backing-up/2432/3
# Authored by Dennis Marwood on March 6 2023. Author assumes no liability whatsoever. Use at your own risk or not at all.
# I want to create a backup of the vaultwarden db each day only if it has meaningful changes.
# Data stored in the db such as someone logging in are not relevant to keep a back up of the entire db.
# You should manually set the tables that are to be ignored here.
skip=("__diesel_schema_migrations" "twofactor_incomplete" "event" "devices")
# Not tested, but script checks for any schema changes that are not data related and should trigger a backup. 
# Example cron entry:
# @daily /.../bash_scripts/backup_vw_db.sh >> /.../vaultwarden/logs/backup.log
# Requires sqlite3.
# TODO -
# Tie this into fail2ban to generate an email when it fails or performs a backup.

# Working directory should be the parent directory of your data folder.
cd /.../vaultwarden

create_vw_backup () {
    #create folder to hold all files needed for db restoration
    new_dmp="./vw_db_backup_$(date +%Y%m%d_%H_%M_%S)"
    mkdir $new_dmp

    #Add files you want backed up other than the db here
    cp -r /.../vaultwarden/data/attachments $new_dmp
    cp -r /.../vaultwarden/data/sends $new_dmp

    #Create a backup of the db and store it in the newly created folder
    sqlite3 /.../vaultwarden/data/db.sqlite3 "VACUUM INTO '$new_dmp/db.sqlite3';"
}

#Where is the new backup going to be stored? I am using rclone, alter as needed for your application
copy_backup () {
    #Rclone will not overwrite existing backups.
    if [ -d vw_db_backup_20* ]; then
        target_dmp=$(find . -name "vw_db_backup_*" -type d)
        printf "Copying %s...\n" $target_dmp 
        /usr/local/bin/docker-compose -f /.../docker-compose.yml run rclone copy --ignore-existing --create-empty-src-dirs -v /vw/$target_dmp /.../vw_db_backups/$target_dmp 2>&1
    else
        printf "Warning! - The back up of the vw db did not complete.\n"
    fi
    exit
}

# On the initial run assume a backup is needed.
if [ ! -d vw_db_backup_20* ]; then
    create_vw_backup
    copy_backup
fi

#The existing dmp contains the last backed up file.
existing_dmp=$(find . -name "vw_db_backup_*" -type d)

# Create the new dmp to compare against the old.
create_vw_backup

#Any time there is a schema change perform a backup
if [ "$(sqldiff --schema "$new_dmp/db.sqlite3" "$existing_dmp/db.sqlite3")" ]; then 
    printf "Found column name and/or table differences in the schema."
    rm -r $existing_dmp
    copy_backup
fi

all_tables=($(sudo sqlite3 $new_dmp/db.sqlite3 .tables))

#Load all the tables into bash variables
set -- "${all_tables[@]}"
printf "Looking for changes to tables...\nSkipping table(s): "
for w; do
    #Skip the tables that should be ignored
    if [[ "${skip[*]}" =~ "${w}" ]]; then
        printf "%s " $w
        continue
    fi

    #Check if relevant tables have changed
    if [ "$(sqldiff --table "$w" "$existing_dmp/db.sqlite3" "$new_dmp/db.sqlite3")" ]; then 
        printf "\nChanges were found between:\n%s\n%s\nRemoving the outdated db dump: %s\n" $existing_dmp $new_dmp $existing_dmp
        rm -r $existing_dmp
        copy_backup
        break
    fi
done

if [ -d "$new_dmp" ]; then
    printf "\nNo changes found between:\n%s\n%s\nNo backup required.\nRemoving: %s\n" $existing_dmp $new_dmp $new_dmp
    rm -r $new_dmp
fi