Skip to content

Commit

Permalink
Backup/restore enhancements
Browse files Browse the repository at this point in the history
This patchset introduces the framework by which all OSH-based database
systems can use to backup and restore their databases. The framework
is refactored from the Postgresql backup and restore logic. This will
prevent alot of code duplication in the backup restore scripts across
each cluster.

In the process, some improvements needed to be made:
1) Removing the need for 2 separate containers to do the backup
   and restore work to a remote gateway. This simplifies the design
   and enables a higher level of robustness.
2) Adding separate "days to keep" config value for remote backup files,
   as there may be different requirements for the remote files than the
   local backup files.
3) Adding capability to send Storage_Policy when creating the remote
   RGW swift container.
4) Making coding style improvement for readability and maintainability.
5) Fixing a deployment bug that occurs when remote backup is disabled.

Change-Id: I3a3482ad67320e89f04305b17da79abf7ad6eb45
  • Loading branch information
Parsons, Cliff (cp769u) committed May 13, 2020
1 parent e53d287 commit 5a2babd
Show file tree
Hide file tree
Showing 8 changed files with 877 additions and 468 deletions.
367 changes: 367 additions & 0 deletions helm-toolkit/templates/scripts/db-backup-restore/_backup_main.sh.tpl
Original file line number Diff line number Diff line change
@@ -0,0 +1,367 @@
{{- define "helm-toolkit.scripts.db-backup-restore.backup_main" }}
#!/bin/bash

# This file contains a database backup framework which database scripts
# can use to perform a backup. The idea here is that the database-specific
# functions will be implemented by the various databases using this script
# (like mariadb, postgresql or etcd for example). The database-specific
# script will need to first "source" this file like this:
# source /tmp/backup_main.sh
#
# Then the script should call the main backup function (backup_databases):
# backup_databases
#
# No arguments required. However, the framework will require the
# following variables to be exported:
#
# export DB_NAMESPACE Namespace where the database(s) reside
# export DB_NAME Name of the database system
# export LOCAL_DAYS_TO_KEEP Number of days to keep the local backups
# export REMOTE_DAYS_TO_KEEP Number of days to keep the remote backups
# export ARCHIVE_DIR Local location where the backup tarballs should
# be stored. (full directory path)
# export REMOTE_BACKUP_ENABLED "true" if remote backup enabled; false
# otherwise
# export CONTAINER_NAME Name of the container on the RGW to store
# the backup tarball.
# export STORAGE_POLICY Name of the storage policy defined on the
# RGW which is intended to store backups.
# RGW access variables:
# export OS_REGION_NAME Name of the region the RGW resides in
# export OS_AUTH_URL Keystone URL associated with the RGW
# export OS_PROJECT_NAME Name of the project associated with the
# keystone user
# export OS_USERNAME Name of the keystone user
# export OS_PASSWORD Password of the keystone user
# export OS_USER_DOMAIN_NAME Keystone domain the project belongs to
# export OS_PROJECT_DOMAIN_NAME Keystone domain the user belongs to
# export OS_IDENTITY_API_VERSION Keystone API version to use
#
# The following variables are optional:
# export RGW_TIMEOUT Number of seconds to wait for the
# connection to the RGW to be available
# when sending a backup to the RGW. Default
# is 1800 (30 minutes).
#
# The database-specific functions that need to be implemented are:
# dump_databases_to_directory <directory> <err_logfile>
# where:
# <directory> is the full directory path to dump the database files
# into. This is a temporary directory for this backup only.
# <err_logfile> is the full directory path where error logs are to be
# written by the application.
# returns: 0 if no errors; 1 if any errors occurred
#
# This function is expected to dump the database file(s) to the specified
# directory path. If this function completes successfully (returns 0), the
# framework will automatically tar/zip the files in that directory and
# name the tarball appropriately according to the proper conventions.
#
# The functions in this file will take care of:
# 1) Calling "dump_databases_to_directory" and then compressing the files,
# naming the tarball properly, and then storing it locally at the specified
# local directory.
# 2) Sending the tarball built to the remote gateway, to be stored in the
# container configured to store database backups.
# 3) Removing local backup tarballs which are older than the number of days
# specified by the "LOCAL_DAYS_TO_KEEP" variable.
# 4) Removing remote backup tarballs (from the remote gateway) which are older
# than the number of days specified by the "REMOTE_DAYS_TO_KEEP" variable.
#

# Note: not using set -e in this script because more elaborate error handling
# is needed.
set -x

log_backup_error_exit() {
MSG=$1
ERRCODE=$2
log ERROR "${DB_NAME}_backup" "${MSG}"
rm -f $ERR_LOG_FILE
rm -rf $TMP_DIR
exit $ERRCODE
}

log() {
#Log message to a file or stdout
#TODO: This can be convert into mail alert of alert send to a monitoring system
#Params: $1 log level
#Params: $2 service
#Params: $3 message
#Params: $4 Destination
LEVEL=$1
SERVICE=$2
MSG=$3
DEST=$4
DATE=$(date +"%m-%d-%y %H:%M:%S")
if [[ -z "$DEST" ]]; then
echo "${DATE} ${LEVEL}: $(hostname) ${SERVICE}: ${MSG}"
else
echo "${DATE} ${LEVEL}: $(hostname) ${SERVICE}: ${MSG}" >>$DEST
fi
}

#Get the day delta since the archive file backup
seconds_difference() {
ARCHIVE_DATE=$( date --date="$1" +%s )
if [[ $? -ne 0 ]]; then
SECOND_DELTA=0
fi
CURRENT_DATE=$( date +%s )
SECOND_DELTA=$(($CURRENT_DATE-$ARCHIVE_DATE))
if [[ "$SECOND_DELTA" -lt 0 ]]; then
SECOND_DELTA=0
fi
echo $SECOND_DELTA
}

# Send the specified tarball file at the specified filepath to the
# remote gateway.
send_to_remote_server() {
FILEPATH=$1
FILE=$2
# Grab the list of containers on the remote site
RESULT=$(openstack container list 2>&1)
if [[ $? -eq 0 ]]; then
echo $RESULT | grep $CONTAINER_NAME
if [[ $? -ne 0 ]]; then
# Find the swift URL from the keystone endpoint list
SWIFT_URL=$(openstack catalog list -f value | grep -A5 swift | grep public | awk '{print $2}')
# Get a token from keystone
TOKEN=$(openstack token issue -f value -c id)
# Create the container
RES_FILE=$(mktemp -p /tmp)
curl -g -i -X PUT ${SWIFT_URL}/${CONTAINER_NAME} \
-H "X-Auth-Token: ${TOKEN}" \
-H "X-Storage-Policy: ${STORAGE_POLICY}" 2>&1 > $RES_FILE

if [[ $? -ne 0 || $(grep "HTTP" $RES_FILE | awk '{print $2}') -ge 400 ]]; then
log ERROR "${DB_NAME}_backup" "Error creating container ${CONTAINER_NAME}"
cat $RES_FILE
rm -f $RES_FILE
return 1
fi
rm -f $RES_FILE

swift stat $CONTAINER_NAME
if [[ $? -ne 0 ]]; then
log ERROR "${DB_NAME}_backup" "Error retrieving container ${CONTAINER_NAME} details after creation."
return 1
fi
fi
else
echo $RESULT | grep "HTTP 401"
if [[ $? -eq 0 ]]; then
log ERROR "${DB_NAME}_backup" "Access denied by keystone: ${RESULT}"
return 1
else
echo $RESULT | grep -E "ConnectionError|Failed to discover available identity versions"
if [[ $? -eq 0 ]]; then
log ERROR "${DB_NAME}_backup" "Could not reach the RGW: ${RESULT}"
# In this case, keystone or the site/node may be temporarily down.
# Return slightly different error code so the calling code can retry
return 2
else
log ERROR "${DB_NAME}_backup" "Could not get container list: ${RESULT}"
return 1
fi
fi
fi

# Create an object to store the file
openstack object create --name $FILE $CONTAINER_NAME $FILEPATH/$FILE || log ERROR "${DB_NAME}_backup" "Cannot create container object ${FILE}!"
openstack object show $CONTAINER_NAME $FILE
if [[ $? -ne 0 ]]; then
log ERROR "${DB_NAME}_backup" "Error retrieving container object $FILE after creation."
return 1
fi

log INFO "${DB_NAME}_backup" "Created file $FILE in container $CONTAINER_NAME successfully."
return 0
}

# This function attempts to store the built tarball to the remote gateway,
# with built-in logic to handle error cases like:
# 1) Network connectivity issues - retries for a specific amount of time
# 2) Authorization errors - immediately logs an ERROR and exits
store_backup_remotely() {
FILEPATH=$1
FILE=$2
# If the RGW_TIMEOUT has already been set, use that value, otherwise give it
# a default value.
if [[ -z $RGW_TIMEOUT ]]; then
RGW_TIMEOUT=1800
fi
ERROR_SEEN=false
DONE=false
TIMEOUT_EXP=$(( $(date +%s) + $RGW_TIMEOUT ))
while [[ $DONE == "false" ]]; do
# Store the new archive to the remote backup storage facility.
send_to_remote_server $FILEPATH $FILE
# Check if successful
if [[ $? -eq 0 ]]; then
log INFO "${DB_NAME}_backup" "Backup file ${FILE} successfully sent to RGW."
DONE=true
elif [[ $? -eq 2 ]]; then
# Temporary failure occurred. We need to retry if we have not timed out
log WARN "${DB_NAME}_backup" "Backup file ${FILE} could not be sent to RGW due to connection issue."
DELTA=$(( TIMEOUT_EXP - $(date +%s) ))
if [[ $DELTA -lt 0 ]]; then
DONE=true
log ERROR "${DB_NAME}_backup" "Timed out waiting for RGW to become available."
ERROR_SEEN=true
else
log INFO "${DB_NAME}_backup" "Sleeping 30 seconds waiting for RGW to become available..."
sleep 30
log INFO "${DB_NAME}_backup" "Retrying..."
fi
else
log ERROR "${DB_NAME}_backup" "Backup file ${FILE} could not be sent to the RGW."
ERROR_SEEN=true
DONE=true
fi
done
if [[ $ERROR_SEEN == "true" ]]; then
log ERROR "${DB_NAME}_backup" "Errors encountered. Exiting."
return 1
fi
return 0
}

remove_old_local_archives() {
log INFO "${DB_NAME}_backup" "Deleting backups older than ${LOCAL_DAYS_TO_KEEP} days"
if [[ -d $ARCHIVE_DIR ]]; then
for ARCHIVE_FILE in $(ls -1 $ARCHIVE_DIR/*.gz); do
ARCHIVE_DATE=$( echo $ARCHIVE_FILE | awk -F/ '{print $NF}' | cut -d'.' -f 4)
if [[ "$(seconds_difference $ARCHIVE_DATE)" -gt "$(($LOCAL_DAYS_TO_KEEP*86400))" ]]; then
log INFO "${DB_NAME}_backup" "Deleting file $ARCHIVE_FILE."
rm -rf $ARCHIVE_FILE
if [[ $? -ne 0 ]]; then
# Log error but don't exit so we can finish the script
# because at this point we haven't sent backup to RGW yet
log ERROR "${DB_NAME}_backup" "Cannot remove ${ARCHIVE_FILE}"
fi
else
log INFO "${DB_NAME}_backup" "Keeping file ${ARCHIVE_FILE}."
fi
done
fi
}

remove_old_remote_archives() {
log INFO "${DB_NAME}_backup" "Deleting backups older than ${REMOTE_DAYS_TO_KEEP} days"
BACKUP_FILES=$(mktemp -p /tmp)
DB_BACKUP_FILES=$(mktemp -p /tmp)
openstack object list $CONTAINER_NAME > $BACKUP_FILES
if [[ $? -ne 0 ]]; then
log_backup_error_exit "Could not obtain a list of current backup files in the RGW" 1
fi
# Filter out other types of backup files
cat $BACKUP_FILES | grep $DB_NAME | grep $DB_NAMESPACE | awk '{print $2}' > $DB_BACKUP_FILES
for ARCHIVE_FILE in $(cat $DB_BACKUP_FILES); do
ARCHIVE_DATE=$( echo $ARCHIVE_FILE | awk -F/ '{print $NF}' | cut -d'.' -f 4)
if [[ "$(seconds_difference ${ARCHIVE_DATE})" -gt "$((${REMOTE_DAYS_TO_KEEP}*86400))" ]]; then
log INFO "${DB_NAME}_backup" "Deleting file ${ARCHIVE_FILE} from the RGW"
openstack object delete $CONTAINER_NAME $ARCHIVE_FILE || log_backup_error_exit "Cannot delete container object ${ARCHIVE_FILE}!" 1
fi
done
# Cleanup now that we're done.
rm -f $BACKUP_FILES $DB_BACKUP_FILES
}
# Main function to backup the databases. Calling functions need to supply:
# 1) The directory where the final backup will be kept after it is compressed.
# 2) A temporary directory to use for placing database files to be compressed.
# Note: this temp directory will be deleted after backup is done.
backup_databases() {
# Create necessary directories if they do not exist.
mkdir -p $ARCHIVE_DIR || log_backup_error_exit "Cannot create directory ${ARCHIVE_DIR}!"
export TMP_DIR=$(mktemp -d) || log_backup_error_exit "Cannot create temp directory!"
# Create temporary log file
export ERR_LOG_FILE=$(mktemp -p /tmp) || log_backup_error_exit "Cannot create log file!"
# It is expected that this function will dump the database files to the $TMP_DIR
dump_databases_to_directory $TMP_DIR $ERR_LOG_FILE
# If successful, there should be at least one file in the TMP_DIR
if [[ $? -ne 0 || $(ls $TMP_DIR | wc -w) -eq 0 ]]; then
cat $ERR_LOG_FILE
log_backup_error_exit "Backup of the ${DB_NAME} database failed and needs attention."
fi
log INFO "${DB_NAME}_backup" "Databases dumped successfully. Creating tarball..."
NOW=$(date +"%Y-%m-%dT%H:%M:%SZ")
TARBALL_FILE="${DB_NAME}.${DB_NAMESPACE}.all.${NOW}.tar.gz"
cd $TMP_DIR || log_backup_error_exit "Cannot change to directory $TMP_DIR"
#Archive the current database files
tar zcvf $ARCHIVE_DIR/$TARBALL_FILE *
if [[ $? -ne 0 ]]; then
log_backup_error_exit "Backup tarball could not be created."
fi
# Get the size of the file
ARCHIVE_SIZE=$(ls -l $ARCHIVE_DIR/$TARBALL_FILE | awk '{print $5}')

log INFO "${DB_NAME}_backup" "Tarball $TARBALL_FILE created successfully."

cd $ARCHIVE_DIR

# Remove the temporary directory and files as they are no longer needed.
rm -rf $TMP_DIR
rm -f $ERR_LOG_FILE

#Only delete the old archive after a successful archive
if [[ "$LOCAL_DAYS_TO_KEEP" -gt 0 ]]; then
remove_old_local_archives
fi

if $REMOTE_BACKUP_ENABLED; then
store_backup_remotely $ARCHIVE_DIR $TARBALL_FILE
if [[ $? -ne 0 ]]; then
log_backup_error_exit "Backup could not be sent to remote RGW."
fi

#Only delete the old archive after a successful archive
if [[ "$REMOTE_DAYS_TO_KEEP" -gt 0 ]]; then
remove_old_remote_archives
fi

# Turn off trace just for a clearer printout of backup status - for manual backups, mainly.
set +x
echo "=================================================================="
echo "Local backup and backup to remote RGW successful!"
echo "Backup archive name: $TARBALL_FILE"
echo "Backup archive size: $ARCHIVE_SIZE"
echo "=================================================================="
set -x
else
# Remote backup is not enabled. This is ok; at least we have a local backup.
log INFO "${DB_NAME}_backup" "Skipping remote backup, as it is not enabled."

# Turn off trace just for a clearer printout of backup status - for manual backups, mainly.
set +x
echo "=================================================================="
echo "Local backup successful!"
echo "Backup archive name: $TARBALL_FILE"
echo "Backup archive size: $ARCHIVE_SIZE"
echo "=================================================================="
set -x
fi
}
{{- end }}
Loading

0 comments on commit 5a2babd

Please sign in to comment.