Cross-Account Cross-Region Backups for Disaster Recovery

7 min readJul 3, 2023

OBJECTIVE :

The objective of this blog is to outline the process of creating backups in source accounts and securely storing the copy of backups in backup accounts for disaster recovery and data protection purposes. By implementing cross-account backups, we aim to ensure that in the event of any disaster, account deletion, or resource deletion, the backups in the designated backup account will be available. The backup account will have limited access, restricted to authorized individuals, to maintain data security and confidentiality.

OVERVIEW :

In this backup strategy, we have two types: automated backups utilizing the AWS Backup service and native backups implemented through Python scripts running in AWS Lambda. These backups are scheduled to run daily using AWS Event Bridge and are stored in an Amazon S3 bucket. The below table represents the types of backups and the AWS services they cover.

PROCEDURE :

STEP: 1 Create backup vaults in AWS backup account and source accounts.

Do not use the default backup vault and default KMS key to encrypt the data.
Create an individual backup vault for each AWS service.
Create and use KMS CMK to encrypt the backup vaults.
Apply backup vault lock with governance mode.
Apply access policy to backup vaults to access the vaults by specific users.

STEP: 2 Create backup plans for each AWS service.

In the backup plans options, use the build a new plan option.
Tag the backup plan with proper backup type and backup-component tags.
Give service-specific naming conventions for the plan and choose service-specific backup vaults to store the backups.
Choose a backup window at non-peak hours at 2:00 AM IST ranging from 1 hr to 5 hrs since the start of the backup window.
Choose transition to cold storage to “never” and retention period to “3 days” in the source account.
Choose copy to destination to “ap-south-1” and copy to another account’s service-specific vault using ARN.
In the destination backup vault, choose the retention period to “7 days”

NOTE: cross-account cross-region backup is not supported yet, it may come as a feature request.

STEP : 3 Assign the resource to the backup plan.

Give AWS service-specific name to “Resource assignment name”.
In the resource selection, select a specific AWS service related to this plan.
Choose the required resources to take the backup in this specific AWS service.
Create resource assignment and monitor the backup jobs and copy jobs.

STEP: 4 Create native backups for unsupported resources in the AWS backup.

Create an s3 bucket with specific AWS service folders in the source account.
Create lambda functions for each native service and trigger them using AWS EventBridge daily.
Make sure to take the backup of the s3 bucket using the AWS backup service and store it in the backup account.
Apply life cycle configuration to the backup s3 bucket in the source account to reduce the storage cost.

Python boto3 scripts :

To take the backup of Redis and dump it in s3 bucket

import boto3
from datetime import date

def lambda_handler(event, context):
    # Specify the region
    region = 'ap-south-1'
    
    # Specify the S3 bucket name
    bucket_name = 'backup-bucket-name'

    
    # Create an ElastiCache client
    elasticache_client = boto3.client('elasticache', region_name=region)
    
    # Retrieve the list of automated snapshots
    response = elasticache_client.describe_snapshots(SnapshotSource='automated')
    
    # Get the current date
    current_date = date.today().strftime("%Y-%m-%d")
    
    # Extract and copy the snapshots with the current day's date to the S3 bucket
    snapshots = response['Snapshots']
    current_day_snapshots = [snapshot for snapshot in snapshots if current_date in snapshot.get('SnapshotName', '')]
    
    # Create an S3 client
    s3_client = boto3.client('s3', region_name=region)
    
    if current_day_snapshots:
        print("Automated Snapshots with Current Day's Date:")
        for snapshot in current_day_snapshots:
            snapshot_name = snapshot['SnapshotName']
            target_folder = f"redis/{current_date}"
            target_snapshot_name = f"{target_folder}/{snapshot_name}"
            try:
                response = elasticache_client.copy_snapshot(
                    TargetBucket=bucket_name,
                    SourceSnapshotName=snapshot_name,
                    TargetSnapshotName=target_snapshot_name
                )
                print(f"Snapshot '{snapshot_name}' copied to S3 bucket '{bucket_name}' in '{target_folder}' folder")
            except Exception as e:
                print(f"Error copying snapshot '{snapshot_name}' to S3 bucket: {str(e)}")
    else:
        print("No automated snapshots found with the current day's date.")
    
    return {
        'statusCode': 200,
        'body': 'Automated snapshots copied to S3 bucket successfully'
    }

2. To take the backup of Route53 and dump it in s3 bucket

import boto3
import datetime
import json

def lambda_handler(event, context):
    # Create a timestamp for the backup folder and file
    timestamp = datetime.datetime.now().strftime("%Y-%m-%d")
    
    # Initialize the Route 53 client
    route53_client = boto3.client('route53')
    
    # Retrieve the Route 53 hosted zones
    hosted_zones = route53_client.list_hosted_zones()['HostedZones']
    
    # Iterate over each hosted zone and backup its configuration and record sets
    for zone in hosted_zones:
        zone_id = zone['Id']
        zone_name = zone['Name']
        
        # Retrieve the zone's configuration
        zone_config = route53_client.get_hosted_zone(Id=zone_id)['HostedZone']
        
        # Retrieve the zone's record sets
        record_sets = route53_client.list_resource_record_sets(HostedZoneId=zone_id)['ResourceRecordSets']
        
        # Create a dictionary to store the zone's configuration and record sets
        backup_content = {
            'ZoneId': zone_id,
            'ZoneName': zone_name,
            'Configuration': zone_config,
            'RecordSets': record_sets
        }
        
        # Create the backup folder and file names
        backup_folder_name = f"route53/{timestamp}"
        backup_file_name = f"route53-backup-{zone_name.replace('.', '-')}-{timestamp}.json"
        
        # Save the backup file to an S3 bucket in the date-based folder
        s3_client = boto3.client('s3')
        s3_key = f"{backup_folder_name}/{backup_file_name}"
        s3_client.put_object(
            Body=json.dumps(backup_content),
            Bucket='backup-bucket-name',
            Key=s3_key
        )
    
    # Return a success message
    return {
        'statusCode': 200,
        'body': f"Route 53 backup completed. Backup files saved in the '{backup_folder_name}' folder of 'backup-bucket-name'."
    }

3. To take the backup of Secrets from the secret manager.

import boto3
import datetime
import json

def lambda_handler(event, context):
    # Create a timestamp for the backup folder and files
    timestamp = datetime.datetime.now().strftime("%Y-%m-%d")
    
    # Initialize the Secrets Manager client
    secrets_manager_client = boto3.client('secretsmanager')
    
    # Retrieve a list of all secrets
    secrets = secrets_manager_client.list_secrets()['SecretList']
    
    # Iterate over each secret and backup its content
    for secret in secrets:
        secret_name = secret['Name']
        
        # Retrieve the secret value
        secret_value = secrets_manager_client.get_secret_value(SecretId=secret_name)['SecretString']
        
        # Create a dictionary to store the secret's content
        backup_content = {
            'SecretName': secret_name,
            'SecretValue': secret_value
        }
        
        # Create the backup folder and file names
        backup_folder_name = f"secrets/{timestamp}"
        backup_file_name = f"secrets-backup-{secret_name.replace('/', '-')}-{timestamp}.json"
        
        # Save the backup file to an S3 bucket in the date-based folder
        s3_client = boto3.client('s3')
        s3_key = f"{backup_folder_name}/{backup_file_name}"
        s3_client.put_object(
            Body=json.dumps(backup_content),
            Bucket='backup-bucket-name',
            Key=s3_key
        )
    
    # Return a success message
    return {
        'statusCode': 200,
        'body': f"Secrets backup completed. Backup files saved in the '{backup_folder_name}' folder of 'backup-bucket-name'."
    }

4. To take the backup of EMR cluster and dump it in s3 bucket.

import boto3
import datetime

def lambda_handler(event, context):
    # Define the EMR client
    emr_client = boto3.client('emr')
    
    try:
        # List all EMR clusters
        response = emr_client.list_clusters()
        clusters = response['Clusters']
        
        # Iterate over each cluster and take the backup
        for cluster in clusters:
            # Get the cluster ID
            cluster_id = cluster['Id']
            
            # Get the cluster configuration
            cluster_config = emr_client.describe_cluster(ClusterId=cluster_id)
            
            # Store the cluster configuration in an S3 bucket with date-wise folders
            backup_bucket = 'backup-bucket-name'
            backup_folder = 'emr'
            backup_date = datetime.datetime.now().strftime('%Y-%m-%d')
            backup_filename = f'{backup_folder}/{backup_date}/{cluster_id}_config.json'
            
            # Save the cluster configuration to S3
            s3_client = boto3.client('s3')
            s3_client.put_object(
                Body=str(cluster_config),
                Bucket=backup_bucket,
                Key=backup_filename
            )
            
            # Success message
            backup_location = f's3://{backup_bucket}/{backup_filename}'
            print(f'EMR cluster configuration backup completed for cluster {cluster_id}. Backup file: {backup_location}')
        
        return {
            'statusCode': 200,
            'body': 'EMR cluster configuration backup completed successfully.'
        }
    
    except Exception as e:
        # Error handling
        print(f'Error occurred: {str(e)}')
        
        return {
            'statusCode': 500,
            'body': 'Error occurred during EMR cluster configuration backup.'
        }

5. To take the backup of DMS tasks and dump it in the s3 bucket.

import boto3
import json
import datetime

def convert_datetime_to_string(obj):
    if isinstance(obj, datetime.datetime):
        return obj.__str__()

def create_folder(s3_client, bucket_name, folder_path):
    s3_client.put_object(Bucket=bucket_name, Key=(folder_path + '/'))

def lambda_handler(event, context):
    # Create a DMS client
    dms = boto3.client('dms')

    # List all DMS tasks
    response = dms.describe_replication_tasks(MaxRecords=100)

    # Convert datetime objects to strings
    response = json.loads(json.dumps(response, default=convert_datetime_to_string))

    # Get the current date
    current_date = datetime.datetime.now().strftime('%Y-%m-%d')

    # Define the S3 bucket name and folder path
    bucket_name = 'backup-bucket-name'
    folder_path = 'dms/' + current_date + '/'
    file_name = f'dms_replication_tasks_{current_date}.json'
    s3_key = folder_path + file_name

    # Create an S3 client
    s3 = boto3.client('s3')

    # Check if the folder exists, create it if it doesn't
    try:
        s3.head_object(Bucket=bucket_name, Key=folder_path)
    except:
        create_folder(s3, bucket_name, folder_path)

    # Store the response in S3
    s3.put_object(
        Body=json.dumps(response),
        Bucket=bucket_name,
        Key=s3_key
    )

    print(f"Replication tasks description stored in S3: s3://{bucket_name}/{s3_key}")

    # Print all replication tasks
    print("Replication Tasks:")
    for task in response['ReplicationTasks']:
        print(f"Task ID: {task['ReplicationTaskIdentifier']}")
        print(f"Status: {task['Status']}")
        print("---")

CONCLUSION :

To ensure a comprehensive backup strategy, it’s crucial to set up proper permissions for the backup role, monitor backup and copy jobs regularly, and address any failures by identifying and resolving errors. Validate permissions for the KMS CMK and backup vaults in the destination account to maintain data security. Additionally, create a daily report of failed and copy jobs, storing it in an S3 bucket for easy access and review. Implementing these measures can enhance backup operations, ensure data integrity, and facilitate efficient error management in your backup solution.

REFERENCES :

Cross-Account Cross-Region Backups for Disaster Recovery

Python boto3 scripts :

Written by Bhanu Reddy