Using Azure function to backup Azure Blob Storage

There is no built-in backup solution for Azure Storage containers, so I thought I’ll build my own using Azure Functions.

I’ll do this in Visual Studio 2017. You can also find a description of how to create an Azure Function in Visual Studio here: https://docs.microsoft.com/en-us/azure/azure-functions/functions-develop-vs.

PS: There is a command line tool called AzCopy.exe that is very good, but it doesn’t exist as library or NuGet package. You can find more information about it here.

Prerequisites

Visual Studio 2017 version 15.3 (or later) with the Azure Development workload

My requirements

To do the backup, there were some decisions I made beforehand:

  • I’ll be using Azure Functions but another option is to use a WebJob
  • It should run at regular intervals (i.e. TimerTrigger).
    • Another option is to trigger the function everytime there is new blob. I didn’t go for this option, since I may want to copy the same source-blob several times
  • I want to skip blobs that are already in the destination
    • I will first get all blob id’s from the destination, and use them to filter the source-blobs
      • OBS: This means that I will not discover if a source-blob is updated (new source-blob with same blob-id)
  • I want it to report the total number of blobs currently in both the source and the destination storage (using the built-in TraceWriter log).

Create a new Azure Functions project

Create a new Azure Functions project:

In Solution Explorer, right-click on your project node and select Add > New Item. Select Azure Function, type a Name for the class, and click Add.

Choose TimerTrigger:

The value for the Schedule, is a CRON expression with 6 fields separated by space:

{second} {minute} {hour} {day} {month} {day-of-week}

The */5 value for minute that we have above, means “every 5 minutes”. It’s a shorthand notation for listing every minute value (5,10,15, etc). In addition to / and comma, you can also use a hyphen (-) to define a range.

I want it to run every 5 minutes, every weekday, between 6am and 6pm, so change the schedule to this:

“0 */5 6-18 * * MON-FRI”

The default time zone used with CRON expressions is UTC. You might want to set the app setting WEBSITE_TIME_ZONE to your local time zone. You can find more information about how to set this value here: https://docs.microsoft.com/en-us/azure/azure-functions/functions-bindings-timer

You should now have a file called Function1.cs that looks like this:

using System;
using Microsoft.Azure.WebJobs;
using Microsoft.Azure.WebJobs.Host;

namespace FunctionApp1
{
    public static class Function1
    {
        [FunctionName("Function1")]
        public static void Run([TimerTrigger("0 */5 6-18 * * MON-FRI")]TimerInfo myTimer, TraceWriter log)
        {
            log.Info($"C# Timer trigger function executed at: {DateTime.Now}");
        }
    }
}

Configuration files

You also have two configuration files that we are going to update:

  • host.json
  • local.settings.json

host.json

This file lets you configure the Functions host. These settings apply both when running locally and in Azure. For more information, see host.json reference article.

We are only going to set the tracing values. Change it so it looks like this:

{
  "tracing": {
    "consoleLevel": "verbose",
    "fileLoggingMode": "debugOnly"
  }
}

local.settings.json

This file maintains settings used when running functions locally. These settings are not used by Azure, they are used by the Azure Functions Core Tools. Use this file to specify settings, such as connection strings to other Azure services. Add a new key to the Values array for each connection required by functions in your project. For more information, see Local settings file in the Azure Functions Core Tools topic.

Set the connection strings and container names for the source and destination. It should look something like this:

{
  "IsEncrypted": false,
  "Values": {
    "AzureWebJobsStorage": "********",
    "AzureWebJobsDashboard": "********",
    "SourceContainerName": "source",
    "DestinationContainerName": "destination"
  },
  "ConnectionStrings": {
    "SourceConnectionString": "********",
    "DestinationConnectionString": "********"
  }
}

Azure Storage SDK

We then need to add the NuGet packages for working with Azure Storage:

  • Microsoft.WindowsAzure.ConfigurationManager (I used version 3.2.3)
  • WindowsAzure.Storage (I used version 8.4.0)

We create a new class to handle all the work with Azure Storage. Add a new file called StorageHelper.cs, and replace the code with this:

using System;
using System.Collections.Generic;
using System.Linq;
using Microsoft.WindowsAzure.Storage;
using Microsoft.WindowsAzure.Storage.Blob;
using Microsoft.WindowsAzure.Storage.RetryPolicies;
using System.Threading.Tasks;
using Microsoft.Azure.WebJobs.Host;

namespace FunctionApp1
{
    public class StorageHelper
    {
        private const int RetryDeltaBackOff = 4;
        private const int RetryMaxAttemts = 5;
        private const int RetryMaxExecutionTime = 30;

        private readonly TraceWriter _logger;
        private readonly string _sourceConnectionString;
        private readonly string _destinationConnectionString;
        private readonly string _sourceContainerName;
        private readonly string _destinationContainerName;

        public StorageHelper(
            TraceWriter logger,
            string sourceConnectionString, string sourceContainerName,
            string destinationConnectionString, string destinationContainerName)
        {
            _logger = logger;
            _sourceConnectionString = sourceConnectionString;
            _destinationConnectionString = destinationConnectionString;
            _sourceContainerName = sourceContainerName;
            _destinationContainerName = destinationContainerName;
        }

        public void BackupNewFiles()
        {
            BackupNewFilesAsync().GetAwaiter().GetResult();
        }

        private async Task BackupNewFilesAsync()
        {
            var containers = GetContainers();
            var newBlobs = GetBlobsToCopy(containers);

            var sourceSas = GetSourceSas(containers.source);
            foreach (var sourceBlob in newBlobs)
            {
                try
                {
                    var destinationBlob = containers.destination.GetBlockBlobReference(sourceBlob.Name);
                    await CopyBlockBlobAsync(sourceSas, sourceBlob, destinationBlob);
                }
                catch (Exception ex)
                {
                    _logger.Error(ex.Message, ex);
                    throw;
                }
            }
        }

        private (CloudBlobContainer source, CloudBlobContainer destination) GetContainers()
        {
            var source = GetContainer(_sourceConnectionString, _sourceContainerName);
            var destination = GetContainer(_destinationConnectionString, _destinationContainerName);
            return (source, destination);
        }

        private static CloudBlobContainer GetContainer(
            string connectionString,
            string containerName)
        {
            CloudStorageAccount storageAccount = CloudStorageAccount.Parse(connectionString);
            CloudBlobClient blobClient = storageAccount.CreateCloudBlobClient();
            blobClient.DefaultRequestOptions.RetryPolicy = new ExponentialRetry(TimeSpan.FromSeconds(RetryDeltaBackOff), RetryMaxAttemts);
            blobClient.DefaultRequestOptions.MaximumExecutionTime = TimeSpan.FromSeconds(RetryMaxExecutionTime);
            CloudBlobContainer container = blobClient.GetContainerReference(containerName);
            container.CreateIfNotExists();

            return container;
        }

        private List<CloudBlockBlob> GetBlobsToCopy((CloudBlobContainer source, CloudBlobContainer destination) containers)
        {
            var sourceBlobs = containers.source.ListBlobs(null, true).Select(b => (CloudBlockBlob)b).ToList();
            _logger.Info($"{sourceBlobs.Count} blobs in source container.");
            var destinationBlobs = containers.destination.ListBlobs(null, true).Select(b => (CloudBlockBlob)b).ToList();
            _logger.Info($"{destinationBlobs.Count} blobs in destination container.");

            var newBlobs = sourceBlobs.Where(u => destinationBlobs.All(du => du.Name != u.Name)).ToList();
            _logger.Info($"{newBlobs.Count} blobs to backup.");
            return newBlobs;
        }

        private static string GetSourceSas(CloudBlobContainer sourceBlobContainer)
        {
            var sourceSas = sourceBlobContainer.GetSharedAccessSignature(new SharedAccessBlobPolicy()
            {
                SharedAccessStartTime = DateTime.UtcNow.AddMinutes(-15),
                SharedAccessExpiryTime = DateTime.UtcNow.AddDays(1),
                Permissions = SharedAccessBlobPermissions.Read
            });
            return sourceSas;
        }

        private async Task CopyBlockBlobAsync(
            string sourceSas,
            CloudBlockBlob sourceBlob,
            CloudBlockBlob destinationBlob)
        {
            _logger.Verbose($"Start copy: '{sourceBlob.Name}'...");
            await destinationBlob.StartCopyAsync(new Uri($"{sourceBlob.Uri}{sourceSas}"));
            while (destinationBlob.CopyState.Status == CopyStatus.Pending)
            {
                await Task.Delay(500);
                destinationBlob.FetchAttributes();
            }
            _logger.Verbose($"End copy: '{sourceBlob.Name}'.");
        }
    }
}

I will not go through this code in detail, but notice the following:

  • I’m using the new C# 7 tuple type (e.g. the return type of the method GetContainers)
  • Code only handlers Block Blobs. Change code if you want to handle other types, like Append Blobs
  • Notice that we append a SAS token to the Uri in the call to destinationBlob.StartCopyAsync. Since my containers access policy is Private, I would get a “not found” exception without the SAS token.
  • GetBlobsToCopy compares source blobs with destination blobs, so we only copy blobs that doesn’t already exist in the destination

Azure Function code

We now have a helper class to handle the Azure Storage work. Update the Function1.cs code, so it looks like this:

using System;
using System.Configuration;
using Microsoft.Azure;
using Microsoft.Azure.WebJobs;
using Microsoft.Azure.WebJobs.Host;

namespace FunctionApp1
{
    public static class Function1
    {
        [FunctionName("Function1")]
        public static void Run([TimerTrigger("0 */5 6-18 * * MON-FRI")]TimerInfo myTimer, TraceWriter log)
        {
            log.Info($"Function1 function executed at: {DateTime.Now}");

            string sourceConnectionString = GetConnectionString("SourceConnectionString", log);
            string destinationConnectionString = GetConnectionString("DestinationConnectionString", log);
            string sourceContainerName = GetSetting("SourceContainerName", log);
            string destinationContainerName = GetSetting("DestinationContainerName", log);

            var storageHelper = new StorageHelper(
                log,
                sourceConnectionString,
                sourceContainerName,
                destinationConnectionString,
                destinationContainerName);

            storageHelper.BackupNewFiles();

            log.Info($"Function1 function finished at: {DateTime.Now}");
        }

        private static string GetConnectionString(string key, TraceWriter log)
        {
            try
            {
                string connectionString = ConfigurationManager.ConnectionStrings?[key]?.ConnectionString;
                if (!string.IsNullOrEmpty(connectionString))
                {
                    log.Info($"Got value for '{key}'");
                    return connectionString;
                }
                throw new ArgumentException($"Connection string for '{key}' is not set.");
            }
            catch (Exception ex)
            {
                var msg = ex.ToString();
                log.Error(msg);
                throw;
            }
        }

        private static string GetSetting(string key, TraceWriter log)
        {
            try
            {
                string setting = CloudConfigurationManager.GetSetting(key, true);
                if (!string.IsNullOrEmpty(setting))
                {
                    log.Info($"Setting '{key}' to '{setting}'");
                    return setting;
                }
                throw new ArgumentException($"No value for '{key}'.");
            }
            catch (Exception ex)
            {
                var msg = ex.ToString();
                log.Error(msg);
                throw;
            }
        }
    }
}

Notice that we do some error checking when getting connection strings and application settings. If you don’t, the errors you get if you forget to set these values, can be very difficult to understand.

Run Locally

To test that it’s working, I put a blob in my source container, and ran the application locally.

A cmd window pops up with a nice logo for Azure functions, and some other information. When the function got triggered, it logged the following to the cmd windows:

Notice that you might have to wait a few minutes, since it runs exactly 5 past, 10 past, etc.

If you get an error similar to “can’t debug class library project”, you don’t have Azure Functions installed. Read this.

Publishing to Azure

Right-click your project, and choose Publish…

You should see this window:

Click Publish

You should then get the “Create App Service” window:

Fill in your values, and click Create

Wait until you get a line saying “Publish Succeeded.” in the Output window:

Find your function in the Azure Portal, click Monitor, and if it has already run, you should get an error:

We have to set the application settings in Azure. These are the same values as we set in the local.settings.json file.

Click on your function app, then Application settings:

Fill in your two connection strings, and your two settings:

Scroll all the way up, and click Save

You should now start seeing green runs:

2 Comments on “Using Azure function to backup Azure Blob Storage

  1. There’s a typo at line 75 in StorageHelper.cs “RetryMaxAttemts” which throws a compiler exception. Also, this is very nice, but I noticed some of the module references are outdated. Do you have a newer/updated version you could share? Thank you!

    • Hi
      This was done as part of a work assignment I had last year. Unfortunately I don’t work with this anymore, so they will not be updated 🙁

      PS: Didn’t expect anyone to read these posts. I made them in case I got a similar assignment again. But thank you for the comment.

      Arve