After building out our cloud infrastructure exclusively on AWS for several years, our company recently started delving into Azure. Besides being a great learning experience, it’s always good to understand and be familiar with how the basic building blocks of the cloud (compute, storage, permissions, observability) are implemented across providers.

Even if you build your product to run on an industry standard platform, the state of the cloud is such that differences in implementation can have a significant impact on migration efforts. Infrastructure-as-code solutions are not cloud agnostic; a complete rewrite of all your module and environment code is usually necessary. In short, it’s more than just running sed -i 's/aws/azure/g' on your code base.

With that said, there are obvious advantages to be had by aiming for standards compliance – even if the code is different, the same deployment tools and practices are still applicable whether you’re deploying to AWS, Azure, or GCP. By factoring the infrastructure stuff out from the business logic, development teams can focus on writing good, secure code without worrying too much over how or where (or even whether) it will run.

The Stack

I’m going to focus here on the fourth building block – observability – and how I built a log delivery pipeline on Azure to complement the one we had already built with AWS.

Both cloud providers offer a variety of managed load balancer solutions. On AWS, we’re using Application Load Balancer to front our workload running on EKS. The load balancers log all traffic to an S3 storage bucket. Those log writes trigger a Lambda function which parses and forwards the logs to Datadog.

In Azure land, Azure Application Gateway forwards traffic to our workload running on AKS. The Application Gateway is configured through a diagnostic setting to write logs to Azure blob storage. An Azure Function App is triggered by the logs and, once again, forwards the logs to Datadog. The end result, for either cloud, is the same. But more importantly, both of these pipelines are deployed using the same tools, from the same VCS repository.

Thankfully, Datadog provides serverless code for both AWS and Azure to do the log forwarding. But the documentation is a bit thin on how to actually go about deploying it.

Inspired by some similar write-ups, I sought out a way to do exactly that.

Azure Functions

A lot has been written about serverless architectures, but one of the biggest pitfalls I’ve encountered with every single serverless platform I’ve had the misfortune of touching is the amount of voodoo necessary to actually get serverless code running.

Also, in every case, that voodoo looks completely different.

It might be simply because serverless hasn’t had time to coalesce around a standard way of doing things like other more mature orchestration frameworks have had. But as of early 2021, every time I encounter something serverless, I can’t shake the feeling that the whole thing is held together with duct tape and crossed fingers.

But anyway.

I followed the Azure Quickstart for Azure Functions to get a feel for how to create a new serverless deployment using their CLI tool. Creating a new JavaScript function generates a function.json file which, among other things, defines how that function is triggered. Datadog provides a sample function.json with their Azure serverless code, but I had to do a lot of extra reading to understand what’s going on behind the scenes.

One thing that took me a while to understand was the Azure Functions pricing tier. It turns out that when you create an Azure Function App through the UI (or through the CLI with the --consumption-plan-location argument), you’re implicitly creating a dynamic (i.e. serverless) App Service plan to host it. This is the pay-as-you-go function hosting pricing tier offered by Azure, and the one that will seem most familiar to folks with experience with other serverless platforms. For some unknown reason, Azure calls this plan the Consumption plan. I wish I knew why.

Another confusing bit is the bindings.connection entry in the function.json manifest. Apparently, by leaving this field blank, as long as you’ve configured your application settings properly (which we’ll do in the deployment section below), your blob storage-triggered function will just work. Magic voodoo.

Once everything is set up properly, you should have a set of files that looks like this:

.
├── function
│   ├── datadog_logs_monitoring
│   │   ├── function.json
│   │   └── index.js
│   ├── host.json
│   └── local.settings.json
└── ...

The first two files are from the Datadog repo linked above. host.json is the Azure serverless manifest for the multi-function app that got generated from the Azure quickstart guide. local.settings.json is optional here; it gets ignored on deploy as you’ll see soon.

From the function directory, you can actually go ahead and deploy the function to Azure using the CLI tool – no, not az, but the other Azure CLI tool for functions – as long as you’ve set up the Azure App Service infrastructure already.

I guess one of the appeals of the serverless bandwagon is cutting out the ops middle-man by allowing devs to deploy directly from their workstations to production with a simple CLI command. But we’re going to make things complicated again and do it in Terraform.

Deployment

The rest of this post is heavily borrowed from Adrian Hall’s excellent write-up on deploying an Azure Function App with Terraform. I think it says something about the Azure docs that a personal blog post by a Microsoft employee was my primary reference for how to do this, but I digress.

The basic idea is this: Use Terraform to setup the Azure App Function infrastructure along with all of the requisite parts. Then, using the archive provider, package the code into a zip file, upload it to blob storage, and … duct tape and crossed fingers.

Input variables are as follows:

prefix: resource name prefix
tags: a map of tags
resource_group_name: name of the resource group where your Application Gateway is deployed
azurerm_application_gateway_id: ID of the App Gateway
storage_account_name: name of a storage account where App Gateway logs and function code will be uploaded to
datadog_api_key: Datadog API key

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148


data "azurerm_monitor_diagnostic_categories" "app_gw" {
  resource_id = var.azurerm_application_gateway_id
}

data "azurerm_storage_account" "storage" {
  name                = var.storage_account_name
  resource_group_name = var.resource_group_name
}

# Diagnostic setting to push App Gateway access logs to storage
resource "azurerm_monitor_diagnostic_setting" "app_gw_logs" {
  name               = "${var.prefix}-app-gw-logs"
  target_resource_id = var.azurerm_application_gateway_id
  storage_account_id = data.azurerm_storage_account.storage.id

  dynamic log {
    for_each = data.azurerm_monitor_diagnostic_categories.app_gw.logs
    content {
      category = log.key
      retention_policy {
        enabled = true
        days    = 7
      }
    }
  }

  dynamic metric {
    for_each = data.azurerm_monitor_diagnostic_categories.app_gw.metrics
    content {
      category = metric.key
      retention_policy {
        enabled = true
        days    = 7
      }
    }
  }
}

resource "time_rotating" "sas" {
  rotation_days = 365
}

# Obtain Shared Access Signature token for our function to read from our storage account
data "azurerm_storage_account_sas" "sas" {
  connection_string = data.azurerm_storage_account.storage.primary_connection_string
  https_only        = true
  start             = formatdate("YYYY-MM-DD", time_rotating.sas.id)
  expiry = formatdate("YYYY-MM-DD", timeadd(time_rotating.sas.id,
  "${time_rotating.sas.rotation_days * 24}h"))
  resource_types {
    object    = true
    container = false
    service   = false
  }
  services {
    blob  = true
    queue = false
    table = false
    file  = false
  }
  permissions {
    read    = true
    write   = false
    delete  = false
    list    = false
    add     = false
    create  = false
    update  = false
    process = false
  }
}

# Storage container where our function code will be uploaded
resource "azurerm_storage_container" "function_releases" {
  name                  = "function-releases"
  storage_account_name  = data.azurerm_storage_account.storage.name
  container_access_type = "private"
}

# Create our function zip archive
data "archive_file" "function" {
  type        = "zip"
  source_dir  = "${path.module}/function"
  output_path = "${path.module}/function.zip"
  excludes = [
    "${path.module}/function/local.settings.json",
    "${path.module}/function/.gitignore"
  ]
}

data "azurerm_resource_group" "rg" {
  name = var.resource_group_name
}

# Actually upload it to storage
resource "azurerm_storage_blob" "datadog_logs_sender" {
  type                   = "Block"
  storage_account_name   = data.azurerm_storage_account.storage.name
  storage_container_name = azurerm_storage_container.function_releases.name
  name                   = format("function_%s.zip", data.archive_file.function.output_sha)
  source                 = data.archive_file.function.output_path
}

# The 'implicit' app service plan discussed earlier
resource "azurerm_app_service_plan" "datadog_logs_sender" {
  name = format(
  "%s-%s", "datadog-logs-sender", data.azurerm_resource_group.rg.name)
  location            = data.azurerm_resource_group.rg.location
  resource_group_name = data.azurerm_resource_group.rg.name
  kind                = "functionapp"
  reserved            = true

  sku {
    tier = "Dynamic" # we're using serverless mode
    size = "Y1"
  }
}

resource "azurerm_function_app" "datadog_logs_sender" {
  name = format(
  "%s-%s", "datadog-logs-sender", data.azurerm_resource_group.rg.name)
  location                   = data.azurerm_resource_group.rg.location
  resource_group_name        = data.azurerm_resource_group.rg.name
  app_service_plan_id        = azurerm_app_service_plan.datadog_logs_sender.id
  storage_account_name       = data.azurerm_storage_account.storage.name
  storage_account_access_key = data.azurerm_storage_account.storage.primary_access_key
  os_type                    = "linux"

  # Environment variables for our function runtime along with some voodoo
  # to tell Azure where to find our function code
  app_settings = {
    DD_SOURCE                    = "azure-application-gateway"
    DD_SERVICE                   = var.azurerm_application_gateway_id
    DD_SOURCE_CATEGORY           = "azure"
    DD_SITE                      = "datadoghq.com"
    DD_TAGS                      = join(",", [for k, v in var.tags : "${k}:${v}"])
    DD_API_KEY                   = var.datadog_api_key
    FUNCTIONS_WORKER_RUNTIME     = "node"
    WEBSITE_NODE_DEFAULT_VERSION = "~12"
    WEBSITE_RUN_FROM_PACKAGE = format(
      "https://%s.blob.core.windows.net/%s/%s%s",
      data.azurerm_storage_account.storage.name,
      azurerm_storage_container.function_releases.name,
      azurerm_storage_blob.datadog_logs_sender.name,
      data.azurerm_storage_account_sas.sas.sas
    )
  }
}

The only essential part that remains is to configure Datadog to properly grok the timestamp field from the logs. This can be done by defining a new log pipeline with a filter like source:azure-application-gateway, and adding a date remapper to pull the timestamp from the timeStamp field. There are a couple of other handy built-in processors available in Datadog to parse things like HTTP status codes and user agent strings that I recommend using as well.

bad gateway

ship azure application gateway logs to datadog

The Stack

Azure Functions

Deployment

Resources