2023-07-29

Using Vault with Go

So today we're gonna use vault to make the configuration of an application to be in-memory, this would make debugging harder (since it's in memory, not on disk), but a bit more secure (if got hacked, have to read memory to know the credentials). 

The flow of doing this is something like this:

1. Set up Vault service in separate directory (vault-server/Dockerfile):

FROM hashicorp/vault

RUN apk add --no-cache bash jq

COPY reseller1-policy.hcl /vault/config/reseller1-policy.hcl
COPY terraform-policy.hcl /vault/config/terraform-policy.hcl
COPY init_vault.sh /init_vault.sh

EXPOSE 8200

ENTRYPOINT [ "/init_vault.sh" ]

HEALTHCHECK \
    --start-period=5s \
    --interval=1s \
    --timeout=1s \
    --retries=30 \
        CMD [ "/bin/sh", "-c", "[ -f /tmp/healthy ]" ]

2. The reseller1 ("user" for the app) policy and terraform (just name, we don't use terraform here, this could be any tool that provision/deploy the app, eg. any CD pipeline) policy is something like this:

# terraform-policy.hcl
path "auth/approle/role/dummy_role/secret-id" {
  capabilities = ["update"]
}

path "secret/data/dummy_config_yaml/*" {
  capabilities = ["create","update","read","patch","delete"]
}

path "secret/dummy_config_yaml/*" { # v1
  capabilities = ["create","update","read","patch","delete"]
}

path "secret/metadata/dummy_config_yaml/*" {
  capabilities = ["list"]
}

# reseller1-policy.hcl
path "secret/data/dummy_config_yaml/reseller1/*" {
  capabilities = ["read"]
}

path "secret/dummy_config_yaml/reseller1/*" { # v1
  capabilities = ["read"]
}

3. Then we need to create init script for docker (init_vault.sh), so it could execute required permissions when docker started (insert policies, create appRole, reset token for provisioner), something like this:

set -e

export VAULT_ADDR='http://127.0.0.1:8200'
export VAULT_FORMAT='json'
sleep 1s
vault login -no-print "${VAULT_DEV_ROOT_TOKEN_ID}"
vault policy write terraform-policy /vault/config/terraform-policy.hcl
vault policy write reseller1-policy /vault/config/reseller1-policy.hcl
vault auth enable approle

# configure AppRole
vault write auth/approle/role/dummy_role \
    token_policies=reseller1-policy \
    token_num_uses=0 \
    secret_id_ttl="32d" \
    token_ttl="32d" \
    token_max_ttl="32d"

# overwrite token for provisioner
vault token create \
    -id="${TERRAFORM_TOKEN}" \
    -policy=terraform-policy \
    -ttl="32d"

# keep container alive
tail -f /dev/null & trap 'kill %1' TERM ; wait

5. Now that all has been set up, we can create docker compose (docker-compose.yaml) to start everything with proper environment variable injection, something like this:

version: '3.3'
services:
  testvaultserver1:
    build: ./vault-server/
    cap_add:
      - IPC_LOCK
    environment:
      VAULT_DEV_ROOT_TOKEN_ID: root
      APPROLE_ROLE_ID:         dummy_app
      TERRAFORM_TOKEN:         dummyTerraformToken
    ports:
      - "8200:8200"

# run with: docker compose up 

6. Now that vault server already up, we can run a script (should be run by provisioner/CD) to retrieve an AppSecret and write it to /tmp/secret, and write our app configuration (config.yaml) to vault path with key dummy_config_yaml/reseller1/region99 something like this:

TERRAFORM_TOKEN=`cat docker-compose.yml | grep TERRAFORM_TOKEN | cut -d':' -f2 | xargs echo -n`
VAULT_ADDRESS="127.0.0.1:8200"

# retrieve secret for appsecret so dummy app can load the /tmp/secret
curl \
   --request POST \
   --header "X-Vault-Token: ${TERRAFORM_TOKEN}" \
      "${VAULT_ADDRESS}/v1/auth/approle/role/dummy_role/secret-id" > /tmp/debug

cat /tmp/debug | jq -r '.data.secret_id' > /tmp/secret

# check appsecret exists
cat /tmp/debug
cat /tmp/secret

VAULT_DOCKER=`docker ps| grep vault | cut -d' ' -f 1`

echo 'put secret'
cat config.yaml | docker exec -i $VAULT_DOCKER vault -v kv put -address=http://127.0.0.1:8200 -mount=secret dummy_config_yaml/reseller1/region99 raw=-

echo 'check secret length'
docker exec -i $VAULT_DOCKER vault -v kv get -address=http://127.0.0.1:8200 -mount=secret dummy_config_yaml/reseller1/region99 | wc -l

7. Next, we just need to creat an application that will read the AppSecret (/tmp/secret), retrieve the application config from vault key path secret dummy_config_yaml/reseller1/region99, something like this:

secretId := readFile(`/tmp/secret`)
config := vault.DefaultConfig()
config.Address = address
appRoleAuth, err := approle.NewAppRoleAuth(
    AppRoleID, -- injected on compile time = `dummy_app`
    approleSecretID)
const configPath = `
secret/data/dummy_config_yaml/reseller1/region99`
secret, err := client.Logical().Read(configPath)
data := secret.Data[`data`]
m, ok := data.(map[string]interface{})
raw, ok := m[`raw`]
rawStr, ok := raw.(string)

the content of rawStr that read from vault will have exactly the same as config.yaml.

This way if hacker already got in into the system/OS/docker, can only know the secretId, to know the AppRoleID and the config.yaml content they have to analyze from memory. Full source code can be found here.

2023-07-02

KEDA Kubernetes Event-Driven Autoscaling

Autoscaling mostly useless, if the number of host/nodes/hypervisor is limited, eg. we only have N number of nodes, and we tried to autoscale the services inside of it, so by default we already waste a lot of unused resources (especially if the billing criteria is not like Jelastic, you use whatever you allocate not whatever you utilize). Autoscaling also quite useless if your problem is I/O-bound not CPU-bound, for example you don't use autoscaled database (or whatever the I/O bottleneck are). CPU are rarely the bottleneck in my past experience. But today we're gonna try to use KEDA, to autoscale kubernetes service. First we need to install fastest kube:

# install minikube for local kubernetes cluster
curl -LO https://storage.googleapis.com/minikube/releases/latest/minikube-linux-amd64 
sudo install minikube-linux-amd64 /usr/local/bin/minikube
minikube start # --driver=kvm2 or --driver=virtualbox
minikube kubectl
alias k='minikube kubectl --'
k get pods --all-namespaces

# if not using --driver=docker (default)
minikube addons configure registry-creds
Do you want to enable AWS Elastic Container Registry? [y/n]: n
Do you want to enable Google Container Registry? [y/n]: n
Do you want to enable Docker Registry? [y/n]: y
-- Enter docker registry server url: https://hub.docker.com/
-- Enter docker registry username: kokizzu
-- Enter docker registry password:
Do you want to enable Azure Container Registry? [y/n]: n
✅  registry-creds was successfully configured

 

Next we need to create a dummy container as for a pod that we want to be autoscaled:

# build example docker image
docker login -u kokizzu # replace with your docker hub username
docker build -t pf1 .
docker image ls pf1
# REPOSITORY   TAG       IMAGE ID       CREATED         SIZE
# pf1          latest    204670ee86bd   2 minutes ago   89.3MB

# run locally for testing
docker run -it pf1 -p 3000:3000

# tag and upload
docker image tag pf1 kokizzu/pf1:v0001
docker image push kokizzu/pf1:v0001


Create deployment terraform file, something like this:

# main.tf
terraform {
  required_version = ">= 1.3.0"
  required_providers {
    kubernetes = {
      source  = "hashicorp/kubernetes"
      version = "= 2.20.0"
    }
  }
  backend "local" {
    path = "/tmp/pf1.tfstate"
  }
}
provider "kubernetes" {
  config_path    = "~/.kube/config"
  # from k config view | grep -A 3 minikube | grep server:
  host           = "https://240.1.0.2:8443"
  config_context = "minikube"
}
provider "helm" {
  kubernetes {
    config_path    = "~/.kube/config"
    config_context = "minikube"
  }
}
resource "kubernetes_namespace_v1" "pf1ns" {
  metadata {
    name        = "pf1ns"
    annotations = {
      name = "deployment namespace"
    }
  }
}
resource "kubernetes_deployment_v1" "promfiberdeploy" {
  metadata {
    name      = "promfiberdeploy"
    namespace = kubernetes_namespace_v1.pf1ns.metadata.0.name
  }
  spec {
    selector {
      match_labels = {
        app = "promfiber"
      }
    }
    replicas = "1"
    template {
      metadata {
        labels = {
          app = "promfiber"
        }
        annotations = {
          "prometheus.io/path"   = "/metrics"
          "prometheus.io/scrape" = "true"
          "prometheus.io/port"   = 3000
        }
      }
      spec {
        container {
          name  = "pf1"
          image = "kokizzu/pf1:v0001" # from promfiber.go
          port {
            container_port = 3000
          }
        }
      }
    }
  }
}
resource "kubernetes_service_v1" "pf1svc" {
  metadata {
    name      = "pf1svc"
    namespace = kubernetes_namespace_v1.pf1ns.metadata.0.name
  }
  spec {
    selector = {
      app = kubernetes_deployment_v1.promfiberdeploy.spec.0.template.0.metadata.0.labels.app
    }
    port {
      port        = 33000 # no effect in minikube, will forwarded to random port anyway
      target_port = kubernetes_deployment_v1.promfiberdeploy.spec.0.template.0.spec.0.container.0.port.0.container_port
    }
    type = "NodePort"
  }
}
resource "kubernetes_ingress_v1" "pf1ingress" {
  metadata {
    name        = "pf1ingress"
    namespace   = kubernetes_namespace_v1.pf1ns.metadata.0.name
    annotations = {
      "kubernetes.io/ingress.class" = "nginx"
    }
  }
  spec {
    rule {
      host = "pf1svc.pf1ns.svc.cluster.local"
      http {
        path {
          path = "/"
          backend {
            service {
              name = kubernetes_service_v1.pf1svc.metadata.0.name
              port {
                number = kubernetes_service_v1.pf1svc.spec.0.port.0.port
              }
            }
          }
        }
      }
    }
  }
}
resource "kubernetes_config_map_v1" "prom1conf" {
  metadata {
    name      = "prom1conf"
    namespace = kubernetes_namespace_v1.pf1ns.metadata.0.name
  }
  data = {
    # from https://github.com/techiescamp/kubernetes-prometheus/blob/master/config-map.yaml
    "prometheus.yml" : <<EOF
global:
  scrape_interval: 15s
  evaluation_interval: 15s
alerting:
  alertmanagers:
    - static_configs:
        - targets:
          # - alertmanager:9093
rule_files:
  #- /etc/prometheus/prometheus.rules
scrape_configs:
  - job_name: "prometheus"
    static_configs:
      - targets: ["localhost:9090"]
  - job_name: "pf1"
    static_configs:
      - targets: [
          "${kubernetes_ingress_v1.pf1ingress.spec.0.rule.0.host}:${kubernetes_service_v1.pf1svc.spec.0.port.0.port}"
        ]
EOF
    # need to delete stateful set if this changed after terraform apply
    # or kubectl rollout restart statefulset prom1stateful -n pf1ns
    # because statefulset pod not restarted automatically when changed
    # if configmap set as env or config file
  }
}
resource "kubernetes_persistent_volume_v1" "prom1datavol" {
  metadata {
    name = "prom1datavol"
  }
  spec {
    access_modes = ["ReadWriteOnce"]
    capacity     = {
      storage = "1Gi"
    }
    # do not add storage_class_name or it would stuck
    persistent_volume_source {
      host_path {
        path = "/tmp/prom1data" # mkdir first?
      }
    }
  }
}
resource "kubernetes_persistent_volume_claim_v1" "prom1dataclaim" {
  metadata {
    name      = "prom1dataclaim"
    namespace = kubernetes_namespace_v1.pf1ns.metadata.0.name
  }
  spec {
    # do not add storage_class_name or it would stuck
    access_modes = ["ReadWriteOnce"]
    resources {
      requests = {
        storage = "1Gi"
      }
    }
  }
}
resource "kubernetes_stateful_set_v1" "prom1stateful" {
  metadata {
    name      = "prom1stateful"
    namespace = kubernetes_namespace_v1.pf1ns.metadata.0.name
    labels    = {
      app = "prom1"
    }
  }
  spec {
    selector {
      match_labels = {
        app = "prom1"
      }
    }
    template {
      metadata {
        labels = {
          app = "prom1"
        }
      }
      # example: https://github.com/mateothegreat/terraform-kubernetes-monitoring-prometheus/blob/main/deployment.tf
      spec {
        container {
          name  = "prometheus"
          image = "prom/prometheus:latest"
          args  = [
            "--config.file=/etc/prometheus/prometheus.yml",
            "--storage.tsdb.path=/prometheus/",
            "--web.console.libraries=/etc/prometheus/console_libraries",
            "--web.console.templates=/etc/prometheus/consoles",
            "--web.enable-lifecycle",
            "--web.enable-admin-api",
            "--web.listen-address=:10902"
          ]
          port {
            name           = "http1"
            container_port = 10902
          }
          volume_mount {
            name       = kubernetes_config_map_v1.prom1conf.metadata.0.name
            mount_path = "/etc/prometheus/"
          }
          volume_mount {
            name       = "prom1datastorage"
            mount_path = "/prometheus/"
          }
          #security_context {
          #  run_as_group = "1000" # because /tmp/prom1data is owned by 1000
          #}
        }
        volume {
          name = kubernetes_config_map_v1.prom1conf.metadata.0.name
          config_map {
            default_mode = "0666"
            name         = kubernetes_config_map_v1.prom1conf.metadata.0.name
          }
        }
        volume {
          name = "prom1datastorage"
          persistent_volume_claim {
            claim_name = kubernetes_persistent_volume_claim_v1.prom1dataclaim.metadata.0.name
          }
        }
      }
    }
    service_name = ""
  }
}
resource "kubernetes_service_v1" "prom1svc" {
  metadata {
    name      = "prom1svc"
    namespace = kubernetes_namespace_v1.pf1ns.metadata.0.name
  }
  spec {
    selector = {
      app = kubernetes_stateful_set_v1.prom1stateful.spec.0.template.0.metadata.0.labels.app
    }
    port {
      port        = 10902 # no effect in minikube, will forwarded to random port anyway
      target_port = kubernetes_stateful_set_v1.prom1stateful.spec.0.template.0.spec.0.container.0.port.0.container_port
    }
    type = "NodePort"
  }
}
resource "helm_release" "pf1keda" {
  name       = "pf1keda"
  repository = "https://kedacore.github.io/charts"
  chart      = "keda"
  namespace  = kubernetes_namespace_v1.pf1ns.metadata.0.name
  # uninstall: https://keda.sh/docs/2.11/deploy/#helm
}
# run with this commented first, then uncomment
## from: https://www.youtube.com/watch?v=1kEKrhYMf_g
#resource "kubernetes_manifest" "scaled_object" {
#  manifest = {
#    "apiVersion" = "keda.sh/v1alpha1"
#    "kind"       = "ScaledObject"
#    "metadata"   = {
#      "name"      = "pf1scaledobject"
#      "namespace" = kubernetes_namespace_v1.pf1ns.metadata.0.name
#    }
#    "spec" = {
#      "scaleTargetRef" = {
#        "apiVersion" = "apps/v1"
#        "name"       = kubernetes_deployment_v1.promfiberdeploy.metadata.0.name
#        "kind"       = "Deployment"
#      }
#      "minReplicaCount" = 1
#      "maxReplicaCount" = 5
#      "triggers"        = [
#        {
#          "type"     = "prometheus"
#          "metadata" = {
#            "serverAddress" = "http://prom1svc.pf1ns.svc.cluster.local:10902"
#            "threshold"     = "100"
#            "query"         = "sum(irate(http_requests_total[1m]))"
#            # with or without {service=\"promfiber\"} is the same since 1 service 1 pod in our case
#          }
#        }
#      ]
#    }
#  }
#}


terraform init # download dependencies
terraform plan # check changes
terraform apply # deploy
terraform apply # uncomment first scaled_object part

k get pods --all-namespaces -w # check deployment
NAMESPACE NAME                          READY STATUS  RESTARTS AGE
keda-admission-webhooks-xzkp4          1/1    Running 0        2m2s
keda-operator-r6hsh                    1/1    Running 1        2m2s
keda-operator-metrics-apiserver-xjp4d  1/1    Running 0        2m2s
promfiberdeploy-868697d555-8jh6r       1/1    Running 0        3m40s
prom1stateful-0                        1/1    Running 0        22s

k get services --all-namespaces
NAMESP NAME     TYPE     CLUSTER-IP     EXTERNAL-IP PORT(S) AGE
pf1ns  pf1svc   NodePort 10.111.141.44  <none>      33000:30308/TCP 2s
pf1ns  prom1svc NodePort 10.109.131.196 <none>      10902:30423/TCP 6s

minikube service list
|------------|--------------|-------------|------------------------|
|  NAMESPACE |    NAME      | TARGET PORT |          URL           |
|------------|--------------|-------------|------------------------|
| pf1ns      | pf1svc       |       33000 | http://240.1.0.2:30308 |
| pf1ns      | prom1service |       10902 | http://240.1.0.2:30423 |
|------------|--------------|-------------|------------------------|

 

To debug if something goes wrong, you can use something like this:


# debug inside container, replace with pod name
k exec -it pf1deploy-77bf69d7b6-cqqwq -n pf1ns -- bash

# or use dedicated debug pod
k apply -f debug-pod.yml
k exec -it debug-pod -n pf1ns -- bash
# delete if done using
k delete pod debug-pod -n pf1ns


# install debugging tools
apt update
apt install curl iputils-ping
dnsutils net-tools # dig and netstat

 

To check metrics that we want to use as autoscaler we can check from multiple place:

# check metrics inside pod
curl http://pf1svc.pf1ns.svc.cluster.local:33000/metrics

# check metrics from outside
curl
http://240.1.0.2:30308/metrics

# or open from prometheus UI:
http://240.1.0.2:30423

# get metrics
k get --raw "/apis/external.metrics.k8s.io/v1beta1"                                                        1 ↵
{"kind":"APIResourceList","apiVersion":"v1","groupVersion":"external.metrics.k8s.io/v1beta1","resources":[{"name":"externalmetrics","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]}]}

# get scaled object
k get scaledobject pf1keda -n pf1ns
NAME      SCALETARGETKIND      SCALETARGETNAME   MIN   MAX   TRIGGERS     AUTHENTICATION   READY   ACTIVE   FALLBACK   PAUSED    AGE
pf1keda   apps/v1.Deployment   pf1deploy         1     5     prometheus                    True    False    False      Unknown   3d20h

# get metric name
k get scaledobject pf1keda -n pf1ns -o 'jsonpath={.status.externalMetricNames}'
["s0-prometheus-prometheus"]

 

Next we can do loadtest while watching pods:

# do loadtest
hey -c 100 -n 100000
http://240.1.0.2:30308

# check with kubectl get pods -w -n pf1ns, it would spawn:
promfiberdeploy-96qq9  0/1     Pending             0  0s
promfiberdeploy-j5qw9  0/1     Pending             0  0s
promfiberdeploy-96qq9  0/1     Pending             0  0s
promfiberdeploy-76pvt  0/1     Pending             0  0s
promfiberdeploy-76pvt  0/1     Pending             0  0s
promfiberdeploy-j5qw9  0/1     Pending             0  0s
promfiberdeploy-96qq9  0/1     ContainerCreating   0  0s
promfiberdeploy-76pvt  0/1     ContainerCreating   0  0s
promfiberdeploy-j5qw9  0/1     ContainerCreating   0  0s
promfiberdeploy-96qq9  1/1     Running             0  1s
promfiberdeploy-j5qw9  1/1     Running             0  1s
promfiberdeploy-76pvt  1/1     Running             0  1s
...
promfiberdeploy-j5qw9  1/1     Terminating         0  5m45s
promfiberdeploy-96qq9  1/1     Terminating  
      0  5m45s
promfiberdeploy-gt2h5  1/1     Terminating  
      0  5m30s
promfiberdeploy-76pvt  1/1     Terminating  
      0  5m45s

# all events includes scale up event
k get events -n pf1ns -w
21m    Normal ScalingReplicaSet deployment/promfiberdeploy Scaled up replica set promfiberdeploy-868697d555 to 1
9m20s  Normal ScalingReplicaSet deployment/promfiberdeploy Scaled up replica set promfiberdeploy-868697d555 to 4 from 1
9m5s   Normal ScalingReplicaSet deployment/promfiberdeploy Scaled up replica set promfiberdeploy-868697d555 to 5 from 4
3m35s  Normal ScalingReplicaSet deployment/promfiberdeploy Scaled down replica set promfiberdeploy-868697d555 to 1 from 5

That's it, that's how you use KEDA and terraform to autoscale deployments. The key parts on the .tf files are:

  • terraform - needed to let terraform know what plugins being used on terraform init
  • kubernetes and helm - needed to know which config being used, and which cluster being contacted
  • kubernetes_namespace_v1 - to create a namespace (eg. per tenant)
  • kubernetes_deployment_v1 - to set what pod being used and which docker container to be used
  • kubernetes_service_v1 - to expose port on the node (in this case only NodePort), to loadbalance between pods
  • kubernetes_ingress_v1 - should be used to redirect request to proper services, but since we only have 1 service and we use minikube (that it have it's own forwarding) this one not used in our case
  • kubernetes_config_map_v1 - used to bind a config file (volume) for prometheus deployment, this sets where to scrape the service, this is NOT a proper way to do this, the proper way is on the latest commit on that repository, using PodMonitor from prometheus-operator:
    • kubernetes_service_v1 - to expose global prometheus (that monitor whole kubernetes, not per namespace)
    • kubernetes_service_account_v1 - crates service account so prometheus on namespace can retrieve pods list
    • kubernetes_cluster_role_v1 - role to allow list pods
    • kubernetes_cluster_role_binding_v1 - bind service account with the role above
    • kubernetes_manifest - creates podmonitor kubernetes manifest, this is the rules generated for prometheus on namespace to match specific pod
    • kubernetes_manifest - creates prometheus manifest that deploys prometheus on specific namespace
  • kubernetes_persistent_volume_v1 and kubernetes_persistent_volume_claim_v1 - used to bind data diectory (volume) to prometheus deployment
  • kubernetes_stateful_set_v1 - to deploy the prometheus, since it's not a stateless service, we have to bind data volume to prevent data loss
  • kubernetes_service_v1 - to expose port of prometheus to outside
  • helm_release - to deploy keda
  • kubernetes_manifest - to create custom manifest since scaled object is not supported by kubernetes terraform provider, this configures which service that able to be autoscaled

If you need the source code, you can take a look at terraform1 repo, the latest one is using podmonitor.