Autoscaling mostly useless, if the number of host/nodes/hypervisor is limited, eg. we only have N number of nodes, and we tried to autoscale the services inside of it, so by default we already waste a lot of unused resources (especially if the billing criteria is not like Jelastic, you use whatever you allocate not whatever you utilize). Autoscaling also quite useless if your problem is I/O-bound not CPU-bound, for example you don't use autoscaled database (or whatever the I/O bottleneck are). CPU are rarely the bottleneck in my past experience. But today we're gonna try to use KEDA, to autoscale kubernetes service. First we need to install fastest kube:
# install minikube for local kubernetes cluster
curl -LO https://storage.googleapis.com/minikube/releases/latest/minikube-linux-amd64
sudo install minikube-linux-amd64 /usr/local/bin/minikube
minikube start # --driver=kvm2 or --driver=virtualbox
minikube kubectl
alias k='minikube kubectl --'
k get pods --all-namespaces
# if not using --driver=docker (default)
minikube addons configure registry-creds
Do you want to enable AWS Elastic Container Registry? [y/n]: n
Do you want to enable Google Container Registry? [y/n]: n
Do you want to enable Docker Registry? [y/n]: y
-- Enter docker registry server url: https://hub.docker.com/
-- Enter docker registry username: kokizzu
-- Enter docker registry password:
Do you want to enable Azure Container Registry? [y/n]: n
✅ registry-creds was successfully configured
Next we need to create a dummy container as for a pod that we want to be autoscaled:
# build example docker image
docker login -u kokizzu # replace with your docker hub username
docker build -t pf1 .
docker image ls pf1
# REPOSITORY TAG IMAGE ID CREATED SIZE
# pf1 latest 204670ee86bd 2 minutes ago 89.3MB
# run locally for testing
docker run -it pf1 -p 3000:3000
# tag and upload
docker image tag pf1 kokizzu/pf1:v0001
docker image push kokizzu/pf1:v0001
Create deployment terraform file, something like this:
# main.tf
terraform {
required_version = ">= 1.3.0"
required_providers {
kubernetes = {
source = "hashicorp/kubernetes"
version = "= 2.20.0"
}
}
backend "local" {
path = "/tmp/pf1.tfstate"
}
}
provider "kubernetes" {
config_path = "~/.kube/config"
# from k config view | grep -A 3 minikube | grep server:
host = "https://240.1.0.2:8443"
config_context = "minikube"
}
provider "helm" {
kubernetes {
config_path = "~/.kube/config"
config_context = "minikube"
}
}
resource "kubernetes_namespace_v1" "pf1ns" {
metadata {
name = "pf1ns"
annotations = {
name = "deployment namespace"
}
}
}
resource "kubernetes_deployment_v1" "promfiberdeploy" {
metadata {
name = "promfiberdeploy"
namespace = kubernetes_namespace_v1.pf1ns.metadata.0.name
}
spec {
selector {
match_labels = {
app = "promfiber"
}
}
replicas = "1"
template {
metadata {
labels = {
app = "promfiber"
}
annotations = {
"prometheus.io/path" = "/metrics"
"prometheus.io/scrape" = "true"
"prometheus.io/port" = 3000
}
}
spec {
container {
name = "pf1"
image = "kokizzu/pf1:v0001" # from promfiber.go
port {
container_port = 3000
}
}
}
}
}
}
resource "kubernetes_service_v1" "pf1svc" {
metadata {
name = "pf1svc"
namespace = kubernetes_namespace_v1.pf1ns.metadata.0.name
}
spec {
selector = {
app = kubernetes_deployment_v1.promfiberdeploy.spec.0.template.0.metadata.0.labels.app
}
port {
port = 33000 # no effect in minikube, will forwarded to random port anyway
target_port = kubernetes_deployment_v1.promfiberdeploy.spec.0.template.0.spec.0.container.0.port.0.container_port
}
type = "NodePort"
}
}
resource "kubernetes_ingress_v1" "pf1ingress" {
metadata {
name = "pf1ingress"
namespace = kubernetes_namespace_v1.pf1ns.metadata.0.name
annotations = {
"kubernetes.io/ingress.class" = "nginx"
}
}
spec {
rule {
host = "pf1svc.pf1ns.svc.cluster.local"
http {
path {
path = "/"
backend {
service {
name = kubernetes_service_v1.pf1svc.metadata.0.name
port {
number = kubernetes_service_v1.pf1svc.spec.0.port.0.port
}
}
}
}
}
}
}
}
resource "kubernetes_config_map_v1" "prom1conf" {
metadata {
name = "prom1conf"
namespace = kubernetes_namespace_v1.pf1ns.metadata.0.name
}
data = {
# from https://github.com/techiescamp/kubernetes-prometheus/blob/master/config-map.yaml
"prometheus.yml" : <<EOF
global:
scrape_interval: 15s
evaluation_interval: 15s
alerting:
alertmanagers:
- static_configs:
- targets:
# - alertmanager:9093
rule_files:
#- /etc/prometheus/prometheus.rules
scrape_configs:
- job_name: "prometheus"
static_configs:
- targets: ["localhost:9090"]
- job_name: "pf1"
static_configs:
- targets: [
"${kubernetes_ingress_v1.pf1ingress.spec.0.rule.0.host}:${kubernetes_service_v1.pf1svc.spec.0.port.0.port}"
]
EOF
# need to delete stateful set if this changed after terraform apply
# or kubectl rollout restart statefulset prom1stateful -n pf1ns
# because statefulset pod not restarted automatically when changed
# if configmap set as env or config file
}
}
resource "kubernetes_persistent_volume_v1" "prom1datavol" {
metadata {
name = "prom1datavol"
}
spec {
access_modes = ["ReadWriteOnce"]
capacity = {
storage = "1Gi"
}
# do not add storage_class_name or it would stuck
persistent_volume_source {
host_path {
path = "/tmp/prom1data" # mkdir first?
}
}
}
}
resource "kubernetes_persistent_volume_claim_v1" "prom1dataclaim" {
metadata {
name = "prom1dataclaim"
namespace = kubernetes_namespace_v1.pf1ns.metadata.0.name
}
spec {
# do not add storage_class_name or it would stuck
access_modes = ["ReadWriteOnce"]
resources {
requests = {
storage = "1Gi"
}
}
}
}
resource "kubernetes_stateful_set_v1" "prom1stateful" {
metadata {
name = "prom1stateful"
namespace = kubernetes_namespace_v1.pf1ns.metadata.0.name
labels = {
app = "prom1"
}
}
spec {
selector {
match_labels = {
app = "prom1"
}
}
template {
metadata {
labels = {
app = "prom1"
}
}
# example: https://github.com/mateothegreat/terraform-kubernetes-monitoring-prometheus/blob/main/deployment.tf
spec {
container {
name = "prometheus"
image = "prom/prometheus:latest"
args = [
"--config.file=/etc/prometheus/prometheus.yml",
"--storage.tsdb.path=/prometheus/",
"--web.console.libraries=/etc/prometheus/console_libraries",
"--web.console.templates=/etc/prometheus/consoles",
"--web.enable-lifecycle",
"--web.enable-admin-api",
"--web.listen-address=:10902"
]
port {
name = "http1"
container_port = 10902
}
volume_mount {
name = kubernetes_config_map_v1.prom1conf.metadata.0.name
mount_path = "/etc/prometheus/"
}
volume_mount {
name = "prom1datastorage"
mount_path = "/prometheus/"
}
#security_context {
# run_as_group = "1000" # because /tmp/prom1data is owned by 1000
#}
}
volume {
name = kubernetes_config_map_v1.prom1conf.metadata.0.name
config_map {
default_mode = "0666"
name = kubernetes_config_map_v1.prom1conf.metadata.0.name
}
}
volume {
name = "prom1datastorage"
persistent_volume_claim {
claim_name = kubernetes_persistent_volume_claim_v1.prom1dataclaim.metadata.0.name
}
}
}
}
service_name = ""
}
}
resource "kubernetes_service_v1" "prom1svc" {
metadata {
name = "prom1svc"
namespace = kubernetes_namespace_v1.pf1ns.metadata.0.name
}
spec {
selector = {
app = kubernetes_stateful_set_v1.prom1stateful.spec.0.template.0.metadata.0.labels.app
}
port {
port = 10902 # no effect in minikube, will forwarded to random port anyway
target_port = kubernetes_stateful_set_v1.prom1stateful.spec.0.template.0.spec.0.container.0.port.0.container_port
}
type = "NodePort"
}
}
resource "helm_release" "pf1keda" {
name = "pf1keda"
repository = "https://kedacore.github.io/charts"
chart = "keda"
namespace = kubernetes_namespace_v1.pf1ns.metadata.0.name
# uninstall: https://keda.sh/docs/2.11/deploy/#helm
}
# run with this commented first, then uncomment
## from: https://www.youtube.com/watch?v=1kEKrhYMf_g
#resource "kubernetes_manifest" "scaled_object" {
# manifest = {
# "apiVersion" = "keda.sh/v1alpha1"
# "kind" = "ScaledObject"
# "metadata" = {
# "name" = "pf1scaledobject"
# "namespace" = kubernetes_namespace_v1.pf1ns.metadata.0.name
# }
# "spec" = {
# "scaleTargetRef" = {
# "apiVersion" = "apps/v1"
# "name" = kubernetes_deployment_v1.promfiberdeploy.metadata.0.name
# "kind" = "Deployment"
# }
# "minReplicaCount" = 1
# "maxReplicaCount" = 5
# "triggers" = [
# {
# "type" = "prometheus"
# "metadata" = {
# "serverAddress" = "http://prom1svc.pf1ns.svc.cluster.local:10902"
# "threshold" = "100"
# "query" = "sum(irate(http_requests_total[1m]))"
# # with or without {service=\"promfiber\"} is the same since 1 service 1 pod in our case
# }
# }
# ]
# }
# }
#}
terraform init # download dependencies
terraform plan # check changes
terraform apply # deploy
terraform apply # uncomment first scaled_object part
k get pods --all-namespaces -w # check deployment
NAMESPACE NAME READY STATUS RESTARTS AGE
keda-admission-webhooks-xzkp4 1/1 Running 0 2m2s
keda-operator-r6hsh 1/1 Running 1 2m2s
keda-operator-metrics-apiserver-xjp4d 1/1 Running 0 2m2s
promfiberdeploy-868697d555-8jh6r 1/1 Running 0 3m40s
prom1stateful-0 1/1 Running 0 22s
k get services --all-namespaces
NAMESP NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
pf1ns pf1svc NodePort 10.111.141.44 <none> 33000:30308/TCP 2s
pf1ns prom1svc NodePort 10.109.131.196 <none> 10902:30423/TCP 6s
minikube service list
|------------|--------------|-------------|------------------------|
| NAMESPACE | NAME | TARGET PORT | URL |
|------------|--------------|-------------|------------------------|
| pf1ns | pf1svc | 33000 | http://240.1.0.2:30308 |
| pf1ns | prom1service | 10902 | http://240.1.0.2:30423 |
|------------|--------------|-------------|------------------------|
To debug if something goes wrong, you can use something like this:
# debug inside container, replace with pod name
k exec -it pf1deploy-77bf69d7b6-cqqwq -n pf1ns -- bash
# or use dedicated debug pod
k apply -f debug-pod.yml
k exec -it debug-pod -n pf1ns -- bash
# delete if done using
k delete pod debug-pod -n pf1ns
# install debugging tools
apt update
apt install curl iputils-ping dnsutils net-tools # dig and netstat
To check metrics that we want to use as autoscaler we can check from multiple place:
# check metrics inside pod
curl http://pf1svc.pf1ns.svc.cluster.local:33000/metrics
# check metrics from outside
curl http://240.1.0.2:30308/metrics
# or open from prometheus UI: http://240.1.0.2:30423
# get metrics
k get --raw "/apis/external.metrics.k8s.io/v1beta1" 1 ↵
{"kind":"APIResourceList","apiVersion":"v1","groupVersion":"external.metrics.k8s.io/v1beta1","resources":[{"name":"externalmetrics","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]}]}
# get scaled object
k get scaledobject pf1keda -n pf1ns
NAME SCALETARGETKIND SCALETARGETNAME MIN MAX TRIGGERS AUTHENTICATION READY ACTIVE FALLBACK PAUSED AGE
pf1keda apps/v1.Deployment pf1deploy 1 5 prometheus True False False Unknown 3d20h
# get metric name
k get scaledobject pf1keda -n pf1ns -o 'jsonpath={.status.externalMetricNames}'
["s0-prometheus-prometheus"]
Next we can do loadtest while watching pods:
# do loadtest
hey -c 100 -n 100000 http://240.1.0.2:30308
# check with kubectl get pods -w -n pf1ns, it would spawn:
promfiberdeploy-96qq9 0/1 Pending 0 0s
promfiberdeploy-j5qw9 0/1 Pending 0 0s
promfiberdeploy-96qq9 0/1 Pending 0 0s
promfiberdeploy-76pvt 0/1 Pending 0 0s
promfiberdeploy-76pvt 0/1 Pending 0 0s
promfiberdeploy-j5qw9 0/1 Pending 0 0s
promfiberdeploy-96qq9 0/1 ContainerCreating 0 0s
promfiberdeploy-76pvt 0/1 ContainerCreating 0 0s
promfiberdeploy-j5qw9 0/1 ContainerCreating 0 0s
promfiberdeploy-96qq9 1/1 Running 0 1s
promfiberdeploy-j5qw9 1/1 Running 0 1s
promfiberdeploy-76pvt 1/1 Running 0 1s
...
promfiberdeploy-j5qw9 1/1 Terminating 0 5m45s
promfiberdeploy-96qq9 1/1 Terminating 0 5m45s
promfiberdeploy-gt2h5 1/1 Terminating 0 5m30s
promfiberdeploy-76pvt 1/1 Terminating 0 5m45s
# all events includes scale up event
k get events -n pf1ns -w
21m Normal ScalingReplicaSet deployment/promfiberdeploy Scaled up replica set promfiberdeploy-868697d555 to 1
9m20s Normal ScalingReplicaSet deployment/promfiberdeploy Scaled up replica set promfiberdeploy-868697d555 to 4 from 1
9m5s Normal ScalingReplicaSet deployment/promfiberdeploy Scaled up replica set promfiberdeploy-868697d555 to 5 from 4
3m35s Normal ScalingReplicaSet deployment/promfiberdeploy Scaled down replica set promfiberdeploy-868697d555 to 1 from 5
That's it, that's how you use KEDA and terraform to autoscale deployments. The key parts on the .tf files are:
- terraform - needed to let terraform know what plugins being used on terraform init
- kubernetes and helm - needed to know which config being used, and which cluster being contacted
- kubernetes_namespace_v1 - to create a namespace (eg. per tenant)
- kubernetes_deployment_v1 - to set what pod being used and which docker container to be used
- kubernetes_service_v1 - to expose port on the node (in this case only NodePort), to loadbalance between pods
- kubernetes_ingress_v1 - should be used to redirect request to proper services, but since we only have 1 service and we use minikube (that it have it's own forwarding) this one not used in our case
- kubernetes_config_map_v1 - used to bind a config file (volume) for prometheus deployment, this sets where to scrape the service, this is NOT a proper way to do this, the proper way is on the latest commit on that repository, using PodMonitor from prometheus-operator:
- kubernetes_service_v1 - to expose global prometheus (that monitor whole kubernetes, not per namespace)
- kubernetes_service_account_v1 - crates service account so prometheus on namespace can retrieve pods list
- kubernetes_cluster_role_v1 - role to allow list pods
- kubernetes_cluster_role_binding_v1 - bind service account with the role above
- kubernetes_manifest - creates podmonitor kubernetes manifest, this is the rules generated for prometheus on namespace to match specific pod
- kubernetes_manifest - creates prometheus manifest that deploys prometheus on specific namespace
- kubernetes_persistent_volume_v1 and kubernetes_persistent_volume_claim_v1 - used to bind data diectory (volume) to prometheus deployment
- kubernetes_stateful_set_v1 - to deploy the prometheus, since it's not a stateless service, we have to bind data volume to prevent data loss
- kubernetes_service_v1 - to expose port of prometheus to outside
- helm_release - to deploy keda
- kubernetes_manifest - to create custom manifest since scaled object is not supported by kubernetes terraform provider, this configures which service that able to be autoscaled
If you need the source code, you can take a look at terraform1 repo, the latest one is using podmonitor.
No comments:
Post a Comment
THINK: is it True? is it Helpful? is it Inspiring? is it Necessary? is it Kind?