Showing posts with label kubernetes. Show all posts
Showing posts with label kubernetes. Show all posts

2024-03-23

Clickhouse in Kubernetes / Minikube using Bitnami Helm Charts

Normally for my projects I use docker-compose + normal bind persistence directory to avoid data loss for database (and keep it fast and simple to maintain), because it's simpler to upgrade version than normal installation (looking at that popular database that annoying to upgrade the major version that requier a lot of copying) and the network/disk/cpu overhead also unnoticeable. Today we're gonna try to setup Clickhouse using minikube. Assuming you already installed minikube, all you need to do is install helm (k8s package manager).

curl https://raw.githubusercontent.com/kubernetes/helm/master/scripts/get-helm-3 > install_helm.sh && bash install_helm.sh
helm install my-release oci://registry-1.docker.io/bitnamicharts/clickhouse

# to check again the deploy
helm status my-release

# to check configurable options
helm show all oci://registry-1.docker.io/bitnamicharts/clickhouse

# to uninstall
helm uninstall my-release

by default it would create 2 shard with 3 replica (includes the pv and pvc for all pods), and too bad it's still using zookeeper instead of clickhouse-keeper.

my-release-clickhouse-shard0-0   1/1  Running   0 
my-release-clickhouse-shard0-1   1/1  Running   0 
my-release-clickhouse-shard0-2   1/1  Running   0 
my-release-clickhouse-shard1-0   1/1  Running   0 
my-release-clickhouse-shard1-1   1/1  Running   0 
my-release-clickhouse-shard1-2   1/1  Running   0 
my-release-zookeeper-0           1/1  Running   0 
my-release-zookeeper-1           1/1  Running   0 
my-release-zookeeper-2           1/1  Running   0 


to access it locally you just need to port forward and connect using clickhouse-client program:

kubectl port-forward --namespace default svc/my-release-clickhouse 9001:9000 &
clickhouse-client --host localhost --port 9001 --user default --password $(kubectl get secret --namespace default my-release-clickhouse -o jsonpath="{.data.admin-password}" | base64 -d)

alternatively you can just exec into the pod and use clickhouse-client inside the pod:

kubectl exec -it my-release-clickhouse-shard1-2 sh
clickhouse-client # use password from above

you already can create distributed table and do stuff with it. For more information about bitnami charts you can read it here. If you want to customize your setup (not 2 shard 3 replica, I always prefer no shard but multiple replica, unless your data is massive that hard not to shard across multiple machines). You can clone the chart and the modify the configuration, then deploy it, something like this:

git clone --depth 1 git@github.com:bitnami/charts.git
vim charts/
bitnami/clickhouse/values.yaml
# ^ change shard=1 keeper.enabled=true zookeeper.enabled=false
helm dependency build charts/bitnami/clickhouse
helm install ch1
charts/bitnami/clickhouse

# update
helm upgrade ch1 charts/bitnami/clickhouse

it would create something like this:

ch1-clickhouse-shard0-0         1/1  Running   0
ch1-clickhouse-shard0-1         1/1  Running   0
ch1-clickhouse-shard0-2         1/1  Running   0

That's it for now, not sure how to run this in real kubernetes cluster (not minikube), especially making the pv/pvc to be put in multiple nodes.

2023-07-02

KEDA Kubernetes Event-Driven Autoscaling

Autoscaling mostly useless, if the number of host/nodes/hypervisor is limited, eg. we only have N number of nodes, and we tried to autoscale the services inside of it, so by default we already waste a lot of unused resources (especially if the billing criteria is not like Jelastic, you use whatever you allocate not whatever you utilize). Autoscaling also quite useless if your problem is I/O-bound not CPU-bound, for example you don't use autoscaled database (or whatever the I/O bottleneck are). CPU are rarely the bottleneck in my past experience. But today we're gonna try to use KEDA, to autoscale kubernetes service. First we need to install fastest kube:

# install minikube for local kubernetes cluster
curl -LO https://storage.googleapis.com/minikube/releases/latest/minikube-linux-amd64 
sudo install minikube-linux-amd64 /usr/local/bin/minikube
minikube start # --driver=kvm2 or --driver=virtualbox
minikube kubectl
alias k='minikube kubectl --'
k get pods --all-namespaces

# if not using --driver=docker (default)
minikube addons configure registry-creds
Do you want to enable AWS Elastic Container Registry? [y/n]: n
Do you want to enable Google Container Registry? [y/n]: n
Do you want to enable Docker Registry? [y/n]: y
-- Enter docker registry server url: https://hub.docker.com/
-- Enter docker registry username: kokizzu
-- Enter docker registry password:
Do you want to enable Azure Container Registry? [y/n]: n
✅  registry-creds was successfully configured

 

Next we need to create a dummy container as for a pod that we want to be autoscaled:

# build example docker image
docker login -u kokizzu # replace with your docker hub username
docker build -t pf1 .
docker image ls pf1
# REPOSITORY   TAG       IMAGE ID       CREATED         SIZE
# pf1          latest    204670ee86bd   2 minutes ago   89.3MB

# run locally for testing
docker run -it pf1 -p 3000:3000

# tag and upload
docker image tag pf1 kokizzu/pf1:v0001
docker image push kokizzu/pf1:v0001


Create deployment terraform file, something like this:

# main.tf
terraform {
  required_version = ">= 1.3.0"
  required_providers {
    kubernetes = {
      source  = "hashicorp/kubernetes"
      version = "= 2.20.0"
    }
  }
  backend "local" {
    path = "/tmp/pf1.tfstate"
  }
}
provider "kubernetes" {
  config_path    = "~/.kube/config"
  # from k config view | grep -A 3 minikube | grep server:
  host           = "https://240.1.0.2:8443"
  config_context = "minikube"
}
provider "helm" {
  kubernetes {
    config_path    = "~/.kube/config"
    config_context = "minikube"
  }
}
resource "kubernetes_namespace_v1" "pf1ns" {
  metadata {
    name        = "pf1ns"
    annotations = {
      name = "deployment namespace"
    }
  }
}
resource "kubernetes_deployment_v1" "promfiberdeploy" {
  metadata {
    name      = "promfiberdeploy"
    namespace = kubernetes_namespace_v1.pf1ns.metadata.0.name
  }
  spec {
    selector {
      match_labels = {
        app = "promfiber"
      }
    }
    replicas = "1"
    template {
      metadata {
        labels = {
          app = "promfiber"
        }
        annotations = {
          "prometheus.io/path"   = "/metrics"
          "prometheus.io/scrape" = "true"
          "prometheus.io/port"   = 3000
        }
      }
      spec {
        container {
          name  = "pf1"
          image = "kokizzu/pf1:v0001" # from promfiber.go
          port {
            container_port = 3000
          }
        }
      }
    }
  }
}
resource "kubernetes_service_v1" "pf1svc" {
  metadata {
    name      = "pf1svc"
    namespace = kubernetes_namespace_v1.pf1ns.metadata.0.name
  }
  spec {
    selector = {
      app = kubernetes_deployment_v1.promfiberdeploy.spec.0.template.0.metadata.0.labels.app
    }
    port {
      port        = 33000 # no effect in minikube, will forwarded to random port anyway
      target_port = kubernetes_deployment_v1.promfiberdeploy.spec.0.template.0.spec.0.container.0.port.0.container_port
    }
    type = "NodePort"
  }
}
resource "kubernetes_ingress_v1" "pf1ingress" {
  metadata {
    name        = "pf1ingress"
    namespace   = kubernetes_namespace_v1.pf1ns.metadata.0.name
    annotations = {
      "kubernetes.io/ingress.class" = "nginx"
    }
  }
  spec {
    rule {
      host = "pf1svc.pf1ns.svc.cluster.local"
      http {
        path {
          path = "/"
          backend {
            service {
              name = kubernetes_service_v1.pf1svc.metadata.0.name
              port {
                number = kubernetes_service_v1.pf1svc.spec.0.port.0.port
              }
            }
          }
        }
      }
    }
  }
}
resource "kubernetes_config_map_v1" "prom1conf" {
  metadata {
    name      = "prom1conf"
    namespace = kubernetes_namespace_v1.pf1ns.metadata.0.name
  }
  data = {
    # from https://github.com/techiescamp/kubernetes-prometheus/blob/master/config-map.yaml
    "prometheus.yml" : <<EOF
global:
  scrape_interval: 15s
  evaluation_interval: 15s
alerting:
  alertmanagers:
    - static_configs:
        - targets:
          # - alertmanager:9093
rule_files:
  #- /etc/prometheus/prometheus.rules
scrape_configs:
  - job_name: "prometheus"
    static_configs:
      - targets: ["localhost:9090"]
  - job_name: "pf1"
    static_configs:
      - targets: [
          "${kubernetes_ingress_v1.pf1ingress.spec.0.rule.0.host}:${kubernetes_service_v1.pf1svc.spec.0.port.0.port}"
        ]
EOF
    # need to delete stateful set if this changed after terraform apply
    # or kubectl rollout restart statefulset prom1stateful -n pf1ns
    # because statefulset pod not restarted automatically when changed
    # if configmap set as env or config file
  }
}
resource "kubernetes_persistent_volume_v1" "prom1datavol" {
  metadata {
    name = "prom1datavol"
  }
  spec {
    access_modes = ["ReadWriteOnce"]
    capacity     = {
      storage = "1Gi"
    }
    # do not add storage_class_name or it would stuck
    persistent_volume_source {
      host_path {
        path = "/tmp/prom1data" # mkdir first?
      }
    }
  }
}
resource "kubernetes_persistent_volume_claim_v1" "prom1dataclaim" {
  metadata {
    name      = "prom1dataclaim"
    namespace = kubernetes_namespace_v1.pf1ns.metadata.0.name
  }
  spec {
    # do not add storage_class_name or it would stuck
    access_modes = ["ReadWriteOnce"]
    resources {
      requests = {
        storage = "1Gi"
      }
    }
  }
}
resource "kubernetes_stateful_set_v1" "prom1stateful" {
  metadata {
    name      = "prom1stateful"
    namespace = kubernetes_namespace_v1.pf1ns.metadata.0.name
    labels    = {
      app = "prom1"
    }
  }
  spec {
    selector {
      match_labels = {
        app = "prom1"
      }
    }
    template {
      metadata {
        labels = {
          app = "prom1"
        }
      }
      # example: https://github.com/mateothegreat/terraform-kubernetes-monitoring-prometheus/blob/main/deployment.tf
      spec {
        container {
          name  = "prometheus"
          image = "prom/prometheus:latest"
          args  = [
            "--config.file=/etc/prometheus/prometheus.yml",
            "--storage.tsdb.path=/prometheus/",
            "--web.console.libraries=/etc/prometheus/console_libraries",
            "--web.console.templates=/etc/prometheus/consoles",
            "--web.enable-lifecycle",
            "--web.enable-admin-api",
            "--web.listen-address=:10902"
          ]
          port {
            name           = "http1"
            container_port = 10902
          }
          volume_mount {
            name       = kubernetes_config_map_v1.prom1conf.metadata.0.name
            mount_path = "/etc/prometheus/"
          }
          volume_mount {
            name       = "prom1datastorage"
            mount_path = "/prometheus/"
          }
          #security_context {
          #  run_as_group = "1000" # because /tmp/prom1data is owned by 1000
          #}
        }
        volume {
          name = kubernetes_config_map_v1.prom1conf.metadata.0.name
          config_map {
            default_mode = "0666"
            name         = kubernetes_config_map_v1.prom1conf.metadata.0.name
          }
        }
        volume {
          name = "prom1datastorage"
          persistent_volume_claim {
            claim_name = kubernetes_persistent_volume_claim_v1.prom1dataclaim.metadata.0.name
          }
        }
      }
    }
    service_name = ""
  }
}
resource "kubernetes_service_v1" "prom1svc" {
  metadata {
    name      = "prom1svc"
    namespace = kubernetes_namespace_v1.pf1ns.metadata.0.name
  }
  spec {
    selector = {
      app = kubernetes_stateful_set_v1.prom1stateful.spec.0.template.0.metadata.0.labels.app
    }
    port {
      port        = 10902 # no effect in minikube, will forwarded to random port anyway
      target_port = kubernetes_stateful_set_v1.prom1stateful.spec.0.template.0.spec.0.container.0.port.0.container_port
    }
    type = "NodePort"
  }
}
resource "helm_release" "pf1keda" {
  name       = "pf1keda"
  repository = "https://kedacore.github.io/charts"
  chart      = "keda"
  namespace  = kubernetes_namespace_v1.pf1ns.metadata.0.name
  # uninstall: https://keda.sh/docs/2.11/deploy/#helm
}
# run with this commented first, then uncomment
## from: https://www.youtube.com/watch?v=1kEKrhYMf_g
#resource "kubernetes_manifest" "scaled_object" {
#  manifest = {
#    "apiVersion" = "keda.sh/v1alpha1"
#    "kind"       = "ScaledObject"
#    "metadata"   = {
#      "name"      = "pf1scaledobject"
#      "namespace" = kubernetes_namespace_v1.pf1ns.metadata.0.name
#    }
#    "spec" = {
#      "scaleTargetRef" = {
#        "apiVersion" = "apps/v1"
#        "name"       = kubernetes_deployment_v1.promfiberdeploy.metadata.0.name
#        "kind"       = "Deployment"
#      }
#      "minReplicaCount" = 1
#      "maxReplicaCount" = 5
#      "triggers"        = [
#        {
#          "type"     = "prometheus"
#          "metadata" = {
#            "serverAddress" = "http://prom1svc.pf1ns.svc.cluster.local:10902"
#            "threshold"     = "100"
#            "query"         = "sum(irate(http_requests_total[1m]))"
#            # with or without {service=\"promfiber\"} is the same since 1 service 1 pod in our case
#          }
#        }
#      ]
#    }
#  }
#}


terraform init # download dependencies
terraform plan # check changes
terraform apply # deploy
terraform apply # uncomment first scaled_object part

k get pods --all-namespaces -w # check deployment
NAMESPACE NAME                          READY STATUS  RESTARTS AGE
keda-admission-webhooks-xzkp4          1/1    Running 0        2m2s
keda-operator-r6hsh                    1/1    Running 1        2m2s
keda-operator-metrics-apiserver-xjp4d  1/1    Running 0        2m2s
promfiberdeploy-868697d555-8jh6r       1/1    Running 0        3m40s
prom1stateful-0                        1/1    Running 0        22s

k get services --all-namespaces
NAMESP NAME     TYPE     CLUSTER-IP     EXTERNAL-IP PORT(S) AGE
pf1ns  pf1svc   NodePort 10.111.141.44  <none>      33000:30308/TCP 2s
pf1ns  prom1svc NodePort 10.109.131.196 <none>      10902:30423/TCP 6s

minikube service list
|------------|--------------|-------------|------------------------|
|  NAMESPACE |    NAME      | TARGET PORT |          URL           |
|------------|--------------|-------------|------------------------|
| pf1ns      | pf1svc       |       33000 | http://240.1.0.2:30308 |
| pf1ns      | prom1service |       10902 | http://240.1.0.2:30423 |
|------------|--------------|-------------|------------------------|

 

To debug if something goes wrong, you can use something like this:


# debug inside container, replace with pod name
k exec -it pf1deploy-77bf69d7b6-cqqwq -n pf1ns -- bash

# or use dedicated debug pod
k apply -f debug-pod.yml
k exec -it debug-pod -n pf1ns -- bash
# delete if done using
k delete pod debug-pod -n pf1ns


# install debugging tools
apt update
apt install curl iputils-ping
dnsutils net-tools # dig and netstat

 

To check metrics that we want to use as autoscaler we can check from multiple place:

# check metrics inside pod
curl http://pf1svc.pf1ns.svc.cluster.local:33000/metrics

# check metrics from outside
curl
http://240.1.0.2:30308/metrics

# or open from prometheus UI:
http://240.1.0.2:30423

# get metrics
k get --raw "/apis/external.metrics.k8s.io/v1beta1"                                                        1 ↵
{"kind":"APIResourceList","apiVersion":"v1","groupVersion":"external.metrics.k8s.io/v1beta1","resources":[{"name":"externalmetrics","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]}]}

# get scaled object
k get scaledobject pf1keda -n pf1ns
NAME      SCALETARGETKIND      SCALETARGETNAME   MIN   MAX   TRIGGERS     AUTHENTICATION   READY   ACTIVE   FALLBACK   PAUSED    AGE
pf1keda   apps/v1.Deployment   pf1deploy         1     5     prometheus                    True    False    False      Unknown   3d20h

# get metric name
k get scaledobject pf1keda -n pf1ns -o 'jsonpath={.status.externalMetricNames}'
["s0-prometheus-prometheus"]

 

Next we can do loadtest while watching pods:

# do loadtest
hey -c 100 -n 100000
http://240.1.0.2:30308

# check with kubectl get pods -w -n pf1ns, it would spawn:
promfiberdeploy-96qq9  0/1     Pending             0  0s
promfiberdeploy-j5qw9  0/1     Pending             0  0s
promfiberdeploy-96qq9  0/1     Pending             0  0s
promfiberdeploy-76pvt  0/1     Pending             0  0s
promfiberdeploy-76pvt  0/1     Pending             0  0s
promfiberdeploy-j5qw9  0/1     Pending             0  0s
promfiberdeploy-96qq9  0/1     ContainerCreating   0  0s
promfiberdeploy-76pvt  0/1     ContainerCreating   0  0s
promfiberdeploy-j5qw9  0/1     ContainerCreating   0  0s
promfiberdeploy-96qq9  1/1     Running             0  1s
promfiberdeploy-j5qw9  1/1     Running             0  1s
promfiberdeploy-76pvt  1/1     Running             0  1s
...
promfiberdeploy-j5qw9  1/1     Terminating         0  5m45s
promfiberdeploy-96qq9  1/1     Terminating  
      0  5m45s
promfiberdeploy-gt2h5  1/1     Terminating  
      0  5m30s
promfiberdeploy-76pvt  1/1     Terminating  
      0  5m45s

# all events includes scale up event
k get events -n pf1ns -w
21m    Normal ScalingReplicaSet deployment/promfiberdeploy Scaled up replica set promfiberdeploy-868697d555 to 1
9m20s  Normal ScalingReplicaSet deployment/promfiberdeploy Scaled up replica set promfiberdeploy-868697d555 to 4 from 1
9m5s   Normal ScalingReplicaSet deployment/promfiberdeploy Scaled up replica set promfiberdeploy-868697d555 to 5 from 4
3m35s  Normal ScalingReplicaSet deployment/promfiberdeploy Scaled down replica set promfiberdeploy-868697d555 to 1 from 5

That's it, that's how you use KEDA and terraform to autoscale deployments. The key parts on the .tf files are:

  • terraform - needed to let terraform know what plugins being used on terraform init
  • kubernetes and helm - needed to know which config being used, and which cluster being contacted
  • kubernetes_namespace_v1 - to create a namespace (eg. per tenant)
  • kubernetes_deployment_v1 - to set what pod being used and which docker container to be used
  • kubernetes_service_v1 - to expose port on the node (in this case only NodePort), to loadbalance between pods
  • kubernetes_ingress_v1 - should be used to redirect request to proper services, but since we only have 1 service and we use minikube (that it have it's own forwarding) this one not used in our case
  • kubernetes_config_map_v1 - used to bind a config file (volume) for prometheus deployment, this sets where to scrape the service, this is NOT a proper way to do this, the proper way is on the latest commit on that repository, using PodMonitor from prometheus-operator:
    • kubernetes_service_v1 - to expose global prometheus (that monitor whole kubernetes, not per namespace)
    • kubernetes_service_account_v1 - crates service account so prometheus on namespace can retrieve pods list
    • kubernetes_cluster_role_v1 - role to allow list pods
    • kubernetes_cluster_role_binding_v1 - bind service account with the role above
    • kubernetes_manifest - creates podmonitor kubernetes manifest, this is the rules generated for prometheus on namespace to match specific pod
    • kubernetes_manifest - creates prometheus manifest that deploys prometheus on specific namespace
  • kubernetes_persistent_volume_v1 and kubernetes_persistent_volume_claim_v1 - used to bind data diectory (volume) to prometheus deployment
  • kubernetes_stateful_set_v1 - to deploy the prometheus, since it's not a stateless service, we have to bind data volume to prevent data loss
  • kubernetes_service_v1 - to expose port of prometheus to outside
  • helm_release - to deploy keda
  • kubernetes_manifest - to create custom manifest since scaled object is not supported by kubernetes terraform provider, this configures which service that able to be autoscaled

If you need the source code, you can take a look at terraform1 repo, the latest one is using podmonitor.


2022-04-05

Start/restart Golang or any other binary program automatically on boot/crash

There are some alternative to make program start on boot on Linux, the usual way is using:

1. SystemD, it could ensure that dependency started before your service, also could limit your CPU/RAM usage. Generate a template using this website or use kardianos/service

2. PM2 (requires NodeJS), or PMG

3. docker-compose (requires docker, but you can skip the build part, just copy the binary directly on Dockerfile command (that can be deployed using rsync), just set restart property on docker-compose and it would restart when computer boot),  -- bad part, you cannot limit cpu/ram unless using docker swarm. But you can use docker directly to limit and use --restart flag.

3. lxc/lxd or multipass or other vm/lightweight vm (but still need systemd inside it XD at least it won't ruin your host), you can rsync directly to the container to redeploy, for example using overseer or tableflip, you must add reverse proxy or nat or proper routing/ip forwarding tho if you want it to be accessed from outside

4. supervisord (python) or ochinchina/supervisord (golang) tutorial here

5. create one daemon manager with systemd/docker-compose, then spawn the other services using goproc or pioz/god

6. monit it can monitor and ensure a program started/not dead

7. nomad (actually this one is deployment tool), but i can also manage workload

8. kubernetes XD overkill

9. immortal.run a supervisor, this one actually using systemd

10. other containerization/VM workload orchestrator/manager that usually already provided by the hoster/PaaS provider (Amazon ECS/Beanstalk/Fargate, Google AppEngine, Heroku, Jelastic, etc)


This is the systemd script that I usually use (you need to create user named "web" and install "unbuffer"):

$ cat /usr/lib/systemd/system/xxx.service
[Unit]
Description=xxx
After=network-online.target postgresql.service
Wants=network-online.target

[Service]
Type=simple
Restart=on-failure
User=web
Group=users
WorkingDirectory=/home/web/xxx
ExecStart=/home/web/xxx/run_production.sh
ExecStop=/usr/bin/killall xxx
LimitNOFILE=2097152
LimitNPROC=65536
ProtectSystem=full
NoNewPrivileges=true

[Install]
WantedBy=multi-user.target

$ cat /home/web/xxx/run_production.sh
#!/usr/bin/env bash

mkdir -p `pwd`/logs
ofile=`pwd`/logs/access_`date +%F_%H%M%S`.log
echo Logging into: $ofile
unbuffer time ./xxx | tee $ofile



2022-04-04

Automatic Load Balancer Registration/Deregistration with NATS or FabioLB

Today we're gonna test 2 alternative for automatic load balancing (previously I always use Caddy or NginX (because most of my projects is single server -- the bottleneck is always the database not the backend/compute part), with manual reverse proxy configuration, but today we're gonna test 2 possible way to high-availability load balance strategy (without kubernetes of course), first is using NATS, second one is using standard load balancer, in this case FabioLB.

To use NATS, we're gonna use this strategy:
first one we deploy is the our custom reverse proxy, that should able to convert any query string, form body with any kind of content-type, and any header if needed, we can use any serialization format (json, msgpack, protobuf, etc), but in this case we're just gonna use normal string, we call this service "apiproxy". The apiproxy will send the serialized payload (from map/object) into NATS using request-reply mechanism. Another service is our backend "worker"/handler, that could be anything, but in this case is our real handler that would contain our business logic, so we need to subscribe and return a reply to the apiproxy and it would be deserialized back to the client with any serizaliation format and protocol (gRPC/Websocket/HTTP-REST/JSONP/etc). Here's the benchmark result of normal Fiber without any proxy, apiproxy-nats-worker with single nats vs multi nats instance

# no proxy
go run main.go apiserver
hey -n 1000000 -c 255 http://127.0.0.1:3000
  Average:      0.0011 secs
  Requests/sec: 232449.1716

# single nats
go run main.go apiproxy
go run main.go # worker
hey -n 1000000 -c 255 http://127.0.0.1:3000
  Average:      0.0025 secs
  Requests/sec: 100461.5866

# 2 worker
  Average:      0.0033 secs
  Requests/sec: 76130.4079

# 4 worker
  Average:      0.0051 secs
  Requests/sec: 50140.6288

# limit the apiserver CPU
GOMAXPROCS=2 go run main.go apiserver
  Average:      0.0014 secs
  Requests/sec: 184234.0106

# apiproxy 2 core
# 1 worker 2 core each
  Average:      0.0025 secs
  Requests/sec: 103007.4516

# 2 worker 2 core each
  Average:      0.0029 secs
  Requests/sec: 87522.6801

# 4 worker 2 core each
  Average:      0.0037 secs
  Requests/sec: 67714.5851

# seems that the bottleneck is spawning the producer's NATS
# spawning 8 connections using round-robin

# 1 worker 2 core each
  Average:      0.0021 secs
  Requests/sec: 121883.4324

# 4 worker 2 core each
  Average:      0.0030 secs
  Requests/sec: 84289.4330

# seems also the apiproxy is hogging all the CPU cores
# limiting to 8 core for apiproxy
# now synchronous handler changed into async/callback version
GOMAXPROCS=8 go run main.go apiserver

# 1 worker 2 core each
  Average:      0.0017 secs
  Requests/sec: 148298.8623

# 2 worker 2 core each
  Average:      0.0017 secs
  Requests/sec: 143958.4056

# 4 worker 2 core each
  Average:      0.0029 secs
  Requests/sec: 88447.5352

# limiting the NATS to 4 core using go run on the source
# 1 worker 2 core each
  Average:      0.0013 secs
  Requests/sec: 194787.6327

# 2 worker 2 core each
  Average:      0.0014 secs
  Requests/sec: 176702.0119

# 4 worker 2 core each
  Average:      0.0022 secs
  Requests/sec: 116926.5218

# same nats core count, increase worker core count
# 1 worker 4 core each
  Average:      0.0013 secs
  Requests/sec: 196075.4366

# 2 worker 4 core each
  Average:      0.0014 secs
  Requests/sec: 174912.7629

# 4 worker 4 core each
  Average:      0.0021 secs
  Requests/sec: 121911.4473 --> see update below


Could be better if it was tested in multiple server, but it seems the bottleneck is on NATS connection when have many subscriber, they could not scale linearly (16-66% overhead for a single API proxy) IT's A BUG ON MY SIDE, SEE UPDATE BELOW. Next we're gonna try FabioLB with Consul, Consul used for service registry (it's a synchronous-consistent "database" like Zookeeper or Etcd). To install all of it use this commands:

# setup:
curl -fsSL https://apt.releases.hashicorp.com/gpg | sudo apt-key add -
sudo apt-add-repository "deb [arch=amd64] https://apt.releases.hashicorp.com $(lsb_release -cs) main"
sudo apt install consul
go install github.com/fabiolb/fabio@latest

# start:
sudo consul agent -dev --data-dir=/tmp/consul
fabio
go run main.go -addr 172.17.0.1:5000 -name svc-a -prefix /foo -consul 127.0.0.1:8500

# benchmark:
# without fabio
  Average:      0.0013 secs
  Requests/sec: 197047.9124

# with fabio 1 backend
  Average:      0.0038 secs
  Requests/sec: 65764.9021

# with fabio 2 backend
go run main.go -addr 172.17.0.1:5001 -name svc-a -prefix /foo -consul 127.0.0.1:8500

# the bottleneck might be the cores, so we limit the cores to 2 for each worker
# with fabio 1 backend 2 core each
  Average:      0.0045 secs
  Requests/sec: 56339.5518

# with fabio 2 backend 2 core each
  Average:      0.0042 secs
  Requests/sec: 60296.9714

# what if we limit also the fabio
GOMAXPROCS=8 fabio

# with fabio 8 core, 1 backend 2 core each
  Average:      0.0042 secs
  Requests/sec: 59969.5206

# with fabio 8 core, 2 backend 2 core each
  Average:      0.0041 secs
  Requests/sec: 62169.2256

# with fabio 8 core, 4 backend 2 core each
  Average:      0.0039 secs
  Requests/sec: 64703.8253

All CPU cores utilized around 50% of 32-core server 128GB RAM, can't find which part the bottleneck for now, but for sure both strategy have around 16% vs 67% overhead compared for non proxies (which is make sense because adding more layer will add more transport and more things to copy/transfer and transform/serialize-deserialize). The code used in this benchmark is here, on 2022mid directory, and the code for fabio-consul registration copied from ebay's github repository.

Why even we need to do this? If we're using api gateway pattern (one of the pattern that being used in my past company, but with Kubernetes on worker part), we could deploy independently and communicate between service using the gateway (proxy) without knowing the IP address or domain name of the service itself, as long as it have proper route and payload it can be handled wherever the service being deployed. What if you want to do canary or blue green deployment? you can just register a handler in nats or consul with different route name (especially for communication between services, not public to service), and wait for all traffic to be moved there before killing previous deployment.

So what should you choose? both strategy requires 3 moving part (apiproxy-nats-worker, fabio-consul-worker) but NATS strategy simpler in the development and can give better performance (especially if you make the apiproxy to be as flexible as possible), but it needs to have better serialization, since in this benchmark the serialization not measured, if you need better performance on serialization you must use codegen, which may require you to deploy 2 times (one for apiproxy, one for worker, unless you split the raw response meta with jsonparser or use map only for apiproxy). FabioLB strategy have more features, also you can use consul for service discovery (contacting other services directly by name without have to go thru FabioLB). NATS strategy have some benefit in terms of security, which is the NATS cluster can be inside DMZ, and worker can be on the different subnet without ability to connect each other and it would still works, where if you use consul to connect directly to another service, they should have route or connection to access each other. The bad part about NATS is that you should not use it for file upload, or it would hogging a lot of resource, so it should handled by apiproxy directly, then the reference of the uploaded file should be forwarded as payload to NATS. You can check NATS traffic statistics using nats-top.

What's next? Maybe we can try traefik, which is a service registry combined with load balancer in one binary, it can also use consul.

UPDATE: by changing the code from Subscribe (broadcast/fan-out) to QueueSubscribe (load balance), it have similar performance on 1/2/4 subscribers, so we can NATS for high availability/fault tolerance in api gateway pattern with cost of 16% overhead.

TL:DR

no LB: 232K rps
-> LB with NATS request-reply: 196K rps (16% overhead)
no LB: 197K rps
-> LB with Fabio+Consul: 65K rps (67% overhead)

 



2021-12-18

Coolest PaaS/IaaS I've ever use: Jelastic

So, I'm looking an simplest deployment strategy for my next side project, I don't want to use Kubernetes since I'm all alone XD, learning Nomad, WayPoint, Swarm, and other popular tool to make it easy like Portainer, but why they doesn't make it just as simple as Vercel or Fly.io). Also don't want to use big cloud providers (GCP, AWS, Azure, etc) which the UI quite sucks like everything developed by different team with lack of communication and you have to do a lot of setup hassle just to deploy simple things. Then I found a really cool product called Jelastic, that fit my needs:
  1. Can autoscale out (like AWS ELB/ECS, GCR, ACS, etc) and auto-clustering (as easy as CloudSQL or AWS RDS/Aurora, but can be automatic)
  2. Can autoscale up '__') without downtime, only took 1 second to scale up from 1 core 640MB to 16 core 32GB (seems like they only changing container's resource quota limit) but you can see the changes directly without restart
  3. Can deploy VPS on the same cluster/network (for my databases, since I don't use "standard/popular" databases) and it's super cheap (it only took 3.9$ per month to deploy a VPS with 1 static IP, and can autoscale up), you only need to pay what you utilize (CPU and RAM usage), not charged 100% when server up unlike other VPS providers
  4. The UI doesn't sucks XD you can WebSSH, normal SSH (as long as have real IP), easy SSL setup, super easy to change config, the lacking part about Jelastic probably configfile/gitops-based setup (for working with multiple members in the future) at least there's API and CLI to create and modify environment, not sure if there's auditing available (haven't checked yet). 
  5. Can also deploy automatically from git (checked every N minutes) or CI pipeline or using CLI.
  6. Easy to move (live migration) to different providers, change ownership of a cluster, or if it's not enabled, at least there's no vendor locking, you can also manually export and import environment (for example copying staging setup to production has similar architecture just different deployment branch and scaling strategy).


Other cool things that I won't use: deploying any-container/docker-based with easy steps, deploying kubernetes, bunch of stack in marketplace provided (may vary on different provider).

For 3.9$ (if you utilize only 1%) per month (16 core, 32GB RAM, 200 GB NVMe VPS, 1 static IP, provider: ToggleBox), you can get the greenest (highest on average) result among all VPS I've ever tried:


You can see the raw benchmark result here and recap here.

What's the catch?
  1. It's quite expensive if you utilize 100% (around 339$ if you use ToggleBox for the specs above), for comparison:
    1. cheapest highest spec Contabo's VPS (9 core, 60GB RAM, 1.5TB SSD) unmetered bandwidth only cost $55-ish per month (not apple-to-apple since it's different spec and performance, also this is what you should pay per month regardless your utilization)
    2. similar spec GCE n1-custom-16-32768 (16 core, 32GB, 200GB SSD) non-committed, cost $525 excluding bandwidth
    3. similar spec AWS EC2 a1.xlarge (16 core, 32GB RAM, 200GB gp2 SSD) on-demand, only cost  $317 excluding bandwidth
    4. similar spec Azure F16s (16 core, 32GB RAM, 256GB SSD) pay-as-you-go, cost $634 excluding bandwidth
    5. cheapest OVH on SG (8 core. 64GB RAM, 400GB SSD) only cost $135 with unmetered 200Mbps bandwidth
    but still, this is way cheaper for minimum usage than if you use GCR you will be billed around ~$10 per month for idle instance, or ~$37 for standby instance (for 1 VCPU, 1 GB RAM, not including bandwidth that quite pricey $0.085)
  2. Some provider have different "free" tier, for example ToggleBox give free 2GB bandwidth per hour (GCR only give free 1GB per month XD), some other provider give free 1 static IP, some other provider give free 10GB disk usage per hour, etc.
  3. License might be pricey if you install it on your own cluster instead of using the already provided (eg. DewaCloud or CloudKilat for Indonesia region, ToggleBox for US region, etc), but they have profit sharing model if you are a reseller (have your own VPS and rent it).
  4. The billing is hourly (so you will always billed at minimum 1 cloudlet -- specs of 1 cloudlet can be vary per provider), compared to for example GCR that use second as minimum billing resolution (VCPU, GB RAM, Requests, and Bandwidth).


That's it for now, I'll create a new post if I found something better.

2021-03-30

Kubernetes IDE/GUI

There's various GUI for Kubernetes that I've found:
For now my recommendation would be Kontenta-Lens, you'll get a bird-view of your cluster.
For shell autocomplete, I recommend kube-prompt (or other shell-specific autocomplete)

If you only need docker GUI, you can try Dry
If you prefer web-based GUI, you can try Portainer (it could manage much more than just kubernetes, but also local docker and docker swarm), it's quite better than Rancher.

2021-01-26

GOPS: Trace your Golang service with ease

GoPS is one alternative (also made by Google, other than pprof) to measure, trace or diagnose the performance and memory usage your Go-powered service/long-lived program. The usage is very easy, you just need to import and add 3 lines in your main (so the gops command line can communicate with your program):

import "github.com/google/gops/agent"

func main() {
  if err := agent.Listen(agent.Options{}); err != nil {
    log.Fatal(err)
  }
  // remaining of your long-lived program logic
}

If you don't put those lines, you can still use gops limited to get list of programs running on your computer/server that made with Go with limited statistics information, using these commands:

$ go get -u -v github.com/google/gops

$ gops  # show the list of running golang program
1248    1       dnscrypt-proxy  go1.13.4  /usr/bin/dnscrypt-proxy
1259    1       containerd      go1.13.15 /usr/bin/containerd
18220   1       dockerd         go1.13.15 /usr/bin/dockerd
1342132 1306434 docker          go1.13.15 /usr/bin/docker

$ gops tree # show running process in tree

$ gops PID # check the stats and whether the program have GOPS agent

#########################################################
# these commands below only available
# if the binary compiled with GOPS agent
# PID can be replaced with GOPS host:port of that program

$ gops stack PID # get current stack trace of running PID

$ gops memstats PID # get memory statistics of running PID

$ gops gc PID # force garbage collection

$ gops setgc PID X # set GC percentage

$ gops pprof-cpu PID # get cpu profile graph
$ gops pprof-heap PID # get memory usage profile graph
profile saved at /tmp/heap_profile070676630
$ gops trace PID # get 5 sec execution trace

# you can install graphviz to visualize the cpu/memory profile
$ sudo apt install graphviz

# visualize the cpu/memory profile graph on the web browser
$ go tool pprof /tmp/heap_profile070676630
> web 

Next step is analyze the call graph for the memory leaks (which mostly just wrongly/forgot to defer body/sql rows or holding slice reference of huge buffer or certain framework's cache trashing) or slow functions, whichever your mission are.

What if golang service you need to trace it inside Kubernetes pod that the GOPS address (host:port) not exposed to outside-world? Kubernetes is a popular solution for companies that manages bunch of servers/microservices or cloud like (GKE, AKS, Amazon EKS, ACK, DOKS, etc) but obviously overkill solution for small companies that doesn't need to scale elastically (or the servers are less than 10 or not using microservice architecture).

First, you must compile gops statically so it can be run inside alpine container (which mostly what people use):

$ cd $GOPATH/go/src/github.com/google/gops
$ export CGO_ENABLED=0
$ go mod vendor
$ go build

# copy gops to your kubernetes pod
$ export POD_NAME=blabla
$ kubectl cp ./gops $POD_NAME:/bin

# ssh/exec to your pod
$ kubectl exec -it $POD_NAME -- sh
$ gops

# for example you want to check heap profile for PID=1
$ gops pprof-heap 1
$ exit

# copy back trace file to local, then you can analyze the dump
kubectl cp $POD:/tmp/heap_profile070676630 out.dump

But if your address and port are exposed you can directly use gops from your computer to the pod or create a tunnel inside the pod if it doesn't have public IP, for example using ngrok.

Btw if you know any companies migrating from/to certain language (especially Go), frameworks or database, you can contribute here.