2022-01-28

Getting started with Kubernetes

Since the containerization got more popular, kubernetes gained more traction than using just one VM for deployment. In previous post I explained why or when you don't need kubernetes and when you'll need it. At least from deployment perspective we can categorize it into 5 types (based on ownership, initial cost, and granularity of the recurring cost, the need for capacity planning):

1. On-Premise Dedicated Server, your own server, your own rack, or put it the colocation, we own the hardware, and we have to replace it when it's broken, we have to maintain the network part (switches, routers). Usually this one is best choice for internal services (software that used only by internal staff, from the security and bandwidth perspective.

2. VM, we rent the "cloud" infrastructure, this can be considered IaaS (Infrastructure as a Service), we rent a virtual machine/server or sometimes named Virtual Private/Dedicated Server, so we pay monthly when the server turned on (or based on contract, X per month/year) also sometimes the bandwidth (unless the provider have unmetered billing). Some of notable product in this category: Google Compute Engine, Amazon EC2, Azure VM, Contabo VPS/VDS, etc. Usually this one is the best for databases (unless you are using managed database service) or other stateful applications or if the maximum number of users are limited that estimated based on the capacity planning (not whole world will be accessing this VM), because it a bit insane if you want your data moving/replicating automatically on high load, so manual trigger/scheduled for scale up/out/down/in, for stateful/database this one will still better option (eg. scale up/out when black friday, scale down/in when it ends).

3. Kubernetes, we rent or use managed kubernetes, or install kubernetes on top of our own own-premise dedicated server. Usually the company will rent 3 huge servers 64 core, 256GB RAM, with very large harddisk, and let developer to deploy containers/pod inside the kubernetes themself splitted based on their team or service's namespace. This have constant cost (those 3 huge VMs/OnPrems, and the managed kubernetes service's cost), some provider also provides automatic node scale-out (so the kubernetes nodes/VM [where the pods will be put inside] can be spawned/deleted based on load). Some notable product in this category: GKE, Amazon EKS, AKS, DOKS, Jelastic Kubernetes Cluster, etc. This one best if you have a lot number of services. For truly self-managed one: Mesos, Docker Swarm, Nomad combined with some other service (since they only manage a fraction of Kubernetes features, eg. Consul for service discovery, FabioLB for load balancer, Vault for secrets management, etc) can be other alternative. The different between this and number 4 below is, you still mostly have to provision the node VM/hardware manually (but can be automated with scripts as long as the provider supported it, eg. via Pulumi or IaaS provider's provisioning/autoscaling group API).

4. Container Engine, in this type of deployment we use the infrastructure provider's platform, so we only need to supply container without have to rent the VM manually, some provider will deploy the container inside a single VM, some other will deploy the container on shared dedicated server/VM. All of them have the same feature: autoscale out, but only some of them have auto scale up. Some notable product in this category: Google AppEngine, Amazon ECS/Beanstalk/Fargate, Azure App Service, Jelastic Cloud, Heroku, etc. Usually this one is the best choice for most cases on budget-wise and for scalability side, especially if you have a small number of services or using monolith, or can also great for large number of services if the billing is granular (pay only the resources [CPU, RAM, Disk] you utilize, not when the server turned on) like in Jelastic.

5. Serverless/FaaS, we only need to supply the function (mostly based on a specific template) that will run on specific event (eg. on specific time like CRON, or when load balancer receiving a request like in old CGI). Usually the function put inside a container, and used as standby instance, so the scale out only happened when it receives high load. If the function requires database as dependency, it's recommended to use managed databases that support high number of connections/connect-disconnect or offloaded to MQ/PubSub service. Notable products in this category: Google CloudRun, AWS Lambda, Azure Functions, OpenFaaS, Netlify, Vercel, Cloudflare Workers, etc. We pay this service usually based on CPU duration, number of calls, total RAM usage, bandwidth, and any other metrics, so it would be very cheap when number of function calls are small, but can be really costly if you are writing inefficient function or have large number of calls. Usually lambda only used for handling spikes or as atomic CRON.

Because of the hype, or because it fit their use cases (bunch of teams that want to do independent service deployments), and the possibility of avoiding vendor locking, sometimes a company might decide to use kubernetes. Most of the company can survive by not following the hype by only managed database (or deploying database on VM or even using docker-compose with volume binding) + container engine (for scaling out strategy), not having to train everyone to learn Kubernetes and not having to have a dedicated team/person to set up and manage Kubernetes (security, policies, etc).

But today we're gonna try one of the fastest local kubernetes for development use-case (not for production).

curl -LO https://storage.googleapis.com/minikube/releases/latest/minikube-linux-amd64
sudo install minikube-linux-amd64 /usr/local/bin/minikube

minikube start

# use --driver=kvm2 or virtualbox if docker cannot connect internet
#sudo apt install virtualbox
#sudo apt install qemu-kvm libvirt-daemon-system libvirt-clients bridge-utils virt-manager
#sudo adduser `id -un` libvirt
#sudo adduser `id -un` kvm

alias kubectl='minikube kubectl --'
alias k=kubectl

# will download kubectl if it's the first time
k

# get pods from all namespace
k get po -A

# open dashboard and authenticate
minikube dashboard 

# destroy minikube cluster
minikube ssh # sudo poweroff
minikube delete


create Dockerfile you want to deploy to kubernetes cluster, or if it's just simple single binary golang project, build locally then put it to alpine docker (instead of build cleanly inside docker), then push to image registry will be work just fine:

# build binary
CGO_ENABLED=0 GOOS=linux go build -o ./bla.exe

# create Dockerfile
echo '
FROM alpine:latest
WORKDIR /
COPY bla.exe .
CMD ./bla.exe # or ENTRYPOINT if you need to override
' > Dockerfile

# build docker image
VERSION=$(ruby -e 't = Time.now; print "v1.#{t.month+(t.year-2021)*12}%02d.#{t.hour}%02d" % [t.day, t.min]')
COMMIT=$(git rev-parse --verify HEAD)
APPNAME=local-bla
docker image build -f ./Dockerfile . \
  --build-arg "app_name=$APPNAME" \
  -t "$APPNAME:latest" \
  -t "$APPNAME:$COMMIT" \
  -t "$APPNAME:$VERSION"

# example how to test locally without kubernetes
docker image build -f ./Dockerfile -t docktest1 .
docker run --network=host \
  --env 'DBMAST=postgres://usr:pwd@127.1:5432/db1?sslmode=disable' \
  --env 'DBREPL=postgres://usr:pwd@127.1:5432/db1?sslmode=disable' \
  --env DEBUG=true \
  --env PORT=8082 \
  docktest1

# push image to minikube
minikube image load $APPNAME

# create deployment config
echo '
apiVersion: v1
kind: Pod
metadata:
  name: bla-pod
spec:
  containers:
    - name: bla
      image:
bla
      imagePullPolicy: Never
      env:
      - name: BLA_ENV
        value: "ENV_VALUE_TO_INJECT"
      # if you need access to docker-compose outside the kube cluster
      # use minikube ssh, route -n, check the ip of the gateway
      # and use that ip as connection string
      # it should work, as long the port forwarded
  restartPolicy: Never
' > bla-pod.yaml

# deploy
kubectl apply -f bla-pod.yaml

# check
k get pods
k logs bla-pod

# delete deployment
kubectl delete pod bla-pod

If you need NewRelic log forwarding, it's just as easy as adding a helm chart (it would automatically attach new pod logs and send it to newrelic):

curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
helm repo add newrelic https://helm-charts.newrelic.com
helm search repo newrelic/
helm install newrelic-logging newrelic/newrelic-logging --set licenseKey=eu01xx2xxxxxxxxxxxxxxxRAL
kubectl get daemonset -o wide -w --namespace default newrelic-logging

The next step should be adding load balancer or ingress so that pod can receive http requests. Other alternative for faster development workflow (autosync after recompile to the pod), you can try Tilt.