Programming Rants: August 2021

2021-08-06

Database Patterns of Microservices

When you need microservice? When you have multiple business domain (not domain of DNS), that are best to be splitted, managed, and deployed separately. If your business domain so small and the team is small, it's better to use modular monolith instead, since using Microservice adds a lot of operational complexity (especially if you are using Kubernetes).

These are database patterns I got from Kindson presentation.

Private database per service
this is the most common pattern, every domain/service must have their own database, this has some benefit:
+ the developers won't be tempted to join across domains that could make the codebase hard to refactor if someday need to be splitted to microservice/modular approach
+ easier for new developer that joining the team, since you don't need to know whole ER diagram, just a small segment that related to the service he/she managed
- more complicated for analytics use case because you can't do JOIN, but this can be solved using distributed sql query engine like Trino
+ each database can scale and migrate independently (no downtimes especially when you are using database that require locking on migration like MySQL)
- this causes another problem for accessing different domain that could be solved by:
* api gateway (sync): service must hit other service thru API gateway
* event hub/pubsub (async/push): service must subscribe other services' event to retrieve the data, which causes another consistency-related problem
* service mesh (sync): service must hit other service thru sidecar
* directly read a replica (async/pull)
Shared database
all or some microservice accessing the same database, this have some pros and cons:
+ simpler for the developers, since they can do JOIN and transaction
- worst kind of performance when the bottleneck is the database especially on migration and scaling
+ no consistency problem
SAGA (sequence of local transaction)
this have pros and cons:
+ split an atomic transaction into multiple steps that when one of the steps are failed, must reconcile/undo/create a compensation action
- more complex than normal database transaction
API Composition (join inside the API service, the pattern used in Trino)
this have pros and cons:
+ can do join across services and datasource
- must hit multiple services (slower than normal join) if the count are large
- can be bad if the other service calling another service too (cascading N+1 queries), eg. A hit B, B hit C, but this can be solved if B and C have batch APIs (that usually using WHERE IN, instead of single API)
CQRS (Command Query Responsibility Segregation)
a pattern that created because old databases usually single master, multiple slave, but this also have a good benefit and cons:
+ simpler scaling, either need to scale the write or just scale the read
- possible inconsistency problem if not reading from the master for transaction which adds complexity on development (which one must read the master, which one can read the readonly replica)
Domain Events
service must publish events, have pros and cons:
+ decoupling, no more service mesh/hitting other services, we just need to subscribe the events
- eventual consistency
- must store and publish events that probably never need to be consumed, but we can use directly read event database of that service to overcome this, but this can be also a benefit since events helps the auditing
Event Sourcing
this pattern create a snapshot to reconstruct final state from series of events, have pros and cons:
+ can be used for reliably publish an event when state changed
+ good for auditing
+ theoritically easier to track when business logic changed (but we must build the full DFA/NFA state graph to reliably consider the edge cases)
- difficult to query since it's series of events, unless you are prioritizing the snapshot

Engineering is about making decision and prioritization, which simplicity that needs to be prioritized, either maintainability, raw performance, ease of scaling, or other metrics, you'll need to define your own "best". And there's no silver bullet, each solution are best only for specific use case.

But.. if you have to create a ultimate/general purpose service which patter you would use?

Based on experience, I'll use:

CQRS (1 writer, 1-N reader)
reader must cache the reads
writer must log each changes (Domain Events)
writer can be sync or async (for slow computation, or those that depends on another service/SAGA) and update snapshot each time
Domain Events
this logs of events can be tailed (async/pull) from another service if they need it
they must record their own bookmark (of which events already tailed/consumed/ack'ed, which hasn't)
EDIT 2021-08-16: this guy has the same idea, except that I believe there's should be no circular dependency (eg, his order-inventory system should be composed from order, inventory, and delivery)
API Composition
but we can use JOIN inside the domain
especially for statistics, we must collect each services statistics for example
API have 2 version: consistent version (read from master), eventual consistency version (read from readonly replica)
API must have batch/paged version other than standard CRUD
API must tell whether that API depends on another service's APIs
Private database is a must
since bottleneck are mostly always the database, it'll be better to split databases by domains from beginning (but no need to create a readonly replicata until the read part became the bottleneck)
If the write more than 200K-600K rps i prefer manual partitioning instead of sharding (unless the database I use support sharding, automatic rebalancing, super-easy to add a new node with Tarantool-like performance)
What if you need joins for analytics reasons? You can use Trino/Presto/BigQuery/etc, or just delegate the statistics responsibility to each services then collect/aggregate from statistics/collector service.

2021-08-05

You don't need Kubernetes

Everyone using kubernetes, even they don't have to, which adds additional complexity. In this article we will discuss when we need Kubernetes, and when we don't, and what's the alternative?

When you need Kubernetes?

If you have lot of teams (>3, not 3 person, but at least 3 teams) and services, it's better to use Kubernetes or any other orchestration service to utilize your servers
Why? because that way they can deploy each services independently, and service mesh management is far easier if using orchestration.
Alternative? use Nomad or standard rsync with autoreloader (eg. overseer if you are using Golang) or if you are using dynamic languages (or old CGI/FastCGI) that are not creating own web server usually you don't need an autoreloader since source code on the server anyway and will be autoreloaded by the web server
If you have more than 3+ powerful servers, it's better to use Kubernetes or any other orchestration service
Why? with orchestration service we can binpack and fully utilize those servers better than creating bunch of VMs inside the servers
Alternative? use Ansible to setup those servers, or Nomad to orchestate, or create a script to rsync to multiple servers
If you need canary or A/B deployment
Why? creating canary-like deployment manually (setting up load balancer to send a percentage of request to specific nodes is a bit chore)
Alternative? use Nomad and Consul service registry
When you are using slow programming language implementation that the first bottleneck is not database but your API/Web requests implementation
Why? normally if you code using compiled programming language, the bottleneck is not the network/number of hits, but the database, which usually solved with caching if it just read, or CDN if static files. But when you are using slow programming languages (eg. Ruby, Python, PHP5, etc), you mostly will hit bottleneck on the API/Web requests side first before the database. So you'll need to spawn additional pod to handle those requests.
Alternative? Amazon AutoScaling, Google Cloud AutoScaling Group, Azure AutoScale, or just do a capacity planning and scale up/out manually before the peak season (eg. Black Friday, Christmas, discount day, etc) and scale down/in after peak season ends.

When you don't need Kubernetes?

When you only have a little <=3 or bunch of least powerful servers that are utilized well.
Seriously you don't need it, just a simple rsync script (either manual or in CI/CD pipeline) or Ansible script would be sufficient
When you are alone or in a small team (no other team), writing modular monolith is fine until it's painful to build, test, and deploy, then that's the moment that you will need to split to multiple services.
When you only have production and test environment without A/B or canary deployment.
Again, Kubernetes will be totally overkill in this case.

always cache reads
prepare for nth server when it's already 70% load (you need to provision a new node anyway if kubernetes also hit the load), either manually or use cloud providers' autoscaler
put slow tasks/computation as background job, so web services could always respond fast. Even when you have autoscaler for APIs/web services, if the bottleneck is the database or I/O (which is more likely than the processor/computation), it would be totally useless.

When you don't want Kubernetes to be the 30% overhead your servers, use lightweight orchestration and service discovery instead, like Nomad with Consul.

Well, for the last part, you are mostly don't need Kubernetes, don't be eaten by the hype, especially if you deploy databases on the pod, it's unstable AF, you don't need to automatically scale out/in the databases anyway (usually scale up if the database product is quite hassle to scale out), it would be a crazy thing to move around gigabytes of data because the automatic scaling. I hope this article could help people that started to build their startup to start provision with proper capacity planning and releasing instead of prematurely optimize their architecture and infrastructure with Kubernetes.

FAQ

So, what are your qualification saying this?

I worked for small companies (<7 engineers) and huge companies (700+ engineers) that are both using Kubernetes, and I realized the difference why one is a must, and why the other one is just decision based on hype.

EDIT

I found something that even simpler than nomad/waypoint/kubernetes, jelastic! see the demo video here.

2021-08-04

Dockerfile Template (React, Express, Vue, Nest, Angular, GoFiber, Svelte, Django, Laravel, ASP.NET Core, Kotlin, Deno)

These are docker template for deploying common applications (either using Kubernetes, Nomad, or locally using docker-compose), this post are copied mostly from scalablescripts youtube channel and docker docs, the gist for nginx config are here.

ReactJS

FROM node:15.4 as build1

WORKDIR /app1

COPY package+.json .

RUN npm install

COPY . .

RUN npm run build

FROM nginx:1.19
COPY ./nginx.conf /etc/nginx/nginx.conf

COPY --from=build1 /app1/build /usr/share/nginx/html

To build it, use docker build -t react1 .

To run it, use docker run -p 8001:80 react1

ExpressJS

FROM node:15.4

WORKDIR /app

COPY package+.json .

RUN npm install

COPY . .

CMD node index.js

VueJS

FROM node:15.4 as build1

WORKDIR /app1

COPY package+.json .

RUN npm install

COPY . .

RUN npm run build

FROM nginx:1.19
COPY ./nginx.conf /etc/nginx/nginx.conf

COPY --from=build1 /app1/dist /usr/share/nginx/html

The only different thing from react is the build directory not build/ but dist/.

NestJS

FROM node:15.4 as build1

WORKDIR /app1

COPY package+.json .

RUN npm install

COPY . .

RUN npm run build

FROM node:15.4

WORKDIR /app
COPY package.json .

RUN npm install --only=production
COPY --from=build1 /app1/dist ./dist

CMD npm run start:prod

AngularJS

FROM node:15.4 as build1

WORKDIR /app1

COPY package+.json .

RUN npm install

COPY . .

RUN npm run build --prod

FROM nginx:1.19
COPY ./nginx.conf /etc/nginx/nginx.conf

COPY --from=build1 /app1/dist/PROJECT_NAME /usr/share/nginx/html

Fiber (Golang)

FROM golang:1.16-alpine as build1

WORKDIR /app1

COPY go.mod .

COPY go.sum .

RUN go mod download

COPY . .

RUN CGO_ENABLED=0 GOOS=linux go build -o app1.exe

FROM alpine:latest

RUN apk --no-cache add ca-certificates

WORKDIR /
COPY --from=build1 /app1/app1.exe .
CMD ./app1.exe

You don't need COPY go.mod to go mod download step if you have vendor/ directory to /go/pkg/mod, you can reuse it instead of redownloading whole dependencies (this can really faster things up on the CI/CD pipeline, especially if you live on 3rd world country). The ca-certificates only needed if you need to hit https endpoints, if you don't then you can skip that step.

Svelte

FROM node:15.4 as build1

WORKDIR /app1

COPY package+.json .

RUN npm install

COPY . .

RUN npm run build

FROM nginx:1.19
COPY ./nginx.conf /etc/nginx/nginx.conf

COPY --from=build1 /app1/public /usr/share/nginx/html

Django

FROM python:3.9-alpine as build1

ENV PYTHONUNBUFFERED 1

WORKDIR /app1

COPY requirements.txt .

CMD pip install -r requirements.txt

COPY . .

CMD python manage.py runserver 0.0.0.0:80

Laravel

FROM php:7.4-fpm

RUN apt-get update && apt-get install -y git curl libpng-dev libonig-dev libxml2-dev zip unzip

RUN curl -sS https://getcomposer.org/installer | php -- --install-dir=/usr/local/bin --filename=composer

RUN docker-php-ext-install pdo_mysql mbstring

WORKDIR /app1

COPY composer.json .

RUN composer install --no-scripts

COPY . .

CMD php artisan serve --host=0.0.0.0 --port=80

ASP.NET Core

FROM mcr.microsoft.com/dotnet/sdk:5.0 as build1

WORKDIR app1

COPY *.csproj .

CMD dotnet restore

COPY . .

RUN dotnet publish -c Release -o out

FROM mcr.microsoft.com/dotnet/aspnet

WORKDIR /app

COPY --from=build1 /app1/out .

ENTRYPOINT ["dotnet", "PROJECT_NAME.dll"]

Kotlin

FROM gradle:7-jdk8 as build1

WORKDIR /app1

COPY . .

RUN ./gradlew build --stacktrace

FROM openjdk

WORKDIR /app

EXPOSE 80

COPY --from=build1 /app/build/libs/PROJECT_NAME-VERSION-SNAPSHOT.jar .

CMD java -jar PROJECT_NAME-VERSION-SNAPSHOT.jar

Deno

FROM denoland/deno:1.11.0

WORKDIR /app1

COPY . .

RUN ["--run","--allow-net","app.ts"]

Deployment

For deployment you can use AWS (elastic container registry and elastic container instance or elastic container service with fargate), Azure (azure container registry and azure container instance), GoogleCloud (upload to container registry and google cloud run), or just upload it to docker registry then pull it on the server.

Subscribe to: Comments ( Atom )