2022-12-24

Map to Struct and Struct to Map Golang Benchmark 2022 Edition

Sometimes we want to convert from map to struct or struct to map (dictionary in other language), or even struct to struct. There's some library that can help us doing this, for example structs, mapstructure, copier, or smapping. We could also utilize serialization and deserialization libraries to do this. With caveats, that some serialization format (eg. JSON) doesn't allow integer larger than 2^53 for example.

Here's benchmark that I run this morning:

map to structtotalns/opB/opallocs/op
M2S_GoccyGoJson_MarshalUnmarshal-326,661,932517803
M2S_JsonIteratorGo_MarshalUnmarshal-324,892,6117241968
M2S_VmihailencoMspackV5_MarhsalUnmarshal-324,572,5977411885
M2S_FxamackerCbor_MarshalUnmarshal-324,418,5587991208
M2S_SurrealdbCork_EncodeDecode-323,080,2821,0801,2176
M2S_GopkgInMgoV2Bson_MarshalUnmarshal-323,227,9051,09223213
M2S_ShamatonMsgpackV2_MarshalUnmarshal-323,062,6771,16195615
M2S_MitchellhMapstructure_Decode-322,487,4281,39572018
M2S_MongoDriverBson_MarshalUnmarshal-322,477,9831,45941414
M2S_KokizzuJson5b_MarshalUnmarshal-321,987,2401,71163216
M2S_EncodingJson_MarshalUnmarshal-322,056,9441,78060016
M2S_EtNikBinngo_MarshalUnmarshal-321,985,5951,85742539
M2S_PquernaFfjson_MarshalUnmarshal-321,739,9681,98660916
M2S_UngorjiGocodec_BincEncodeDecode-321,401,4532,5824,34023
M2S_UngorjiGoCodec_CborEncodeDecode-321,304,8282,6364,34023
M2S_PelletierGoTomlV2_MarshalUnmarshal-321,284,0372,7871,60027
M2S_UngorjiGocodec_SimpleEncodeDecode-321,295,9262,8104,34023
M2S_UngorjiGocodec_JsonEncodeDecode-321,000,0003,0284,95625
M2S_IchibanTnetstrings_MarshalUnmarshal-32749,9475,0569,32948
M2S_BurntSushiToml_EncodeUnmarshal-32425,3358,0657,95871
M2S_HjsonHjsonGoV4_MarshalUnmarshal-32355,78410,8703,93678
M2S_GopkgInYamlV3_MarshalUnmarshal-32271,19013,52414,11280
M2S_DONUTSLz4Msgpack_MarshalUnmarshal-32240,61915,4981,26416
M2S_GoccyGoYaml_MarshalUnmarshal-32214,77616,1927,821214
M2S_GhodssYaml_MarshalUnmarshal-32156,41223,34721,378161
M2S_NaoinaToml_MarshalUnmarshal-3257,60758,331398,54477





struct to maptotalns/opB/opallocs/op
S2M_MitchellhMapstructure_Decode-325,055,40271653612
S2M_GoccyGoJson_MarshalUnmarshal-324,660,22474752212
S2M_JsonIteratorGo_MarshalUnmarshal-324,283,26283550514
S2M_VmihailencoMspackV5_MarhsalUnmarshal-324,009,86390860712
S2M_FxamackerCbor_MarshalUnmarshal-323,562,3521,02345211
S2M_ShamatonMsgpackV2_MarshalUnmarshal-323,180,0101,08955615
S2M_GopkgInMgoV2Bson_MarshalUnmarshal-323,047,3961,14552815
S2M_SurrealdbCork_EncodeDecode-322,976,3281,1961,61112
S2M_EncodingJson_MarshalUnmarshal-321,914,1651,78268818
S2M_PquernaFfjson_MarshalUnmarshal-321,911,9501,84569718
S2M_EtNikBinngo_MarshalUnmarshal-321,948,8021,85976845
S2M_KokizzuJson5b_MarshalUnmarshal-321,888,7741,88496020
S2M_MongoDriverBson_MarshalUnmarshal-321,857,6491,99575918
S2M_PelletierGoTomlV2_MarshalUnmarshal-321,244,0122,8641,80031
S2M_UngorjiGocodec_BincEncodeDecode-321,000,0003,2344,88834
S2M_UngorjiGoCodec_CborEncodeDecode-32989,6713,3584,88834
S2M_UngorjiGocodec_SimpleEncodeDecode-321,000,0003,4004,88834
S2M_UngorjiGocodec_JsonEncodeDecode-32912,5123,6395,50436
S2M_IchibanTnetstrings_MarshalUnmarshal-32776,7964,7449,56146
S2M_BurntSushiToml_EncodeUnmarshal-32447,2168,5388,23173
S2M_HjsonHjsonGoV4_MarshalUnmarshal-32389,4769,4163,86866
S2M_GopkgInYamlV3_MarshalUnmarshal-32315,93913,33814,40081
S2M_DONUTSLz4Msgpack_MarshalUnmarshal-32242,33014,29874416
S2M_GoccyGoYaml_MarshalUnmarshal-32230,04214,9197,580202
S2M_GhodssYaml_MarshalUnmarshal-32151,02322,68221,441161
S2M_NaoinaToml_MarshalUnmarshal-3260,91652,047398,11280





struct to structtotalns/opB/opallocs/op
S2S_GoccyGoJson_MarshalUnmarshal-3212,046,4973171124
S2S_ShamatonMsgpackV2_MarshalUnmarshal-327,897,4884581486
S2S_JsonIteratorGo_MarshalUnmarshal-327,853,592494926
S2S_FxamackerCbor_MarshalUnmarshal-327,038,808511805
S2S_GopkgInMgoV2Bson_MarshalUnmarshal-325,105,3437151449
S2S_VmihailencoMspackV5_MarhsalUnmarshal-324,549,7008182136
S2S_MongoDriverBson_MarshalUnmarshal-323,560,9461,0193218
S2S_EncodingJson_MarshalUnmarshal-322,731,0511,3133049
S2S_PquernaFfjson_MarshalUnmarshal-322,734,3571,3303049
S2S_KokizzuJson5b_MarshalUnmarshal-322,594,7281,3435049
S2S_SurrealdbCork_EncodeDecode-322,555,7451,3971,2417
S2S_EtNikBinngo_MarshalUnmarshal-321,995,4681,84040041
S2S_PelletierGoTomlV2_MarshalUnmarshal-321,460,6832,4591,44023
S2S_UngorjiGocodec_SimpleEncodeDecode-321,205,6482,9194,36424
S2S_UngorjiGoCodec_CborEncodeDecode-321,290,7342,9204,36424
S2S_UngorjiGocodec_BincEncodeDecode-321,207,3273,0074,36424
S2S_UngorjiGocodec_JsonEncodeDecode-321,000,0003,2234,98026
S2S_IchibanTnetstrings_MarshalUnmarshal-32722,4934,9509,28947
S2S_BurntSushiToml_EncodeUnmarshal-32398,3668,4587,91872
S2S_HjsonHjsonGoV4_MarshalUnmarshal-32304,36911,1894,57879
S2S_DONUTSLz4Msgpack_MarshalUnmarshal-32282,50512,2152377
S2S_GopkgInYamlV3_MarshalUnmarshal-32279,33212,54114,01676
S2S_GoccyGoYaml_MarshalUnmarshal-32211,95215,5427,982208
S2S_GhodssYaml_MarshalUnmarshal-32160,66023,14821,073154
S2S_NaoinaToml_MarshalUnmarshal-3264,46859,672399,06583

The repository and the always updated result is here, feel free to add your own serialization/deserialization library. As we can see, goccy-gojson is the fastest among all, too bad if you store int64 larger than 2^53 it give wrong result. So it's better to use second best and all rounder vmihailenco-msgpack, or for specific use case struct to map/struct is mapstructure.

Here's the top ranking:

Ser/Deser M2S S2M S2S
GoccyGoJson 1 2 2
JsonIteratorGo 2 3 4
MitchellhMapstructure 8 1 1
VmihailencoMspackV5 3 4 7
FxamackerCbor 4 5 5
ShamatonMsgpackV2 7 6 3
SurrealdbCork 5 8 12


2022-09-08

Getting started with InfluxDB

Today we're gonna learn about InfluxDB, a time series database that already been a standard for open source metric storage, a part of TICK stack (telegraf -- a metric collector like datadog-agent/fluentd, influx -- the database, chronograf -- visualizer like grafana, kapacitor -- an alert manager like prometheus alert-manager also for ETL). To install influx use this steps from this link (there's also docker). What's the cons of influx compared to other solution? Datadog is (very very) expensive, Prometheus seems too kube-oriented, Influx open source doesn't support HA and clustering, that's why I would rather use Clickhouse for time series / log collection (as data sink) and aggregate those logs as metrics (materialized view) and copy periodically to faster database like Tarantool. Influx have 2 query syntax, one is SQL-like called InfluxQL and other one Javascript+FP-like called Flux (it has |> pipe like most FP). Here's the example of docker and the query language:

docker run -d -e INFLUXDB_ADMIN_USER:user1 -e INFLUXDB_ADMIN_PASSWORD:pass1 --name influxdb1 influxdb

docker exec -it influxdb1 influx -host 127.0.0.1 -port 8086 -username user1 -password pass1

show databases
create database db1
use db1

-- show all tables (=measurements)

show measurements

-- tag/index always string, field can be float, int, bool, etc

insert into table1,tag1=a,tag2=b field1=1,field2=2

-- if not set, default time is in nanosecond of insert time

select * from "table1"
select * from table1 where tag1='a'

-- describe columns from all table:

show tag keys
show field keys from "table1"

-- select distinct tag from table1
show tag values from "table1" with key in ("tag1")

-- automatically delete after N days, default 0 = never delete
-- shard by N hours, default 168 hours (a week)
show retention policies

-- like partition in other database
show shards
show shard groups

I guess that's it what you need to know to get started with InfluxDB. If you're looking one that comparable with Clickhouse, you can check TDEngine or TiFlash.

2022-07-28

Techempower Framework Benchmark Round 21

 The result for Techempower framework benchmark round 21 is out, as usual the most important benchmark is the update and multi query benchmark:

The top rankers are Rust, JS (cheating), Java, C++, C#, PHP, C, Scala, Kotlin, and Go. For multiple queries:


Top rankers are Rust, Java, JS (cheating), Scala, Kotlin, C++, C#, and Go. These benchmark shows how efficient their database driver (which mostly the biggest bottleneck), and how much overhead the framework of each language (including serialization, alloc/GC, async-I/O, etc).

For memory usage and CPU utilization you can check here https://ajdust.github.io/tfbvis/?testrun=Citrine_started2022-06-30_edd8ab2e-018b-4041-92ce-03e5317d35ea&testtype=update&round=false

2022-06-07

How to profile your Golang Fiber server

Usually you need to load test your webserver to measure where's the memory leak or where's the bottleneck, and Golang already provided tool to do that, called pprof. What you need to do is depends on your framework that you use, but it's all similar, most framework already have a middleware that you can import and use, for example in fiber there's pprof middleware, to use:

// import
  "github.com/gofiber/fiber/v2/middleware/pprof"

// use
  app.Use(pprof.New())

It would create a route called /debug/pprof that you can use, just start the server, then open that path. To profile or check heap you just need to click profile/heap link, it would wait for around 10 seconds, meanwhile it waits, you must hit other endpoints to generate traffic/function calls. After 10 seconds, it would show a download dialog to save your cpu profile or heap profile. From that file, you can run a command similar to gops, for example if you want to generate svg or generate web that shows your profiling:

pprof -web /tmp/profile # or
pprof -svg /tmp/profile # <-- file that you just downloaded

It would generate something like this:



So you can find out which function that took most of the CPU time (or if it's heap profile, which function that generates/allocate most memory usage. In my case the bottleneck is the default built-in pretty logger, it limits the number of requests it can only handle to ~9K rps for concurrency 255 on database write benchmark, that if we remove built-in logging and replace with zerolog, it can handle ~57K rps for same benchmark.

2022-05-31

Getting started with SaltStack Configuration Management

SaltStack is one of the least popular Automation and Configuration Management tool, similar to Ansible, Chef, and Puppet (also stuff that not a CM, it's IaC tool but if you are skilled enough can be used as CM: Terraform and Pulumi). Unlike Ansible, it have agent that needs to be installed to the target servers. Unlike Chef and Puppet, it uses Python. Like Ansible, SaltStack can be used for infrastructure lifecycle management too. To install Salt, you need to run these commands:

sudo add-apt-repository ppa:saltstack/salt
sudo apt install salt-master salt-minion salt-cloud

# for minion:
sudo vim /etc/salt/minion
# change line "#master: salt" into:
# master: 127.0.0.1 # should be master's IP
# id: thisMachineId
sudo systemctl restart salt-minion


There's simpler way to install, bootstrap master and minion, it's using a bootstrap script:

curl -L https://bootstrap.saltstack.com -o install_salt.sh
sudo sh install_salt.sh -M # for master

sudo sh install_salt.sh    # for minion
# don't forget change /etc/salt/minion, and restart salt-minion


To the minion will send the key to master, to list all possible minion, run:

sudo salt-key -L

To accept/add minion, you can use:

sudo salt-key --accept=FROM_LIST_ABOVE
sudo salt-key -A # accept all unaccepted, -R reject all

# test connection to all minion
sudo salt '*' test.ping

# run a command 
sudo salt '*' cmd.run 'uptime'

If you don't want open port, you can also use salt-ssh, so it would work like Ansible:

pip install salt-ssh
# create /etc/salt/roster containing:
ID1:
  host: minionIp1
  user: root
  sudo: True
ID2_usually_hostname:
  host: minionIp2
  user: root
  sudo: True

To execute a command on all roster you can use salt-ssh:

salt-ssh '*' cmd.run 'hostname'

On SaltStack, there's 5 things that you need to know:
  1. Master (the controller) 
  2. Minion (the servers/machines being controlled)
  3. States (current state of servers)
  4. Modules (same like Ansible modules)
    1. Grains (facts/properties of machines, to gather facts call: salt-call --grains)
    2. Execution (execute action on machines, salt-call moduleX.functionX, for example: cmd.run 'cat /etc/hosts | grep 127. ; uptime ; lsblk', or user.list_users, or grains.ls -- for example to list all grains available properties, or grains.get ipv4_interfaces, or grains.get ipv4_interfaces:docker0)
    3. States (idempotent multiplatform for CM)
    4. Renderers (module that transform any format to Python dictionary)
    5. Pillars (user configuration properties)
  5. salt-call (primary command)
To run locally just add --local. To run on every host we can use salt '*' modulename.functionname. The wildcard can be changed with compound filtering argument, more detail and example here.
To start using salt with file mode, create a directory called salt, and create a top.sls file (it's just a yaml combined with jinja) which contains list of host, filters, and state module you want to call, usually it saved on /srv/ directory, containing:

base:
  '*': # every machine
    - requirements # state module to be called
    - statemodule0
dev1:
  'osrelease:*22.04*': # only machine with specific os version
     - match: grain
     - statemodule1
dev2:
  'os:MacOS': # only run on mac
     - match: grain
     - statemodule2/somesubmodule1
prod:
  'os:Pop and host:*.domain1': # only run on PopOS with tld domain1
     - match: grain
     - statemodule3
     - statemodule4
  'os:Pop': # this too will run if the machine match
     - match: grain
     - statemodule5

Then create a directory for each those items containing init.sls or create files for each those items with .sls extension. For example requirements.sls:

essential-packages: # ID = what this state module do
  pkg.installed: # module name, function name
    - pkgs:
      - bash
      - build-essentials
      - git
      - tmux
      - byobu
      - zsh
      - curl
      - htop
      - python-software-properties
      - software-properties-common
      - apache2-utils
  file.managed: 
    - name: /tmp/a/b
    - makedirs: True
    - user: root
    - group: wheel
    - mode: 644
    - source: salt://files/b # will copy ./files/b to machine
  file.managed:
    - name: /tmp/a/c
    - contents: # this will create a file with specific lines
      - line 1
      - line 2
  service.running:
    - name: myservice1
    - watch: # will restart the service if these changed
      - file: /etc/myservice.conf
      - file: /tmp/a/b
  file.append:
    - name: /tmp/a/c
    - text: 'some line' # will append to that file
  cmd.run:
    - name: /bin/someCmd1 param1; /bin/cmd2 --someflag2
    - onchanges:
      - file: /tmp/a/c # run cmd above if this file changed
  file.directory: # ensure directory created
    - name: /tmp/d
    - user: root
    - dirmode: 700
  archive.extracted: # extract from specific archive file
    - name: /tmp/e
    - source: https://somedomain/somefile.tgz
    - force: True
    - keep_source: True
    - clean: True

To apply run: salt-call state.apply requirements
Some other example, we can create a template with jinja and yaml combined, like this:

statemodule0:
  file.managed:

    - name: /tmp/myconf.cfg # will copy file based on jinja condition
    {% if '/usr/bin' in grains['pythonpath'] %}
    - source: salt://files/defaultpython.conf
    {% elif 'Pop' == grains['os'] %}
    - source: salt://files/popos.conf
    {% else %}
    - source: salt://files/unknown.conf
    {% endif %}
    - makedirs: True
  cmd.run:
    - name: echo
    - onchanges:
      - file: statemodule0 # refering statemodule0.file.managed.name

To create a python state module, you can create a file containing something like this:

#!py
def run():
  config = {}
  a_var = 'test1' # we can also do a loop, everything is dict/array
  config['create_file_{}'.format(a_var)] = {
    'file.managed': [
      {'name': '/tmp/{}'.format(a_var)},
      {'makedirs': True},
      {'contents': [
        'line1',
        'line2'
        ]
      },
    ],
  }
  return config
 
To include another state module, you can specify on statemodulename/init.sls, something like this:

include:
  - .statemodule2 # if this a folder, will run the init.sls inside
  - .statemodule3 # if this a file, will run statemodule3.sls

To run all state it you can call salt-call state.highstate or salt-call state.apply without any parameter.
It would execute 
top.sls file and the includes in order recursively.
To create a scheduled state, you can create a file containing something like this:

id_for_this:
  schedule.present:
    - name: highstate
    - run_on_start: True
    - function: state.highstate
    - minutes: 60
    - maxrunning: 1
    - enabled: True
    - returner: rawfile_json
    - splay: 600 


Full example can be something like this:

install nginx:
  pkg.install:
    - nginx

/etc/nginx/nginx.conf: # used as name
   file.managed:
     source: salt://_files/nginx.j2
     template: jinja
     require:
       - install nginx

run nginx:
  service.running:
    name: nginx
    enable: true
    watch: 
      - /etc/nginx/nginx.conf

Next, to create a pillar config, just create a normal sls file, containing something like this:

user1:
  active: true
  sudo: true
  ssh_keys:
    - ssh-rsa censored user1@domain1
nginx:
  server_name: 'foo.domain1'

To reference this on other salt file, you can use jinja something like this:

{% set sn = salt['pillar.get']('nginx:server_name') -%}
server {
  listen 443 ssl;
  server_name {{ sn }};
...

That's it for now, next if you want to learn more is to create your own executor module or other topics here.

2022-05-08

How to structure/layer your Golang Project (or whatever language you are using)

I've been maintaing a lot of other people's project, and see that most people blindly following a framework's structure without purpose. So I write this so that people can be convinced how to write a good directory structure or good layering on your application, especially for this case when creating a service.

It would be better to split your project to exactly 3 layers:
  1. presentation (handle only serialization/deserialization, and transport)
  2. business-logic (handles pure business logic, DTO goes here)
  3. persistence (handles records and it's persistence, DAO goes here)
It could be a package with bunch of sub-package, or per domain basis, or per user role basis. For example:

# monolith example

bin/ # should be added to .gitignore
presenter/ # de/serialization, transport, formatter goes here
  grpc.go
  restapi.go
  cli.go
  ...
businesslogic/ # pure business logic and INPUT/OUTPUT=DTO struct
  role1|role2|etc.go
  domain1|domain2|etc.go
  ...
models/ # all DAO/data access model goes here
  users.go
  schedules.go
  posts.go
  ...
deploy/ 
  Dockerfile
  docker-compose.yml
  deploy.sh
pkg/ # all 3rd party helpers/lib/wrapper goes here
  mysql/
  redpanda/
  aerospike/
  minio/

# Vertical-slice example

bin/
domain1/
  presenter1.go
  business1.go
  models1.go
domain2/
  presenter2.go
  business2.go
  models2.go
...
deploy/
pkg/

Also it's better to inject per function basis instead of whole struct/interface, something like this:

type LoginIn struct {
  Email string
  Password string
  // normally I embed CommonRequest object
}
type LoginOut struct {

  // normally I embed CommonResponse object with properties:
  SessionToken string
  Error string
  Success bool
}
type GuestDeps struct {
  GetUserByEmailPass func(string, string) (*User, error)
  // other injected dependencies, eg. S3 uploader, 3rd party libs
}
func (g *GuestDeps) Login(in *LoginIn) (out LoginOut) {
  // do validation
  // do retrieve from database
  // return proper object
}

So when you need to do testing, all you need is create a fake, either with counterfeiter (if you inject an interface instead of function) or manual, then check with autogold:

func TestLogin(t *testing.T) {
  t.Run(`fail_case`, func(t *testing.T){
    in := LoginIn{}
    deps := 
GuestDeps{
      GetUserByEmailPass: func(string,string) { return nil, errors.New("failed to retrieve from db") }
    }
    out := deps.Login(in)
    want := autogold.Want("fail_case",nil)
    want.Equal(t, out) // ^ update with go test -update 
  })
}

then on the main (real server implementation, you can put real dependency something like this:

rUser := userRepo.NewPgConn(conf.PgConnStr)
srv := httpRest.New( ... )
guest := GuestDeps{
  GetUserByEmailPass: rUser.GetUserByEmailPass,
  
DoFileUpload: s3client.DoFileUpload,
  ...
}
srv.Handle[LoginIn,LoginOut](LoginPath,guest.Login)
srv.Listen(conf.HostPort)

Why are we doing like this? because usually a framework that I ever used is either insanely overlayered or no clear separation between controller and business logic (it still handles transport, serialization, etc), and validation only happenned on outermost layer or sometimes half outside half inside the business logic (which can make it vulnerable), which when we create unit test, the programmer tempted to test whole http/grpc layer instead of pure business logic. This way we can also use another kind of serialization/transport layer without having to modify the business logic side. Imagine if you use 1 controller to handle the business logic, how much hassle it is if you have to switch framework because some reasons (framework no longer maintained, performance bottleneck in framework side, the framework doesn't provide proper middleware for some fragile telemetry, need to add other kind of serialization format or protocol, etc). But with this kind of layering, for example if we want to add grpc or json-rpc or command line or switching framework, or anything else, it's easy, just need to add a layer with proper serialization/transport then call the original business logic.
 
 
mermaid link (number 2-4 is our control)

Talk is cheap, show me the code example! gingorm1 or echogorm1 is the minimal example (you can always change the framework to fiber or default net/http or any other framework, an the orm to sqlc sqlx jet. But if you are all alone don't want to inject the database functions (which against clean-architecture, but this is most sensible way) and want to test directly against the database, you can check this example fiber1 or sveltefiber (without database). Note that those just example, I would not inject the database as a function (see fiber1 for example), I would directly depend and use dockertest for managed dependencies, and only use function injection for unmanaged dependency (3rd party). Some more complex example can be found here: street.

This is only from code maintainer's point of view, there are WORST practice that I found in the past when continuing other people's project:
  • table-based entity microservice, every table or set of tables has their own microservice, it's overly granular, where some should be coupled instead (eg. user, group, roles, permission -- this should be one service instead of 2-3 services)
  • MVC microservice, one microservice for each layer of the MVC and worse it's on different repository, eg. API service for X (V), API service for Y (V), webhook for X (C), adapter for third party Z (C/M), service for persistence (M), etc -- should be separated by domains/capability instead of by MVC layer, why? because if we implement one feature, that normally in monolith we only need to change 1 repository, but if the microservice separated by MVC layer, we have to modify 2-4 microservice when implementing 1 feature, have to start 2-4 service just to debug something, which doesn't make sense. It might make a bit sense if you are using monorepo, but without it, it's more pain than the benefit.
  • pub-sub channel-goroutine without persistence, this one is ok only if the request is discardable (all or nothing), but if it's very important (money for example) you should always persist every state, and would be better if there's a worker that progressing every state into next state so we don't have to fix manually.
There's also some GOOD things I found:
  • Start with auth, metrics, logs (show request id to user, and useful response), proper dependency injection, don't let this became too late that you have to fix everything later
  • Standard go test instead of custom test framework, because go test has proper tooling on the IDEs
  • Use worker for slow things, which is make sense, don't let user wait, unless the product guy want it all sync. Also send failures to slack or telegram or something
  • CRON pattern, runs a code every specific time, this is good when you need a scheduled task (eg. billing, reminder, etc)
  • Query-job publish task, this pattern separate CRON from time dependency (query from db, get list of items to be processed, publish to MQ), so the publish task can be triggered independently (eg. if there's bug in only 1 item), and regardless of time, and the workers will pick up any kind of work that are late.
  • Separating proxy (transport/serialization) and processor (business-logic/persistence), this have a really good benefit in terms of scalability of small requests (not for upload/download big files), where we put generic reverse proxy, push it to two-way pub-sub, then return back the response. For example, we create a rest proxy, grpc proxy, json-rpc proxy, all those 3 will push to NATS, then worker/processor will process the request and return proper response, this works like lambda, so programmer only need to focus building the worker/processor part instead of generic serialization/transport, all the generic auth, logging, request id, metrics, can be handled by the proxy, programmer only need to focus on business logic.
    the chart is something like this:

    generic proxy <--> NATS <--> worker/processor <--> databases/3rd party

    this way we can also scale independently either with monolith or microservice. Service mesh? no! service bus/star is the way :3

2022-05-07

Getting started with Trino

Trino is a distributed query engine, that allows you to JOIN from multiple datasources (databases like mysql, postgresql, bigquery, cassandra, mongodb, redis, prometheus, elasticsearch, csv file, google sheets, s3, etc). It's like Clickhouse but without high-tech (merge-tree) storage ability, so it cannot do blazing fast analytics query like in Clickhouse, but it can be as fast as the connected database that it uses, eg. if it uses Clickhouse connected, then it can be as fast as Clickhouse. It was developed by Facebook (previously named Presto). List of database connectors can be seen here. To use Trino, you can use dockerized version or manual:

# Docker
docker run -d
 -p 8080:8080 --name trino1 trinodb/trino
# web UI only for monitoring, use random username 

docker exec -it trino1 trino

# Ubuntu 22.04
java --version
python3 --version
# download and extract from https://trino.io/download.html
mkdir 
./trino-server-379/etc
cd trino-server-379
SRCURL=https://raw.githubusercontent.com/trinodb/trino-the-definitive-guide/master/single-installation/etc
wget -c $SRCURL/jvm.config
wget -c $SRCURL/log.properties
wget -c $SRCURL/node.properties
echo '
coordinator=true
node-scheduler.include-coordinator=true
http-server.http.port=8081
query.max-memory=5GB
query.max-memory-per-node=1GB
discovery.uri=http://127.0.0.1:8081
' > config.properties
echo '

node.data-dir=/tmp/
' >> node.properties
mkdir catalog
echo '
connector.name=cassandra
cassandra.contact-points=127.0.0.1
# more here https://trino.io/docs/current/connector/cassandra.html
' > catalog/localscylla.properties 
cd ..
python3 ./bin/launcher.py run # to run in background: start

# CLI/Client
EXELOC=/usr/bin
/trino
curl -O $EXELOC https://repo1.maven.org/maven2/io/trino/trino-cli/379/trino-cli-379-executable.jar
chmod a+x $EXELOC
trino --server http://localhost:8081

These are the list of commands can be used in trino (other than standard SQL):

SHOW CATALOGS;
SHOW SCHEMAS FROM/IN __CATALOG__; # eg. localscylla
SHOW TABLES FROM/IN __CATALOG__.__SCHEMA__;
DESCRIBE __CATALOG__.__SCHEMA__.__TABLE__;
EXPLAIN SELECT * FROM __CATALOG__.__SCHEMA__.__TABLE__;

That's it, you can add more databases connection by creating more etc/catalog/*.properties file with proper configuration (username, password, port, etc).