Programming Rants

2022-12-24

Map to Struct and Struct to Map Golang Benchmark 2022 Edition

Sometimes we want to convert from map to struct or struct to map (dictionary in other language), or even struct to struct. There's some library that can help us doing this, for example structs, mapstructure, copier, or smapping. We could also utilize serialization and deserialization libraries to do this. With caveats, that some serialization format (eg. JSON) doesn't allow integer larger than 2^53 for example.

Here's benchmark that I run this morning:

map to struct	total	ns/op	B/op	allocs/op
M2S_GoccyGoJson_MarshalUnmarshal-32	6,661,932	517	80	3
M2S_JsonIteratorGo_MarshalUnmarshal-32	4,892,611	724	196	8
M2S_VmihailencoMspackV5_MarhsalUnmarshal-32	4,572,597	741	188	5
M2S_FxamackerCbor_MarshalUnmarshal-32	4,418,558	799	120	8
M2S_SurrealdbCork_EncodeDecode-32	3,080,282	1,080	1,217	6
M2S_GopkgInMgoV2Bson_MarshalUnmarshal-32	3,227,905	1,092	232	13
M2S_ShamatonMsgpackV2_MarshalUnmarshal-32	3,062,677	1,161	956	15
M2S_MitchellhMapstructure_Decode-32	2,487,428	1,395	720	18
M2S_MongoDriverBson_MarshalUnmarshal-32	2,477,983	1,459	414	14
M2S_KokizzuJson5b_MarshalUnmarshal-32	1,987,240	1,711	632	16
M2S_EncodingJson_MarshalUnmarshal-32	2,056,944	1,780	600	16
M2S_EtNikBinngo_MarshalUnmarshal-32	1,985,595	1,857	425	39
M2S_PquernaFfjson_MarshalUnmarshal-32	1,739,968	1,986	609	16
M2S_UngorjiGocodec_BincEncodeDecode-32	1,401,453	2,582	4,340	23
M2S_UngorjiGoCodec_CborEncodeDecode-32	1,304,828	2,636	4,340	23
M2S_PelletierGoTomlV2_MarshalUnmarshal-32	1,284,037	2,787	1,600	27
M2S_UngorjiGocodec_SimpleEncodeDecode-32	1,295,926	2,810	4,340	23
M2S_UngorjiGocodec_JsonEncodeDecode-32	1,000,000	3,028	4,956	25
M2S_IchibanTnetstrings_MarshalUnmarshal-32	749,947	5,056	9,329	48
M2S_BurntSushiToml_EncodeUnmarshal-32	425,335	8,065	7,958	71
M2S_HjsonHjsonGoV4_MarshalUnmarshal-32	355,784	10,870	3,936	78
M2S_GopkgInYamlV3_MarshalUnmarshal-32	271,190	13,524	14,112	80
M2S_DONUTSLz4Msgpack_MarshalUnmarshal-32	240,619	15,498	1,264	16
M2S_GoccyGoYaml_MarshalUnmarshal-32	214,776	16,192	7,821	214
M2S_GhodssYaml_MarshalUnmarshal-32	156,412	23,347	21,378	161
M2S_NaoinaToml_MarshalUnmarshal-32	57,607	58,331	398,544	77

struct to map	total	ns/op	B/op	allocs/op
S2M_MitchellhMapstructure_Decode-32	5,055,402	716	536	12
S2M_GoccyGoJson_MarshalUnmarshal-32	4,660,224	747	522	12
S2M_JsonIteratorGo_MarshalUnmarshal-32	4,283,262	835	505	14
S2M_VmihailencoMspackV5_MarhsalUnmarshal-32	4,009,863	908	607	12
S2M_FxamackerCbor_MarshalUnmarshal-32	3,562,352	1,023	452	11
S2M_ShamatonMsgpackV2_MarshalUnmarshal-32	3,180,010	1,089	556	15
S2M_GopkgInMgoV2Bson_MarshalUnmarshal-32	3,047,396	1,145	528	15
S2M_SurrealdbCork_EncodeDecode-32	2,976,328	1,196	1,611	12
S2M_EncodingJson_MarshalUnmarshal-32	1,914,165	1,782	688	18
S2M_PquernaFfjson_MarshalUnmarshal-32	1,911,950	1,845	697	18
S2M_EtNikBinngo_MarshalUnmarshal-32	1,948,802	1,859	768	45
S2M_KokizzuJson5b_MarshalUnmarshal-32	1,888,774	1,884	960	20
S2M_MongoDriverBson_MarshalUnmarshal-32	1,857,649	1,995	759	18
S2M_PelletierGoTomlV2_MarshalUnmarshal-32	1,244,012	2,864	1,800	31
S2M_UngorjiGocodec_BincEncodeDecode-32	1,000,000	3,234	4,888	34
S2M_UngorjiGoCodec_CborEncodeDecode-32	989,671	3,358	4,888	34
S2M_UngorjiGocodec_SimpleEncodeDecode-32	1,000,000	3,400	4,888	34
S2M_UngorjiGocodec_JsonEncodeDecode-32	912,512	3,639	5,504	36
S2M_IchibanTnetstrings_MarshalUnmarshal-32	776,796	4,744	9,561	46
S2M_BurntSushiToml_EncodeUnmarshal-32	447,216	8,538	8,231	73
S2M_HjsonHjsonGoV4_MarshalUnmarshal-32	389,476	9,416	3,868	66
S2M_GopkgInYamlV3_MarshalUnmarshal-32	315,939	13,338	14,400	81
S2M_DONUTSLz4Msgpack_MarshalUnmarshal-32	242,330	14,298	744	16
S2M_GoccyGoYaml_MarshalUnmarshal-32	230,042	14,919	7,580	202
S2M_GhodssYaml_MarshalUnmarshal-32	151,023	22,682	21,441	161
S2M_NaoinaToml_MarshalUnmarshal-32	60,916	52,047	398,112	80

struct to struct	total	ns/op	B/op	allocs/op
S2S_GoccyGoJson_MarshalUnmarshal-32	12,046,497	317	112	4
S2S_ShamatonMsgpackV2_MarshalUnmarshal-32	7,897,488	458	148	6
S2S_JsonIteratorGo_MarshalUnmarshal-32	7,853,592	494	92	6
S2S_FxamackerCbor_MarshalUnmarshal-32	7,038,808	511	80	5
S2S_GopkgInMgoV2Bson_MarshalUnmarshal-32	5,105,343	715	144	9
S2S_VmihailencoMspackV5_MarhsalUnmarshal-32	4,549,700	818	213	6
S2S_MongoDriverBson_MarshalUnmarshal-32	3,560,946	1,019	321	8
S2S_EncodingJson_MarshalUnmarshal-32	2,731,051	1,313	304	9
S2S_PquernaFfjson_MarshalUnmarshal-32	2,734,357	1,330	304	9
S2S_KokizzuJson5b_MarshalUnmarshal-32	2,594,728	1,343	504	9
S2S_SurrealdbCork_EncodeDecode-32	2,555,745	1,397	1,241	7
S2S_EtNikBinngo_MarshalUnmarshal-32	1,995,468	1,840	400	41
S2S_PelletierGoTomlV2_MarshalUnmarshal-32	1,460,683	2,459	1,440	23
S2S_UngorjiGocodec_SimpleEncodeDecode-32	1,205,648	2,919	4,364	24
S2S_UngorjiGoCodec_CborEncodeDecode-32	1,290,734	2,920	4,364	24
S2S_UngorjiGocodec_BincEncodeDecode-32	1,207,327	3,007	4,364	24
S2S_UngorjiGocodec_JsonEncodeDecode-32	1,000,000	3,223	4,980	26
S2S_IchibanTnetstrings_MarshalUnmarshal-32	722,493	4,950	9,289	47
S2S_BurntSushiToml_EncodeUnmarshal-32	398,366	8,458	7,918	72
S2S_HjsonHjsonGoV4_MarshalUnmarshal-32	304,369	11,189	4,578	79
S2S_DONUTSLz4Msgpack_MarshalUnmarshal-32	282,505	12,215	237	7
S2S_GopkgInYamlV3_MarshalUnmarshal-32	279,332	12,541	14,016	76
S2S_GoccyGoYaml_MarshalUnmarshal-32	211,952	15,542	7,982	208
S2S_GhodssYaml_MarshalUnmarshal-32	160,660	23,148	21,073	154
S2S_NaoinaToml_MarshalUnmarshal-32	64,468	59,672	399,065	83

The repository and the always updated result is here, feel free to add your own serialization/deserialization library. As we can see, goccy-gojson is the fastest among all, too bad if you store int64 larger than 2^53 it give wrong result. So it's better to use second best and all rounder vmihailenco-msgpack, or for specific use case struct to map/struct is mapstructure.

Here's the top ranking:

Ser/Deser	M2S	S2M	S2S
GoccyGoJson	1	2	2
JsonIteratorGo	2	3	4
MitchellhMapstructure	8	1	1
VmihailencoMspackV5	3	4	7
FxamackerCbor	4	5	5
ShamatonMsgpackV2	7	6	3
SurrealdbCork	5	8	12

2022-09-08

Getting started with InfluxDB

Today we're gonna learn about InfluxDB, a time series database that already been a standard for open source metric storage, a part of TICK stack (telegraf -- a metric collector like datadog-agent/fluentd, influx -- the database, chronograf -- visualizer like grafana, kapacitor -- an alert manager like prometheus alert-manager also for ETL). To install influx use this steps from this link (there's also docker). What's the cons of influx compared to other solution? Datadog is (very very) expensive, Prometheus seems too kube-oriented, Influx open source doesn't support HA and clustering, that's why I would rather use Clickhouse for time series / log collection (as data sink) and aggregate those logs as metrics (materialized view) and copy periodically to faster database like Tarantool. Influx have 2 query syntax, one is SQL-like called InfluxQL and other one Javascript+FP-like called Flux (it has |> pipe like most FP). Here's the example of docker and the query language:

docker run -d -e INFLUXDB_ADMIN_USER:user1 -e INFLUXDB_ADMIN_PASSWORD:pass1 --name influxdb1 influxdb

docker exec -it influxdb1 influx -host 127.0.0.1 -port 8086 -username user1 -password pass1

show databases
create database db1
use db1

-- show all tables (=measurements)
show measurements

-- tag/index always string, field can be float, int, bool, etc
insert into table1,tag1=a,tag2=b field1=1,field2=2

-- if not set, default time is in nanosecond of insert time
select * from "table1"
select * from table1 where tag1='a'

-- describe columns from all table:
show tag keys
show field keys from "table1"

-- select distinct tag from table1
show tag values from "table1" with key in ("tag1")

-- automatically delete after N days, default 0 = never delete
-- shard by N hours, default 168 hours (a week)
show retention policies

-- like partition in other database
show shards
show shard groups

I guess that's it what you need to know to get started with InfluxDB. If you're looking one that comparable with Clickhouse, you can check TDEngine or TiFlash.

2022-07-28

Techempower Framework Benchmark Round 21

The result for Techempower framework benchmark round 21 is out, as usual the most important benchmark is the update and multi query benchmark:

The top rankers are Rust, JS (cheating), Java, C++, C#, PHP, C, Scala, Kotlin, and Go. For multiple queries:

Top rankers are Rust, Java, JS (cheating), Scala, Kotlin, C++, C#, and Go. These benchmark shows how efficient their database driver (which mostly the biggest bottleneck), and how much overhead the framework of each language (including serialization, alloc/GC, async-I/O, etc).

For memory usage and CPU utilization you can check here https://ajdust.github.io/tfbvis/?testrun=Citrine_started2022-06-30_edd8ab2e-018b-4041-92ce-03e5317d35ea&testtype=update&round=false

2022-06-07

How to profile your Golang Fiber server

Usually you need to load test your webserver to measure where's the memory leak or where's the bottleneck, and Golang already provided tool to do that, called pprof. What you need to do is depends on your framework that you use, but it's all similar, most framework already have a middleware that you can import and use, for example in fiber there's pprof middleware, to use:

// import

"github.com/gofiber/fiber/v2/middleware/pprof"

// use
app.Use(pprof.New())

It would create a route called /debug/pprof that you can use, just start the server, then open that path. To profile or check heap you just need to click profile/heap link, it would wait for around 10 seconds, meanwhile it waits, you must hit other endpoints to generate traffic/function calls. After 10 seconds, it would show a download dialog to save your cpu profile or heap profile. From that file, you can run a command similar to gops, for example if you want to generate svg or generate web that shows your profiling:

pprof -web /tmp/profile # or
pprof -svg /tmp/profile # <-- file that you just downloaded

It would generate something like this:

So you can find out which function that took most of the CPU time (or if it's heap profile, which function that generates/allocate most memory usage. In my case the bottleneck is the default built-in pretty logger, it limits the number of requests it can only handle to ~9K rps for concurrency 255 on database write benchmark, that if we remove built-in logging and replace with zerolog, it can handle ~57K rps for same benchmark.

2022-05-31

Getting started with SaltStack Configuration Management

SaltStack is one of the least popular Automation and Configuration Management tool, similar to Ansible, Chef, and Puppet (also stuff that not a CM, it's IaC tool but if you are skilled enough can be used as CM: Terraform and Pulumi). Unlike Ansible, it have agent that needs to be installed to the target servers. Unlike Chef and Puppet, it uses Python. Like Ansible, SaltStack can be used for infrastructure lifecycle management too. To install Salt, you need to run these commands:

sudo add-apt-repository ppa:saltstack/salt
sudo apt install salt-master salt-minion salt-cloud

# for minion:
sudo vim /etc/salt/minion
# change line "#master: salt" into:
# master: 127.0.0.1 # should be master's IP
# id: thisMachineId
sudo systemctl restart salt-minion

There's simpler way to install, bootstrap master and minion, it's using a bootstrap script:

curl -L https://bootstrap.saltstack.com -o install_salt.sh
sudo sh install_salt.sh -M # for master

sudo sh install_salt.sh # for minion
# don't forget change /etc/salt/minion, and restart salt-minion

To the minion will send the key to master, to list all possible minion, run:

sudo salt-key -L

To accept/add minion, you can use:

sudo salt-key --accept=FROM_LIST_ABOVE
sudo salt-key -A # accept all unaccepted, -R reject all

# test connection to all minion
sudo salt '*' test.ping

# run a command
sudo salt '*' cmd.run 'uptime'

If you don't want open port, you can also use salt-ssh, so it would work like Ansible:

pip install salt-ssh
# create /etc/salt/roster containing:
ID1:
host: minionIp1
user: root
sudo: True
ID2_usually_hostname:
host: minionIp2
user: root
sudo: True

To execute a command on all roster you can use salt-ssh:

salt-ssh '*' cmd.run 'hostname'

On SaltStack, there's 5 things that you need to know:

Master (the controller)
Minion (the servers/machines being controlled)
States (current state of servers)
Modules (same like Ansible modules)

Grains (facts/properties of machines, to gather facts call: salt-call --grains)
Execution (execute action on machines, salt-call moduleX.functionX, for example: cmd.run 'cat /etc/hosts | grep 127. ; uptime ; lsblk', or user.list_users, or grains.ls -- for example to list all grains available properties, or grains.get ipv4_interfaces, or grains.get ipv4_interfaces:docker0)
States (idempotent multiplatform for CM)
Renderers (module that transform any format to Python dictionary)
Pillars (user configuration properties)

salt-call (primary command)

To run locally just add --local. To run on every host we can use salt '*' modulename.functionname. The wildcard can be changed with compound filtering argument, more detail and example here.

To start using salt with file mode, create a directory called salt, and create a top.sls file (it's just a yaml combined with jinja) which contains list of host, filters, and state module you want to call, usually it saved on /srv/ directory, containing:

base:
'*': # every machine

- requirements # state module to be called
- statemodule0
dev1:
'osrelease:*22.04*': # only machine with specific os version
- match: grain
- statemodule1
dev2:
'os:MacOS': # only run on mac
- match: grain
- statemodule2/somesubmodule1
prod:
'os:Pop and host:*.domain1': # only run on PopOS with tld domain1
- match: grain
- statemodule3
- statemodule4
'os:Pop': # this too will run if the machine match
- match: grain
- statemodule5

Then create a directory for each those items containing init.sls or create files for each those items with .sls extension. For example requirements.sls:

essential-packages: # ID = what this state module do

pkg.installed: # module name, function name

- pkgs:

- bash

- build-essentials

- git

- tmux

- byobu

- zsh

- curl

- htop

- python-software-properties

- software-properties-common

- apache2-utils
file.managed:
- name: /tmp/a/b
- makedirs: True
- user: root
- group: wheel
- mode: 644
- source: salt://files/b # will copy ./files/b to machine

file.managed:

- name: /tmp/a/c
- contents: # this will create a file with specific lines
- line 1
- line 2
service.running:
- name: myservice1
- watch: # will restart the service if these changed
- file: /etc/myservice.conf
- file: /tmp/a/b
file.append:
- name: /tmp/a/c
- text: 'some line' # will append to that file
cmd.run:
- name: /bin/someCmd1 param1; /bin/cmd2 --someflag2

- onchanges:

- file: /tmp/a/c # run cmd above if this file changed
file.directory: # ensure directory created
- name: /tmp/d
- user: root
- dirmode: 700
archive.extracted: # extract from specific archive file

- name: /tmp/e
- source: https://somedomain/somefile.tgz
- force: True
- keep_source: True
- clean: True

To apply run: salt-call state.apply requirements

Some other example, we can create a template with jinja and yaml combined, like this:

statemodule0:
file.managed:
- name: /tmp/myconf.cfg # will copy file based on jinja condition

{% if '/usr/bin' in grains['pythonpath'] %}
- source: salt://files/defaultpython.conf
{% elif 'Pop' == grains['os'] %}
- source: salt://files/popos.conf
{% else %}
- source: salt://files/unknown.conf
{% endif %}
- makedirs: True
cmd.run:
- name: echo
- onchanges:

- file: statemodule0 # refering statemodule0.file.managed.name

To create a python state module, you can create a file containing something like this:

#!py
def run():
config = {}
a_var = 'test1' # we can also do a loop, everything is dict/array
config['create_file_{}'.format(a_var)] = {
'file.managed': [
{'name': '/tmp/{}'.format(a_var)},
{'makedirs': True},
{'contents': [
'line1',
'line2'
]
},
],
}
return config

To include another state module, you can specify on statemodulename/init.sls, something like this:

include:
- .statemodule2 # if this a folder, will run the init.sls inside
- .statemodule3 # if this a file, will run statemodule3.sls

To run all state it you can call salt-call state.highstate or salt-call state.apply without any parameter.
It would execute top.sls file and the includes in order recursively.
To create a scheduled state, you can create a file containing something like this:

id_for_this:
schedule.present:
- name: highstate
- run_on_start: True
- function: state.highstate
- minutes: 60
- maxrunning: 1
- enabled: True
- returner: rawfile_json
- splay: 600

Full example can be something like this:

install nginx:
pkg.install:
- nginx

/etc/nginx/nginx.conf: # used as name
file.managed:
source: salt://_files/nginx.j2
template: jinja
require:
- install nginx

run nginx:

service.running:
name: nginx
enable: true
watch:
- /etc/nginx/nginx.conf

Next, to create a pillar config, just create a normal sls file, containing something like this:

user1:
active: true
sudo: true
ssh_keys:
- ssh-rsa censored user1@domain1
nginx:
server_name: 'foo.domain1'

To reference this on other salt file, you can use jinja something like this:

{% set sn = salt['pillar.get']('nginx:server_name') -%}
server {
listen 443 ssl;
server_name {{ sn }};
...

That's it for now, next if you want to learn more is to create your own executor module or other topics here.

2022-05-08

How to structure/layer your Golang Project (or whatever language you are using)

I've been maintaing a lot of other people's project, and see that most people blindly following a framework's structure without purpose. So I write this so that people can be convinced how to write a good directory structure or good layering on your application, especially for this case when creating a service.

It would be better to split your project to exactly 3 layers:

presentation (handle only serialization/deserialization, and transport)
business-logic (handles pure business logic, DTO goes here)
persistence (handles records and it's persistence, DAO goes here)

It could be a package with bunch of sub-package, or per domain basis, or per user role basis. For example:

# monolith example

bin/ # should be added to .gitignore

presenter/ # de/serialization, transport, formatter goes here

grpc.go

restapi.go

cli.go

...

businesslogic/ # pure business logic and INPUT/OUTPUT=DTO struct

role1|role2|etc.go

domain1|domain2|etc.go

...

models/ # all DAO/data access model goes here

users.go

schedules.go

posts.go

...

deploy/

Dockerfile

docker-compose.yml

deploy.sh

pkg/ # all 3rd party helpers/lib/wrapper goes here

mysql/

redpanda/

aerospike/

minio/

# Vertical-slice example

bin/

domain1/

presenter1.go

business1.go

models1.go

domain2/

presenter2.go

business2.go

models2.go

...

deploy/

pkg/

Also it's better to inject per function basis instead of whole struct/interface, something like this:

type LoginIn struct { 
  Email string
  Password string
  // normally I embed CommonRequest object
}
type LoginOut struct {
  // normally I embed CommonResponse object with properties:
  SessionToken string
  Error string
  Success bool
}
type GuestDeps struct {
  GetUserByEmailPass func(string, string) (*User, error)
  // other injected dependencies, eg. S3 uploader, 3rd party libs
}
func (g *GuestDeps) Login(in *LoginIn) (out LoginOut) {
  // do validation
  // do retrieve from database
  // return proper object
}

So when you need to do testing, all you need is create a fake, either with counterfeiter (if you inject an interface instead of function) or manual, then check with autogold:

func TestLogin(t *testing.T) {
t.Run(`fail_case`, func(t *testing.T){
in := LoginIn{}
deps := GuestDeps{
GetUserByEmailPass: func(string,string) { return nil, errors.New("failed to retrieve from db") }
}
out := deps.Login(in)
want := autogold.Want("fail_case",nil)
want.Equal(t, out) // ^ update with go test -update
})
}

then on the main (real server implementation, you can put real dependency something like this:

rUser := userRepo.NewPgConn(conf.PgConnStr)
srv := httpRest.New( ... )
guest := GuestDeps{
GetUserByEmailPass: rUser.GetUserByEmailPass,
DoFileUpload: s3client.DoFileUpload,
...
}
srv.Handle[LoginIn,LoginOut](LoginPath,guest.Login)
srv.Listen(conf.HostPort)

Why are we doing like this? because usually a framework that I ever used is either insanely overlayered or no clear separation between controller and business logic (it still handles transport, serialization, etc), and validation only happenned on outermost layer or sometimes half outside half inside the business logic (which can make it vulnerable), which when we create unit test, the programmer tempted to test whole http/grpc layer instead of pure business logic. This way we can also use another kind of serialization/transport layer without having to modify the business logic side. Imagine if you use 1 controller to handle the business logic, how much hassle it is if you have to switch framework because some reasons (framework no longer maintained, performance bottleneck in framework side, the framework doesn't provide proper middleware for some fragile telemetry, need to add other kind of serialization format or protocol, etc). But with this kind of layering, for example if we want to add grpc or json-rpc or command line or switching framework, or anything else, it's easy, just need to add a layer with proper serialization/transport then call the original business logic.

mermaid link (number 2-4 is our control)

Talk is cheap, show me the code example! gingorm1 or echogorm1 is the minimal example (you can always change the framework to fiber or default net/http or any other framework, an the orm to sqlc sqlx jet. But if you are all alone don't want to inject the database functions (which against clean-architecture, but this is most sensible way) and want to test directly against the database, you can check this example fiber1 or sveltefiber (without database). Note that those just example, I would not inject the database as a function (see fiber1 for example), I would directly depend and use dockertest for managed dependencies, and only use function injection for unmanaged dependency (3rd party). Some more complex example can be found here: street.

This is only from code maintainer's point of view, there are WORST practice that I found in the past when continuing other people's project:

table-based entity microservice, every table or set of tables has their own microservice, it's overly granular, where some should be coupled instead (eg. user, group, roles, permission -- this should be one service instead of 2-3 services)
MVC microservice, one microservice for each layer of the MVC and worse it's on different repository, eg. API service for X (V), API service for Y (V), webhook for X (C), adapter for third party Z (C/M), service for persistence (M), etc -- should be separated by domains/capability instead of by MVC layer, why? because if we implement one feature, that normally in monolith we only need to change 1 repository, but if the microservice separated by MVC layer, we have to modify 2-4 microservice when implementing 1 feature, have to start 2-4 service just to debug something, which doesn't make sense. It might make a bit sense if you are using monorepo, but without it, it's more pain than the benefit.
pub-sub channel-goroutine without persistence, this one is ok only if the request is discardable (all or nothing), but if it's very important (money for example) you should always persist every state, and would be better if there's a worker that progressing every state into next state so we don't have to fix manually.

There's also some GOOD things I found:

Start with auth, metrics, logs (show request id to user, and useful response), proper dependency injection, don't let this became too late that you have to fix everything later
Standard go test instead of custom test framework, because go test has proper tooling on the IDEs
Use worker for slow things, which is make sense, don't let user wait, unless the product guy want it all sync. Also send failures to slack or telegram or something
CRON pattern, runs a code every specific time, this is good when you need a scheduled task (eg. billing, reminder, etc)
Query-job publish task, this pattern separate CRON from time dependency (query from db, get list of items to be processed, publish to MQ), so the publish task can be triggered independently (eg. if there's bug in only 1 item), and regardless of time, and the workers will pick up any kind of work that are late.
Separating proxy (transport/serialization) and processor (business-logic/persistence), this have a really good benefit in terms of scalability of small requests (not for upload/download big files), where we put generic reverse proxy, push it to two-way pub-sub, then return back the response. For example, we create a rest proxy, grpc proxy, json-rpc proxy, all those 3 will push to NATS, then worker/processor will process the request and return proper response, this works like lambda, so programmer only need to focus building the worker/processor part instead of generic serialization/transport, all the generic auth, logging, request id, metrics, can be handled by the proxy, programmer only need to focus on business logic.
the chart is something like this:

generic proxy <--> NATS <--> worker/processor <--> databases/3rd party

this way we can also scale independently either with monolith or microservice. Service mesh? no! service bus/star is the way :3

2022-05-07

Getting started with Trino

Trino is a distributed query engine, that allows you to JOIN from multiple datasources (databases like mysql, postgresql, bigquery, cassandra, mongodb, redis, prometheus, elasticsearch, csv file, google sheets, s3, etc). It's like Clickhouse but without high-tech (merge-tree) storage ability, so it cannot do blazing fast analytics query like in Clickhouse, but it can be as fast as the connected database that it uses, eg. if it uses Clickhouse connected, then it can be as fast as Clickhouse. It was developed by Facebook (previously named Presto). List of database connectors can be seen here. To use Trino, you can use dockerized version or manual:

# Docker
docker run -d -p 8080:8080 --name trino1 trinodb/trino
# web UI only for monitoring, use random username
docker exec -it trino1 trino

# Ubuntu 22.04
java --version
python3 --version
# download and extract from https://trino.io/download.html
mkdir ./trino-server-379/etc

cd trino-server-379
SRCURL=https://raw.githubusercontent.com/trinodb/trino-the-definitive-guide/master/single-installation/etc
wget -c $SRCURL/jvm.config
wget -c $SRCURL/log.properties
wget -c $SRCURL/node.properties
echo '
coordinator=true
node-scheduler.include-coordinator=true
http-server.http.port=8081
query.max-memory=5GB
query.max-memory-per-node=1GB
discovery.uri=http://127.0.0.1:8081
' > config.properties
echo '
node.data-dir=/tmp/
' >> node.properties
mkdir catalog
echo '

connector.name=cassandra

cassandra.contact-points=127.0.0.1
# more here https://trino.io/docs/current/connector/cassandra.html

' > catalog/localscylla.properties

cd ..
python3 ./bin/launcher.py run # to run in background: start

# CLI/Client
EXELOC=/usr/bin/trino
curl -O $EXELOC https://repo1.maven.org/maven2/io/trino/trino-cli/379/trino-cli-379-executable.jar
chmod a+x $EXELOC
trino --server http://localhost:8081

These are the list of commands can be used in trino (other than standard SQL):

SHOW CATALOGS;

SHOW SCHEMAS FROM/IN __CATALOG__; # eg. localscylla

SHOW TABLES FROM/IN __CATALOG__.__SCHEMA__;

DESCRIBE __CATALOG__.__SCHEMA__.__TABLE__;

EXPLAIN SELECT * FROM __CATALOG__.__SCHEMA__.__TABLE__;

That's it, you can add more databases connection by creating more etc/catalog/*.properties file with proper configuration (username, password, port, etc).

Subscribe to: Posts ( Atom )