Programming Rants: golang

Showing posts with label golang. Show all posts

2022-06-07

How to profile your Golang Fiber server

Usually you need to load test your webserver to measure where's the memory leak or where's the bottleneck, and Golang already provided tool to do that, called pprof. What you need to do is depends on your framework that you use, but it's all similar, most framework already have a middleware that you can import and use, for example in fiber there's pprof middleware, to use:

// import

"github.com/gofiber/fiber/v2/middleware/pprof"

// use
app.Use(pprof.New())

It would create a route called /debug/pprof that you can use, just start the server, then open that path. To profile or check heap you just need to click profile/heap link, it would wait for around 10 seconds, meanwhile it waits, you must hit other endpoints to generate traffic/function calls. After 10 seconds, it would show a download dialog to save your cpu profile or heap profile. From that file, you can run a command similar to gops, for example if you want to generate svg or generate web that shows your profiling:

pprof -web /tmp/profile # or
pprof -svg /tmp/profile # <-- file that you just downloaded

It would generate something like this:

So you can find out which function that took most of the CPU time (or if it's heap profile, which function that generates/allocate most memory usage. In my case the bottleneck is the default built-in pretty logger, it limits the number of requests it can only handle to ~9K rps for concurrency 255 on database write benchmark, that if we remove built-in logging and replace with zerolog, it can handle ~57K rps for same benchmark.

2022-04-05

Start/restart Golang or any other binary program automatically on boot/crash

There are some alternative to make program start on boot on Linux, the usual way is using:

1. SystemD, it could ensure that dependency started before your service, also could limit your CPU/RAM usage. Generate a template using this website or use kardianos/service

2. PM2 (requires NodeJS), or PMG

3. docker-compose (requires docker, but you can skip the build part, just copy the binary directly on Dockerfile command (that can be deployed using rsync), just set restart property on docker-compose and it would restart when computer boot), -- bad part, you cannot limit cpu/ram unless using docker swarm. But you can use docker directly to limit and use --restart flag.

3. lxc/lxd or multipass or other vm/lightweight vm (but still need systemd inside it XD at least it won't ruin your host), you can rsync directly to the container to redeploy, for example using overseer or tableflip, you must add reverse proxy or nat or proper routing/ip forwarding tho if you want it to be accessed from outside

4. supervisord (python) or ochinchina/supervisord (golang) tutorial here

5. create one daemon manager with systemd/docker-compose, then spawn the other services using goproc or pioz/god

6. monit it can monitor and ensure a program started/not dead

7. nomad (actually this one is deployment tool), but i can also manage workload

8. kubernetes XD overkill

9. immortal.run a supervisor, this one actually using systemd

10. other containerization/VM workload orchestrator/manager that usually already provided by the hoster/PaaS provider (Amazon ECS/Beanstalk/Fargate, Google AppEngine, Heroku, Jelastic, etc)

This is the systemd script that I usually use (you need to create user named "web" and install "unbuffer"):

$ cat /usr/lib/systemd/system/xxx.service
[Unit]
Description=xxx
After=network-online.target postgresql.service
Wants=network-online.target

[Service]
Type=simple
Restart=on-failure
User=web
Group=users
WorkingDirectory=/home/web/xxx
ExecStart=/home/web/xxx/run_production.sh
ExecStop=/usr/bin/killall xxx
LimitNOFILE=2097152
LimitNPROC=65536
ProtectSystem=full
NoNewPrivileges=true

[Install]
WantedBy=multi-user.target

$ cat /home/web/xxx/run_production.sh
#!/usr/bin/env bash

mkdir -p `pwd`/logs
ofile=`pwd`/logs/access_`date +%F_%H%M%S`.log
echo Logging into: $ofile
unbuffer time ./xxx | tee $ofile

2022-04-04

Automatic Load Balancer Registration/Deregistration with NATS or FabioLB

Today we're gonna test 2 alternative for automatic load balancing (previously I always use Caddy or NginX (because most of my projects is single server -- the bottleneck is always the database not the backend/compute part), with manual reverse proxy configuration, but today we're gonna test 2 possible way to high-availability load balance strategy (without kubernetes of course), first is using NATS, second one is using standard load balancer, in this case FabioLB.

To use NATS, we're gonna use this strategy:
first one we deploy is the our custom reverse proxy, that should able to convert any query string, form body with any kind of content-type, and any header if needed, we can use any serialization format (json, msgpack, protobuf, etc), but in this case we're just gonna use normal string, we call this service "apiproxy". The apiproxy will send the serialized payload (from map/object) into NATS using request-reply mechanism. Another service is our backend "worker"/handler, that could be anything, but in this case is our real handler that would contain our business logic, so we need to subscribe and return a reply to the apiproxy and it would be deserialized back to the client with any serizaliation format and protocol (gRPC/Websocket/HTTP-REST/JSONP/etc). Here's the benchmark result of normal Fiber without any proxy, apiproxy-nats-worker with single nats vs multi nats instance

# no proxy
go run main.go apiserver
hey -n 1000000 -c 255 http://127.0.0.1:3000
Average:      0.0011 secs
Requests/sec: 232449.1716

# single nats
go run main.go apiproxy
go run main.go # worker
hey -n 1000000 -c 255 http://127.0.0.1:3000
Average:      0.0025 secs
Requests/sec: 100461.5866

# 2 worker
Average:      0.0033 secs
Requests/sec: 76130.4079

# 4 worker
Average:      0.0051 secs
Requests/sec: 50140.6288

# limit the apiserver CPU
GOMAXPROCS=2 go run main.go apiserver
Average:      0.0014 secs
Requests/sec: 184234.0106

# apiproxy 2 core
# 1 worker 2 core each
Average:      0.0025 secs
Requests/sec: 103007.4516

# 2 worker 2 core each
Average:      0.0029 secs
Requests/sec: 87522.6801

# 4 worker 2 core each
Average:      0.0037 secs
Requests/sec: 67714.5851

# seems that the bottleneck is spawning the producer's NATS
# spawning 8 connections using round-robin

# 1 worker 2 core each
Average:      0.0021 secs
Requests/sec: 121883.4324

# 4 worker 2 core each
Average:      0.0030 secs
Requests/sec: 84289.4330

# seems also the apiproxy is hogging all the CPU cores
# limiting to 8 core for apiproxy
# now synchronous handler changed into async/callback version
GOMAXPROCS=8 go run main.go apiserver

# 1 worker 2 core each
Average:      0.0017 secs
Requests/sec: 148298.8623

# 2 worker 2 core each
Average:      0.0017 secs
Requests/sec: 143958.4056

# 4 worker 2 core each
Average:      0.0029 secs
Requests/sec: 88447.5352

# limiting the NATS to 4 core using go run on the source
# 1 worker 2 core each
Average:      0.0013 secs
Requests/sec: 194787.6327

# 2 worker 2 core each
Average:      0.0014 secs
Requests/sec: 176702.0119

# 4 worker 2 core each
Average:      0.0022 secs
Requests/sec: 116926.5218

# same nats core count, increase worker core count
# 1 worker 4 core each
Average:      0.0013 secs
Requests/sec: 196075.4366

# 2 worker 4 core each
Average:      0.0014 secs
Requests/sec: 174912.7629

# 4 worker 4 core each
Average:      0.0021 secs
Requests/sec: 121911.4473 --> see update below

Could be better if it was tested in multiple server, but it seems the bottleneck is on NATS connection when have many subscriber, they could not scale linearly (16-66% overhead for a single API proxy) IT's A BUG ON MY SIDE, SEE UPDATE BELOW. Next we're gonna try FabioLB with Consul, Consul used for service registry (it's a synchronous-consistent "database" like Zookeeper or Etcd). To install all of it use this commands:

# setup:
curl -fsSL https://apt.releases.hashicorp.com/gpg | sudo apt-key add -
sudo apt-add-repository "deb [arch=amd64] https://apt.releases.hashicorp.com $(lsb_release -cs) main"
sudo apt install consul
go install github.com/fabiolb/fabio@latest

# start:
sudo consul agent -dev --data-dir=/tmp/consul
fabio
go run main.go -addr 172.17.0.1:5000 -name svc-a -prefix /foo -consul 127.0.0.1:8500

# benchmark:
# without fabio
Average:      0.0013 secs
Requests/sec: 197047.9124

# with fabio 1 backend
Average:      0.0038 secs
Requests/sec: 65764.9021

# with fabio 2 backend
go run main.go -addr 172.17.0.1:5001 -name svc-a -prefix /foo -consul 127.0.0.1:8500

# the bottleneck might be the cores, so we limit the cores to 2 for each worker
# with fabio 1 backend 2 core each
Average:      0.0045 secs
Requests/sec: 56339.5518

# with fabio 2 backend 2 core each
Average:      0.0042 secs
Requests/sec: 60296.9714

# what if we limit also the fabio
GOMAXPROCS=8 fabio

# with fabio 8 core, 1 backend 2 core each
Average:      0.0042 secs
Requests/sec: 59969.5206

# with fabio 8 core, 2 backend 2 core each
Average:      0.0041 secs
Requests/sec: 62169.2256

# with fabio 8 core, 4 backend 2 core each
Average:      0.0039 secs
Requests/sec: 64703.8253

All CPU cores utilized around 50% of 32-core server 128GB RAM, can't find which part the bottleneck for now, but for sure both strategy have around 16% vs 67% overhead compared for non proxies (which is make sense because adding more layer will add more transport and more things to copy/transfer and transform/serialize-deserialize). The code used in this benchmark is here, on 2022mid directory, and the code for fabio-consul registration copied from ebay's github repository.

Why even we need to do this? If we're using api gateway pattern (one of the pattern that being used in my past company, but with Kubernetes on worker part), we could deploy independently and communicate between service using the gateway (proxy) without knowing the IP address or domain name of the service itself, as long as it have proper route and payload it can be handled wherever the service being deployed. What if you want to do canary or blue green deployment? you can just register a handler in nats or consul with different route name (especially for communication between services, not public to service), and wait for all traffic to be moved there before killing previous deployment.

So what should you choose? both strategy requires 3 moving part (apiproxy-nats-worker, fabio-consul-worker) but NATS strategy simpler in the development and can give better performance (especially if you make the apiproxy to be as flexible as possible), but it needs to have better serialization, since in this benchmark the serialization not measured, if you need better performance on serialization you must use codegen, which may require you to deploy 2 times (one for apiproxy, one for worker, unless you split the raw response meta with jsonparser or use map only for apiproxy). FabioLB strategy have more features, also you can use consul for service discovery (contacting other services directly by name without have to go thru FabioLB). NATS strategy have some benefit in terms of security, which is the NATS cluster can be inside DMZ, and worker can be on the different subnet without ability to connect each other and it would still works, where if you use consul to connect directly to another service, they should have route or connection to access each other. The bad part about NATS is that you should not use it for file upload, or it would hogging a lot of resource, so it should handled by apiproxy directly, then the reference of the uploaded file should be forwarded as payload to NATS. You can check NATS traffic statistics using nats-top.

What's next? Maybe we can try traefik, which is a service registry combined with load balancer in one binary, it can also use consul.

UPDATE: by changing the code from Subscribe (broadcast/fan-out) to QueueSubscribe (load balance), it have similar performance on 1/2/4 subscribers, so we can NATS for high availability/fault tolerance in api gateway pattern with cost of 16% overhead.

TL:DR

no LB: 232K rps
-> LB with NATS request-reply: 196K rps (16% overhead)
no LB: 197K rps
-> LB with Fabio+Consul: 65K rps (67% overhead)

2022-03-20

1 million Go goroutine vs C# task

Let's compare 1 million goroutines with 1 million tasks, which one more efficient in cpu usage and memory usage? The code is forked from kjpgit's techdemo

Name UserCPU    SysCPU     AvgRSS       MaxRSS    Wall
c#_1t      38.96    0.68    582,951    636,136    1:00
c#_2t      88.33    0.95 623,956    820,620    1:02
c#_4t   142.86    1.09 687,365 814,028    1:03
c#_8t   235.80    1.71 669,882      820,704    1:05
c#_16t    434.76    4.01 734,545      771,240    1:08
c#_32t    717.39    4.81      720,235 769,888    1:11
go_1t      58.77    0.65    2,635,380    2,635,632    1:04
go_2t 64.48    0.71    2,639,206    2,642,752    1:00
go_4t 72.55    1.42    2,651,086    2,654,972    1:00
go_8t 80.87    2.82    2,641,664    2,643,392    1:00
go_16t    83.18    4.03    2,673,404    2,681,100    1:00
go_32t    86.65    4.30    2,645,494    2,657,580    1:00

Apparently usual result is as expected because all tasks/goroutine spawned first before processing, Go's scheduler more efficient in CPU usage, but C# runtime more efficient in memory usage, which is normal tho because goroutines requires minimum 2KB overhead per goroutine, way higher cost than spawing a task. What if we increase to 10 millions tasks/goroutines, and let the spawning done in another task/goroutine, so if goroutine done it can restore back memory to the GC, here's the result:

Name     UserCPU    SysCPU     AvgRSS MaxRSS    Wall
c#_1t    12.78    1.28    2,459,190 5,051,528    0:13
c#_2t    22.60    1.54    2,692,439 5,934,796    0:18
c#_4t    42.09    1.54    2,370,239 5,538,280    0:21
c#_8t 88.54    2.29    2,522,053 6,334,176    0:29
c#_16t    204.39    3.32    2,395,001 5,803,808    0:34
c#_32t    259.09    3.25    1,842,458 4,710,012    0:28
go_1t 13.97    0.97    4,514,200 6,151,088    0:14
go_2t 12.35    1.51    5,595,418 9,506,076    0:07
go_4t 22.09    2.40    6,394,162    12,517,848    0:07
go_8t 31.00    3.09    7,115,281    13,428,344    0:06
go_16t 40.32    3.52    7,126,851    13,764,940    0:06
go_32t 58.58    3.58    7,104,882    12,145,396    0:06

Result seems normal, high memory usage caused by a lot of goroutine spawned at the same time in different thread, not blocking the main thread, but after it's done, got collected by GC (previously it was time based exit condition, this time it would exit after all process done, since I move the sleep first before atomic increment). What if we lower back to 1 million but with same exit rule and spawning executed in different task/goroutine also checking completion done every 1s, here's the result:

Name    UserCPU SysCPU    AvgRSS    MaxRSS    Wall
c#_1t    1.18    0.16    328,134    511,652    0:02
c#_2t    2.18    0.22 294,608    554,488    0:02
c#_4t    3.19    0.20 305,336    554,064    0:02
c#_8t    7.77    0.31 292,281    530,368    0:02
c#_16t 12.33    0.25 304,352    569,460    0:02
c#_32t 37.90    1.25 337,837    684,252    0:03
go_1t    2.72    0.42   1,592,978   2,519,040    0:03
go_2t    3.04    0.47   1,852,084   2,637,532    0:03
go_4t    3.65    0.54   1,936,626   2,637,272    0:03
go_8t    3.27    0.59   1,768,540   2,655,208    0:02
go_16t   4.01    0.71   1,770,673   2,664,504    0:02
go_32t   4.96    0.72   1,770,354   2,669,244    0:02

The difference in processing time is nengligible, but the CPU usage and memory usage quite contrast. Next, let's try to spawn in burst (100K per second), so we add 1 second sleep every 100th task/goroutine, since it's not quite realistic even for DDOS'ed server to receive that much (unless the server finely tuned), here's the result:

Name    UserCPU SysCPU    AvgRSS   MaxRSS   Wall
c#_1t 0.61    0.08    146,849   284,436    0:05
c#_2t 1.17    0.10    131,778   261,720    0:05
c#_4t 1.53    0.08    133,505   289,584    0:05
c#_8t    4.17    0.15    131,924   284,960    0:05
c#_16t   10.94    0.68    135,446   289,028    0:05
c#_32t   19.86    3.01    130,533   284,924    0:05
go_1t 1.84    0.24    731,872   1,317,796    0:06
go_2t    1.87    0.26    659,382   1,312,220    0:05
go_4t 2.00    0.30    661,296   1,322,152    0:05
go_8t 2.37    0.34    660,641   1,324,684    0:05
go_16t    2.82    0.39    660,225   1,323,932    0:05
go_32t    3.36    0.45    659,176   1,327,264    0:05

And for 5 millions:

Name    UserCPU    SysCPU    AvgRSS    MaxRSS    Wall
c#_1t 3.39    0.24    309,103    573,772    0:11
c#_2t 8.30    0.26    278,683    553,592    0:11
c#_4t    13.65    0.32    274,679    658,104    0:11
c#_8t    23.20    0.46    286,336 641,376    0:12
c#_16t   45.85    1.32    286,311 640,336    0:12
c#_32t   64.83    2.46    264,866    615,552    0:12
go_1t 6.25    0.50    1,397,434    2,629,936    0:13
go_2t 6.20    0.56    1,386,336    2,631,580    0:11
go_4t    7.52    0.65    1,410,523    2,625,308    0:11
go_8t   8.21    0.86    1,441,080    2,779,456    0:11
go_16t   11.17    0.96    1,436,220    2,687,908    0:11
go_32t   12.97    1.06    1,430,573    2,668,816    0:11

And for 25 millions:

Name    UserCPU   SysCPU    AvgRSS      MaxRSS    Wall
c#_1t 15.94    0.69 590,411    1,190,340    0:24
c#_2t 34.88    0.84 699,288    1,615,372    0:32
c#_4t 59.95    0.89 761,308    1,794,116    0:34
c#_8t    100.64    1.36    758,161    1,845,944    0:36
c#_16t   199.56    2.99 765,791    2,014,856    0:38
c#_32t   332.02    4.07 811,809    1,972,400    0:41
go_1t 21.76    0.71   2,846,565    4,413,968    0:29
go_2t 25.77    1.03   2,949,433    5,553,608    0:25
go_4t 28.74    1.24   2,920,447    5,800,088    0:24
go_8t 37.28    1.96   2,869,074    5,502,776    0:23
go_16t    43.46    2.67   2,987,114    5,769,356    0:24
go_32t    43.77    2.92   3,027,179    5,867,084    0:24

How about 25 millions and sleep per 200K?

Name    UserCPU   SysCPU AvgRSS    MaxRSS    Wall
c#_1t    18.47    0.91    842,492    1,820,788    0:22
c#_2t    40.32    0.93    1,070,555    2,454,324    0:31
c#_4t    62.39    1.16    1,103,741    2,581,476    0:33
c#_8t    100.84    1.34    1,074,820    2,377,580    0:34
c#_16t   218.26    2.91    1,062,642    2,726,700    0:37
c#_32t   339.00    6.51    1,042,254    2,275,644    0:40
go_1t    22.61    0.88    3,474,195    5,071,944    0:27
go_2t 25.83    1.20    3,912,071    6,964,640    0:20
go_4t 37.98    1.68    4,180,188    7,392,800    0:20
go_8t 38.56    2.44    4,189,265    8,481,852    0:18
go_16t    44.49    3.19    4,187,142    8,483,236    0:18
go_32t    48.82    3.44    4,218,591    8,424,200    0:18

And lastly 25 millions and sleep per 400K?

Name    UserCPU    SysCPU    AvgRSS MaxRSS    Wall
c#_1t    18.66    0.98    1,183,313    2,622,464    0:20
c#_2t    41.27    1.14    1,326,415    3,155,948    0:31
c#_4t    67.21    1.11    1,436,280    3,015,212    0:33
c#_8t    107.14    1.56    1,492,179    3,378,688    0:35
c#_16t   233.50    2.45    1,498,421    3,732,368    0:41
c#_32t   346.87    3.74    1,335,756    2,882,676    0:39
go_1t    24.13    0.82    4,048,937    5,099,220    0:26
go_2t    28.85    1.41    4,936,677    8,023,568    0:18
go_4t    31.51    1.95    5,193,653    9,537,080    0:14
go_8t    45.27    2.65    5,461,107    9,499,308    0:14
go_16t    53.43    3.19    5,183,009    9,476,084    0:14
go_32t    61.98    3.86    5,589,156   10,587,788    0:14

How to read results above? Wall = how much time need to complete, lower is better; AvgRSS/MaxRSS = average/max memory usage, lower is better; UserCPU = CPU time used in percent >100% means more than 1 full core compute time being used, lower is better. Versions used in this benchmark:

go version go1.17.6 linux/amd64
dotnet --version
6.0.201

2022-02-22

C# vs Go in Simple Benchmark

Today we're gonna retry two of my few favorite language in associative array and comb sort benchmark (compile and run, not just runtime performance, because development waiting for compilation time also important) like in the past benchmark. For installing DotNet:

wget https://packages.microsoft.com/config/ubuntu/21.04/packages-microsoft-prod.deb -O packages-microsoft-prod.deb
sudo dpkg -i packages-microsoft-prod.deb
rm packages-microsoft-prod.deb
sudo apt install apt-transport-https
sudo apt-get update
sudo apt-get install -y dotnet-sdk-6.0 aspnetcore-runtime-6.0

For installing Golang:

sudo add-apt-repository ppa:longsleep/golang-backports
sudo apt-get update
sudo apt install -y golang-1.17

Result (best of 3 runs)

cd assoc; time dotnet run
6009354 6009348 611297
36186112 159701682 23370001

CPU: 14.16s Real: 14.41s RAM: 1945904KB

cd assoc; time go run map.go
6009354 6009348 611297
36186112 159701682 23370001

CPU: 14.80s Real: 12.01s RAM: 2305384KB

This is a bit weird, usually I see that Go use less memory but slower, but in this benchmark C# that are using less memory but a bit slower (14.41s vs 12.01s), possibly because the compilation speed also included.

cd num-assoc; time dotnet run
CPU: 2.21s      Real: 2.19s     RAM: 169208KB

cd num-assoc; time go run comb.go
CPU: 0.46s      Real: 0.44s     RAM: 83100KB

What if we increase the N from 1 million to 10 million?

cd num-assoc; time dotnet run
CPU: 19.25s     Real: 19.16s    RAM: 802296KB

cd num-assoc; time go run comb.go
CPU: 4.60s      Real: 4.67s     RAM: 808940KB

If you want to contribute (if I make mistake when coding the C# or Go version of the algorithm, or if there's more efficient data structure, just fork and create a pull request, and I will redo the benchmark).

2021-11-17

Alternative Strategy for Dependency Injection (lambda-returning vs function-pointer)

There's some common strategy for injecting dependency (one or sets of function) using interface, something like this:

type Foo interface{ Bla() string }
type RealAyaya struct {}
func(a *RealAyaya) Bla() {}

type MockAyaya struct {} // generated from gomock or others

func(a *MockAyaya) Bla() {}
// real usage:

deps := RealAyaya{}

deps.Bla()

// test usage:

deps := MockAyaya{}

deps.Bla()

and there's another one (dependency parameter on function returning a lambda):

type Bla func() string

type DepsIface interface { ... }
func NewBla(deps DepsIface) Bla {
return func() string {
// do something with deps
}
}

// real usage:

bla := NewBla(realDeps)

res := bla()

// test usage:

bla := NewBLa(mockedOrFakeDeps)

res := bla()

and there other way by combining both fake and real implementation like this, or alternatively using proxy/cache+codegen if it's for 3rd party dependency.

and there other way (plugggable per-function level):

type Bla func() string
type BlaCaller struct {

BlaFunc Bla
}
// real usage:

bla := BlaCaller{ BlaFunc: deps.SomeMethod }
res := bla.BlaFunc()

// test usage:

bla := BlaCaller{ BlaFunc: func() string { return `fake` } }
res := bla.BlaFunc()

Analysis

The first one is the most popular way, the 2nd one is one that I saw recently (that also being used in openApi/swagger codegen, i forgot which library), the bad part is that we have to sanitize the trace manually because it would show something like NewBla.func1 in the traces, and we have to use generated mock or implement everything if we have to test. Last style is what I thought when writing some task, where the specs still unclear whether I should:
1. query from local database

2. hit another service

3. or just a fake data (in the tests)

I can easily switch out any function without have to depend on whole struct or interface, and it would be still easy to debug (set breakpoint) and jump around the method, compared to generated mock or interface version.

Probably the bad part is, we have to inject every function one by one for each function that we want to call (which nearly equal effort as the 2nd one). But if that's the case, when your function requires 10+ other function to inject, maybe it's time to refactor?

The real use case would be something like this:

type LoanExpirationFunc func(userId string) time.Time

type InProcess1 struct {
UserId string // add more input here

LoanExpirationFunc LoanExpirationFunc

// add more injectable-function, eg. 3rd party hit or db read/save
}

type OutProcess1 struct {}
func Process1(in *InProcess1) (out *OutProcess1) {

if ... // eg. validation

x := in.LoanExpirationFunc(in.UserId)
// ... // do something
}

func defaultLoanExpirationFunc(userId string) time.Time {
// eg. query from database

}

type thirdParty struct {} // to put dependencies
func NewThirdParty() (*thirdParty) { return &thirdParty{} }
func (t *thirdParty) extLoanExpirationFunc(userId string) time.Time {

// eg. hit another service
}

// init input:

func main() {
http.HandleFunc("/case1", func(w, r ...) {
in := InProcess1{LoanExpirationFunc: defaultLoanExpirationFunc}
in.ParseFromRequest(r)

out := Process1(in)

out.WriteToResponse(w)
})
tp := NewThirdParty()

http.HandleFunc("/case2", func(w, r ...) {
in := InProcess1{LoanExpirationFunc: tp.extLoanExpirationFunc}
in.ParseFromRequest(r)

out := Process1(in)

out.WriteToResponse(w)
})

}

// on test:

func TestProcess1(t *testing.T) {

t.Run(`test one year from now`, func(t *testing.T) {
in := inProcess1{LoanExpirationFunc: func(string) { return time.Now().Add(1, 0, 0) }}

out := Process1(in)

assert.Equal(t, out, ...)
})

}

Haven't using this strategy extensively on new a project (since I just thought about this today and yesterday when creating horrid integration test), but I'll update this post when I found annoyance with this strategy.

UPDATE 2022: after using this strategy extensively for a while, this one is better than interface (especially when using IntelliJ), my tip: it would be better if you use function pointer name and injected function name with same name.

2021-07-30

Mock vs Fake and Classical Testing

Motivation of this article is to promote less painful way of testing, structuring codes, and less broken test when changing logic/implementation details (only changing the logic not changing the input/output). This post recapping past ~4 years compilation of articles that conclude that Fake > Mock, Classical Test > Mock Test from other developers that realize the similar pain points of popular approach (mock).

Mock Approach

Given a code like this:

type Obj struct {
   *sql.DB // or Provider
}
func (o *Obj) DoMultipleQuery(in InputStruct) (out OutputStruct, err error) {
  ... = o.DoSomeQuery()
  ... = o.DoOtherQuery()
}

I’ve seen code to test with mock technique like this:

func TestObjDoMultipleQuery(t *testing.T) {
  o := Obj{mockProvider{}}
  testCases := []struct {
    name      string
    mockFunc  func(sqlmock.Sqlmock, *gomock.Controller) 
    in        InputStruct
    out       OutputStruct
    shouldErr bool
  } {
    {
      name: `bast case`,
      mockFunc: func(db sqlmock.SqlMock, c *gomock.Controller) {
        db.ExpectExec(`UPDATE t1 SET bla = \?, foo = \?, yay = \? WHERE bar = \? LIMIT 1`).
          WillReturnResult(sqlmock.NewResult(1,1))
        db.ExpectQuery(`SELECT a, b, c, d, bar, bla, yay FROM t1 WHERE bar = \? AND state IN \(1,2\)`).
          WithArgs(3).
          WillReturnRows(sqlmock.NewRows([]string{"id", "channel_name", "display_name", "color", "description", "active", "updated_at"}).
             AddRow("2", "bla2", "Bla2", "#0000", "bla bla", "1", "2021-05-18T15:04:05Z").
             AddRow("3", "wkwk", "WkWk", "#0000", "wkwk", "1", "2021-05-18T15:04:05Z"))
        ...
      },
      in: InputStruct{...},
      out: OutputStruct{...},
      wantErr: false,
    },
    {
       ... other cases
    },
  }
  for _, tc := range testCases {
    t.Run(tc.name, func(t *testing.T){
      ... // prepare mock object
      o := Obj{mockProvider}
      out := o.DoMultipleQueryBusinessLogic(tc.in)
      assert.Equal(t, out, tc.out)
    })
  }
}

This approach has pros and cons:

+ could check whether has typos (eg. add one character in the original query, this test would detect the error)

+ could check whether some queries are properly called, or not called but expected to be called

+ unit test would always faster than integration test

- testing implementation detail (easily break when the logic changed)

- cannot check whether the SQL statements are correct

- possible coupled implementation between data provider and business logic

- duplicating work between original query and the regex-version of query, which if add a column, we must change both implementation

For the last cons, we can change it to something like this:

db.ExpectQuery(`SELECT.+FROM t1.+`).
  WillReturnRows( ... )

This approach has pros and cons:

+ not deduplicating works (since it just a simplified regex of the full SQL statements

+ still can check whether queries properly called or not

+ unit test would always faster than integration test

- testing implementation detail (easily break when the logic changed)

- cannot detect typos/whether the query no longer match (eg. if we accidentally add one character on the original query that can cause sql error)

- cannot check correctness of the SQL statement

- possible coupled implementation between data provider and business logic

We could also create a helper function to replace the original query to regex version:

func SqlToRegexSql(sql string) string {
   return // replace special characters in regex (, ), ?, . with escaped version
}
db.ExpectQuery(SqlToRegexSql(ORIGINAL_QUERY)) ...

This approach has same pros and cons as previous approach.

Fake Approach

Fake testing use classical approach, instead of checking implementation detail (expected calls to dependency), we use a compatible implementation as dependency (eg. a slice/map of struct for database table/DataProvider)

Given a code like this:

type Obj struct {
  FooDataProvider // interface{UpdateFoo,GetFoo,...}
}
func (o *Obj) DoBusinessLogic(in *Input) (out *Output,err error) {
  ... = o.UpdateFoo(in.bla)
  ... = o.GetFoo(in.bla)
  ...
}

It’s better to make a fake data provider like this:

type FakeFooDataProvider struct {
   Rows map[int]FooRow{} // or slice
}
func (f *FakeFooDataProvider) UpdateFoo(a string) (...) {
/* update Rows */}
func (f *FakeFooDataProvider) GetFoo(a string) (...) {
/* get one Rows */}
... // insert, delete, count, get batched/paged

So in the test, we can do something like this:

func TestObjDoBusinessLogic(t *testing.T) {
  o := Obj{FakeFooDataProvider{}}
  testCases := []struct{
    name      string
    in        Input
    out       Ouput
    shouldErr bool
  } {
    {
      name: `best case`,
      in: Input{...},
      out: Output{...},
      shouldErr: false,
    },
    {
      ...
    },
  }
  for _, tc := range testCases {
    t.Run(tc.name, func(t *testing.T){
      out := o.DoBusinessLogic(tc.in)
      assert.Equal(t, tc.out, out)
    })
  }
}

This approach have pros and cons:

+ testing behavior (this input should give this output) instead of implementation detail (not easily break/no need to modify the test when algorithm/logic changed)

+ unit test would always faster than integration test

- cannot check whether the queries are called or not called but expected to be called

- double work in Golang (since there’s no generic/template yet, go 1.18 must wait Feb 2022), must create minimal fake implementation (map/slice) that simulate basic database table logic, or if data provider not separated between tables (repository/entity pattern) must create a join logic too – better approach in this case is to always create Insert, Update, Delete, GetOne, GetBatch instead of joining.

+ should be no coupling between queries and business logic

Cannot check whether queries in data provider are correct (which should not be the problem of this unit, it should be DataProvider integration/unit test’s problem, not this unit)

Classical Approach for DataProvider

It’s better to test the queries using classical (black box) approach integration test instead of mock (white box), since mock and fake testing can only test the correctness of business logic, not logic of the data provider that mostly depend to a 2nd party (database). Fake testing also considered a classical approach, since it test input/output not implementation detail.

Using dockertest when test on local and gitlab-ci service when test on pipeline, can be something like this:

var testDbConn *sql.DB
func TestMain(m *testing.M) int { // called before test
  if env == `` || env == `development` { 
    // spawn dockertest, return connection to dockertest
    prepareDb(func(db *sql.DB){
      testDbConn = db
      if db == nil {
        return 0
      }
      return m.Run()
    })
  } else {
    // connect to gitlab-ci service
    var err error
    testDbConn, err = ...
    // log error
  }
}
func TestDataProviderLogic(t *testing.T) {
  if testDbConn == nil {
    if env == `` || env == `development` || env == `test` {
      t.Fail()
    }
    return
  }
  f := FooDataProvider{testDbConn}
  f.InitTables()
  f.MigrateTables() // if testing migration
  // test f.UpdateFoo, f.GetFoo, ...
}

Where the prepareDb function can be something like this (taken from dockertest example):

func prepareDb(onReady func(db *sql.DB) int) {
  const dockerRepo = `yandex/clickhouse-server`
  const dockerVer = `latest`
  const chPort = `9000/tcp`
  const dbDriver = "clickhouse"
  const dbConnStr = "tcp://127.0.0.1:%s?debug=true"
  var err error
  if globalPool == nil {
    globalPool, err = dockertest.NewPool("")
    if err != nil {
      log.Printf("Could not connect to docker: %s\n", err)  
      return
    }
  }
  resource, err := globalPool.Run(dockerRepo, dockerVer, []string{})
  if err != nil {
    log.Printf("Could not start resource: %s\n", err)
    return
  }
  var db *sql.DB  
  if err := globalPool.Retry(func() error {
    var err error
    db, err = sql.Open(dbDriver, 
      fmt.Sprintf(dbConnStr, resource.GetPort(chPort)))
    if err != nil {
      return err
    }
    return db.Ping()
  }); err != nil {
    log.Printf("Could not connect to docker: %s\n", err)
    return
  }
  code := onReady(db)
  if err := globalPool.Purge(resource); err != nil {
    log.Fatalf("Could not purge resource: %s", err)
  }
  os.Exit(code)
}

In the pipeline the .gitlab-ci.yml file can be something like this for PostgreSQL (use tmpfs/inmem version for database data directory to make it faster):

test:
  stage: test
  image: golang:1.16.4
  dependencies: []
  services:
    - postgres:13-alpine # TODO: create a tmpfs version
  tags:
    - cicd
  variables:
    ENV: test
    POSTGRES_DB: postgres
    POSTGRES_HOST: postgres
    POSTGRES_PASSWORD: postgres
    POSTGRES_PORT: "5432"
    POSTGRES_USER: postgres
  script:
    - source env.sample
    - go test

The dockerfile for tmpfs database if using MySQL can be something like this:

FROM circleci/mysql:5.5

RUN echo '\n\
[mysqld]\n\
datadir = /dev/inmemory/mysql\n\
' >> /etc/mysql/my.cnf

Or for MongoDB:

FROM circleci/mongo:3.6.9

RUN sed -i '/exec "$@"/i mkdir \/dev\/inmemory\/mongo' /usr/local/bin/docker-entrypoint.sh

CMD ["mongod", "--nojournal", "--noprealloc", "--smallfiles", "--dbpath=/dev/inmemory/mongo"]

The benefit of this classical integration test approach:

+ high confidence that your SQL statements are correct, can detect typos (wrong column, wrong table, etc)

+ isolated test, not testing business logic but only data provider layer, also can test for schema migrations

- not a good approach for database with eventual consistency (eg. Clickhouse)

- since this is an integration test, it would be slower than mock/fake unit test (1-3s+ total delay overhead when spawning docker)

Conclusion

use mock for databases with eventual consistency
prefer fake over mock for business logic correctness because it’s better for maintainability to test behavior (this input should give this output), instead of implementation details
prefer classical testing over mock testing for checking data provider logic correctness

References

(aka confirmation bias :3)

https://martinfowler.com/articles/mocksArentStubs.html
https://stackoverflow.com/questions/1595166/why-is-it-so-bad-to-mock-classes
https://medium.com/javascript-scene/mocking-is-a-code-smell-944a70c90a6a
https://chemaclass.medium.com/to-mock-or-not-to-mock-af995072b22e
https://accu.org/journals/overload/23/127/balaam_2108/
https://news.ycombinator.com/item?id=7809402
https://philippe.bourgau.net/careless-mocking-considered-harmful/
https://debugged.it/blog/mockito-is-bad-for-your-code/
https://engineering.talkdesk.com/double-trouble-why-we-decided-against-mocking-498c915bbe1c
https://blog.thecodewhisperer.com/permalink/you-dont-hate-mocks-you-hate-side-effects
https://agilewarrior.wordpress.com/2015/04/18/classical-vs-mockist-testing/
https://www.slideshare.net/davidvoelkel/mockist-vs-classicists-tdd-57218553
https://www.thoughtworks.com/insights/blog/mockists-are-dead-long-live-classicists
https://stackoverflow.com/questions/184666/should-i-practice-mockist-or-classical-tdd
https://bencane.com/2020/06/15/dont-mock-a-db-use-docker-compose/
https://swizec.com/blog/what-i-learned-from-software-engineering-at-google/#stubs-and-mocks-make-bad-tests
https://www.freecodecamp.org/news/end-to-end-api-testing-with-docker/
https://medium.com/@june.pravin/mocking-is-not-practical-use-fakes-e30cc6eaaf4e
https://www.c-sharpcorner.com/article/stub-vs-fake-vs-spy-vs-mock/

2021-03-13

Pyroscope: Continuous Tracing in Go, Python, or Ruby

Recently I stumbled upon slow library/function problem and don't know chich part that causes it, and found out that there's a easy way to trace either Go, Ruby, or Python code using Pyroscope. The feature is a bit minimalist, there's no memory usage tracing yet unlike in gops or pprof. Pyroscope consist 2 parts: the server and the agent/client library (if using Golang) or executor (if using Ruby or Python). Here's the way how to run and start Pyroscope server:

# run server using docker

docker run -it -p 4040:4040 pyroscope/pyroscope:latest server

And here's the example on how to use the client library/agent (modifying Go's source code, just like in DataDog or any other APM tools) and install the Pyroscope CLI to run Ruby/Python scripts:

# golang, add agent inside the source code

import "github.com/pyroscope-io/pyroscope/pkg/agent/profiler"

func main() {

profiler.Start(profiler.Config{

ApplicationName: "my.app.server",

ServerAddress: "http://pyroscope:4040",

})

// rest of your code

}

# ruby or python, install CLI client

cd /tmp

wget https://dl.pyroscope.io/release/pyroscope_0.0.28_amd64.deb

sudo apt-get install ./pyroscope_0.0.28_amd64.deb

# ruby

pyroscope exec ruby yourcode.rb

# python

pyroscope exec python yourcode.py

It would show something like this if you open the server URL (localhost:4040) in the browser, so you can check which part of the code that took most of the runtime.

2021-01-26

GOPS: Trace your Golang service with ease

GoPS is one alternative (also made by Google, other than pprof) to measure, trace or diagnose the performance and memory usage your Go-powered service/long-lived program. The usage is very easy, you just need to import and add 3 lines in your main (so the gops command line can communicate with your program):

import "github.com/google/gops/agent"

func main() {

if err := agent.Listen(agent.Options{}); err != nil {

log.Fatal(err)

}

// remaining of your long-lived program logic

}

If you don't put those lines, you can still use gops limited to get list of programs running on your computer/server that made with Go with limited statistics information, using these commands:

$ go get -u -v github.com/google/gops

$ gops # show the list of running golang program

1248 1 dnscrypt-proxy go1.13.4 /usr/bin/dnscrypt-proxy

1259 1 containerd go1.13.15 /usr/bin/containerd

18220 1 dockerd go1.13.15 /usr/bin/dockerd

1342132 1306434 docker go1.13.15 /usr/bin/docker

$ gops tree # show running process in tree

$ gops PID # check the stats and whether the program have GOPS agent

#########################################################

# these commands below only available
# if the binary compiled with GOPS agent

# PID can be replaced with GOPS host:port of that program

$ gops stack PID # get current stack trace of running PID

$ gops memstats PID # get memory statistics of running PID

$ gops gc PID # force garbage collection

$ gops setgc PID X # set GC percentage

$ gops pprof-cpu PID # get cpu profile graph

$ gops pprof-heap PID # get memory usage profile graph

profile saved at /tmp/heap_profile070676630

$ gops trace PID # get 5 sec execution trace

# you can install graphviz to visualize the cpu/memory profile

$ sudo apt install graphviz

# visualize the cpu/memory profile graph on the web browser

$ go tool pprof /tmp/heap_profile070676630

> web

Next step is analyze the call graph for the memory leaks (which mostly just wrongly/forgot to defer body/sql rows or holding slice reference of huge buffer or certain framework's cache trashing) or slow functions, whichever your mission are.

What if golang service you need to trace it inside Kubernetes pod that the GOPS address (host:port) not exposed to outside-world? Kubernetes is a popular solution for companies that manages bunch of servers/microservices or cloud like (GKE, AKS, Amazon EKS, ACK, DOKS, etc) but obviously overkill solution for small companies that doesn't need to scale elastically (or the servers are less than 10 or not using microservice architecture).

First, you must compile gops statically so it can be run inside alpine container (which mostly what people use):

$ cd $GOPATH/go/src/github.com/google/gops

$ export CGO_ENABLED=0

$ go mod vendor

$ go build

# copy gops to your kubernetes pod

$ export POD_NAME=blabla

$ kubectl cp ./gops $POD_NAME:/bin

# ssh/exec to your pod

$ kubectl exec -it $POD_NAME -- sh

$ gops

# for example you want to check heap profile for PID=1

$ gops pprof-heap 1

$ exit

# copy back trace file to local, then you can analyze the dump

kubectl cp $POD:/tmp/heap_profile070676630 out.dump

But if your address and port are exposed you can directly use gops from your computer to the pod or create a tunnel inside the pod if it doesn't have public IP, for example using ngrok.

Btw if you know any companies migrating from/to certain language (especially Go), frameworks or database, you can contribute here.

2020-12-22

String Associative Array and CombSort Benchmark 2020 Edition

5 Years later since last string associative benchmark and lesser string associative benchmark (measuring string concat operation and built-in associative array set and get), numeric comb sort benchmark and string comb sort benchmark (measuring basic array random access, string conversion, and array swap for number and string), this year's using newer processor: AMD Ryzen 3 3100 running on 64-bit Ubuntu 20.04. Now with 10x more data to hopefully make the benchmark runs 10x slower (at least 1 sec), best of 3 runs.

$ alias time='/usr/bin/time -f "\nCPU: %Us\tReal: %es\tRAM: %MKB"'

$ php -v

PHP 7.4.3 (cli) (built: Oct 6 2020 15:47:56) ( NTS )

$ time php assoc.php

637912 641149 67002

3808703 14182513 2343937

CPU: 1.25s Real: 1.34s RAM: 190644KB

$ python3 -V

Python 3.8.5

$ time python3 dictionary.py

637912 641149 67002

3808703 14182513 2343937

CPU: 5.33s Real: 5.47s RAM: 314564KB

$ ruby3.0 -v

ruby 3.0.0p0 (2020-12-25 revision 95aff21468) [x86_64-linux-gnu]

$ time ruby3.0 --jit hash.rb

637912 641149 67002

3808703 14182513 2343937

CPU: 6.50s Real: 5.94s RAM: 371832KB

$ go version

go version go1.14.7 linux/amd64

$ time go run map.go

637912 641149 67002

3808703 14182513 2343937

CPU: 1.79s Real: 1.56s RAM: 257440KB

$ node -v

v14.15.2

$ time node object.js

637912 641149 67002

3808703 14182513 2343937

CPU: 2.24s Real: 2.21s RAM: 326636KB

$ luajit -v

$ time luajit table.lua

637912 641149 67002

3808703 14182513 2343937

CPU: 4.11s Real: 4.22s RAM: 250828KB

$ dart --version

Dart SDK version: 2.10.4 (stable) (Unknown timestamp) on "linux_x64"

$ time dart map.dart

637912 641149 67002

3808703 14182513 2343937

CPU: 2.99s Real: 2.91s RAM: 385496KB

$ v version

V 0.2 36dcace

$ time v run map.v

637912, 641149, 67002

3808703, 14182513, 2343937

CPU: 4.79s Real: 5.28s RAM: 1470668KB

$ tcc -v

tcc version 0.9.27 (x86_64 Linux)

$ time tcc -run uthash.c

637912 641149 67002

3808703 14182513 2343937

Command exited with non-zero status 25

CPU: 2.52s Real: 2.61s RAM: 291912KB

$ export GOPHERJS_GOROOT="$(go1.12.16 env GOROOT)"

$ npm install --global source-map-support

$ goperjs version

GopherJS 1.12-3

$ time gopherjs

637912 641149 67002

3808703 14182513 2343937

CPU: 14.13s Real: 12.01s RAM: 597712KB

$ java -version

java version "14.0.2" 2020-07-14

Java(TM) SE Runtime Environment (build 14.0.2+12-46)

Java HotSpot(TM) 64-Bit Server VM (build 14.0.2+12-46, mixed mode, sharing)

$ time java hashmap.java

637912 641149 67002

3808703 14182513 2343937

CPU: 5.18s Real: 1.63s RAM: 545412KB

The result shows a huge improvement for PHP since the old 5.4. NodeJS also huge improvement compared to old 0.10. The rest is quite bit the same. Also please keep note that Golang and V includes build/compile time not just run duration, and it seems V performance really bad when it comes to string operations (the compile itself really fast, less than 1s for 36dcace -- using gcc 9.3.0).

Next we're gonna benchmark comb sort implementation. But this time we use jit version of ruby 2.7, since it's far way faster (19s vs 26s and 58s vs 66s for string benchmark), for ruby 3.0 we always use jit version since it's faster than non-jit. In case for C (TCC) which doesn't have built-in associative array, I used uthash, because it's the most popular. TinyGo does not complete first benchmark after more than 1000s, sometimes segfault. XS Javascript engine failed to give correct result, engine262 also failed to finish within 1000s.

Language	Command Flags	Version	Assoc	RAM	Num Comb	RAM	Str Comb	RAM	Total	RAM
Go	go run	1.14.7	1.56	257,440	0.73	82,844	4.74	245,432	7.03	585,716
Go	go run	1.15.6	1.73	256,620	0.78	82,896	4.86	245,468	7.37	584,984
Nim	nim r -d:release --gc:arc	1.4.2	1.56	265,172	0.79	79,284	5.77	633,676	8.12	978,132
Nim	nim r -d:release --gc:orc	1.4.2	1.53	265,160	0.94	79,380	5.83	633,636	8.30	978,176
Javascript	node	14.15.2	2.21	327,048	0.87	111,972	6.13	351,520	9.21	790,540
Crystal	crystal run --release	0.35.1	1.81	283,648	1.44	146,700	6.09	440,796	9.34	871,144
Javascript	~/.esvu/bin/v8	8.9.201	1.77	177,748	0.89	105,416	6.71	335,236	9.37	618,400
C	tcc -run	0.9.27	2.61	291,912	1.45	80,832	6.40	393,352	10.46	766,096
Java	java	14.0.2 2020-07-14	1.63	545,412	1.50	165,864	7.69	743,572	10.82	1,454,848
Nim	nim r -d:release	1.4.2	1.91	247,456	0.96	79,476	8.38	1,211,116	11.25	1,538,048
Dart	dart	2.10.4	2.91	385,496	1.61	191,916	7.31	616,716	11.83	1,194,128
Python	pypy	7.3.1+dfsg-2	2.19	331,776	2.83	139,740	8.04	522,648	13.06	994,164
Javascript	~/.esvu/bin/chakra	1.11.24.0	2.73	487,400	1.27	102,192	11.27	803,168	15.27	1,392,760
Javascript	~/.esvu/bin/jsc	271117	5.90	593,624	0.68	111,972	9.09	596,088	15.67	1,301,684
V	v -prod run	0.2 32091dd gcc-10.2	4.78	1,469,932	1.86	79,376	14.06	1,560,516	20.70	3,109,824
Lua	luajit	2.1.0-beta3	4.11	250,828	3.76	133,424	12.91	511,196	20.78	895,448
Javascript	~/.esvu/bin/sm	JavaScript-C86.0a1	5.61	378,064	1.40	96,480	13.81	393,376	20.82	867,920
V	v -prod run	0.2 32091dd gcc-9.3	5.05	1,469,936	2.14	79,408	14.62	1,560,484	21.81	3,109,828
Javascript	~/.esvu/bin/graaljs	CE Native 20.3.0	7.78	958,380	4.45	405,900	14.31	911,220	26.54	2,275,500
Go	gopherjs run	1.12-3 (node 14.15.2)	11.76	594,896	2.04	119,604	18.46	397,396	32.26	1,111,896
Nim	nim r	1.4.2	6.60	247,444	3.05	79,332	31.85	1,211,208	41.50	1,537,984
PHP	php	7.4.3	1.34	190,644	10.11	328,452	34.51	641,664	45.96	1,160,760
Ruby	truffleruby	21.1.0-dev-c1517c55	14.54	2,456,156	3.09	453,152	29.27	3,660,284	46.90	6,569,592
Crystal	crystal run	0.35.1	5.69	284,328	12.00	153,828	31.69	441,740	49.38	879,896
Javascript	~/.esvu/bin/quickjs	2020-11-08	3.90	252,484	23.48	80,772	34.80	471,624	62.18	804,880
V	v run	0.2 36dcace gcc-9.3	5.28	1,470,668	6.60	80,232	58.99	1,561,176	70.87	3,112,076
Lua	lua	5.3.3	5.98	366,516	27.26	264,648	46.05	864,300	79.29	1,495,464
Ruby	ruby	2.7.0p0	6.31	371,456	19.29	100,536	58.82	694,560	84.42	1,166,552
Python	python3	3.8.5	5.47	314,564	33.96	404,976	47.79	722,820	87.22	1,442,360
Ruby	jruby	9.2.9.0	7.45	1,878,184	34.11	1,976,844	59.83	7,115,448	101.39	10,970,476
Ruby	ruby	3.0.0p0	5.94	371,832	24.87	92,844	74.32	1,015,096	105.13	1,479,772
Go	tinygo run	0.16.0	999.99	318,148	3.68	300,548	252.34	711,340	1256.01	1,330,036

Golang still the winner (obviously, since it's compiled), then Nim (Compiled), next best JIT or interpreter is NodeJS, Crystal (Compiled, not JIT), v8, followed Java, by TCC (Compiled) Dart, PyPy, V (Compiled, not JIT), LuaJIT, PHP, Ruby, and Python3. The recap spreadsheet can be accessed here.

FAQ:

1. Why you measure the compile duration too? because developer experience also important (feedback loop), at least for me.

2. Why not warming up the VM first? each implementation have it's own advantage and disadvantage.

3. Why there's no C++, VB.NET, C#, D, Object-Pascal? don't want to compile things (since there's no build and run command in one flag).

4. Why there's no Kotlin, Scala, Rust, Pony, Swift, Groovy, Julia, Crystal, or Zig? Too lazy to add :3 you can contribute tho (create a pull request, then I'll run the benchmark again as preferabbly as there's precompiled binary/deb/apt/ppa repository for the compiler/interpreter).

Contributors: ilmanzo (Nim, Crystal, D)

Subscribe to: Posts ( Atom )