Showing posts with label nodejs. Show all posts
Showing posts with label nodejs. Show all posts

2023-06-14

Simple Websocket Echo Benchmark

Today we're gonna benchmark nodejs/bin+uwebsocket with golang+nbio. The code is here, both taken from their own example. The benchmark plan is create 10k client/connection, send both text/binary string and receive back the from server and sleep for 1s,100ms,10ms or 1ms, the result is as expected:

go 1.20.5 nbio 1.3.16
rps: 19157.16 avg/max latency = 1.66ms/319.88ms elapsed 10.2s
102 MB 0.6 core usage
rps: 187728.05 avg/max latency = 0.76ms/167.76ms elapsed 10.2s
104 MB 5 core usage
rps: 501232.80 avg/max latency = 12.48ms/395.01ms elapsed 10.1s
rps: 498869.28 avg/max latency = 12.67ms/425.04ms elapsed 10.1s
134 MB 15 core usage

bun 0.6.9
rps: 17420.17 avg/max latency = 5.57ms/257.61ms elapsed 10.1s
48 MB 0.2 core usage
rps: 95992.29 avg/max latency = 29.93ms/242.74ms elapsed 10.4s
rps: 123589.91 avg/max latency = 40.67ms/366.15ms elapsed 10.2s
rps: 123171.42 avg/max latency = 62.74ms/293.29ms elapsed 10.1s
55 MB 1 core usage

node 18.16.0
rps: 18946.51 avg/max latency = 6.64ms/229.28ms elapsed 10.3s
59 MB 0.2 core usage
rps: 97032.08 avg/max latency = 44.06ms/196.41ms elapsed 11.1s
rps: 114449.91 avg/max latency = 72.62ms/295.33ms elapsed 10.3s
rps: 109512.05 avg/max latency = 79.27ms/226.03ms elapsed 10.2s
59 MB 1 core usage


First line until 4th line are with 1s, 100ms, 10ms, 1ms delay before next request. Since Golang/nbio is by default can utilize multi-core so can handle ~50 rps per client, while Bun/Nodejs 11-12 rps per client. If you found a bug, or want to contribute another language (or create better client, just create a pull request on the github link above.

2022-04-05

Start/restart Golang or any other binary program automatically on boot/crash

There are some alternative to make program start on boot on Linux, the usual way is using:

1. SystemD, it could ensure that dependency started before your service, also could limit your CPU/RAM usage. Generate a template using this website or use kardianos/service

2. PM2 (requires NodeJS), or PMG

3. docker-compose (requires docker, but you can skip the build part, just copy the binary directly on Dockerfile command (that can be deployed using rsync), just set restart property on docker-compose and it would restart when computer boot),  -- bad part, you cannot limit cpu/ram unless using docker swarm. But you can use docker directly to limit and use --restart flag.

3. lxc/lxd or multipass or other vm/lightweight vm (but still need systemd inside it XD at least it won't ruin your host), you can rsync directly to the container to redeploy, for example using overseer or tableflip, you must add reverse proxy or nat or proper routing/ip forwarding tho if you want it to be accessed from outside

4. supervisord (python) or ochinchina/supervisord (golang) tutorial here

5. create one daemon manager with systemd/docker-compose, then spawn the other services using goproc or pioz/god

6. monit it can monitor and ensure a program started/not dead

7. nomad (actually this one is deployment tool), but i can also manage workload

8. kubernetes XD overkill

9. immortal.run a supervisor, this one actually using systemd

10. other containerization/VM workload orchestrator/manager that usually already provided by the hoster/PaaS provider (Amazon ECS/Beanstalk/Fargate, Google AppEngine, Heroku, Jelastic, etc)


This is the systemd script that I usually use (you need to create user named "web" and install "unbuffer"):

$ cat /usr/lib/systemd/system/xxx.service
[Unit]
Description=xxx
After=network-online.target postgresql.service
Wants=network-online.target

[Service]
Type=simple
Restart=on-failure
User=web
Group=users
WorkingDirectory=/home/web/xxx
ExecStart=/home/web/xxx/run_production.sh
ExecStop=/usr/bin/killall xxx
LimitNOFILE=2097152
LimitNPROC=65536
ProtectSystem=full
NoNewPrivileges=true

[Install]
WantedBy=multi-user.target

$ cat /home/web/xxx/run_production.sh
#!/usr/bin/env bash

mkdir -p `pwd`/logs
ofile=`pwd`/logs/access_`date +%F_%H%M%S`.log
echo Logging into: $ofile
unbuffer time ./xxx | tee $ofile



2021-12-31

String Associative Array and CombSort Benchmark 2021 Edition

Last year, we've done string associative benchmark and lesser string associative benchmark (measuring string concat operation and built-in associative array set and get), numeric comb sort benchmark and string comb sort benchmark (measuring basic array random access, string conversion, and array swap for number and string), this year's using newer server: 32-core running on 64-bit Ubuntu 21.10. This time we will skip programming languages that are no deb packages (unless the install script is just one line and doesn't ruin system directories) or no direct compile-run command like previous one, also only best of 3 runs.

$ alias time='/usr/bin/time -f "\nCPU: %Us\tReal: %es\tRAM: %MKB"

This time, NodeJS failed to complete (after waiting for an hour) with 10x more data compared to last year for assoc benchmark. Here's the spreadsheet and final result (Real duration and RAM):

LanguageCommand FlagsVersionAssocRAMNum CombRAMStr CombRAMTotalRAM
Gogo rungo1.17.512.392,305,8240.4683,0963.33245,89616.182,634,816
Javajava18-ea+15-Ubuntu-410.705,582,3080.96170,2406.93722,15218.596,474,700
Nimnim r -d:release --gc:arc1.4.217.904,200,2121.2179,4445.45633,81624.564,913,472
Pythonpypy7.3.519.703,104,4402.17140,1245.73523,26427.603,767,828
Ctcc -run0.9.2722.682,820,5681.0180,4844.80392,89628.493,293,948
Juliajulia1.6.523.983,714,4480.52255,8444.60861,07629.104,831,368
Vv -prod run0.2.4 a0a180720.368,910,8562.67124,2486.48470,13629.519,505,240
Lualuajit2.1.0-beta316.701,133,8763.27133,50410.80511,46830.771,778,848
Dartdart2.15.125.922,101,1521.40208,0565.75574,85233.072,884,060
Crystalcrystal run --release1.2.213.772,371,3207.73202,55212.08440,74833.583,014,620
Nimnim r -d:release1.4.224.923,864,1001.7679,5487.511,211,20034.195,154,848
PHPphp8.0.89.941,368,8087.74328,34424.84641,44842.522,338,600
Crystalcrystal run0.35.138.852,372,0049.97179,17622.27441,72071.092,992,900
Vv run0.2.4 a0a180751.818,911,0046.6379,71618.43470,42076.879,461,140
Pythonpython33.9.743.114,106,89229.16405,33243.08722,996115.355,235,220
Nimnim r1.4.288.053,864,0482.9379,53632.601,211,260123.585,154,844
Rubyrubyruby 2.7.4p19152.482,970,90827.15100,32052.23708,940131.863,780,168
Javascriptnodev16.13.1999.999,999,9990.71115,0446.11461,8361006.8110,576,879

FAQ

1. Why you measure the compile duration too? because developer experience also important (feedback loop, edit-rebuild/compile-run), at least for me, it would be sucks a lot if we have to wait a minute to compile before we can test something. We could always write precalculated values with C++ template to make runtime faster for example, but the compilation delay would be very sucks.
2. Why not warming up the VM first? each implementation have it's own advantage and disadvantage. We already know, that compiled language mostly faster at runtime, but at cost of relatively slower development feedback loop. Interpreted language mostly slower at runtime, especially if executed using VM that have startup overhead, in exception of one with AOT or JIT optimization. So to make it fair for every kind of implementation, we do it differently by not glorifying the runtime performance (which super make sense for server or long-lived process, but not best for development or CI cost which people often neglected), but by total performance which consist of Compile duration (if any) + VM startup duration (if any) + AOT or JIT duration (if any) + Runtime duration, so every strategy the PL's implementator use can be fairly judged.
3. Why there's no C++, VB.NET, C#, D, Object-Pascal? don't want to compile things (since there's no build and run command in one flag).  
4. Why there's no Kotlin, Scala, Rust, Elixir, Pony, Swift, Groovy, or Zig? Too lazy to add :3 you can contribute tho (create a pull request, then I'll run the benchmark again as preferably as there's precompiled binary/deb/apt/ppa repository for the compiler/interpreter).
5. Why there's no Ruby 3.1? I can't find any PPA for latest Ruby, latest one on Ubuntu 21.10 repo is Ruby2.7.

Contributorsilmanzo (Nim, Crystal, D), inkydragon (Julia)

2020-12-22

String Associative Array and CombSort Benchmark 2020 Edition

5 Years later since last string associative benchmark and lesser string associative benchmark (measuring string concat operation and built-in associative array set and get), numeric comb sort benchmark and string comb sort benchmark (measuring basic array random access, string conversion, and array swap for number and string), this year's using newer processor: AMD Ryzen 3 3100 running on 64-bit Ubuntu 20.04. Now with 10x more data to hopefully make the benchmark runs 10x slower (at least 1 sec), best of 3 runs.

alias time='/usr/bin/time -f "\nCPU: %Us\tReal: %es\tRAM: %MKB"'

$ php -v
PHP 7.4.3 (cli) (built: Oct  6 2020 15:47:56) ( NTS )

$ time php assoc.php 
637912 641149 67002
3808703 14182513 2343937
CPU: 1.25s      Real: 1.34s     RAM: 190644KB

$ python3 -V
Python 3.8.5

$ time python3 dictionary.py
637912 641149 67002
3808703 14182513 2343937
CPU: 5.33s      Real: 5.47s     RAM: 314564KB

$ ruby3.0 -v
ruby 3.0.0p0 (2020-12-25 revision 95aff21468) [x86_64-linux-gnu]

$ time ruby3.0 --jit hash.rb 
637912 641149 67002
3808703 14182513 2343937
CPU: 6.50s      Real: 5.94s     RAM: 371832KB

$ go version
go version go1.14.7 linux/amd64

$ time go run map.go
637912 641149 67002
3808703 14182513 2343937
CPU: 1.79s      Real: 1.56s     RAM: 257440KB

$ node -v       
v14.15.2

$ time node object.js
637912 641149 67002
3808703 14182513 2343937
CPU: 2.24s      Real: 2.21s     RAM: 326636KB

$ luajit -v 
LuaJIT 2.1.0-beta3 -- Copyright (C) 2005-2017 Mike Pall. http://luajit.org/

$ time luajit table.lua
637912  641149  67002
3808703 14182513        2343937
CPU: 4.11s      Real: 4.22s     RAM: 250828KB

$ dart --version
Dart SDK version: 2.10.4 (stable) (Unknown timestamp) on "linux_x64"

$ time dart map.dart
637912 641149 67002
3808703 14182513 2343937
CPU: 2.99s      Real: 2.91s     RAM: 385496KB

$ v version
V 0.2 36dcace

$ time v run map.v
637912, 641149, 67002
3808703, 14182513, 2343937
CPU: 4.79s      Real: 5.28s     RAM: 1470668KB

$ tcc -v
tcc version 0.9.27 (x86_64 Linux)

$ time tcc -run uthash.c
637912 641149 67002
3808703 14182513 2343937
Command exited with non-zero status 25

CPU: 2.52s      Real: 2.61s     RAM: 291912KB

export GOPHERJS_GOROOT="$(go1.12.16 env GOROOT)"
$ npm install --global source-map-support

$ goperjs version
GopherJS 1.12-3

$ time gopherjs 
637912 641149 67002
3808703 14182513 2343937

CPU: 14.13s     Real: 12.01s    RAM: 597712KB

$ java -version
java version "14.0.2" 2020-07-14
Java(TM) SE Runtime Environment (build 14.0.2+12-46)
Java HotSpot(TM) 64-Bit Server VM (build 14.0.2+12-46, mixed mode, sharing)

$ time java hashmap.java
637912 641149 67002
3808703 14182513 2343937

CPU: 5.18s      Real: 1.63s     RAM: 545412KB

The result shows a huge improvement for PHP since the old 5.4. NodeJS also huge improvement compared to old 0.10. The rest is quite bit the same. Also please keep note that Golang and V includes build/compile time not just run duration, and it seems V performance really bad when it comes to string operations (the compile itself really fast, less than 1s for 36dcace -- using gcc 9.3.0). 
Next we're gonna benchmark comb sort implementation. But this time we use jit version of ruby 2.7, since it's far way faster (19s vs 26s and 58s vs 66s for string benchmark), for ruby 3.0 we always use jit version since it's faster than non-jit. In case for C (TCC) which doesn't have built-in associative array, I used uthash, because it's the most popular. TinyGo does not complete first benchmark after more than 1000s, sometimes segfault. XS Javascript engine failed to give correct result, engine262 also failed to finish within 1000s.

LanguageCommand FlagsVersionAssocRAMNum CombRAMStr CombRAMTotalRAM
Gogo run1.14.71.56257,4400.7382,8444.74245,4327.03585,716
Gogo run1.15.61.73256,6200.7882,8964.86245,4687.37584,984
Nimnim r -d:release --gc:arc1.4.21.56265,1720.7979,2845.77633,6768.12978,132
Nimnim r -d:release --gc:orc1.4.21.53265,1600.9479,3805.83633,6368.30978,176
Javascriptnode14.15.22.21327,0480.87111,9726.13351,5209.21790,540
Crystalcrystal run --release0.35.11.81283,6481.44146,7006.09440,7969.34871,144
Javascript~/.esvu/bin/v88.9.2011.77177,7480.89105,4166.71335,2369.37618,400
Ctcc -run0.9.272.61291,9121.4580,8326.40393,35210.46766,096
Javajava14.0.2 2020-07-141.63545,4121.50165,8647.69743,57210.821,454,848
Nimnim r -d:release1.4.21.91247,4560.9679,4768.381,211,11611.251,538,048
Dartdart2.10.42.91385,4961.61191,9167.31616,71611.831,194,128
Pythonpypy7.3.1+dfsg-22.19331,7762.83139,7408.04522,64813.06994,164
Javascript~/.esvu/bin/chakra1.11.24.02.73487,4001.27102,19211.27803,16815.271,392,760
Javascript~/.esvu/bin/jsc2711175.90593,6240.68111,9729.09596,08815.671,301,684
Vv -prod run0.2 32091dd gcc-10.24.781,469,9321.8679,37614.061,560,51620.703,109,824
Lualuajit2.1.0-beta34.11250,8283.76133,42412.91511,19620.78895,448
Javascript~/.esvu/bin/smJavaScript-C86.0a15.61378,0641.4096,48013.81393,37620.82867,920
Vv -prod run0.2 32091dd gcc-9.35.051,469,9362.1479,40814.621,560,48421.813,109,828
Javascript~/.esvu/bin/graaljsCE Native 20.3.07.78958,3804.45405,90014.31911,22026.542,275,500
Gogopherjs run1.12-3 (node 14.15.2)11.76594,8962.04119,60418.46397,39632.261,111,896
Nimnim r1.4.26.60247,4443.0579,33231.851,211,20841.501,537,984
PHPphp7.4.31.34190,64410.11328,45234.51641,66445.961,160,760
Rubytruffleruby21.1.0-dev-c1517c5514.542,456,1563.09453,15229.273,660,28446.906,569,592
Crystalcrystal run0.35.15.69284,32812.00153,82831.69441,74049.38879,896
Javascript~/.esvu/bin/quickjs2020-11-083.90252,48423.4880,77234.80471,62462.18804,880
Vv run0.2 36dcace gcc-9.35.281,470,6686.6080,23258.991,561,17670.873,112,076
Lualua5.3.35.98366,51627.26264,64846.05864,30079.291,495,464
Rubyruby2.7.0p06.31371,45619.29100,53658.82694,56084.421,166,552
Pythonpython33.8.55.47314,56433.96404,97647.79722,82087.221,442,360
Rubyjruby9.2.9.07.451,878,18434.111,976,84459.837,115,448101.3910,970,476
Rubyruby3.0.0p05.94371,83224.8792,84474.321,015,096105.131,479,772
Gotinygo run0.16.0999.99318,1483.68300,548252.34711,3401256.011,330,036

Golang still the winner (obviously, since it's compiled), then Nim (Compiled), next best JIT or interpreter is NodeJS, Crystal (Compiled, not JIT), v8, followed Java, by TCC (Compiled) Dart, PyPy, V (Compiled, not JIT), LuaJIT, PHP, Ruby, and Python3. The recap spreadsheet can be accessed here

FAQ:
1. Why you measure the compile duration too? because developer experience also important (feedback loop), at least for me.
2. Why not warming up the VM first? each implementation have it's own advantage and disadvantage.
3. Why there's no C++, VB.NET, C#, D, Object-Pascal? don't want to compile things (since there's no build and run command in one flag).  
4. Why there's no Kotlin, Scala, Rust, Pony, Swift, Groovy, Julia, Crystal, or Zig? Too lazy to add :3 you can contribute tho (create a pull request, then I'll run the benchmark again as preferabbly as there's precompiled binary/deb/apt/ppa repository for the compiler/interpreter).

Contributors
ilmanzo (Nim, Crystal, D)