Programming Rants: 2021

2021-12-31

String Associative Array and CombSort Benchmark 2021 Edition

Last year, we've done string associative benchmark and lesser string associative benchmark (measuring string concat operation and built-in associative array set and get), numeric comb sort benchmark and string comb sort benchmark (measuring basic array random access, string conversion, and array swap for number and string), this year's using newer server: 32-core running on 64-bit Ubuntu 21.10. This time we will skip programming languages that are no deb packages (unless the install script is just one line and doesn't ruin system directories) or no direct compile-run command like previous one, also only best of 3 runs.

$ alias time='/usr/bin/time -f "\nCPU: %Us\tReal: %es\tRAM: %MKB"

This time, NodeJS failed to complete (after waiting for an hour) with 10x more data compared to last year for assoc benchmark. Here's the spreadsheet and final result (Real duration and RAM):

Language	Command Flags	Version	Assoc	RAM	Num Comb	RAM	Str Comb	RAM	Total	RAM
Go	go run	go1.17.5	12.39	2,305,824	0.46	83,096	3.33	245,896	16.18	2,634,816
Java	java	18-ea+15-Ubuntu-4	10.70	5,582,308	0.96	170,240	6.93	722,152	18.59	6,474,700
Nim	nim r -d:release --gc:arc	1.4.2	17.90	4,200,212	1.21	79,444	5.45	633,816	24.56	4,913,472
Python	pypy	7.3.5	19.70	3,104,440	2.17	140,124	5.73	523,264	27.60	3,767,828
C	tcc -run	0.9.27	22.68	2,820,568	1.01	80,484	4.80	392,896	28.49	3,293,948
Julia	julia	1.6.5	23.98	3,714,448	0.52	255,844	4.60	861,076	29.10	4,831,368
V	v -prod run	0.2.4 a0a1807	20.36	8,910,856	2.67	124,248	6.48	470,136	29.51	9,505,240
Lua	luajit	2.1.0-beta3	16.70	1,133,876	3.27	133,504	10.80	511,468	30.77	1,778,848
Dart	dart	2.15.1	25.92	2,101,152	1.40	208,056	5.75	574,852	33.07	2,884,060
Crystal	crystal run --release	1.2.2	13.77	2,371,320	7.73	202,552	12.08	440,748	33.58	3,014,620
Nim	nim r -d:release	1.4.2	24.92	3,864,100	1.76	79,548	7.51	1,211,200	34.19	5,154,848
PHP	php	8.0.8	9.94	1,368,808	7.74	328,344	24.84	641,448	42.52	2,338,600
Crystal	crystal run	0.35.1	38.85	2,372,004	9.97	179,176	22.27	441,720	71.09	2,992,900
V	v run	0.2.4 a0a1807	51.81	8,911,004	6.63	79,716	18.43	470,420	76.87	9,461,140
Python	python3	3.9.7	43.11	4,106,892	29.16	405,332	43.08	722,996	115.35	5,235,220
Nim	nim r	1.4.2	88.05	3,864,048	2.93	79,536	32.60	1,211,260	123.58	5,154,844
Ruby	ruby	ruby 2.7.4p191	52.48	2,970,908	27.15	100,320	52.23	708,940	131.86	3,780,168
Javascript	node	v16.13.1	999.99	9,999,999	0.71	115,044	6.11	461,836	1006.81	10,576,879

FAQ

1. Why you measure the compile duration too? because developer experience also important (feedback loop, edit-rebuild/compile-run), at least for me, it would be sucks a lot if we have to wait a minute to compile before we can test something. We could always write precalculated values with C++ template to make runtime faster for example, but the compilation delay would be very sucks.

2. Why not warming up the VM first? each implementation have it's own advantage and disadvantage. We already know, that compiled language mostly faster at runtime, but at cost of relatively slower development feedback loop. Interpreted language mostly slower at runtime, especially if executed using VM that have startup overhead, in exception of one with AOT or JIT optimization. So to make it fair for every kind of implementation, we do it differently by not glorifying the runtime performance (which super make sense for server or long-lived process, but not best for development or CI cost which people often neglected), but by total performance which consist of Compile duration (if any) + VM startup duration (if any) + AOT or JIT duration (if any) + Runtime duration, so every strategy the PL's implementator use can be fairly judged.

3. Why there's no C++, VB.NET, C#, D, Object-Pascal? don't want to compile things (since there's no build and run command in one flag).

4. Why there's no Kotlin, Scala, Rust, Elixir, Pony, Swift, Groovy, or Zig? Too lazy to add :3 you can contribute tho (create a pull request, then I'll run the benchmark again as preferably as there's precompiled binary/deb/apt/ppa repository for the compiler/interpreter).

5. Why there's no Ruby 3.1? I can't find any PPA for latest Ruby, latest one on Ubuntu 21.10 repo is Ruby2.7.

Contributors: ilmanzo (Nim, Crystal, D), inkydragon (Julia)

2021-12-28

Object Storage Service with CDN

There's a lot of S3-like service, but some of them doesn't have CDN-like feature, we have to manually cache them or use CDN manually. Today we're gonna compare each service either S3 or CDN storage in terms of storage, bandwidth, and minimum price and location (SG or Tokyo if possible).

These price collected as per 2021-12-28 10:58 GMT+7

Provider Name	Location	Storage Price/GB/month	Bandwidth Price/GB	Other Price
Azure Blob	SEA	$0.195-$0.016 (hot/cold)	$0.09	Put cost $0.0296-$0.13, read also have cost
IBM Object Storage	Tokyo	$0.0237-$0.0085 (auto hot/cold)	$0.14 (50TB), $0.11 (+100TB), $0.08 (+350TB)	Put cost $0.0050 per 10K
Google Cloud Storage	SG	$0.020-$0.005 (hot/cold)/GB	$0.12 (1TB), $0.11 (9TB), $0.08	Put cost $0.05-$0.004 per 10K
AWS S3	SG	$0.025 (50TB), $0.024 (450TB), $0.023	$0.12 (10TB), $0.085 (40TB), $0.082, $0.09 to S3 other S3 region	Put cost $0.005 per 1K
Dreamhost Cloud Storage	?	$0.025, $0.0238-$0.0146 (40GB, 20TB prepaid)	$0.05
BunnyNet CDN+EdgeStorage	Asia	$0.03	$0.03
Linode Object Storage	SG	$0.02	$0.01	$5 minimum for 250GB storage/500GB transfer
Vultr Object Storage	Not stated, SG available	$0.02	$0.01	$5 minimum for 250GB storage/1TB transfer
DigitalOcean Spaces	SG	$0.02	$0.01	$5 minimum for 250GB storage/1TB transfer
5Cents CDN/Akamai	WorldWide	$0.05, $0.0143 (alacarte, pay-as-you-go)	$0.0075-$0.015 (akamai)	$7.5 minimum for 1TB for alacarte
PushR CDN+SFS	US, EU, Asia	$0.015	$0.01-$0.04 (depends on network zone)
Backblaze B2	Only US or NL	$0.005	$0.01, free if through CDN partner	Put cost $0.004 per 10K, 2500 free per day
Contabo	EU only	$0.00996 (promo)	free	$2.49 minimum for 250GB storage
Filebase	?	$0.0059	$0.0059	$5.99 minimum for 1TB storage/1TB transfer
Wasabi	APAC	$0.0068	free*	$6.99 minimum for 1TB storage (=max egress)
StorJ	?	$0.004 ($4/TB after 150GB)	$0.007 ($7/TB after 150GB)

What if you need to transfer between S3-compatible provider? try https://packetfabric.com/transporter

2021-12-18

Coolest PaaS/IaaS I've ever use: Jelastic

So, I'm looking an simplest deployment strategy for my next side project, I don't want to use Kubernetes since I'm all alone XD, learning Nomad, WayPoint, Swarm, and other popular tool to make it easy like Portainer, but why they doesn't make it just as simple as Vercel or Fly.io). Also don't want to use big cloud providers (GCP, AWS, Azure, etc) which the UI quite sucks like everything developed by different team with lack of communication and you have to do a lot of setup hassle just to deploy simple things. Then I found a really cool product called Jelastic, that fit my needs:

Can autoscale out (like AWS ELB/ECS, GCR, ACS, etc) and auto-clustering (as easy as CloudSQL or AWS RDS/Aurora, but can be automatic)
Can autoscale up '__') without downtime, only took 1 second to scale up from 1 core 640MB to 16 core 32GB (seems like they only changing container's resource quota limit) but you can see the changes directly without restart
Can deploy VPS on the same cluster/network (for my databases, since I don't use "standard/popular" databases) and it's super cheap (it only took 3.9$ per month to deploy a VPS with 1 static IP, and can autoscale up), you only need to pay what you utilize (CPU and RAM usage), not charged 100% when server up unlike other VPS providers
The UI doesn't sucks XD you can WebSSH, normal SSH (as long as have real IP), easy SSL setup, super easy to change config, the lacking part about Jelastic probably configfile/gitops-based setup (for working with multiple members in the future) at least there's API and CLI to create and modify environment, not sure if there's auditing available (haven't checked yet).
Can also deploy automatically from git (checked every N minutes) or CI pipeline or using CLI.
Easy to move (live migration) to different providers, change ownership of a cluster, or if it's not enabled, at least there's no vendor locking, you can also manually export and import environment (for example copying staging setup to production has similar architecture just different deployment branch and scaling strategy).

Other cool things that I won't use: deploying any-container/docker-based with easy steps, deploying kubernetes, bunch of stack in marketplace provided (may vary on different provider).

For 3.9$ (if you utilize only 1%) per month (16 core, 32GB RAM, 200 GB NVMe VPS, 1 static IP, provider: ToggleBox), you can get the greenest (highest on average) result among all VPS I've ever tried:

You can see the raw benchmark result here and recap here.

What's the catch?

It's quite expensive if you utilize 100% (around 339$ if you use ToggleBox for the specs above), for comparison:
1. cheapest highest spec Contabo's VPS (9 core, 60GB RAM, 1.5TB SSD) unmetered bandwidth only cost $55-ish per month (not apple-to-apple since it's different spec and performance, also this is what you should pay per month regardless your utilization)
2. similar spec GCE n1-custom-16-32768 (16 core, 32GB, 200GB SSD) non-committed, cost $525 excluding bandwidth
3. similar spec AWS EC2 a1.xlarge (16 core, 32GB RAM, 200GB gp2 SSD) on-demand, only cost $317 excluding bandwidth
4. similar spec Azure F16s (16 core, 32GB RAM, 256GB SSD) pay-as-you-go, cost $634 excluding bandwidth
5. cheapest OVH on SG (8 core. 64GB RAM, 400GB SSD) only cost $135 with unmetered 200Mbps bandwidth
but still, this is way cheaper for minimum usage than if you use GCR you will be billed around ~$10 per month for idle instance, or ~$37 for standby instance (for 1 VCPU, 1 GB RAM, not including bandwidth that quite pricey $0.085)
Some provider have different "free" tier, for example ToggleBox give free 2GB bandwidth per hour (GCR only give free 1GB per month XD), some other provider give free 1 static IP, some other provider give free 10GB disk usage per hour, etc.
License might be pricey if you install it on your own cluster instead of using the already provided (eg. DewaCloud or CloudKilat for Indonesia region, ToggleBox for US region, etc), but they have profit sharing model if you are a reseller (have your own VPS and rent it).
The billing is hourly (so you will always billed at minimum 1 cloudlet -- specs of 1 cloudlet can be vary per provider), compared to for example GCR that use second as minimum billing resolution (VCPU, GB RAM, Requests, and Bandwidth).

That's it for now, I'll create a new post if I found something better.

2021-11-22

Kafka vs RedPanda Benchmark (also Tarantool and Clickhouse as queue)

Using default settings from their docker-compose example, today we're gonna benchmark one of popular MQ/PubSub software. I never used MQ extensively before (only NATS, Google PubSub, ActiveMQ, and Amazon SQS), usually just using standard database that stores event is sufficient (the consumer using pull, tailing from last primary key counter, and if need to fan-out just use multiple goroutine and multiple channel), because my projects never been a latency sensitive applications.

Some issues:

the benchmark has locking (atomic counters, sync.Map, etc), so consumer might not utilize whole CPU cores.
confluent's kafka docker always error when starting because /var/lib/kafka/data not writable, so I bind on /var/lib/kafka instead. Clickhouse also always failed to start when bind to /var/lib/clickhouse/data, so I don't bind volume for Clickhouse.
RedPanda failed to start when fs.aio-max-nr even when it's already ~1 million (originally only 64K), so I set it to 4194304

Benchmarking 1000 goroutines publishing 2000 messages each, with 100 goroutines consuming in parallel.

REDPANDA version: v21.10.1 (rev e7b6714)

=== redpanda single:

FailProduce: 0

FailConsume: 0

DoubleConsume: 0

Produced (ms): 2387

MaxLatency (ms): 2125

AvgLatency (ms): 432

Total (s) 3.457646367s

FailProduce: 0

FailConsume: 0

DoubleConsume: 0

Produced (ms): 2408

MaxLatency (ms): 2663

AvgLatency (ms): 490

Total (s) 3.459949739s

=== redpanda multi:

FailProduce: 0

FailConsume: 0

DoubleConsume: 0

Produced (ms): 4187

MaxLatency (ms): 12146

AvgLatency (ms): 9701

Total (s) 13.610533861s

# ^ weird, maybe startup not yet complete?

# retried reinit docker-compose, 1st time always slow

# but 2nd time always fast:

FailProduce: 0

FailConsume: 0

DoubleConsume: 0

Produced (ms): 2413

MaxLatency (ms): 2704

AvgLatency (ms): 467

Total (s) 3.545496041s

KAFKA version: 7.0.0-ccs (Commit:c6d7e3013b411760)

equal to kafka 3.0.0

=== kafka single:

FailProduce: 0

FailConsume: 0

DoubleConsume: 0

Produced (ms): 6634

MaxLatency (ms): 12052

AvgLatency (ms): 8579

Total (s) 13.722706977s

FailProduce: 0

FailConsume: 0

DoubleConsume: 0

Produced (ms): 6380

MaxLatency (ms): 11856

AvgLatency (ms): 8636

Total (s) 13.625928209s

=== kafka multi:

FailProduce: 0

FailConsume: 0

DoubleConsume: 0

Produced (ms): 6596

MaxLatency (ms): 11932

AvgLatency (ms): 8523

Total (s) 13.659630863s

FailProduce: 0

FailConsume: 0

DoubleConsume: 0

Produced (ms): 6535

MaxLatency (ms): 11903

AvgLatency (ms): 8588

Total (s) 13.677644818s

These benchmark using default settings that exists in the docker examples I found, except SMP (I set it to the same amount of cores in the server that used to benchmark to make it fair with Kafka that uses JVM that by default can utilize all cores -- apparently this has insignificant impact). Current conclusion is, RedPanda way faster than Kafka, in terms of publishing speed (around ~1μs per message, 477K-837K msg/s) and consuming latency (432ms to 2.7s per message), while Kafka (around ~3μs per message, 301K-313K msg/s) and 8.5s to 12s per message. The RAM statistics tho, RedPanda uses 12GB for each node (10% of server's RAM), while Kafka only uses 355MB, 375MB, 788MB for nodes, and 120MB for zookeeper. The repo to reproduce this benchmark is here on 2021mq directory.

Btw if you're looking for Kafka/RedPanda GUI, try KOwl, this way more beautiful than ActiveMQ default Web UI.

Bonus rounds, using one of the fastest OLTP database: Tarantool and one of the fastest OLAP database: Clickhouse as Queue, by laveraging sequence (auto increment) or internal function to generate a sequence, the difference is there's only one consumer group (have to manually fan out using goroutine), no json encode and decode since it's structured database:

TARANTOOL version: 2.8.2

=== tarantool single (memtx):

FailProduce:  0

FailConsume:  0

DoubleConsume:  0

Produced (ms):  11238

MaxLatency (ms):  1071

AvgLatency (ms):  101

Total (s) 11.244551225s

FailProduce: 0

FailConsume: 0

DoubleConsume: 0

Produced (ms): 9596

MaxLatency (ms): 816

AvgLatency (ms): 61

Total (s) 9.957516119s

=== tarantool single (vinyl):

FailProduce:  0

FailConsume:  0

DoubleConsume:  0

Produced (ms):  11383

MaxLatency (ms):  1076

AvgLatency (ms):  157

Total (s) 11.388865281s

FailProduce:  0

FailConsume:  0

DoubleConsume:  0

Produced (ms):  9104

MaxLatency (ms):  102

AvgLatency (ms):  13

Total (s) 9.196549551s

CLICKHOUSE version: 21.11.4.14

=== clickhouse single:

FailProduce: 0
FailConsume: 0

DoubleConsume: 0

Produced (ms): 2052

MaxLatency (ms): 2078

AvgLatency (ms): 1461

Total (s) 3.570767491s

FailProduce: 0

FailConsume: 0

DoubleConsume: 0

Produced (ms): 2057

MaxLatency (ms): 2008

AvgLatency (ms): 1445

Total (s) 3.536277427s

The result recap table (ms = millisecond, us = microsecond, ns = nanosecond):

only best of 2 runs	RedPanda single	RedPanda multi	Kafka single	Kafka multi	Tarantool memtx	Tarantool vinyl	Clickhouse single
Publish (ms)	2,387	2,413	6,380	6,535	9,596	9,104	2,052
Sub Max Latency (ms)	2,125	2,704	11,856	11,903	816	102	2,008
Sub Avg Latency (ms)	490	467	8,636	8,523	61	13	1,445
Pub Troughput (msg/s)	837,872	828,844	313,480	306,044	208,420	219,684	974,659
est. Pub Latency (ns)	1,194	1,207	3,190	3,268	4,798	4,552	1,026
est. Sub Throughput (msg/s)	4,081,633	4,282,655	231,589	234,659	32,786,885	153,846,154	1,384,083

Conclusion: Tarantool probably the only single node database that can compete with Kafka for queue use case (we can have multi-master replica but not recommended, it's better to use master-slave config where slave used as failover), for other database especially RDBMS that persist to disk pretty sure can only do ~50K tps, Clickhouse can be multi-master, and last time i check, it can do ~600K inserts per seconds (while this time it's around 1M inserts per seconds), I simulate the atomic counter on Clickhouse using TimeStamp64Milli, the query limited to 100 queries per second but it's quite good enough for pub-sub use case. The benefit of using database as MQ/PubSub is that you can do a very flexible query (SQL support), mostly better tooling (especially Clickhouse), or update the record for new consumer, but the cons is that you must notify/fan-out (for example using NATS broadcast, only push the signal for worker to pull), track the ack/retries and the read offset of the workers yourself (pull).

Subscribe to: Posts ( Atom )