2021-12-31

String Associative Array and CombSort Benchmark 2021 Edition

Last year, we've done string associative benchmark and lesser string associative benchmark (measuring string concat operation and built-in associative array set and get), numeric comb sort benchmark and string comb sort benchmark (measuring basic array random access, string conversion, and array swap for number and string), this year's using newer server: 32-core running on 64-bit Ubuntu 21.10. This time we will skip programming languages that are no deb packages (unless the install script is just one line and doesn't ruin system directories) or no direct compile-run command like previous one, also only best of 3 runs.

$ alias time='/usr/bin/time -f "\nCPU: %Us\tReal: %es\tRAM: %MKB"

This time, NodeJS failed to complete (after waiting for an hour) with 10x more data compared to last year for assoc benchmark. Here's the spreadsheet and final result (Real duration and RAM):

LanguageCommand FlagsVersionAssocRAMNum CombRAMStr CombRAMTotalRAM
Gogo rungo1.17.512.392,305,8240.4683,0963.33245,89616.182,634,816
Javajava18-ea+15-Ubuntu-410.705,582,3080.96170,2406.93722,15218.596,474,700
Nimnim r -d:release --gc:arc1.4.217.904,200,2121.2179,4445.45633,81624.564,913,472
Pythonpypy7.3.519.703,104,4402.17140,1245.73523,26427.603,767,828
Ctcc -run0.9.2722.682,820,5681.0180,4844.80392,89628.493,293,948
Juliajulia1.6.523.983,714,4480.52255,8444.60861,07629.104,831,368
Vv -prod run0.2.4 a0a180720.368,910,8562.67124,2486.48470,13629.519,505,240
Lualuajit2.1.0-beta316.701,133,8763.27133,50410.80511,46830.771,778,848
Dartdart2.15.125.922,101,1521.40208,0565.75574,85233.072,884,060
Crystalcrystal run --release1.2.213.772,371,3207.73202,55212.08440,74833.583,014,620
Nimnim r -d:release1.4.224.923,864,1001.7679,5487.511,211,20034.195,154,848
PHPphp8.0.89.941,368,8087.74328,34424.84641,44842.522,338,600
Crystalcrystal run0.35.138.852,372,0049.97179,17622.27441,72071.092,992,900
Vv run0.2.4 a0a180751.818,911,0046.6379,71618.43470,42076.879,461,140
Pythonpython33.9.743.114,106,89229.16405,33243.08722,996115.355,235,220
Nimnim r1.4.288.053,864,0482.9379,53632.601,211,260123.585,154,844
Rubyrubyruby 2.7.4p19152.482,970,90827.15100,32052.23708,940131.863,780,168
Javascriptnodev16.13.1999.999,999,9990.71115,0446.11461,8361006.8110,576,879

FAQ

1. Why you measure the compile duration too? because developer experience also important (feedback loop, edit-rebuild/compile-run), at least for me, it would be sucks a lot if we have to wait a minute to compile before we can test something. We could always write precalculated values with C++ template to make runtime faster for example, but the compilation delay would be very sucks.
2. Why not warming up the VM first? each implementation have it's own advantage and disadvantage. We already know, that compiled language mostly faster at runtime, but at cost of relatively slower development feedback loop. Interpreted language mostly slower at runtime, especially if executed using VM that have startup overhead, in exception of one with AOT or JIT optimization. So to make it fair for every kind of implementation, we do it differently by not glorifying the runtime performance (which super make sense for server or long-lived process, but not best for development or CI cost which people often neglected), but by total performance which consist of Compile duration (if any) + VM startup duration (if any) + AOT or JIT duration (if any) + Runtime duration, so every strategy the PL's implementator use can be fairly judged.
3. Why there's no C++, VB.NET, C#, D, Object-Pascal? don't want to compile things (since there's no build and run command in one flag).  
4. Why there's no Kotlin, Scala, Rust, Elixir, Pony, Swift, Groovy, or Zig? Too lazy to add :3 you can contribute tho (create a pull request, then I'll run the benchmark again as preferably as there's precompiled binary/deb/apt/ppa repository for the compiler/interpreter).
5. Why there's no Ruby 3.1? I can't find any PPA for latest Ruby, latest one on Ubuntu 21.10 repo is Ruby2.7.

Contributorsilmanzo (Nim, Crystal, D), inkydragon (Julia)

2021-12-28

Object Storage Service with CDN

There's a lot of S3-like service, but some of them doesn't have CDN-like feature, we have to manually cache them or use CDN manually. Today we're gonna compare each service either S3 or CDN storage in terms of storage, bandwidth, and minimum price and location (SG or Tokyo if possible).

These price collected as per 2021-12-28 10:58 GMT+7

Provider Name Location Storage Price/GB/month Bandwidth Price/GB Other Price
Azure Blob SEA $0.195-$0.016 (hot/cold) $0.09 Put cost $0.0296-$0.13, read also have cost
IBM Object Storage Tokyo $0.0237-$0.0085 (auto hot/cold) $0.14 (50TB), $0.11 (+100TB), $0.08 (+350TB) Put cost $0.0050 per 10K
Google Cloud Storage SG $0.020-$0.005 (hot/cold)/GB $0.12 (1TB), $0.11 (9TB), $0.08 Put cost $0.05-$0.004 per 10K
AWS S3 SG $0.025 (50TB), $0.024 (450TB), $0.023 $0.12 (10TB), $0.085 (40TB), $0.082, $0.09 to S3 other S3 region Put cost $0.005 per 1K
Dreamhost Cloud Storage ? $0.025, $0.0238-$0.0146 (40GB, 20TB prepaid) $0.05
BunnyNet CDN+EdgeStorage Asia $0.03 $0.03
Linode Object Storage SG $0.02 $0.01 $5 minimum for 250GB storage/500GB transfer
Vultr Object Storage Not stated, SG available $0.02 $0.01 $5 minimum for 250GB storage/1TB transfer
DigitalOcean Spaces SG $0.02 $0.01 $5 minimum for 250GB storage/1TB transfer
5Cents CDN/Akamai WorldWide $0.05, $0.0143 (alacarte, pay-as-you-go) $0.0075-$0.015 (akamai) $7.5 minimum for 1TB for alacarte
PushR CDN+SFS US, EU, Asia
$0.015 $0.01-$0.04 (depends on network zone)
Backblaze B2 Only US or NL $0.005 $0.01, free if through CDN partner Put cost $0.004 per 10K, 2500 free per day
Contabo EU only
$0.00996 (promo) free $2.49 minimum for 250GB storage
Filebase ? $0.0059 $0.0059 $5.99 minimum for 1TB storage/1TB transfer
Wasabi APAC $0.0068 free*
$6.99 minimum for 1TB storage (=max egress)       
StorJ ? $0.004 ($4/TB after 150GB) $0.007 ($7/TB after 150GB)

What if you need to transfer between S3-compatible provider? try https://packetfabric.com/transporter

2021-12-18

Coolest PaaS/IaaS I've ever use: Jelastic

So, I'm looking an simplest deployment strategy for my next side project, I don't want to use Kubernetes since I'm all alone XD, learning Nomad, WayPoint, Swarm, and other popular tool to make it easy like Portainer, but why they doesn't make it just as simple as Vercel or Fly.io). Also don't want to use big cloud providers (GCP, AWS, Azure, etc) which the UI quite sucks like everything developed by different team with lack of communication and you have to do a lot of setup hassle just to deploy simple things. Then I found a really cool product called Jelastic, that fit my needs:
  1. Can autoscale out (like AWS ELB/ECS, GCR, ACS, etc) and auto-clustering (as easy as CloudSQL or AWS RDS/Aurora, but can be automatic)
  2. Can autoscale up '__') without downtime, only took 1 second to scale up from 1 core 640MB to 16 core 32GB (seems like they only changing container's resource quota limit) but you can see the changes directly without restart
  3. Can deploy VPS on the same cluster/network (for my databases, since I don't use "standard/popular" databases) and it's super cheap (it only took 3.9$ per month to deploy a VPS with 1 static IP, and can autoscale up), you only need to pay what you utilize (CPU and RAM usage), not charged 100% when server up unlike other VPS providers
  4. The UI doesn't sucks XD you can WebSSH, normal SSH (as long as have real IP), easy SSL setup, super easy to change config, the lacking part about Jelastic probably configfile/gitops-based setup (for working with multiple members in the future) at least there's API and CLI to create and modify environment, not sure if there's auditing available (haven't checked yet). 
  5. Can also deploy automatically from git (checked every N minutes) or CI pipeline or using CLI.
  6. Easy to move (live migration) to different providers, change ownership of a cluster, or if it's not enabled, at least there's no vendor locking, you can also manually export and import environment (for example copying staging setup to production has similar architecture just different deployment branch and scaling strategy).


Other cool things that I won't use: deploying any-container/docker-based with easy steps, deploying kubernetes, bunch of stack in marketplace provided (may vary on different provider).

For 3.9$ (if you utilize only 1%) per month (16 core, 32GB RAM, 200 GB NVMe VPS, 1 static IP, provider: ToggleBox), you can get the greenest (highest on average) result among all VPS I've ever tried:


You can see the raw benchmark result here and recap here.

What's the catch?
  1. It's quite expensive if you utilize 100% (around 339$ if you use ToggleBox for the specs above), for comparison:
    1. cheapest highest spec Contabo's VPS (9 core, 60GB RAM, 1.5TB SSD) unmetered bandwidth only cost $55-ish per month (not apple-to-apple since it's different spec and performance, also this is what you should pay per month regardless your utilization)
    2. similar spec GCE n1-custom-16-32768 (16 core, 32GB, 200GB SSD) non-committed, cost $525 excluding bandwidth
    3. similar spec AWS EC2 a1.xlarge (16 core, 32GB RAM, 200GB gp2 SSD) on-demand, only cost  $317 excluding bandwidth
    4. similar spec Azure F16s (16 core, 32GB RAM, 256GB SSD) pay-as-you-go, cost $634 excluding bandwidth
    5. cheapest OVH on SG (8 core. 64GB RAM, 400GB SSD) only cost $135 with unmetered 200Mbps bandwidth
    but still, this is way cheaper for minimum usage than if you use GCR you will be billed around ~$10 per month for idle instance, or ~$37 for standby instance (for 1 VCPU, 1 GB RAM, not including bandwidth that quite pricey $0.085)
  2. Some provider have different "free" tier, for example ToggleBox give free 2GB bandwidth per hour (GCR only give free 1GB per month XD), some other provider give free 1 static IP, some other provider give free 10GB disk usage per hour, etc.
  3. License might be pricey if you install it on your own cluster instead of using the already provided (eg. DewaCloud or CloudKilat for Indonesia region, ToggleBox for US region, etc), but they have profit sharing model if you are a reseller (have your own VPS and rent it).
  4. The billing is hourly (so you will always billed at minimum 1 cloudlet -- specs of 1 cloudlet can be vary per provider), compared to for example GCR that use second as minimum billing resolution (VCPU, GB RAM, Requests, and Bandwidth).


That's it for now, I'll create a new post if I found something better.

2021-11-22

Kafka vs RedPanda Benchmark (also Tarantool and Clickhouse as queue)

Using default settings from their docker-compose example, today we're gonna benchmark one of popular MQ/PubSub software. I never used MQ extensively before (only NATS, Google PubSub, ActiveMQ, and Amazon SQS), usually just using standard database that stores event is sufficient (the consumer using pull, tailing from last primary key counter, and if need to fan-out just use multiple goroutine and multiple channel), because my projects never been a latency sensitive applications.

Some issues: 
  1. the benchmark has locking (atomic counters, sync.Map, etc), so consumer might not utilize whole CPU cores.
  2. confluent's kafka docker always error when starting because /var/lib/kafka/data not writable, so I bind on /var/lib/kafka instead. Clickhouse also always failed to start when bind to /var/lib/clickhouse/data, so I don't bind volume for Clickhouse.
  3. RedPanda failed to start when fs.aio-max-nr even when it's already ~1 million (originally only 64K), so I set it to 4194304
Benchmarking 1000 goroutines publishing 2000 messages each, with 100 goroutines consuming in parallel.

REDPANDA version: v21.10.1 (rev e7b6714)

=== redpanda single:

FailProduce:  0
FailConsume:  0
DoubleConsume:  0
Produced (ms):  2387
MaxLatency (ms):  2125
AvgLatency (ms):  432
Total (s) 3.457646367s

FailProduce:  0
FailConsume:  0
DoubleConsume:  0
Produced (ms):  2408
MaxLatency (ms):  2663
AvgLatency (ms):  490
Total (s) 3.459949739s

=== redpanda multi:

FailProduce:  0
FailConsume:  0
DoubleConsume:  0
Produced (ms):  4187
MaxLatency (ms):  12146
AvgLatency (ms):  9701
Total (s) 13.610533861s 

# ^ weird, maybe startup not yet complete?
# retried reinit docker-compose, 1st time always slow
# but 2nd time always fast:

FailProduce:  0
FailConsume:  0
DoubleConsume:  0
Produced (ms):  2413
MaxLatency (ms):  2704
AvgLatency (ms):  467
Total (s) 3.545496041s


KAFKA version: 7.0.0-ccs (Commit:c6d7e3013b411760)
equal to kafka 3.0.0

=== kafka single:

FailProduce:  0
FailConsume:  0
DoubleConsume:  0
Produced (ms):  6634
MaxLatency (ms):  12052
AvgLatency (ms):  8579
Total (s) 13.722706977s

FailProduce:  0
FailConsume:  0
DoubleConsume:  0
Produced (ms):  6380
MaxLatency (ms):  11856
AvgLatency (ms):  8636
Total (s) 13.625928209s

=== kafka multi:

FailProduce:  0
FailConsume:  0
DoubleConsume:  0
Produced (ms):  6596
MaxLatency (ms):  11932
AvgLatency (ms):  8523
Total (s) 13.659630863s

FailProduce:  0
FailConsume:  0
DoubleConsume:  0
Produced (ms):  6535
MaxLatency (ms):  11903
AvgLatency (ms):  8588
Total (s) 13.677644818s

These benchmark using default settings that exists in the docker examples I found, except SMP (I set it to the same amount of cores in the server that used to benchmark to make it fair with Kafka that uses JVM that by default can utilize all cores -- apparently this has insignificant impact). Current conclusion is, RedPanda way faster than Kafka, in terms of publishing speed (around ~1μs per message, 477K-837K msg/s) and consuming latency (432ms to 2.7s per message), while Kafka (around ~3μs per message, 301K-313K msg/s) and 8.5s to 12s per message. The RAM statistics tho, RedPanda uses 12GB for each node (10% of server's RAM), while Kafka only uses 355MB, 375MB, 788MB for nodes, and 120MB for zookeeper. The repo to reproduce this benchmark is here on 2021mq directory.

Btw if you're looking for Kafka/RedPanda GUI, try KOwl, this way more beautiful than ActiveMQ default Web UI.

Bonus rounds, using one of the fastest OLTP database: Tarantool and one of the fastest OLAP database: Clickhouse as Queue, by laveraging sequence (auto increment) or internal function to generate a sequence, the difference is there's only one consumer group (have to manually fan out using goroutine), no json encode and decode since it's structured database:


TARANTOOL version: 2.8.2

=== tarantool single (memtx):

FailProduce:  0
FailConsume:  0
DoubleConsume:  0
Produced (ms):  11238
MaxLatency (ms):  1071
AvgLatency (ms):  101
Total (s) 11.244551225s

FailProduce:  0
FailConsume:  0
DoubleConsume:  0
Produced (ms):  9596
MaxLatency (ms):  816
AvgLatency (ms):  61
Total (s) 9.957516119s

=== tarantool single (vinyl):

FailProduce:  0
FailConsume:  0
DoubleConsume:  0
Produced (ms):  11383
MaxLatency (ms):  1076
AvgLatency (ms):  157
Total (s) 11.388865281s

FailProduce:  0
FailConsume:  0
DoubleConsume:  0
Produced (ms):  9104
MaxLatency (ms):  102
AvgLatency (ms):  13
Total (s) 9.196549551s


CLICKHOUSE version: 21.11.4.14

=== clickhouse single:

FailProduce:  0
FailConsume:  0
DoubleConsume:  0
Produced (ms):  2052
MaxLatency (ms):  2078
AvgLatency (ms):  1461
Total (s) 3.570767491s

FailProduce:  0
FailConsume:  0
DoubleConsume:  0
Produced (ms):  2057
MaxLatency (ms):  2008
AvgLatency (ms):  1445
Total (s) 3.536277427s

The result recap table (ms = millisecond, us = microsecond, ns = nanosecond):

only best of 2 runsRedPanda singleRedPanda multiKafka singleKafka multiTarantool memtxTarantool vinylClickhouse single
Publish (ms)2,3872,4136,3806,5359,5969,1042,052
Sub Max Latency (ms)2,1252,70411,85611,9038161022,008
Sub Avg Latency (ms)4904678,6368,52361131,445
Pub Troughput (msg/s)837,872828,844313,480306,044208,420219,684974,659
est. Pub Latency (ns)1,1941,2073,1903,2684,7984,5521,026
est. Sub Throughput (msg/s)4,081,6334,282,655231,589234,65932,786,885153,846,1541,384,083

Conclusion: Tarantool probably the only single node database that can compete with Kafka for queue use case (we can have multi-master replica but not recommended, it's better to use master-slave config where slave used as failover), for other database especially RDBMS that persist to disk pretty sure can only do ~50K tps, Clickhouse can be multi-master, and last time i check, it can do ~600K inserts per seconds (while this time it's around 1M inserts per seconds), I simulate the atomic counter on Clickhouse using TimeStamp64Milli, the query limited to 100 queries per second but it's quite good enough for pub-sub use case. The benefit of using database as MQ/PubSub is that you can do a very flexible query (SQL support), mostly better tooling (especially Clickhouse), or update the record for new consumer, but the cons is that you must notify/fan-out (for example using NATS broadcast, only push the signal for worker to pull), track the ack/retries and the read offset of the workers yourself (pull).