2019-07-25

The Benchmarker's Web Framework Benchmark

Latest update (2019-07-19) from the-benchmarker's web-framework:

Language (Runtime)Framework (Middleware)Requests / sThroughput
c (11)agoo-c (0.5)199670.00115.49 MB
python (3.7)japronto (0.1)177634.00212.57 MB
java (8)rapidoid (5.5)153167.00275.56 MB
go (1.12)fasthttprouter (0.1)146986.67236.54 MB
python (3.6)vibora (0.0)144171.33163.66 MB
c (99)kore (3.1)142837.67370.30 MB
cpp (11)evhtp (1.2)141011.33136.87 MB
java (8)act (1.8)137266.33236.87 MB
ruby (2.6)agoo (2.8)132990.6776.84 MB
rust (1.36)gotham (0.4)130192.33266.35 MB
crystal (0.29)router.cr (0.2)123911.33116.40 MB
nim (0.2)jester (0.4)123719.00248.70 MB
crystal (0.29)raze (0.3)122186.33114.87 MB
crystal (0.29)spider-gazelle (1.4)120138.00128.27 MB
crystal (0.29)kemal (0.25)114424.33187.01 MB
rust (1.36)actix-web (1.0)114286.67163.27 MB
crystal (0.29)amber (0.28)105704.33193.62 MB
rust (1.36)nickel (0.11)102067.33202.98 MB
csharp (7.3)aspnetcore (2.2)100367.67163.49 MB
rust (1.36)iron (0.6)99692.33125.66 MB
crystal (0.29)orion (1.7)95829.67156.64 MB
go (1.12)gorouter (4.0)91250.00121.51 MB
node (12.6)polkadot (1.0)90498.00135.64 MB
go (1.12)chi (4.0)89401.33119.52 MB
node (12.6)0http (1.0)88940.67133.26 MB
go (1.12)gin (1.4)88229.00154.70 MB
go (1.12)violetear (7.0)87979.00116.68 MB
node (12.6)restana (3.3)87181.67130.61 MB
go (1.12)echo (4.1)86944.33152.32 MB
go (1.12)kami (2.2)85569.00113.85 MB
go (1.12)beego (1.12)83531.33112.24 MB
go (1.12)gorilla-mux (1.7)83107.67110.75 MB
kotlin (1.3)ktor (1.2)76189.67118.63 MB
go (1.12)gf (1.8)73145.67110.94 MB
node (12.6)polka (0.5)71049.67106.46 MB
scala (2.12)akkahttp (10.1)69006.00147.87 MB
node (12.6)rayo (1.3)68066.67102.05 MB
python (3.7)falcon (2.0)60301.00141.34 MB
swift (5.0)perfect (3.1)60239.6756.60 MB
node (12.6)muneem (2.4)58723.6787.98 MB
scala (2.12)http4s (0.18)58317.33102.08 MB
node (12.6)fastify (2.6)58029.33147.94 MB
node (12.6)foxify (0.1)53745.00112.74 MB
java (8)spring-boot (2.1)52174.0039.04 MB
node (12.6)koa (2.7)50993.67107.80 MB
python (3.7)blacksheep (0.1)50145.67102.88 MB
python (3.7)bottle (0.12)49704.67122.36 MB
node (12.6)restify (8.2)45617.0079.87 MB
php (7.3)slim (3.12)43847.33217.11 MB
php (7.3)zend-expressive (3.2)42281.00209.34 MB
php (7.3)symfony (4.3)42019.67208.50 MB
python (3.7)starlette (0.12)41710.6789.72 MB
node (12.6)express (4.17)41081.33100.31 MB
php (7.3)zend-framework (3.1)39650.00196.61 MB
swift (5.0)kitura (2.7)39061.3372.50 MB
ruby (2.6)roda (3.22)38720.6736.90 MB
swift (5.0)vapor (3.3)38685.0064.54 MB
python (3.7)hug (2.5)37882.3393.84 MB
php (7.3)lumen (5.8)37822.00196.49 MB
ruby (2.6)cuba (3.9)35257.0041.55 MB
crystal (0.28)lucky (0.14)33885.0041.73 MB
crystal (0.29)onyx (0.5)32685.6783.76 MB
node (12.6)turbo_polka (2.0)31139.6729.22 MB
ruby (2.6)rack-routing (0.0)29710.3317.13 MB
node (12.6)hapi (18.1)29298.3375.73 MB
php (7.3)laravel (5.8)28941.33151.14 MB
swift (5.0)kitura-nio (2.7)28372.0053.53 MB
python (3.7)fastapi (0.33)27457.6759.12 MB
python (3.7)aiohttp (3.5)23169.0052.40 MB
ruby (2.6)flame (4.18)20298.3311.70 MB
python (3.7)molten (0.27)19610.0036.40 MB
python (3.7)flask (1.1)19088.3346.94 MB
ruby (2.6)hanami (1.3)18242.67137.89 MB
rust (nightly)rocket (0.4)17988.3327.86 MB
python (3.7)bocadillo (0.18)17408.3333.59 MB
python (3.7)sanic (19.6)14934.0026.61 MB
ruby (2.6)sinatra (2.0)14907.3338.66 MB
swift (5.0)swifter (1.4)11351.6714.52 MB
python (3.7)quart (0.9)10817.6721.55 MB
python (3.7)responder (1.3)8826.3319.23 MB
python (3.7)django (2.2)7604.6722.02 MB
python (3.7)tornado (5.1)7089.3320.92 MB
python (3.7)masonite (2.2)6298.6715.47 MB
crystal (0.29)athena (0.7)6247.677.81 MB
ruby (2.6)rails (5.2)3680.3311.28 MB
python (3.7)cyclone (0.0)2889.337.85 MB
It's interesting to see new frameworks (or one that I never heard of.. Vibora, Agoo, and Gotham for example) performing well.
But as usual, this just router, the bottleneck is mostly always the database.

Techempower Framework Benchmark Round 18

Framework Benchmark 18 is out (half year after previous result), the shocking result that Vert.x version of Javascript just killing almost everyone except Rust. Top performing programming languages for updating-database benchmark are: Rust, Java, Javascript, C++, C#, Go, Kotlin, Dart, Python.

For multiple-queries benchmark, the top performers are: Rust, Java, Javascript, C, Kotlin, C++, Clojure, Go, PHP, Perl, C#.

Rust is quite interesting, the only drawback that I found other than the syntax is the slow compile, it took nearly 6 seconds for even a minor changes (with Actix framework) in ramdisk to recompile, even with slow compile flags turned off.

2019-07-24

Expose LXC/LXD Container Ports to Public

LXC/LXD is lightweight OS-level virtualization on Linux, much like OpenVZ. It was used by early version of Docker. The benefit of using LXC/LXD is when you need a virtualization but also need fast startup and near-baremetal performance (especially compared to full-virtualization like KVM or VirtualBox). The difference between Docker and LXC is which level they are targeting, Docker is more for application deployment, where LXC is machine level. LXD adds REST API for LXC. Other main difference between LXC and Docker is that Docker has a copy-on-write file system built-in. To start using LXD, just install and run:

sudo apt install lxc lxd libvirt-bin zfsutils-linux
sudo lxd init

# there would be questions to be answered like these:
Would you like to use LXD clustering? (yes/no) [default=no]: 
Do you want to configure a new storage pool? (yes/no) [default=yes]: 
Name of the new storage pool [default=default]: 
Name of the storage backend to use (dir, lvm, zfs) [default=zfs]: 
Create a new ZFS pool? (yes/no) [default=yes]: 
Would you like to use an existing block device? (yes/no) [default=no]: 
Size in GB of the new loop device (1GB minimum) [default=100GB]:    
Would you like to connect to a MAAS server? (yes/no) [default=no]: 
Would you like to create a new local network bridge? (yes/no) [default=yes]: 
What should the new bridge be called? [default=lxdbr0]: 
What IPv4 address should be used? (CIDR subnet notation, “auto” or “none”) [default=auto]: 
What IPv6 address should be used? (CIDR subnet notation, “auto” or “none”) [default=auto]: 
Would you like LXD to be available over the network? (yes/no) [default=no]: yes
Address to bind LXD to (not including port) [default=all]: 127.0.0.1
Port to bind LXD to [default=8443]: 
Trust password for new clients: 
Again: 
Would you like stale cached images to be updated automatically? (yes/no) [default=yes] 
Would you like a YAML "lxd init" preseed to be printed? (yes/no) [default=no]:

# cache one and run one container, but this will only shown on lxc-ls
sudo lxc-create -t download -n container1 -- --dist ubuntu --release bionic --arch amd64
sudo lxc-start --name container1 --daemon
sudo lxc-info --name container1
sudo lxc-stop --name container1
sudo lxc-destroy --name container1

# or run one container
lxc launch ubuntu:18.04 container1


# run command inside, enable ssh with password, change the root password
lxc exec container1 bash
echo '
PermitRootLogin yes
PasswordAuthentication yes
' > /etc/ssh/sshd_config
systemctl restart ssh
passwd

Then you'll need to expose (or port forward) from outside to your container:

# get ip from your container
lxc list
+------------+---------+-----------------------+------------+-----------+
|    NAME    |  STATE  |         IPV4          |    TYPE    | SNAPSHOTS |
+------------+---------+-----------------------+------------+-----------+
| container1 | RUNNING | 10.123.126.200 (eth0) | PERSISTENT | 0         |
+------------+---------+-----------------------+------------+-----------+

# forward real port 2200 to container's port 22 and vice versa
iptables -A FORWARD -i eth0 -j DROP
iptables -A FORWARD -i lxdbr0 -m state --state NEW,INVALID -j DROP
iptables -A FORWARD -i eth0 -d 10.123.126.200 -p tcp --dport 2200 -j ACCEPT
iptables -t nat -A PREROUTING -p tcp --dport 2200 -j DNAT --to 10.123.126.200:22

You can test whether the port forwarding and ssh works using these command from another computer:

ssh -o PreferredAuthentications=keyboard-interactive,password -o PubkeyAuthentication=no root:@thePublicIpAddress -p 2200

If you need to expose more ports, for example container's 80 to real's 8080 for example, you can add the rules like this:

iptables -A FORWARD -i eth0 -d 10.123.126.200 -p tcp --dport 8080 -j ACCEPT
iptables -t nat -A PREROUTING -p tcp --dport 8080 -j DNAT --to 10.123.126.200:80

But for this case, I think it's better to use a reverse proxy instead.

Here's the performance difference between baremetal machine and LXC?

CPU model:  Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz 
Number of cores: 8
CPU frequency:  2199.996 MHz
Total amount of RAM: 30151 MB
Total amount of swap:  MB
System uptime:   147 days, 20:48,    
I/O speed:  132 MB/s
Bzip 25MB: 8.01s
Download 100MB file: 69.2MB/s


I/O speed(1st run)   : 127 MB/s
I/O speed(2nd run)   : 107 MB/s
I/O speed(3rd run)   : 107 MB/s
Average I/O speed    : 113.7 MB/s

LXC (because the write not yet committed?):

CPU model:  Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz 
Number of cores: 8
CPU frequency:  2199.996 MHz
Total amount of RAM: 30151 MB
Total amount of swap:  MB
System uptime:   20 min,    
I/O speed:  451 MB/s
Bzip 25MB: 9.40s
Download 100MB file: 63.7MB/s


I/O speed(1st run)   : 925 MB/s
I/O speed(2nd run)   : 1.2 GB/s
I/O speed(3rd run)   : 956 MB/s
Average I/O speed    : 1036.6 MB/s

2019-05-07

Serialization Benchmark

It's interesting to see the result of jeromefoe's metser and smallnest's gosercomp serialization benchmark, the combined best results are:
There's also interesting result (that optimized for deserializing speed, since it's only indexing/pointer to value, but in expense of bandwidth) such as FlatBuffers (but perform badly on metser's benchmark), or ZeroFormatter which not included in both benchmark, but on the original C# implementation has best result (also explanation how it works). But if you have to use JSON anyway for browser compatibility, please use jsoniter instead of Golang's default. If you really need to communicate between services, it's preferably to use binary format (gRPC instead of REST, or near-dead spec like SOAP). For best practice on using gRPC, see videos below:



How FlatBuffers works:

self-note tl;dr
  • use FlatBuffers if you don't care about bandwidth, want real fast deserialization 
  • use ProtoBuf if you use care about bandwidth, communicating between services using gRPC (since it's already been implemented on most languages' library)
  • use JSON if you need browser compatibility, eg. using REST
  • use Colfer or Gencode if you care about bandwidth, real fast in both case (serialization and deserialization), also both client and server written in Go
  • use ZeroFormatter if both client and server written in C#
Haven't research about bound check tho (when network package forged/tampered), not sure which of those binary formats are secure against those kind of attack.

Also checkout these interesting new database (still experimental so it doesn't support important feature such as replication, but you can use this for any embedded database use-case) that tries to reduce/minimize serialization-deserialization process from disk to memory/network (which most databases do convert from row/column/document to struct then serialize before sending to client, not sure how it will affect the query performance tho):
EDIT 2019-06-05: dammit, there's FastBinaryEncoding, that even more faster than them all -_-, should write a new benchmark for this..

2019-04-19

Huge List of Database Benchmark

Today we will benchmark a single node version of distributed database (and some non-distributed database for comparison), the client all written with Go (with any available driver). The judgement will be about performance (that mostly write, and some infrequent read), not about the distribution performance (I will take a look in some other time). I searched a lot of database from DbEngines for database that could suit my needs for my next project. For session kv-store I'll be using obviously first choice is Aerospike, but since they cannot be run inside server that I rent (that uses OpenVZ), so I'll go for second choice that is Redis. Here's the list of today's contender:
  • CrateDB, a highly optimized for huge amount of data (they said), probably would be the best for updatable time series, also with built-in search engine, so this one is quite fit my use case probably to replace [Riot (small scale) or Manticore (large scale)] and [InfluxDB or TimescaleDB], does not support auto increment
  • CockroachDB, self-healing database with PostgreSQL-compatible connector, the community edition does not support table partitioning
  • MemSQL, which also can replace kv-store, there's a limit of 128GB RAM for free version. Row-store tables can only have one PRIMARY key or one UNIQUE key or one AUTO increment column that must be a SHARD key, and it cannot be updated or altered. Column-store tables does not support UNIQUE/PRIMARY key, only SHARD KEY. The client/connector is MySQL-compatible
  • MariaDB (MySQL), one of the most popular open source RDBMS, for the sake of comparison
  • PostgreSQL, my favorite RDBMS, for the sake of comparison 
  • NuoDB on another benchmark even faster than GoogleSpanner or CockroachDB, the community edition only support 3 transaction engine (TE) and 1 storage manager (SM)
  • YugaByteDB, distributed KV+SQL with Cassandra and PostgreSQL compatible protocol.  Some of SQL syntax not yet supported (ALTER USER, UNIQUE on CREATE TABLE).
  • ScyllaDB, a C++ version of Cassandra. All Cassandra-like databases has a lot of restrictions/annoyances by design compared to traditional RDBMS (cannot CREATE INDEX on composite PRIMARY KEY, no AUTO INCREMENT, doesn't support UNION ALL or OR operator, must use COUNTER type if you want to UPDATE x=x+n, cannot mix COUNTER type with non-counter type on the same table, etc), does not support ORDER BY other than clustering key, does not support OFFSET on LIMIT.
  • Clickhouse, claimed to be fastest and one of the most storage space efficient OLAP database, but doesn't support UPDATE/DELETE-syntax (requires ALTER TABLE to UPDATE/DELETE), only support batch insert, does not support UNIQUE, AUTO INCREMENT. Since this is not designed to be an OLTP database, obviously this benchmark would be totally unfair for Clickhouse.
What's the extra motivation of this post?
I almost never use distributed database, since all of my project have no more than 200 concurrent users/sec. I've encountered bottleneck before, and the culprit is multiple slow complex queries, that could be solved by queuing to another message queue, and process them one by one instead of bombing database's process at the same time and hogging out the memory.

The benchmark scenario would be like this:
1. 50k inserts of single column string value, 200k inserts of 2 column unique value, 900k insert of unique
INSERT INTO users(id, uniq_str) -- x50k
INSERT INTO items(fk_id, typ, amount) -- x50k x4
INSERT INTO rels(fk_low, fk_high, bond) -- x900k

2. while inserting at 5%+, there would be at least 100k random search queries of unique value/, and 300k random search queries, every search queries, there would be 3 random update of amount
SELECT * FROM users WHERE uniq_str = ? -- x100k
SELECT * FROM items WHERE fk_id = ? AND typ IN (?) -- x100k x3
UPDATE items SET amount = amount + xxx WHERE id = ? -- x100k x3

3. while inserting at 5%+, there would be also at least 100k random search queries
SELECT * FROM items WHERE fk_id = ?

4. while inserting at 5%+, there also at least 200k query of relations and 50% chance to update the bond
SELECT * FROM rels WHERE fk_low = ? or fk_high = ? -- x200k
UPDATE rels SET bond = bond + xxx WHERE id = ? -- x200k / 2


This benchmark represent simplified real use case of the game I'm currently develop. Let's start with PostgreSQL 10.7 (current one on Ubuntu 18.04.1 LTS), the configuration generated by pgtune website:

max_connections = 400
shared_buffers = 8GB
effective_cache_size = 24GB
maintenance_work_mem = 2GB
checkpoint_completion_target = 0.9
wal_buffers = 16MB
default_statistics_target = 100
random_page_cost = 1.1
effective_io_concurrency = 200
work_mem = 5242kB
min_wal_size = 2GB
max_wal_size = 4GB
max_worker_processes = 8
max_parallel_workers_per_gather = 4
max_parallel_workers = 8

Create the user and database first:

sudo su - postgres
createuser b1
createdb b1
psql 
GRANT ALL PRIVILEGES ON DATABASE b1 TO b1
\q

Add to pg_hba.conf if required, then restart:

local   all b1 trust
host all b1 127.0.0.1/32 trust
host all b1 ::1/128 trust

For slow databases, all values reduced by 20 except query-only.

$ go run pg.go lib.go
[Pg] RandomSearchItems (100000, 100%) took 24.62s (246.21 µs/op)
[Pg] SearchRelsAddBonds (10000, 100%) took 63.73s (6372.56 µs/op)
[Pg] UpdateItemsAmounts (5000, 100%) took 105.10s (21019.88 µs/op)
[Pg] InsertUsersItems (2500, 100%) took 129.41s (51764.04 µs/op)
USERS CR    :    2500 /    4999 
ITEMS CRU   :   17500 /   14997 +  698341 / 14997
RELS  CRU   :    2375 /   16107 / 8053
SLOW FACTOR : 20
CRU µs/rec  : 5783.69 / 35.26 / 7460.65

Next we'll try with MySQL 5.7, create user and database first, then multiply all memory config by 10 (since there are automatic config generator for mysql?):

innodb_buffer_pool_size=4G

sudo mysql
CREATE USER 'b1'@'localhost' IDENTIFIED BY 'b1';
CREATE DATABASE b1;
GRANT ALL PRIVILEGES ON b1.* TO 'b1'@'localhost';
FLUSH PRIVILEGES;
sudo mysqltuner # not sure if this useful

And here's the result:

$ go run maria.go lib.go
[My] RandomSearchItems (100000, 100%) took 16.62s (166.20 µs/op)
[My] SearchRelsAddBonds (10000, 100%) took 86.32s (8631.74 µs/op)
[My] UpdateItemsAmounts (5000, 100%) took 172.35s (34470.72 µs/op)
[My] InsertUsersItems (2500, 100%) took 228.52s (91408.86 µs/op)
USERS CR    :    2500 /    4994 
ITEMS CRU   :   17500 /   14982 +  696542 / 13485 
RELS  CRU   :    2375 /   12871 / 6435 
SLOW FACTOR : 20 
CRU µs/rec  : 10213.28 / 23.86 / 13097.44

Next we'll try with MemSQL 6.7.16-55671ba478, while the insert and update performance is amazing, the query/read performance is 3-4x slower than traditional RDBMS:

$ memsql-admin start-node --all

$ go run memsql.go lib.go # 4 sec before start RU
[Mem] InsertUsersItems (2500, 100%) took 4.80s (1921.97 µs/op)
[Mem] UpdateItemsAmounts (5000, 100%) took 13.48s (2695.83 µs/op)
[Mem] SearchRelsAddBonds (10000, 100%) took 14.40s (1440.29 µs/op)
[Mem] RandomSearchItems (100000, 100%) took 64.87s (648.73 µs/op)
USERS CR    :    2500 /    4997 
ITEMS CRU   :   17500 /   14991 +  699783 / 13504 
RELS  CRU   :    2375 /   19030 / 9515 
SLOW FACTOR : 20 
CRU µs/rec  : 214.75 / 92.70 / 1255.93

$ go run memsql.go lib.go # 2 sec before start RU
[Mem] InsertUsersItems (2500, 100%) took 5.90s (2360.01 µs/op)
[Mem] UpdateItemsAmounts (5000, 100%) took 13.76s (2751.67 µs/op)
[Mem] SearchRelsAddBonds (10000, 100%) took 14.56s (1455.95 µs/op)
[Mem] RandomSearchItems (100000, 100%) took 65.30s (653.05 µs/op)
USERS CR    :    2500 /    4998 
ITEMS CRU   :   17500 /   14994 +  699776 / 13517 
RELS  CRU   :    2375 /   18824 / 9412 
SLOW FACTOR : 20 
CRU µs/rec  : 263.69 / 93.32 / 1282.38

$ go run memsql.go lib.go # SLOW FACTOR 5
[Mem] InsertUsersItems (10000, 100%) took 31.22s (3121.90 µs/op)
[Mem] UpdateItemsAmounts (20000, 100%) took 66.55s (3327.43 µs/op)
[Mem] RandomSearchItems (100000, 100%) took 85.13s (851.33 µs/op)
[Mem] SearchRelsAddBonds (40000, 100%) took 133.05s (3326.29 µs/op)
USERS CR    :   10000 /   19998
ITEMS CRU   :   70000 /   59994 +  699944 / 53946
RELS  CRU   :   37896 /  300783 / 150391
SLOW FACTOR : 5
CRU µs/rec  : 264.80 / 121.63 / 1059.16

$ go run memsql.go lib.go # SLOW FACTOR 1, DB SIZE: 548.2 MB
[Mem] RandomSearchItems (100000, 100%) took 88.84s (888.39 µs/op)
[Mem] UpdateItemsAmounts (100000, 100%) took 391.87s (3918.74 µs/op)
[Mem] InsertUsersItems (50000, 100%) took 482.57s (9651.42 µs/op)
[Mem] SearchRelsAddBonds (200000, 100%) took 5894.22s (29471.09 µs/op)
USERS CR    :   50000 /   99991 
ITEMS CRU   :  350000 /  299973 +  699846 / 269862 
RELS  CRU   :  946350 / 7161314 / 3580657 
SLOW FACTOR : 1
CRU µs/rec  : 358.43 / 126.94 / 1549.13

Column store tables with MemSQL 6.7.16-55671ba478:

$ go run memsql-columnstore.go lib.go # SLOW FACTOR 20
[Mem] InsertUsersItems (2500, 100%) took 6.44s (2575.26 µs/op)
[Mem] UpdateItemsAmounts (5000, 100%) took 17.51s (3502.94 µs/op)
[Mem] SearchRelsAddBonds (10000, 100%) took 18.82s (1881.71 µs/op)
[Mem] RandomSearchItems (100000, 100%) took 79.48s (794.78 µs/op)
USERS CR    :    2500 /    4997 
ITEMS CRU   :   17500 /   14991 +  699776 / 13512 
RELS  CRU   :    2375 /   18861 / 9430 
SLOW FACTOR : 20 
CRU µs/rec  : 287.74 / 113.58 / 1645.84

Next we'll try CrateDB 3.2.7, with similar setup like PostgreSQL, the result:

go run crate.go lib.go
[Crate] SearchRelsAddBonds (10000, 100%) took 49.11s (4911.38 µs/op)
[Crate] RandomSearchItems (100000, 100%) took 101.40s (1013.95 µs/op)
[Crate] UpdateItemsAmounts (5000, 100%) took 246.42s (49283.84 µs/op)
[Crate] InsertUsersItems (2500, 100%) took 306.12s (122449.00 µs/op)
USERS CR    :    2500 /    4965 
ITEMS CRU   :   17500 /   14894 +  690161 / 14895 
RELS  CRU   :    2375 /    4336 / 2168 
SLOW FACTOR : 20 
CRU µs/rec  : 13681.45 / 146.92 / 19598.85

Next is CockroachDB 19.1, the result:

go run cockroach.go lib.go
[Cockroach] SearchRelsAddBonds (10000, 100%) took 59.25s (5925.42 µs/op)
[Cockroach] RandomSearchItems (100000, 100%) took 85.84s (858.45 µs/op)
[Cockroach] UpdateItemsAmounts (5000, 100%) took 261.43s (52285.65 µs/op
[Cockroach] InsertUsersItems (2500, 100%) took 424.66s (169864.55 µs/op)
USERS CR    :    2500 /    4988
ITEMS CRU   :   17500 /   14964 +  699331 / 14964 
RELS  CRU   :    2375 /    5761 / 2880 
SLOW FACTOR : 20 
CRU µs/rec  : 18979.28 / 122.75 / 19022.43

Next is NuoDB 3.4.1, the storage manager and transaction engine config and the benchmark result:

chown nuodb:nuodb /media/nuodb
$ nuodbmgr --broker localhost --password nuodb1pass
  start process sm archive /media/nuodb host localhost database b1 initialize true 
  start process te host localhost database b1 
    --dba-user b2 --dba-password b3
$ nuosql b1 --user b2 --password b3


$ go run nuodb.go lib.go
[Nuo] RandomSearchItems (100000, 100%) took 33.79s (337.90 µs/op)
[Nuo] SearchRelsAddBonds (10000, 100%) took 72.18s (7218.04 µs/op)
[Nuo] UpdateItemsAmounts (5000, 100%) took 117.22s (23443.65 µs/op)
[Nuo] InsertUsersItems (2500, 100%) took 144.51s (57804.21 µs/op)
USERS CR    :    2500 /    4995 
ITEMS CRU   :   17500 /   14985 +  698313 / 14985 
RELS  CRU   :    2375 /   15822 / 7911 
SLOW FACTOR : 20 
CRU µs/rec  : 6458.57 / 48.39 / 8473.22

Next is TiDB 2.1.7, the config and the result:

sudo sysctl -w net.core.somaxconn=32768
sudo sysctl -w vm.swappiness=0
sudo sysctl -w net.ipv4.tcp_syncookies=0
sudo sysctl -w fs.file-max=1000000

pd-server --name=pd1 \
                --data-dir=pd1 \
                --client-urls="http://127.0.0.1:2379" \
                --peer-urls="http://127.0.0.1:2380" \
                --initial-cluster="pd1=http://127.0.0.1:2380" \
                --log-file=pd1.log
$ tikv-server --pd-endpoints="127.0.0.1:2379" \
                --addr="127.0.0.1:20160" \
                --data-dir=tikv1 \
                --log-file=tikv1.log
$ tidb-server --store=tikv --path="127.0.0.1:2379" --log-file=tidb.log

$ go run tidb.go lib.go
[Ti] InsertUsersItems (125, 5%) took 17.59s (140738.00 µs/op)
[Ti] SearchRelsAddBonds (500, 5%) took 9.17s (18331.36 µs/op)
[Ti] RandomSearchItems (5000, 5%) took 10.82s (2163.28 µs/op)
# failed with bunch of errors on tikv, such as:
[2019/04/26 04:20:11.630 +07:00] [ERROR] [endpoint.rs:452] [error-response] [err="locked LockInfo { primary_lock: [116, 128, 0, 0, 0, 0, 0, 0, 50, 95, 114, 128, 0, 0, 0, 0, 0, 0, 96], lock_version: 407955626145349685, key: [116, 128, 0, 0, 0, 0, 0, 0, 50, 95, 114, 128, 0, 0, 0, 0, 0, 0, 96], lock_ttl: 3000, unknown_fields: UnknownFields { fields: None }, cached_size: CachedSize { size: 0 } }"]

Next is YugaByte 1.2.5.0, the result:

export YB_PG_FALLBACK_SYSTEM_USER_NAME=user1
./bin/yb-ctl --data_dir=/media/yuga create
# edit yb-ctl set use_cassandra_authentication = True
./bin/yb-ctl --data_dir=/media/yuga start
./bin/cqlsh -u cassandra -p cassandra
./bin/psql -h 127.0.0.1 -p 5433 -U postgres
CREATE DATABASE b1;
GRANT ALL ON b1 TO postgres;

$ go run yuga.go lib.go
[Yuga] InsertUsersItems (2500, 100%) took 116.42s (46568.71 µs/op)
[Yuga] UpdateItemsAmounts (5000, 100%) took 173.10s (34620.48 µs/op)
[Yuga] RandomSearchItems (100000, 100%) took 350.04s (3500.43 µs/op)
[Yuga] SearchRelsAddBonds (10000, 100%) took 615.17s (61516.91 µs/op)
USERS CR    :    2500 /    4999 
ITEMS CRU   :   17500 /   14997 +  699587 / 14997 
RELS  CRU   :    2375 /   18713 / 9356 
SLOW FACTOR : 20 
CRU µs/rec  : 5203.21 / 500.36 / 38646.88

Next is ScyllaDB 3.0.8, the result:

$ cqlsh
CREATE KEYSPACE b1 WITH replication = {'class': 'SimpleStrategy', 'replication_factor' : 1};

$ go run scylla.go lib.go
[Scylla] InsertUsersItems (2500, 100%) took 10.92s (4367.99 µs/op)
[Scylla] UpdateItemsAmounts (5000, 100%) took 26.85s (5369.63 µs/op)
[Scylla] SearchRelsAddBonds (10000, 100%) took 28.70s (2870.26 µs/op)
[Scylla] RandomSearchItems (100000, 100%) took 49.74s (497.41 µs/op)
USERS CR    :    2500 /    5000 
ITEMS CRU   :   17500 /   14997 +  699727 / 15000 
RELS  CRU   :    2375 /    9198 / 9198 
SLOW FACTOR : 20 
CRU µs/rec  : 488.04 / 71.09 / 2455.20

Next is Clickhouse 19.7.3.9 with batch INSERT, the result:

$ go run clickhouse.go lib.go
[Click] InsertUsersItems (2500, 100%) took 13.54s (5415.17 µs/op)
[Click] RandomSearchItems (100000, 100%) took 224.58s (2245.81 µs/op)
[Click] SearchRelsAddBonds (10000, 100%) took 421.16s (42115.93 µs/op)
[Click] UpdateItemsAmounts (5000, 100%) took 581.63s (116325.46 µs/op)
USERS CR    :    2500 /    4999 
ITEMS CRU   :   17500 /   14997 +  699748 / 15000 
RELS  CRU   :    2375 /   19052 / 9526 
SLOW FACTOR : 20 
CRU µs/rec  : 605.05 / 320.95 / 41493.35

When INSERT is not batched on Clickhouse 19.7.3.9:

$ go run clickhouse-1insertPreTransaction.go lib.go
[Click] InsertUsersItems (2500, 100%) took 110.78s (44312.56 µs/op)
[Click] RandomSearchItems (100000, 100%) took 306.10s (3060.95 µs/op)
[Click] SearchRelsAddBonds (10000, 100%) took 534.91s (53491.35 µs/op)
[Click] UpdateItemsAmounts (5000, 100%) took 710.39s (142078.55 µs/op)
USERS CR    :    2500 /    4999 
ITEMS CRU   :   17500 /   14997 +  699615 / 15000 
RELS  CRU   :    2375 /   18811 / 9405 
SLOW FACTOR : 20 
CRU µs/rec  : 4951.12 / 437.52 / 52117.48

These benchmark performed using i7-4720HQ 32GB RAM with SSD disk. At first there's a lot that I want to add to this benchmark (maybe someday) to make this huge '__'), such as:
  • DGraph, a graph database written in Go, the backup is local (same as MemSQL, so you cannot do something like this ssh foo@bar "pg_dump | xz - -c" | pv -r -b > /tmp/backup_`date +%Y%m%d_%H%M%S`.sql.xz")
  • Cayley, a graph layer written in Go, can support many backend storage
  • ArangoDB, multi-model database, with built-in Foxx Framework for creating REST APIs, has unfamiliar AQL syntax
  • MongoDB, one of the most popular open source document database, for the sake of comparison, I'm not prefer this one because of the memory usage.
  • InfluxDB or TimeScaleDB or SiriDB or GridDB for comparison with Clickhouse
  • Redis or SSDB or LedisDB or Codis or Olric or SummitDB, obviously for the sake of comparison. Also Cete, distributed key-value but instead using memcache protocol this one uses gRPC and REST
  • Tarantool, a redis competitor with ArrangoDB-like features but with Lua instead of JS, I want to see if this simpler to use but with near equal performance as Aerospike
  • Aerospike, fastest distributed kv-store I ever test, just for the sake of comparison, the free version limited to 2 namespace with 4 billions object. Too bad this one cannot be started on OpenVZ-based VM.
  • Couchbase, document oriented database that support SQL-like syntax (N1QL), the free for production one is few months behind the enterprise edition. Community edition cannot create index (always error 5000?).
  • GridDB, in-memory database from Toshiba, benchmarked to be superior to Cassandra
  • ClustrixDB (New name: MariaDB XPand), distributed columnstore version of MariaDB, community version does not support automatic failover and non-blocking backup
  • Altibase, open source in-memory database promoted to be Oracle-compatible, not sure what's the limitation of the open source version.
  • RedisGraph, fastest in-memory graph database, community edition available.
Skipped databases:
  • RethinkDB, document-oriented database, last ubuntu package cannot be installed, probably because the project no longer maintained
  • OrientDB, multi model (document and graph database), their screenshot looks interesting, multi-model graph database, but too bad both Golang driver are unmaintained and probably unusable for latest version (3.x)
  • TiDB, a work in progress approach of CockroachDB but with MySQL-compatible connector, as seen from benchmark above, there's a lot of errors happening
  • RQLite, a distributed SQLite, the go driver by default not threadsafe
  • VoltDB, seems not free, since the website shows "free evaluation"
  • HyperDex, have good benchmark on paper, but no longer maintained
  • LMDB-memcachedb, faster version of memcachedb, a distributed kv, but no longer maintained
  • FoundationDB, a multi-model database, built from kv-database with additional layers for other models, seems to have complicated APIs
  • TigerGraph, fastest distributed graph database, developer edition free but may not be used for production
For now, I have found what I need, so probably i'll add the rest later. The code for this benchmark can be found here: https://github.com/kokizzu/hugedbbench (send pull request then i'll run and update this post) and the spreadsheet here: http://tiny.cc/hugedb

The chart (lower is better) shown below:


Other 2018's benchmark here (tl;dr: CockroachDB mostly higher throughput, YugabyteDB lowest latency, TiDB lowest performance among those 3).

2019-04-04

How to make 2D Game that fit multiple resolution in Unity

There's a lot of screen resolution out there, how to make our UI objects (canvas) fit all resolution? One of the easiest solution is to envelope the canvas with borders, here's how you do it:

  1. Create a canvas object
  2. Set the Canvas Scaler to Scale with Screen Size
  3. Set the Reference Resolution to for example: 1080 x 800
  4. Set the Screen Match Mode to Match with Or Height
  5. Set the match to 1 if your current screen width is smaller, 0 if height is smaller
  6. Create an Image as background inside the Canvas
  7. Add Aspect Ratio Fitter script
  8. Set the Aspect Mode to Fit in Parent (so the UI anchor can be anywhere)
  9. Set the Aspect Ratio to 1080/800 = 1.35

Now you can add any UI elements inside the background Image.

Last piece is add this piece of script on the canvas' Awake method:

var canvasScaler = GetComponent<CanvasScaler>();
var ratio = Screen.height / (float) Screen.width;
var rr = canvasScaler.referenceResolution;
canvasScaler.matchWidthOrHeight = (ratio < rr.x / rr.y) ? 1 : 0;

This would ensure that the scaling/aspect ratio works correctly across any screen resolutions. There would be border on top-bottom if screen is taller than aspect ratio, and there would be border on left-right if screen is wider than aspect ratio.



2019-03-18

New Promising Programming Language: V

Found out that there's a new programming language called V (2019). At glance it's like combination of Go and Rust. Seems really promising, has really fast compilation speed.
  • No global state
  • No null
  • No undefined values
  • Option types
  • Generics
  • Immutability by default
  • Partially pure functions
  • Hot Code Reload
  • REPL
  • C/C++ converter
  • Native cross platform UI library
  • Run Everywhere
But no compiler yet (but there's already software built with it), wait until May 2019. I hope it hype :3
Here's a bit negative review about V.



Pony (2012) also share similar belief (like Rust and Erlang/Elixir combined):
  • Pony is type safe
  • Pony is memory safe
  • Exception-Safe
  • Data-race Free
  • Deadlock-Free
  • Native Code
  • Compatible with C
  • Garbage Collected
  • and many more
Zig (2017) hopefully will succeed C
  • Integration with C without binding, can compile C
  • Cross compile
  • Generics
  • Error handling and Stacktrace by default
  • Compile-time reflection and compile-time code execution
  • Simple build system
Crystal (2014), not giving similar spirit, but hey it's faster Ruby :3

And don't forget this interesting fib benchmark.