Showing posts with label postgresql. Show all posts
Showing posts with label postgresql. Show all posts

2023-04-26

GeoSearch Database Benchmark

So today we're gonna benchmark database that can store latitude-longitude (GPS coordinate), the benchmark spec is unbatched INSERT 100K records of (id, lat, long) tuple, search 200K times (or until deadline reached) 500 nearest point, id and the distance, move 100 points to another location 50 times, all benchmark done in 16 threads (so other 16 threads can be used by the database).

The contender that already benchmarked are:

  1. Redis GEOSEARCH (2024: also add KeyDB 4 core)
  2. PostgreSQL cube/earthdistance
  3. Tarantool RTREE
  4. TypeSense geosearch
  5. MeiliSearch _geo

Other database attempted but failed because not truly redis-compatible: DragonFlyDB, Garnet, KVRocks

Here's the result for unbatched insert:

REDIS         9.4 sec,  10639.7 rps
KEYDB         1.3 sec,  76546.2 rps
POSTGRES     10.5 sec,   9523.7 rps
TARANTOOL     0.8 sec, 126801.9 rps
TYPESENSE    96.7 sec,   1023.8 rps
MEILISEARCH 365.0 sec,    271.2 rps


This benchmark is totally unfair for Meilisearch since, their API expect it to be batched, just like Clickhouse.

Next the 500 nearby point of interest search benchmark:

REDIS        30676  (15.3%) in 50.0 sec,  613.5 rps
KEYDB         7381   (3.7%) in 50.0 sec,  147.6 rps
POSTGRES      7778   (3.9%) in 50.0 sec,  155.6 rps
TARANTOOL   200000 (100.0%) in 35.0 sec, 5716.3 rps
TYPESENSE     2177   (1.1%) in 50.0 sec,   43.5 rps


PostgreSQL actually quite fast in Tarantool level if the distance ordering removed (random order, only fetch first 500), I believe the search scope is too wide (4km square) causing too many points to be sorted by distance (and database config is default, not tuned at all causing this slowdown). TypeSense cannot search more than 250 rows limit, so that result is for 250, MeiliSearch in this case always return 0 result, not sure what's wrong.

Last benchmark is about moving 10 points 50 times, so it need to update the index periodically, the results are:

REDIS        0.1 sec,  55088.4 rps
KEYDB        0.2 sec,  28926.9 rps
POSTGRES     0.6 sec,   7954.3 rps
TARANTOOL    0.0 sec, 137424.8 rps
TYPESENSE    9.7 sec,    515.2 rps
MEILISEARCH  8.8 sec,    569.7 rps

So I guess I'll use Tarantool for this case, since it's the fastest for geo datapoints with persistence. Other possible database to benchmarked in the future are: MySQL, CockroachDB, ElasticSearch, TiDB, CouchBase, PostGIS, but I'm not sure whether their index can beat Tarantool's. If you want to contribute to this benchmark you can create PR to hugedbbench repo.


2022-12-24

CockroachDB Benchmark on Different Disk Types

Today we're going to benchmark CockroachDB one of database that I use this year to create embedded application. I use CockroachDB because I don't want to use SqLite or any other embedded database that lack of tooling or cannot be accessed by multiple program at the same time. With CockroachDB I only need to distribute my application binary, cockroachdb binary, and that's it, the offline backup also quite simple, just need to rsync the directory, or do manual rows export like other PostgreSQL-like database. Scaling out also quite simple.

Here's the result:

Disk Type Ins Dur (s) Upd Dur (s) Sel Dur (s) Many Dur (s) Insert Q/s Update Q/s Select1 Q/s SelMany Row/s SelMany Q/s
TMPFS (RAM) 1.3 2.1 4.9 1.5 31419 19275 81274 8194872 20487
NVME DA 1TB 2.7 3.7 5.0 1.5 15072 10698 80558 8019435 20048
NVMe Team 1TB 3.8 3.7 4.9 1.5 10569 10678 81820 8209889 20524
SSD GALAX 250GB 8.0 7.1 5.0 1.5 4980 5655 79877 7926162 19815
HDD WD 8TB 32.1 31.7 4.9 3.9 1244 1262 81561 3075780 7689

From the table we can see that TMPFS (RAM, obviously) is the fastest in all case especially insert and update benchmark, NVMe faster than SSD, and standard magnetic HDD is the slowest. but the query-part doesn't really have much effect probably because the dataset too small that all can fit in the cache.

The test done with 100 goroutines, 400 records insert/update per goroutines, the record is only integer and string. Queries done 10x for select, and 300x for select-many, sending small query is shown there reaching the limit  of 80K rps, inserts can reach 31K rps and multirow-query/updates can reach ~20K rps.

The repository is here if you want to run the benchmark on your own machine.

2022-05-07

Getting started with Trino

Trino is a distributed query engine, that allows you to JOIN from multiple datasources (databases like mysql, postgresql, bigquery, cassandra, mongodb, redis, prometheus, elasticsearch, csv file, google sheets, s3, etc). It's like Clickhouse but without high-tech (merge-tree) storage ability, so it cannot do blazing fast analytics query like in Clickhouse, but it can be as fast as the connected database that it uses, eg. if it uses Clickhouse connected, then it can be as fast as Clickhouse. It was developed by Facebook (previously named Presto). List of database connectors can be seen here. To use Trino, you can use dockerized version or manual:

# Docker
docker run -d
 -p 8080:8080 --name trino1 trinodb/trino
# web UI only for monitoring, use random username 

docker exec -it trino1 trino

# Ubuntu 22.04
java --version
python3 --version
# download and extract from https://trino.io/download.html
mkdir 
./trino-server-379/etc
cd trino-server-379
SRCURL=https://raw.githubusercontent.com/trinodb/trino-the-definitive-guide/master/single-installation/etc
wget -c $SRCURL/jvm.config
wget -c $SRCURL/log.properties
wget -c $SRCURL/node.properties
echo '
coordinator=true
node-scheduler.include-coordinator=true
http-server.http.port=8081
query.max-memory=5GB
query.max-memory-per-node=1GB
discovery.uri=http://127.0.0.1:8081
' > config.properties
echo '

node.data-dir=/tmp/
' >> node.properties
mkdir catalog
echo '
connector.name=cassandra
cassandra.contact-points=127.0.0.1
# more here https://trino.io/docs/current/connector/cassandra.html
' > catalog/localscylla.properties 
cd ..
python3 ./bin/launcher.py run # to run in background: start

# CLI/Client
EXELOC=/usr/bin
/trino
curl -O $EXELOC https://repo1.maven.org/maven2/io/trino/trino-cli/379/trino-cli-379-executable.jar
chmod a+x $EXELOC
trino --server http://localhost:8081

These are the list of commands can be used in trino (other than standard SQL):

SHOW CATALOGS;
SHOW SCHEMAS FROM/IN __CATALOG__; # eg. localscylla
SHOW TABLES FROM/IN __CATALOG__.__SCHEMA__;
DESCRIBE __CATALOG__.__SCHEMA__.__TABLE__;
EXPLAIN SELECT * FROM __CATALOG__.__SCHEMA__.__TABLE__;

That's it, you can add more databases connection by creating more etc/catalog/*.properties file with proper configuration (username, password, port, etc).

2019-04-19

Huge List of Database Benchmark

Today we will benchmark a single node version of distributed database (and some non-distributed database for comparison), the client all written with Go (with any available driver). The judgement will be about performance (that mostly write, and some infrequent read), not about the distribution performance (I will take a look in some other time). I searched a lot of database from DbEngines for database that could suit my needs for my next project. For session kv-store I'll be using obviously first choice is Aerospike, but since they cannot be run inside server that I rent (that uses OpenVZ), so I'll go for second choice that is Redis. Here's the list of today's contender:
  • CrateDB, a highly optimized for huge amount of data (they said), probably would be the best for updatable time series, also with built-in search engine, so this one is quite fit my use case probably to replace [Riot (small scale) or Manticore (large scale)] and [InfluxDB or TimescaleDB], does not support auto increment
  • CockroachDB, self-healing database with PostgreSQL-compatible connector, the community edition does not support table partitioning
  • MemSQL, which also can replace kv-store, there's a limit of 128GB RAM for free version. Row-store tables can only have one PRIMARY key or one UNIQUE key or one AUTO increment column that must be a SHARD key, and it cannot be updated or altered. Column-store tables does not support UNIQUE/PRIMARY key, only SHARD KEY. The client/connector is MySQL-compatible
  • MariaDB (MySQL), one of the most popular open source RDBMS, for the sake of comparison
  • PostgreSQL, my favorite RDBMS, for the sake of comparison 
  • NuoDB on another benchmark even faster than GoogleSpanner or CockroachDB, the community edition only support 3 transaction engine (TE) and 1 storage manager (SM)
  • YugaByteDB, distributed KV+SQL with Cassandra and PostgreSQL compatible protocol.  Some of SQL syntax not yet supported (ALTER USER, UNIQUE on CREATE TABLE).
  • ScyllaDB, a C++ version of Cassandra. All Cassandra-like databases has a lot of restrictions/annoyances by design compared to traditional RDBMS (cannot CREATE INDEX on composite PRIMARY KEY, no AUTO INCREMENT, doesn't support UNION ALL or OR operator, must use COUNTER type if you want to UPDATE x=x+n, cannot mix COUNTER type with non-counter type on the same table, etc), does not support ORDER BY other than clustering key, does not support OFFSET on LIMIT.
  • Clickhouse, claimed to be fastest and one of the most storage space efficient OLAP database, but doesn't support UPDATE/DELETE-syntax (requires ALTER TABLE to UPDATE/DELETE), only support batch insert, does not support UNIQUE, AUTO INCREMENT. Since this is not designed to be an OLTP database, obviously this benchmark would be totally unfair for Clickhouse.
What's the extra motivation of this post?
I almost never use distributed database, since all of my project have no more than 200 concurrent users/sec. I've encountered bottleneck before, and the culprit is multiple slow complex queries, that could be solved by queuing to another message queue, and process them one by one instead of bombing database's process at the same time and hogging out the memory.

The benchmark scenario would be like this:
1. 50k inserts of single column string value, 200k inserts of 2 column unique value, 900k insert of unique
INSERT INTO users(id, uniq_str) -- x50k
INSERT INTO items(fk_id, typ, amount) -- x50k x4
INSERT INTO rels(fk_low, fk_high, bond) -- x900k

2. while inserting at 5%+, there would be at least 100k random search queries of unique value/, and 300k random search queries, every search queries, there would be 3 random update of amount
SELECT * FROM users WHERE uniq_str = ? -- x100k
SELECT * FROM items WHERE fk_id = ? AND typ IN (?) -- x100k x3
UPDATE items SET amount = amount + xxx WHERE id = ? -- x100k x3

3. while inserting at 5%+, there would be also at least 100k random search queries
SELECT * FROM items WHERE fk_id = ?

4. while inserting at 5%+, there also at least 200k query of relations and 50% chance to update the bond
SELECT * FROM rels WHERE fk_low = ? or fk_high = ? -- x200k
UPDATE rels SET bond = bond + xxx WHERE id = ? -- x200k / 2


This benchmark represent simplified real use case of the game I'm currently develop. Let's start with PostgreSQL 10.7 (current one on Ubuntu 18.04.1 LTS), the configuration generated by pgtune website:

max_connections = 400
shared_buffers = 8GB
effective_cache_size = 24GB
maintenance_work_mem = 2GB
checkpoint_completion_target = 0.9
wal_buffers = 16MB
default_statistics_target = 100
random_page_cost = 1.1
effective_io_concurrency = 200
work_mem = 5242kB
min_wal_size = 2GB
max_wal_size = 4GB
max_worker_processes = 8
max_parallel_workers_per_gather = 4
max_parallel_workers = 8

Create the user and database first:

sudo su - postgres
createuser b1
createdb b1
psql 
GRANT ALL PRIVILEGES ON DATABASE b1 TO b1
\q

Add to pg_hba.conf if required, then restart:

local   all b1 trust
host all b1 127.0.0.1/32 trust
host all b1 ::1/128 trust

For slow databases, all values reduced by 20 except query-only.

$ go run pg.go lib.go
[Pg] RandomSearchItems (100000, 100%) took 24.62s (246.21 µs/op)
[Pg] SearchRelsAddBonds (10000, 100%) took 63.73s (6372.56 µs/op)
[Pg] UpdateItemsAmounts (5000, 100%) took 105.10s (21019.88 µs/op)
[Pg] InsertUsersItems (2500, 100%) took 129.41s (51764.04 µs/op)
USERS CR    :    2500 /    4999 
ITEMS CRU   :   17500 /   14997 +  698341 / 14997
RELS  CRU   :    2375 /   16107 / 8053
SLOW FACTOR : 20
CRU µs/rec  : 5783.69 / 35.26 / 7460.65

Next we'll try with MySQL 5.7, create user and database first, then multiply all memory config by 10 (since there are automatic config generator for mysql?):

innodb_buffer_pool_size=4G

sudo mysql
CREATE USER 'b1'@'localhost' IDENTIFIED BY 'b1';
CREATE DATABASE b1;
GRANT ALL PRIVILEGES ON b1.* TO 'b1'@'localhost';
FLUSH PRIVILEGES;
sudo mysqltuner # not sure if this useful

And here's the result:

$ go run maria.go lib.go
[My] RandomSearchItems (100000, 100%) took 16.62s (166.20 µs/op)
[My] SearchRelsAddBonds (10000, 100%) took 86.32s (8631.74 µs/op)
[My] UpdateItemsAmounts (5000, 100%) took 172.35s (34470.72 µs/op)
[My] InsertUsersItems (2500, 100%) took 228.52s (91408.86 µs/op)
USERS CR    :    2500 /    4994 
ITEMS CRU   :   17500 /   14982 +  696542 / 13485 
RELS  CRU   :    2375 /   12871 / 6435 
SLOW FACTOR : 20 
CRU µs/rec  : 10213.28 / 23.86 / 13097.44

Next we'll try with MemSQL 6.7.16-55671ba478, while the insert and update performance is amazing, the query/read performance is 3-4x slower than traditional RDBMS:

$ memsql-admin start-node --all

$ go run memsql.go lib.go # 4 sec before start RU
[Mem] InsertUsersItems (2500, 100%) took 4.80s (1921.97 µs/op)
[Mem] UpdateItemsAmounts (5000, 100%) took 13.48s (2695.83 µs/op)
[Mem] SearchRelsAddBonds (10000, 100%) took 14.40s (1440.29 µs/op)
[Mem] RandomSearchItems (100000, 100%) took 64.87s (648.73 µs/op)
USERS CR    :    2500 /    4997 
ITEMS CRU   :   17500 /   14991 +  699783 / 13504 
RELS  CRU   :    2375 /   19030 / 9515 
SLOW FACTOR : 20 
CRU µs/rec  : 214.75 / 92.70 / 1255.93

$ go run memsql.go lib.go # 2 sec before start RU
[Mem] InsertUsersItems (2500, 100%) took 5.90s (2360.01 µs/op)
[Mem] UpdateItemsAmounts (5000, 100%) took 13.76s (2751.67 µs/op)
[Mem] SearchRelsAddBonds (10000, 100%) took 14.56s (1455.95 µs/op)
[Mem] RandomSearchItems (100000, 100%) took 65.30s (653.05 µs/op)
USERS CR    :    2500 /    4998 
ITEMS CRU   :   17500 /   14994 +  699776 / 13517 
RELS  CRU   :    2375 /   18824 / 9412 
SLOW FACTOR : 20 
CRU µs/rec  : 263.69 / 93.32 / 1282.38

$ go run memsql.go lib.go # SLOW FACTOR 5
[Mem] InsertUsersItems (10000, 100%) took 31.22s (3121.90 µs/op)
[Mem] UpdateItemsAmounts (20000, 100%) took 66.55s (3327.43 µs/op)
[Mem] RandomSearchItems (100000, 100%) took 85.13s (851.33 µs/op)
[Mem] SearchRelsAddBonds (40000, 100%) took 133.05s (3326.29 µs/op)
USERS CR    :   10000 /   19998
ITEMS CRU   :   70000 /   59994 +  699944 / 53946
RELS  CRU   :   37896 /  300783 / 150391
SLOW FACTOR : 5
CRU µs/rec  : 264.80 / 121.63 / 1059.16

$ go run memsql.go lib.go # SLOW FACTOR 1, DB SIZE: 548.2 MB
[Mem] RandomSearchItems (100000, 100%) took 88.84s (888.39 µs/op)
[Mem] UpdateItemsAmounts (100000, 100%) took 391.87s (3918.74 µs/op)
[Mem] InsertUsersItems (50000, 100%) took 482.57s (9651.42 µs/op)
[Mem] SearchRelsAddBonds (200000, 100%) took 5894.22s (29471.09 µs/op)
USERS CR    :   50000 /   99991 
ITEMS CRU   :  350000 /  299973 +  699846 / 269862 
RELS  CRU   :  946350 / 7161314 / 3580657 
SLOW FACTOR : 1
CRU µs/rec  : 358.43 / 126.94 / 1549.13

Column store tables with MemSQL 6.7.16-55671ba478:

$ go run memsql-columnstore.go lib.go # SLOW FACTOR 20
[Mem] InsertUsersItems (2500, 100%) took 6.44s (2575.26 µs/op)
[Mem] UpdateItemsAmounts (5000, 100%) took 17.51s (3502.94 µs/op)
[Mem] SearchRelsAddBonds (10000, 100%) took 18.82s (1881.71 µs/op)
[Mem] RandomSearchItems (100000, 100%) took 79.48s (794.78 µs/op)
USERS CR    :    2500 /    4997 
ITEMS CRU   :   17500 /   14991 +  699776 / 13512 
RELS  CRU   :    2375 /   18861 / 9430 
SLOW FACTOR : 20 
CRU µs/rec  : 287.74 / 113.58 / 1645.84

Next we'll try CrateDB 3.2.7, with similar setup like PostgreSQL, the result:

go run crate.go lib.go
[Crate] SearchRelsAddBonds (10000, 100%) took 49.11s (4911.38 µs/op)
[Crate] RandomSearchItems (100000, 100%) took 101.40s (1013.95 µs/op)
[Crate] UpdateItemsAmounts (5000, 100%) took 246.42s (49283.84 µs/op)
[Crate] InsertUsersItems (2500, 100%) took 306.12s (122449.00 µs/op)
USERS CR    :    2500 /    4965 
ITEMS CRU   :   17500 /   14894 +  690161 / 14895 
RELS  CRU   :    2375 /    4336 / 2168 
SLOW FACTOR : 20 
CRU µs/rec  : 13681.45 / 146.92 / 19598.85

Next is CockroachDB 19.1, the result:

go run cockroach.go lib.go
[Cockroach] SearchRelsAddBonds (10000, 100%) took 59.25s (5925.42 µs/op)
[Cockroach] RandomSearchItems (100000, 100%) took 85.84s (858.45 µs/op)
[Cockroach] UpdateItemsAmounts (5000, 100%) took 261.43s (52285.65 µs/op
[Cockroach] InsertUsersItems (2500, 100%) took 424.66s (169864.55 µs/op)
USERS CR    :    2500 /    4988
ITEMS CRU   :   17500 /   14964 +  699331 / 14964 
RELS  CRU   :    2375 /    5761 / 2880 
SLOW FACTOR : 20 
CRU µs/rec  : 18979.28 / 122.75 / 19022.43

Next is NuoDB 3.4.1, the storage manager and transaction engine config and the benchmark result:

chown nuodb:nuodb /media/nuodb
$ nuodbmgr --broker localhost --password nuodb1pass
  start process sm archive /media/nuodb host localhost database b1 initialize true 
  start process te host localhost database b1 
    --dba-user b2 --dba-password b3
$ nuosql b1 --user b2 --password b3


$ go run nuodb.go lib.go
[Nuo] RandomSearchItems (100000, 100%) took 33.79s (337.90 µs/op)
[Nuo] SearchRelsAddBonds (10000, 100%) took 72.18s (7218.04 µs/op)
[Nuo] UpdateItemsAmounts (5000, 100%) took 117.22s (23443.65 µs/op)
[Nuo] InsertUsersItems (2500, 100%) took 144.51s (57804.21 µs/op)
USERS CR    :    2500 /    4995 
ITEMS CRU   :   17500 /   14985 +  698313 / 14985 
RELS  CRU   :    2375 /   15822 / 7911 
SLOW FACTOR : 20 
CRU µs/rec  : 6458.57 / 48.39 / 8473.22

Next is TiDB 2.1.7, the config and the result:

sudo sysctl -w net.core.somaxconn=32768
sudo sysctl -w vm.swappiness=0
sudo sysctl -w net.ipv4.tcp_syncookies=0
sudo sysctl -w fs.file-max=1000000

pd-server --name=pd1 \
                --data-dir=pd1 \
                --client-urls="http://127.0.0.1:2379" \
                --peer-urls="http://127.0.0.1:2380" \
                --initial-cluster="pd1=http://127.0.0.1:2380" \
                --log-file=pd1.log
$ tikv-server --pd-endpoints="127.0.0.1:2379" \
                --addr="127.0.0.1:20160" \
                --data-dir=tikv1 \
                --log-file=tikv1.log
$ tidb-server --store=tikv --path="127.0.0.1:2379" --log-file=tidb.log

$ go run tidb.go lib.go
[Ti] InsertUsersItems (125, 5%) took 17.59s (140738.00 µs/op)
[Ti] SearchRelsAddBonds (500, 5%) took 9.17s (18331.36 µs/op)
[Ti] RandomSearchItems (5000, 5%) took 10.82s (2163.28 µs/op)
# failed with bunch of errors on tikv, such as:
[2019/04/26 04:20:11.630 +07:00] [ERROR] [endpoint.rs:452] [error-response] [err="locked LockInfo { primary_lock: [116, 128, 0, 0, 0, 0, 0, 0, 50, 95, 114, 128, 0, 0, 0, 0, 0, 0, 96], lock_version: 407955626145349685, key: [116, 128, 0, 0, 0, 0, 0, 0, 50, 95, 114, 128, 0, 0, 0, 0, 0, 0, 96], lock_ttl: 3000, unknown_fields: UnknownFields { fields: None }, cached_size: CachedSize { size: 0 } }"]

Next is YugaByte 1.2.5.0, the result:

export YB_PG_FALLBACK_SYSTEM_USER_NAME=user1
./bin/yb-ctl --data_dir=/media/yuga create
# edit yb-ctl set use_cassandra_authentication = True
./bin/yb-ctl --data_dir=/media/yuga start
./bin/cqlsh -u cassandra -p cassandra
./bin/psql -h 127.0.0.1 -p 5433 -U postgres
CREATE DATABASE b1;
GRANT ALL ON b1 TO postgres;

$ go run yuga.go lib.go
[Yuga] InsertUsersItems (2500, 100%) took 116.42s (46568.71 µs/op)
[Yuga] UpdateItemsAmounts (5000, 100%) took 173.10s (34620.48 µs/op)
[Yuga] RandomSearchItems (100000, 100%) took 350.04s (3500.43 µs/op)
[Yuga] SearchRelsAddBonds (10000, 100%) took 615.17s (61516.91 µs/op)
USERS CR    :    2500 /    4999 
ITEMS CRU   :   17500 /   14997 +  699587 / 14997 
RELS  CRU   :    2375 /   18713 / 9356 
SLOW FACTOR : 20 
CRU µs/rec  : 5203.21 / 500.36 / 38646.88

Next is ScyllaDB 3.0.8, the result:

$ cqlsh
CREATE KEYSPACE b1 WITH replication = {'class': 'SimpleStrategy', 'replication_factor' : 1};

$ go run scylla.go lib.go
[Scylla] InsertUsersItems (2500, 100%) took 10.92s (4367.99 µs/op)
[Scylla] UpdateItemsAmounts (5000, 100%) took 26.85s (5369.63 µs/op)
[Scylla] SearchRelsAddBonds (10000, 100%) took 28.70s (2870.26 µs/op)
[Scylla] RandomSearchItems (100000, 100%) took 49.74s (497.41 µs/op)
USERS CR    :    2500 /    5000 
ITEMS CRU   :   17500 /   14997 +  699727 / 15000 
RELS  CRU   :    2375 /    9198 / 9198 
SLOW FACTOR : 20 
CRU µs/rec  : 488.04 / 71.09 / 2455.20

Next is Clickhouse 19.7.3.9 with batch INSERT, the result:

$ go run clickhouse.go lib.go
[Click] InsertUsersItems (2500, 100%) took 13.54s (5415.17 µs/op)
[Click] RandomSearchItems (100000, 100%) took 224.58s (2245.81 µs/op)
[Click] SearchRelsAddBonds (10000, 100%) took 421.16s (42115.93 µs/op)
[Click] UpdateItemsAmounts (5000, 100%) took 581.63s (116325.46 µs/op)
USERS CR    :    2500 /    4999 
ITEMS CRU   :   17500 /   14997 +  699748 / 15000 
RELS  CRU   :    2375 /   19052 / 9526 
SLOW FACTOR : 20 
CRU µs/rec  : 605.05 / 320.95 / 41493.35

When INSERT is not batched on Clickhouse 19.7.3.9:

$ go run clickhouse-1insertPreTransaction.go lib.go
[Click] InsertUsersItems (2500, 100%) took 110.78s (44312.56 µs/op)
[Click] RandomSearchItems (100000, 100%) took 306.10s (3060.95 µs/op)
[Click] SearchRelsAddBonds (10000, 100%) took 534.91s (53491.35 µs/op)
[Click] UpdateItemsAmounts (5000, 100%) took 710.39s (142078.55 µs/op)
USERS CR    :    2500 /    4999 
ITEMS CRU   :   17500 /   14997 +  699615 / 15000 
RELS  CRU   :    2375 /   18811 / 9405 
SLOW FACTOR : 20 
CRU µs/rec  : 4951.12 / 437.52 / 52117.48

These benchmark performed using i7-4720HQ 32GB RAM with SSD disk. At first there's a lot that I want to add to this benchmark (maybe someday) to make this huge '__'), such as:
  • DGraph, a graph database written in Go, the backup is local (same as MemSQL, so you cannot do something like this ssh foo@bar "pg_dump | xz - -c" | pv -r -b > /tmp/backup_`date +%Y%m%d_%H%M%S`.sql.xz")
  • Cayley, a graph layer written in Go, can support many backend storage
  • ArangoDB, multi-model database, with built-in Foxx Framework for creating REST APIs, has unfamiliar AQL syntax
  • MongoDB, one of the most popular open source document database, for the sake of comparison, I'm not prefer this one because of the memory usage.
  • InfluxDB or TimeScaleDB or SiriDB or GridDB for comparison with Clickhouse
  • Redis or SSDB or LedisDB or Codis or Olric or SummitDB, obviously for the sake of comparison. Also Cete, distributed key-value but instead using memcache protocol this one uses gRPC and REST
  • Tarantool, a redis competitor with ArrangoDB-like features but with Lua instead of JS, I want to see if this simpler to use but with near equal performance as Aerospike
  • Aerospike, fastest distributed kv-store I ever test, just for the sake of comparison, the free version limited to 2 namespace with 4 billions object. Too bad this one cannot be started on OpenVZ-based VM.
  • Couchbase, document oriented database that support SQL-like syntax (N1QL), the free for production one is few months behind the enterprise edition. Community edition cannot create index (always error 5000?).
  • GridDB, in-memory database from Toshiba, benchmarked to be superior to Cassandra
  • ClustrixDB (New name: MariaDB XPand), distributed columnstore version of MariaDB, community version does not support automatic failover and non-blocking backup
  • Altibase, open source in-memory database promoted to be Oracle-compatible, not sure what's the limitation of the open source version.
  • RedisGraph, fastest in-memory graph database, community edition available.
Skipped databases:
  • RethinkDB, document-oriented database, last ubuntu package cannot be installed, probably because the project no longer maintained
  • OrientDB, multi model (document and graph database), their screenshot looks interesting, multi-model graph database, but too bad both Golang driver are unmaintained and probably unusable for latest version (3.x)
  • TiDB, a work in progress approach of CockroachDB but with MySQL-compatible connector, as seen from benchmark above, there's a lot of errors happening
  • RQLite, a distributed SQLite, the go driver by default not threadsafe
  • VoltDB, seems not free, since the website shows "free evaluation"
  • HyperDex, have good benchmark on paper, but no longer maintained
  • LMDB-memcachedb, faster version of memcachedb, a distributed kv, but no longer maintained
  • FoundationDB, a multi-model database, built from kv-database with additional layers for other models, seems to have complicated APIs
  • TigerGraph, fastest distributed graph database, developer edition free but may not be used for production
For now, I have found what I need, so probably i'll add the rest later. The code for this benchmark can be found here: https://github.com/kokizzu/hugedbbench (send pull request then i'll run and update this post) and the spreadsheet here: http://tiny.cc/hugedb

The chart (lower is better) shown below:


Other 2018's benchmark here (tl;dr: CockroachDB mostly higher throughput, YugabyteDB lowest latency, TiDB lowest performance among those 3).

2017-05-26

GotRo Framework Tutorial: Go, Redis and PostgreSQL

Gotro is opinionated colleciton of libraries and framework for Go, it's a rewrite of Gokil framework that specially built for Go+Redis+PostgreSQL web application development. Previously Gokil written using the infamous julienschmidt's httprouter, the fastest on that time (2014), but a year later (2015) forked as buaazp's fasthttprouter that built based on faster valyala's fasthttp. In this tutorial, we will learn how to use this framework with Redis and PostgreSQL to create a SoHo or even medium-enterprise-class web projects.

Each folder on the directory/package uses 1 letter that contains specific commonly used functions, that are:

  • A - Array
  • B - Boolean
  • C - Character (or Rune)
  • D - Database
  • F - Floating Point
  • L - Logging
  • M - Map
  • I - Integer
  • S - String
  • T - Time (and Date)
  • W - Web (the "framework") -- deprecated, use W2 instead
  • X - Anything (aka interface{})
  • Z - Z-Template Engine, that has syntax similar to ruby string interpolation #{foo} with additional other that javascript friendly syntax {/* foo */}[/* bar */]/*! bar */
To use this libraries, run this command on the console:

go get -u -v github.com/kokizzu/gotro

To start a new project, copy a directory from W/example-simplified folder to your $GOPATH/src, that's the base of your project, that should contain something like this:

├── public
│   └── lib
│       └── jquery.js
├── start_dev.sh
├── server.go
└── views
    ├── error.html
    ├── layout.html
    ├── login_example.html
    ├── named_params_example.html
    ├── post_values_example.html
    └── query_string_example.html

To start the development server, run ./start_dev.sh, it would show something like this:

set ownership of $GOROOT..
remove $GOPATH/pkg if go upgraded/downgraded..
precompile all dependencies..
hello1
starting gin..
[gin] listening on port 3000
2017-05-26 10:55:59.835 StartServer ▶ Gotro Example [DEVELOPMENT] server with 6 route(s) on :3001
  Work Directory: /home/asd/go/src/hello1/

If you got an error, probably because you haven't installed Redis, that being used in this example to store sessions, to do that on Ubuntu, you can type:

sudo apt-get install redis-server
sudo systemctl enable redis-server
sudo systemctl start redis-server

To see the example, open your browser at http://localhost:3000 
Port 3000 is the proxy port for gin program that auto-recompile if the source code changed, the server itself listens on port 3001, if you change it on the source code, you must also change the gin target port on start_dev.sh file, by replacing -a 3001 and -p 3000   

Next we see the example on the server.go file:

redis_conn := Rd.NewRedisSession(``, ``, 9, `session::`)
global_conn := Rd.NewRedisSession(``, ``, 10, `session::`)
W.InitSession(`Aaa`, 2*24*time.Hour, 1*24*time.Hour, *redis_conn, *global_conn)
W.Mailers = map[string]*W.SmtpConfig{
``: {
Name:     `Mailer Daemon`,
Username: `test.test`,
Password: `123456`,
Hostname: `smtp.gmail.com`,
Port:     587,
},
}
W.Assets = ASSETS
W.Webmasters = WEBMASTER_EMAILS
W.Routes = ROUTERS
W.Filters = []W.Action{AuthFilter}
// web engine
server := W.NewEngine(DEBUG_MODE, false, PROJECT_NAME+VERSION, ROOT_DIR)
server.StartServer(LISTEN_ADDR)

There are 2 redis connection, one for storing local session, one for storing global session (used cross app communication).
You must call W.InitSession to tell the framework name of the cookie, default expiration (how long until a cookie expired, and every how long we should renew). On the next line, we set the mailer W.Mailers, connection that we use to send if there are panic or any other critical error within your web server.
W.Assets is the assets file, should contain any css or javascript script that will be included on every page, the assets should be saved on the public/css/ or public/js/ directory. This is the example how to fill them:

var ASSETS = [][2]string{
//// http://api.jquery.com/ 1.11.1
{`js`, `jquery`},
////// http://hayageek.com/docs/jquery-upload-file.php
{`css`, `uploadfile`},
{`js`, `jquery.form`},
{`js`, `jquery.uploadfile`},
//// https://vuejs.org/v2/guide/ 2.0
{`js`, `vue`},
//// http://momentjs.com/ 2.17.1
{`js`, `moment`},
//// github.com/kokizzu/semantic-ui-daterangepicker
{`css`, `daterangepicker`},
{`js`, `daterangepicker`},
//// http://semantic-ui.com 2.2 // should be below `js` and `css` items
{`/css`, `semantic/semantic`},
{`/js`, `semantic/semantic`},
//// global, helpers, project specific
{`/css`, `global`},
{`/js`, `global`},
}

If you start the type of the file with slash, it means it would locate the file in absolute path starting from public/. Currently only js and css files supported.

Next we must set the W.Webmasters, that is the hardcoded superadmin, one that will be receiving the error emails and could be accessed through ctx.IsWebMaster() that matching the ctx.Session.GetStr(`email`) variable with those values.

Next initialization phase, you must set the route W.Routes, which is used to assign an URL path to a handler function, for example:

var ROUTERS = map[string]W.Action{
``:                            LoginExample,
`login_example`:               LoginExample,
`post_values_example`:         PostValuesExample,
`named_params_example/:test1`: NamedParamsExample,
`query_string_example`:        QueryStringExample,
}

In this example, there are five routes with four different handler function (you can put them on a package, normally you separate them on different package based on access level), on the fourth route we capture the :value as string, that can be anything and can be retrieved by calling ctx.ParamStr(`test1`). Here's some example how to separate the handler based on first segment:

`accounting/acct_payments`:            fAccounting.AcctPayments,
`accounting/acct_invoices`:            fAccounting.AcctInvoices,
`employee/attendance_list`:            fEmployee.AttendanceList,
`employee/business_trip`:              fEmployee.BusinessTrip,
`human_resource/business_trip`:        fHumanResource.BusinessTrip,
`human_resource/employee/profile/:id`: fHumanResource.EmployeeProfileEdit,
`human_resource/employees`:            fHumanResource.Employees,

A handler function should have exactly one parameter with type *W.Context, for example:

func PostValuesExample(ctx *W.Context) {
if ctx.IsAjax() {
ajax := AjaxResponse()
value := ctx.Posts().GetStr(`test2`)
ajax.Set(`test3`, value)
ctx.AppendJson(ajax.SX)
return
}
ctx.Render(`view1`, M.SX{ // <-- locals of the view
`title`: `Post example`,
`map`: M.SI{`test1`:1,`test4`:4},
`arr`: []int{1,2,3,4},
})
}

On above function, we check if the request method is POST or not, if it's so, we assume that it's sent from AJAX, something like this if using jQuery:

var data = {test2: 'foo'};
$.post('', data, function(res) {
alert("Value: " + res.test3);
}).fail(function(xhr, textStatus, errorThrown ) {
alert(textStatus + '\n' + xhr.status);
});

On above javascript snippet, we send to current page through AJAX HTTP POST method, sending a value with key test2 that filled with string foo. The server later will capture it and sending back to client that sending that string as an object with key test3, not that anything you put on it will be converted to JSON. The javascript will retrieve that value through callback (third line on the javascript snippet).

But if client's request is not a POST method, the server will call ctx.Render that will load a file view1.html from view/ directory, if you need to pass anything to that view, put them on a M.SX that is a map with string key and any value type, note that everything you put in this map will be rendered as json. But what's the syntax? This template engine called Z-Template engine, that designed for simplicity and compatibility with javascript syntax, unlike any other template engine, the syntax will not interfere with Javascript IDE's autocomplete feature, here's the example to render values above:

<h1>#{title}</h1>
<h2>#{something that not exists}</h2>
<script>
  var title = '#{title}'; // 'Post example'
  var a_map = {/* map */}; // {"test1":1,"test4":4}
  var an_arr = [/* arr */]; // [1,2,3,4]
</script>

Different from any other template engine, any value given to the Render method that not being used will show a warning, and any key used in the template that not provided in render function will render the key itself (eg: something that not exists).

Wait, in PHP you can retrieve query parameter using $_GET variable, how to do that in this framework?

// this is Go
ctx.QueryParams().GetInt(`theKey`) // equal to $_GET['theKey']

Now back to the handler function, the ctx parameter can be used to control the output, normally when you call Render method, it would also wrap the rendered view with view/layout.html, but if you did not want that, you can call this:

ctx.NoLayout = true
ctx.Buffer.Reset() // to clear rendered things if you already call Render method
ctx.Title = `something` // to set the title, if you use the layout

Layout view have some provided values (locals), that are: title, project_name, assets (the js and css you give on the assets), is_superadmin (if the current logged in person is a webmaster), debug_mode (always true if you didn't update VERSION variable on compile time).

You can see other methods and properties available, you can see them by control-click the W.Context type from your IDE (Gogland, Wide, Visual Studio Code, etc).

Now how to connect to the database? First you must install the database, for example PostgreSQL 9.6 in Ubuntu:

sudo apt-get install postgresql
sudo systemctl enable postgresql
hba=/etc/postgresql/9.6/main/pg_hba.conf
sudo sed -i 's|local   all             all                                     peer|local all all trust|g' $hba
sudo sed -i 's|host    all             all             127.0.0.1/32            md5|host all all 127.0.0.1/32 trust|g' $hba
sudo sed -i 's|host    all             all             ::1/128                 md5|host all all ::1/128 trust|g' $hba
echo 'local all test1 trust' sudo tee -a $hba # if needed
sudo systemctl start postgresql 
sudo su - postgres <<EOF
createuser test1
createdb test1
psql -c 'GRANT ALL PRIVILEGES ON DATABASE test1 TO test1;'

EOF

After testing if your database created correctly, you must create a directory, for example model/ then create a file inside it, for example conn.go with these content:

package model
import (
"github.com/kokizzu/gotro/D/Pg"
_ "github.com/lib/pq"
)
var PG_W, PG_R *Pg.RDBMS
func init() {
PG_W = Pg.NewConn(`test1`, `test1`) 
        // ^ later when scaling we replace this one
PG_R = Pg.NewConn(`test1`, `test1`)
}

On the code above we create 2 connection, writer and reader, this is the recommended way to scale the reader through multiple servers, if you need better writer (but didn't support join, you can use ScyllaDB or Redis). Next, we create a program to initialize our tables, for example in go/init.go:

package main
import "hello1/model"
func main() {
  model.PG_W.CreateBaseTable(`users`, `users`)
  model.PG_W.CreateBaseTable(`todos`, `users`) // 2nd table
}

You must execute the gotro/D/Pg/functions.sql using psql before running the code above, it would create 2 tables with indexes with 2 log tables, triggers and some indexes, you can check it inside psql -U test1 using \dt+ or \d users command, that would show something like this:

                          Table "public.users" 
  Column    |           Type           |                     Modifiers 
------------+--------------------------+-----------------------------------------
id          | bigint                   | not null default nextval('users_id_seq'::regclass) 
unique_id   | character varying(4096)  | 
created_at  | timestamp with time zone | default now() 
updated_at  | timestamp with time zone | 
deleted_at  | timestamp with time zone | 
restored_at | timestamp with time zone | 
modified_at | timestamp with time zone | default now() 
created_by  | bigint                   | 
updated_by  | bigint                   | 
deleted_by  | bigint                   | 
restored_by | bigint                   | 
is_deleted  | boolean                  | default false 
data        | jsonb                    |

This is our generic table, what if we need more columns? You don't need to alter table, we use PostgreSQL's JSONB column data. JSONB is very powerful, it can be indexed, queried using arrow operator, greater than its competitor. Using these exact table design, we can store the old and updated value on the log, everytime somebody changed the value.

Ok, now let's create a real model from users table, create a package and file mUsers/m_users.go with content:

package mUsers
import (
 "Billions/sql"
 "github.com/kokizzu/gotro/A"
 "github.com/kokizzu/gotro/D/Pg"
 "github.com/kokizzu/gotro/I"
 "github.com/kokizzu/gotro/M"
 "github.com/kokizzu/gotro/S"
 "github.com/kokizzu/gotro/T"
 "github.com/kokizzu/gotro/W"
)
const TABLE = `users`
var TM_MASTER Pg.TableModel
var SELECT = ``
var Z func(string) string
var ZZ func(string) string
var ZJ func(string) string
var ZB func(bool) string
var ZI func(int64) string
var ZLIKE func(string) string
var ZT func(...string) string
var PG_W, PG_R *Pg.RDBMS
func init() {
 Z = S.Z
 ZB = S.ZB
 ZZ = S.ZZ
 ZJ = S.ZJ
 ZI = S.ZI
 ZLIKE = S.ZLIKE
 ZT = S.ZT
 PG_W = sql.PG_W
 PG_R = sql.PG_R
 TM_MASTER = Pg.TableModel{
  CacheName: TABLE + `_USERS_MASTER`,
  Fields: []Pg.FieldModel{
   {Key: `id`},
   {Key: `is_deleted`},
   {Key: `modified_at`},
   {Label: `E-Mail(s)`, Key: `emails`, CustomQuery: `emails_join(data)`, Type: `emails`, FormTooltip: `separate with comma`},
   {Label: `Phone`, Key: `phone`, Type: `phone`, FormHide: true},
   {Label: `Full Name`, Key: `full_name`},
  },
 }
 SELECT = TM_MASTER.Select()
}
func One_ByID(id string) M.SX {
 ram_key := ZT(id)
 query := ram_key + `
SELECT ` + SELECT + `
FROM ` + TABLE + ` x1
WHERE x1.id::TEXT = ` + Z(id)
 return PG_R.CQFirstMap(TABLE, ram_key, query)
}
func Search_ByQueryParams(qp *Pg.QueryParams) {
 qp.RamKey = ZT(qp.Term)
 if qp.Term != `` {
  qp.Where += ` AND (x1.data->>'name') LIKE ` + ZLIKE(qp.Term)
 }
 qp.From = `FROM ` + TABLE + ` x1`
 qp.OrderBy = `x1.id`
 qp.Select = SELECT
 qp.SearchQuery_ByConn(PG_W)

}
/* accessed through: {"order":["-col1","+col2"],"filter":{"is_deleted":false,"created_at":">isodate"},"limit":10,"offset":5}
this will retrieve record 6-15 order by col1 descending, col2 ascending, filtered by is_deleted=false and created_at > isodate
*/

If the example above too complex for you, you can also do manually, see gotro/D/Pg/_example for simpler example. The example above we create a query model, that query from a single table. If you need multiple table (join), you can extend the fields, something like this:

 {Label: `Admin`, Key: `admin`, CustomQuery: `x2.data->>'full_name'`},

And the query params something like this:

qp.From = `FROM ` + TABLE + ` x1 LEFT JOIN ` + mAdmin.TABLE + ` x2 ON (x1.data->>'admin_id') = x2.id::TEXT `

You can also do something like this:

func All_ByStartID_ByLimit_IsAsc_IsIncl(id string, limit int64, is_asc, is_incl bool) A.MSX { sign := S.IfElse(is_asc, `>`, `<`) + S.If(is_incl, `=`) ram_key := ZT(id, I.ToS(limit), sign) where := `` if id != `` { where = `AND x1.id ` + sign + Z(id) } query := ram_key + ` SELECT ` + SELECT + ` FROM ` + TABLE + ` x1 WHERE x1.is_deleted = false ` + where + ` ORDER BY x1.id ` + S.If(!is_asc, `DESC`) + ` LIMIT ` + I.ToS(limit) return PG_R.CQMapArray(table, ram_key, query) } ` // accessed through: {"limit":10} // this will retrieve last 10 records

Or query a single row:

func API_Backoffice_Form(rm *W.RequestModel) { rm.Ajax.SX = One_ByID(rm.Id) } // accessed through: {a:'form',id:'123'} // this will retreive all columns on this record

Or create a save/delete/restore function:

func API_Backoffice_SaveDeleteRestore(rm *W.RequestModel) { PG_W.DoTransaction(func(tx *Pg.Tx) string { dm := Pg.NewRow(tx, TABLE, rm) // NewPostlessData emails := rm.Posts.GetStr(`emails`) // rm is the requestModel, values provided by http req dm.Set_UserEmails(emails) // dm is the dataModel, row we want to update // we can call dm.Get* to retrieve old record values dm.SetStr(`full_name`) dm.UpsertRow() if !rm.Ajax.HasError() { dm.WipeUnwipe(rm.Action) } return rm.Ajax.LastError() }) } // accessed through: {a:'save',full_name:'foo',id:'1'} // update // if without id, it would insert

Then you can call them on a handler or package-internal function, something like:

func API_Backoffice_FormLimit(rm *W.RequestModel) { id := rm.Posts.GetStr(`id`) limit := rm.Posts.GetInt(`limit`) is_asc := rm.Posts.GetBool(`asc`) is_incl := rm.Posts.GetBool(`incl`) result := All_ByStartID_ByLimit_IsAsc_IsIncl(id, limit, is_asc, is_incl) rm.Ajax.Set(`result`, result) } func API_Backoffice_Search(rm *W.RequestModel) { qp := Pg.NewQueryParams(rm.Posts, &TM_MASTER) Search_ByQueryParams(qp) qp.ToMap(rm.Ajax) }

And call those two APIs function inside a handler something like this:

func PrepareVars(ctx *W.Context, title string) { user_id := ctx.Session.GetStr(`id`) rm = &W.RequestModel{ Actor: user_id, DbActor: user_id, Level: ctx.Session.SX, Ctx: ctx, } ctx.Title = title is_ajax := ctx.IsAjax() if is_ajax { rm.Ajax = NewAjaxResponse() } page := rm.Level.GetMSB(`page`) first_segment := ctx.FirstPath() // validate if this user may access this first segment // check their access level, if it's not ok, set rm.Ok to false // then render an error, something like this: /* if is_ajax { rm.Ajax.Error(sql.ERR_403_MUST_LOGIN_HIGHER) ctx.AppendJson(rm.Ajax.SX) return } ctx.Error(403, sql.ERR_403_MUST_LOGIN_HIGHER) return */ if !is_ajax { // render menu based on privilege } else { // prepare variables required for ajax response rm.Posts = ctx.Posts() rm.Action = rm.Posts.GetStr(`a`) id := rm.Posts.GetStr(`id`) rm.Id = S.IfElse(id == `0`, ``, id) } } func Users(ctx *W.Context) { rm := PrepareVars(ctx, `Users`) if !rm.Ok { return } if rm.IsAjax() { // handle ajax switch rm.Action { case `search`: // @API mUsers.API_Backoffice_Search(rm) case `form_limit`: // @API mUsers.API_Backoffice_FormLimit(rm) case `form`: // @API mUsers.API_Backoffice_Form(rm) case `save`, `delete`, `restore`: // @ffPI mUsers.API_Backoffice_SaveDeleteRestore(rm) default: // @API-END handler.ErrorHandler(rm.Ajax, rm.Action) } ctx.AppendJson(rm.Ajax.SX) return } locals := W.Ajax{SX: M.SX{ `title`: ctx.Title, }} qp := Pg.NewQueryParams(nil, &mUsers.TM_MASTER) mUsers.Search_ByQueryParams(qp) qp.ToMap(locals) ctx.Render(`backoffice/users`, locals.SX) }

Now that we're done creating the backend API server, all that's left is create the systemd service hello1.service:

[Unit] Description=My Hello1 Service After=network-online.target postgresql.service Wants=network-online.target systemd-networkd-wait-online.service [Service] Type=simple Restart=on-failure User=yourusername Group=users WorkingDirectory=/home/yourusername/web ExecStart=/home/yourusername/web/run_production.sh ExecStop=/usr/bin/killall Hello1 LimitNOFILE=2097152 LimitNPROC=65536 ProtectSystem=full NoNewPrivileges=true [Install] WantedBy=multi-user.target

Create the run_production.sh shell script

#!/usr/bin/env bash ofile=logs/access_`date +%F_%H%M%S`.log echo Logging into: `pwd`/$ofile unbuffer time ./Hello1 | tee $ofile

Then compile the binary (you: can also set the VERSION here, to make it production):

go build -ldflags " -X main.LISTEN_ADDR=:${SUB_PORT} " -o /tmp/Subscriber

Copy the binary, the script above, and whole public/ and views/ directory to the server /home/yourusername/web, copy the service file to the /usr/lib/systemd/system/ then reload the systemd service on the server:

sudo systemctl daemon-reload sudo systemctl enable hello1 sudo systemctl start hello1

you're good to go, you can check the service status using journalctl -f hello1
of course you can automate the hassle above using scp or rsync command.

Well, that's all for now, you can see the complete example on W/example-complex directory, if you have any question, you can contact me through telegram @kokizzu, for frontend stuff, I recommend to learn about VueJS or Weex for mobile.