Programming Rants: typesense

2023-04-26

GeoSearch Database Benchmark

So today we're gonna benchmark database that can store latitude-longitude (GPS coordinate), the benchmark spec is unbatched INSERT 100K records of (id, lat, long) tuple, search 200K times (or until deadline reached) 500 nearest point, id and the distance, move 100 points to another location 50 times, all benchmark done in 16 threads (so other 16 threads can be used by the database).

The contender that already benchmarked are:

Redis GEOSEARCH (2024: also add KeyDB 4 core)
PostgreSQL cube/earthdistance
Tarantool RTREE
TypeSense geosearch
MeiliSearch _geo

Other database attempted but failed because not truly redis-compatible: DragonFlyDB, Garnet, KVRocks

Here's the result for unbatched insert:

REDIS         9.4 sec, 10639.7 rps
KEYDB 1.3 sec, 76546.2 rps
POSTGRES    10.5 sec, 9523.7 rps
TARANTOOL    0.8 sec, 126801.9 rps
TYPESENSE 96.7 sec, 1023.8 rps
MEILISEARCH 365.0 sec,    271.2 rps

This benchmark is totally unfair for Meilisearch since, their API expect it to be batched, just like Clickhouse.

Next the 500 nearby point of interest search benchmark:

REDIS        30676 (15.3%) in 50.0 sec, 613.5 rps
KEYDB         7381   (3.7%) in 50.0 sec, 147.6 rps
POSTGRES      7778 (3.9%) in 50.0 sec, 155.6 rps
TARANTOOL   200000 (100.0%) in 35.0 sec, 5716.3 rps
TYPESENSE     2177 (1.1%) in 50.0 sec,   43.5 rps

PostgreSQL actually quite fast in Tarantool level if the distance ordering removed (random order, only fetch first 500), I believe the search scope is too wide (4km square) causing too many points to be sorted by distance (and database config is default, not tuned at all causing this slowdown). TypeSense cannot search more than 250 rows limit, so that result is for 250, MeiliSearch in this case always return 0 result, not sure what's wrong.

Last benchmark is about moving 10 points 50 times, so it need to update the index periodically, the results are:

REDIS        0.1 sec, 55088.4 rps
KEYDB    0.2 sec, 28926.9 rps
POSTGRES     0.6 sec,   7954.3 rps
TARANTOOL    0.0 sec, 137424.8 rps
TYPESENSE    9.7 sec,    515.2 rps
MEILISEARCH 8.8 sec,    569.7 rps

So I guess I'll use Tarantool for this case, since it's the fastest for geo datapoints with persistence. Other possible database to benchmarked in the future are: MySQL, CockroachDB, ElasticSearch, TiDB, CouchBase, PostGIS, but I'm not sure whether their index can beat Tarantool's. If you want to contribute to this benchmark you can create PR to hugedbbench repo.

2022-04-01

Georeplicable Architecture Tech Stacks

Since I got a use case for geo-replicable (multiple datacenters around the world), I need to create metrics and billing and log aggregator that survives even when multiple datacenter is down. So we need to create an architecture that the each cluster of databases in datacenter can give correct answer on their own, the layers are:

1. Cache Layer

2. Source of Immediate Truth (OLTP) Layer

3. Source of Event Truth Layer (either CDC or subscribable events or both)

4. Metrics and Analytics (OLAP) Layer

For single datacenter I might choose Tarantool or Aerospike (too bad if you want to store more than your RAM you must use paid version) or Redis (master-slave) for cache layer, especially if the DNS is smart enough to route the client's request to nearest datacenter.

Tarantool can also be source of immediate truth, but the problem is it should be master-slave replication so it could be fault tolerant, but it seems there might have to do manual intervention to fallback to promote slave node as master and reroute other slave to recognize new master?

So the other choice is either TiDB (for consistent use case, lower write rate, more complex query use case) or ScyllaDB (for partition tolerant use case, higher write rate). Both are good in terms of availability. TiDB's TiFlash also good for analytics use case.

For source of event truth, we can use RedPanda for pubsub/MQ use cases, and Debezium (that requires Kafka/RedPanda) for change data capture from ScyllaDB, or TiCDC for TiDB to stream to RedPanda.

Lastly for analytics, we can use Clickhouse, also to store all structured logs that can be easily queried, or can also be Loki. For metrics might aggregate from RedPanda using MaterializeIO (too bad that cluster is paid).

So.. what are the combinations possible?

1. Tarantool (manual counter for metrics) + Clickhouse (manual publish logs and analytics), this one good only for single location/datacenter, unless the clients are all can hit proper server location (like game servers, or smart CDNs)

2. same as #1 but with RedPanda if have multiple service, all logs and events published thru RedPanda

3. Aerospike/Tarantool/Redis (manual counter for metrics) + TiDB + TiFlash (analytics) + Loki for logs

4. Aerospike/Tarantool/Redis (manual counter for metrics) + TIDB + TiCDC + ClickHouse (for logs and analytics)

5. Aerospike/Tarantool/Redis (cache only) + TiDB + TiFlash (analytics) + TiCDC + MaterializeIO (for metrics) + Loki (logs)

6. Aerospike/Tarantool/Redis (cache only) + TiDB + TiCDC + Clickhouse (analytics and logs) + MaterializeIO (for metrics)

7. Aerospike/Tarantool/Redis (cache only) + ScyllaDB + Debezium + RedPanda + Clickhouse (analytics and logs) + MaterializeIO (for metrics)

for number #5-#7 you can remove the Aerospike/Tarantool part if no need to cache (consistency matters, and you have a very large cluster that can handle peaks).

wait, why don't you include full-text search use case? '__') ok, we can use Debezium that publish to ElasticSearch (might be overkill), or manual publish to TypeSense or MeiliSearch.

That's only for data part of the cluster, what about computation and presentation layer (backend/frontend)?

Backend of course I will use Golang or C#, frontend either Svelte (web only) or Flutter (mobile and web), Unity3D (game).

For object storage (if locally I would use MinIO) and CDN (CloudFlare for example), you can see my previous post.

So why you choose those?

1. Tarantool, one of the fastest in-mem, like 200K wps/qps but the cons I already mentioned above, the performance similar to Redis, but this one supports SQL.

2. Aerospike, also one of the fastest in-mem, like also 200K wps/qps last time I checked, also can do master-master replication, if I'm not mistaken, limited to 4 nodes for free verrsion, can set the replication factor, but see other cons I mentioned above.

3. TiDB, one newsql with automatic rebalance (one node died, and it would still works fast), can do around 30-50K single node last time I benchmarked, their benchmark mostly shows 30K-40K rps for writes, 120K rps for mixed read-write, 330K rps for read-only multi nodes benchmark, need more space? add more TiKV instance, need more compute? add more TiDB instance, need to do faster analytics queries? add TiFlash instance. But.. the bad part is, so many moving parts, you have to deploy TiDB (query engine), TiKV (storage), and PD (placement driver), also TiCDC and TiFlash if you need those too, also I'm not sure how it would perform at multi-DC use-cases. What's the managed alternative? AlloyDB-PostgreSQL (GCP) or TiDB-Cloud (AWS)

4. ScyllaDB, is faster version of Cassandra, most of benchmark can do 120K-400K insert and queries per second, one of the best database for mutli-DC, each keyspace can be set whether should replicate in which datacenter, how many the replication factor, consistency controlled in client-side, also can manage multiple view since 2 years ago (materialized view), so we don't have to create and maintain multiple table manually for each query pattern. What's the managed alternative? ScyllaDB-Cloud

5. RedPanda, is faster version of Kafka, last time I checked, one instance can receive 800K events per seconds, and publishes 4 millions events per second.

6. Clickhouse, is one of the best analytics database, can do a 1M batched ingestion per second in single node, can do complex queries really fast under a second (but depends on how big your data, but kind of query that if you did it in normal RDBMS would took minutes), the cons is one node can only handle 100 queries concurrently.

7. MaterializeIO, is like ksqldb but written in Rust, haven't checked the performance but they claimed they perform as fast as Redis.

What's other alternative? Yugabyte looks ok, especially the YCQL part that works like Cassandra/ScyllaDB. Especially yugabyte seems to be combining Redis, Postgres, and Cassandra in one deployment, but I like TiDB more because last time I checked, I need to config something to make it writable when one node died.

Proxy and load balancer? Caddy, Traefik, FabioLB (with Consul), or NATS (eg. 3 load balancer/custom api gateway that deployed in front, then it would serialize request to NATS inside DMZ, the worker/handler will receive that and return response that deserialized back to the loadbalancer/api gateway, that way load balancer doesn't need to know how exactly how many worker/handler, services can also communicate synchronously thru NATS without knowing each other service's IP address, and the worker/handler part can be scaled independently)

Deployment? Nomad, just rsync it to Jelastic, DockerCompose for dependencies (don't forget to bind the volume or your data will gone), Kubernetes

Tracker? maybe Pirsch

Why there's no __INSERT_DB_TYPE_HERE__? Graph database? because I rarely do recursive/graph queries, and especially I don't know which one that are best? Neo4J? DGraph? NebulaGraph? TigerGraph? AnzoGraph? TerminusDB? Age? JanusGraph? HugeGraph?

Have some other cool/high-performance tech stack suggestion? Chat with me at http://t.me/kokizzu (give proper intro tho, or I would think you are a crypto scam spammer XD)