2023-04-26

GeoSearch Database Benchmark

So today we're gonna benchmark database that can store latitude-longitude (GPS coordinate), the benchmark spec is unbatched INSERT 100K records of (id, lat, long) tuple, search 200K times (or until deadline reached) 500 nearest point, id and the distance, move 100 points to another location 50 times, all benchmark done in 16 threads (so other 16 threads can be used by the database).

The contender that already benchmarked are:

  1. Redis GEOSEARCH (2024: also add KeyDB 4 core)
  2. PostgreSQL cube/earthdistance
  3. Tarantool RTREE
  4. TypeSense geosearch
  5. MeiliSearch _geo

Other database attempted but failed because not truly redis-compatible: DragonFlyDB, Garnet, KVRocks

Here's the result for unbatched insert:

REDIS         9.4 sec,  10639.7 rps
KEYDB         1.3 sec,  76546.2 rps
POSTGRES     10.5 sec,   9523.7 rps
TARANTOOL     0.8 sec, 126801.9 rps
TYPESENSE    96.7 sec,   1023.8 rps
MEILISEARCH 365.0 sec,    271.2 rps


This benchmark is totally unfair for Meilisearch since, their API expect it to be batched, just like Clickhouse.

Next the 500 nearby point of interest search benchmark:

REDIS        30676  (15.3%) in 50.0 sec,  613.5 rps
KEYDB         7381   (3.7%) in 50.0 sec,  147.6 rps
POSTGRES      7778   (3.9%) in 50.0 sec,  155.6 rps
TARANTOOL   200000 (100.0%) in 35.0 sec, 5716.3 rps
TYPESENSE     2177   (1.1%) in 50.0 sec,   43.5 rps


PostgreSQL actually quite fast in Tarantool level if the distance ordering removed (random order, only fetch first 500), I believe the search scope is too wide (4km square) causing too many points to be sorted by distance (and database config is default, not tuned at all causing this slowdown). TypeSense cannot search more than 250 rows limit, so that result is for 250, MeiliSearch in this case always return 0 result, not sure what's wrong.

Last benchmark is about moving 10 points 50 times, so it need to update the index periodically, the results are:

REDIS        0.1 sec,  55088.4 rps
KEYDB        0.2 sec,  28926.9 rps
POSTGRES     0.6 sec,   7954.3 rps
TARANTOOL    0.0 sec, 137424.8 rps
TYPESENSE    9.7 sec,    515.2 rps
MEILISEARCH  8.8 sec,    569.7 rps

So I guess I'll use Tarantool for this case, since it's the fastest for geo datapoints with persistence. Other possible database to benchmarked in the future are: MySQL, CockroachDB, ElasticSearch, TiDB, CouchBase, PostGIS, but I'm not sure whether their index can beat Tarantool's. If you want to contribute to this benchmark you can create PR to hugedbbench repo.


No comments :

Post a Comment

THINK: is it True? is it Helpful? is it Inspiring? is it Necessary? is it Kind?