2022-09-08

Getting started with InfluxDB

Today we're gonna learn about InfluxDB, a time series database that already been a standard for open source metric storage, a part of TICK stack (telegraf -- a metric collector like datadog-agent/fluentd, influx -- the database, chronograf -- visualizer like grafana, kapacitor -- an alert manager like prometheus alert-manager also for ETL). To install influx use this steps from this link (there's also docker). What's the cons of influx compared to other solution? Datadog is (very very) expensive, Prometheus seems too kube-oriented, Influx open source doesn't support HA and clustering, that's why I would rather use Clickhouse for time series / log collection (as data sink) and aggregate those logs as metrics (materialized view) and copy periodically to faster database like Tarantool. Influx have 2 query syntax, one is SQL-like called InfluxQL and other one Javascript+FP-like called Flux (it has |> pipe like most FP). Here's the example of docker and the query language:

docker run -d -e INFLUXDB_ADMIN_USER:user1 -e INFLUXDB_ADMIN_PASSWORD:pass1 --name influxdb1 influxdb

docker exec -it influxdb1 influx -host 127.0.0.1 -port 8086 -username user1 -password pass1

show databases
create database db1
use db1

-- show all tables (=measurements)

show measurements

-- tag/index always string, field can be float, int, bool, etc

insert into table1,tag1=a,tag2=b field1=1,field2=2

-- if not set, default time is in nanosecond of insert time

select * from "table1"
select * from table1 where tag1='a'

-- describe columns from all table:

show tag keys
show field keys from "table1"

-- select distinct tag from table1
show tag values from "table1" with key in ("tag1")

-- automatically delete after N days, default 0 = never delete
-- shard by N hours, default 168 hours (a week)
show retention policies

-- like partition in other database
show shards
show shard groups

I guess that's it what you need to know to get started with InfluxDB. If you're looking one that comparable with Clickhouse, you can check TDEngine or TiFlash.

2022-07-28

Techempower Framework Benchmark Round 21

 The result for Techempower framework benchmark round 21 is out, as usual the most important benchmark is the update and multi query benchmark:

The top rankers are Rust, JS (cheating), Java, C++, C#, PHP, C, Scala, Kotlin, and Go. For multiple queries:


Top rankers are Rust, Java, JS (cheating), Scala, Kotlin, C++, C#, and Go. These benchmark shows how efficient their database driver (which mostly the biggest bottleneck), and how much overhead the framework of each language (including serialization, alloc/GC, async-I/O, etc).

For memory usage and CPU utilization you can check here https://ajdust.github.io/tfbvis/?testrun=Citrine_started2022-06-30_edd8ab2e-018b-4041-92ce-03e5317d35ea&testtype=update&round=false

2022-06-07

How to profile your Golang Fiber server

Usually you need to load test your webserver to measure where's the memory leak or where's the bottleneck, and Golang already provided tool to do that, called pprof. What you need to do is depends on your framework that you use, but it's all similar, most framework already have a middleware that you can import and use, for example in fiber there's pprof middleware, to use:

// import
  "github.com/gofiber/fiber/v2/middleware/pprof"

// use
  app.Use(pprof.New())

It would create a route called /debug/pprof that you can use, just start the server, then open that path. To profile or check heap you just need to click profile/heap link, it would wait for around 10 seconds, meanwhile it waits, you must hit other endpoints to generate traffic/function calls. After 10 seconds, it would show a download dialog to save your cpu profile or heap profile. From that file, you can run a command similar to gops, for example if you want to generate svg or generate web that shows your profiling:

pprof -web /tmp/profile # or
pprof -svg /tmp/profile # <-- file that you just downloaded

It would generate something like this:



So you can find out which function that took most of the CPU time (or if it's heap profile, which function that generates/allocate most memory usage. In my case the bottleneck is the default built-in pretty logger, it limits the number of requests it can only handle to ~9K rps for concurrency 255 on database write benchmark, that if we remove built-in logging and replace with zerolog, it can handle ~57K rps for same benchmark.