programming: the action or process of writing computer programs. | rants: speak or shout at length in a wild, [im]passioned way.
2023-10-14
Benchmarking docker-volume vs mount-fs vs tmpfs
version: "3.7"
services:
web:
image: ubuntu
command: "sleep 3600"
volumes:
- ./temp1:/temp1 # mountfs
- temp2:/temp2 # dockvol
- temp3:/temp3 # tmpfs
volumes:
temp2:
temp3:
driver_opts:
type: tmpfs
device: tmpfs
The docker compose file is on the sibling directory as data-root of docker to ensure using the same SSD. First benchmark we're gonna clone from this repository, then run copy, create 100 small files, then do 2 sequential write (small and large), here's the result of those (some steps not pasted below, eg. removing file when running benchmark twice for example):
apt install git g++ make time
alias time='/usr/bin/time -f "\nCPU: %Us\tReal: %es\tRAM: %MKB"'
cd /temp3 # tmpfs
git clone https://github.com/nikolausmayer/file-IO-benchmark.git
### copy small files
time cp -R /temp3/file-IO-benchmark /temp2 # dockvol
CPU: 0.00s Real: 1.02s RAM: 2048KB
time cp -R /temp3/file-IO-benchmark /temp1 # bindfs
CPU: 0.00s Real: 1.00s RAM: 2048KB
### create 100 x 10MB files
cd /temp3/file*
time make data # tmpfs
CPU: 0.41s Real: 0.91s RAM: 3072KB
cd /temp2/file*
time make data # dockvol
CPU: 0.44s Real: 1.94s RAM: 2816KB
cd /temp1/file*
time make data # mountfs
CPU: 0.51s Real: 1.83s RAM: 2816KB
### compile
cd /temp3/file*
time make # tmpfs
CPU: 2.93s Real: 3.23s RAM: 236640KB
cd /temp2/file*
time make # dockvol
CPU: 2.94s Real: 3.22s RAM: 236584KB
cd /temp1/file*
time make # mountfs
CPU: 2.89s Real: 3.13s RAM: 236300KB
### sequential small
cd /temp3 # tmpfs
time dd if=/dev/zero of=./test.img count=10 bs=200M
2097152000 bytes (2.1 GB, 2.0 GiB) copied, 0.910784 s, 2.3 GB/s
cd /temp2 # dockvol
time dd if=/dev/zero of=./test.img count=10 bs=200M
2097152000 bytes (2.1 GB, 2.0 GiB) copied, 2.26261 s, 927 MB/s
cd /temp1 # mountfs
time dd if=/dev/zero of=./test.img count=10 bs=200M
2097152000 bytes (2.1 GB, 2.0 GiB) copied, 2.46954 s, 849 MB/s
### sequential large
cd /temp3 # tmpfs
time dd if=/dev/zero of=./test.img count=10 bs=1G
10737418240 bytes (11 GB, 10 GiB) copied, 4.95956 s, 2.2 GB/s
cd /temp2 # dockvol
time dd if=/dev/zero of=./test.img count=10 bs=1G
10737418240 bytes (11 GB, 10 GiB) copied, 81.8511 s, 131 MB/s
10737418240 bytes (11 GB, 10 GiB) copied, 44.2367 s, 243 MB/s
# ^ running twice because I'm not sure why it's so slow
cd /temp1 # mountfs
time dd if=/dev/zero of=./test.img count=10 bs=1G
10737418240 bytes (11 GB, 10 GiB) copied, 12.7516 s, 842 MB/s
The conclusion is, docker volume is a bit faster (+10%) for sequential small, but significantly slower (-72% to -84%) for large sequential files compared to bind/mount-fs, for the other cases seems there's no noticeable difference. I always prefer bind/mount-fs over docker volume because of safety, for example if you accidentally run docker volume rm $(docker volume ls -q) this would delete all your docker volume (I did this multiple times on my own dev PC), also you can easily backup/rsync/copy/manage files if using bind/mount-fs. For other cases, that you don't care whether losing files or not and need high performance (as long as your ram is enough), just use tmpfs.
2023-10-07
NATS: at-most once Queue / simpler networking
As you might already know in the past article, I'm a huge fan of NATS, NATS is one of the fastest one at-most-once delivery non-persistent message broker. Unlike rabbitmq/lavinmq/kafka/redpanda, NATS is not a persistent queue, the persistent version is called NATS-Jetstream (but it's better to use rabbitmq/lavinmq/kafka/redpanda since they are more popular and have more integration than NATS-Jetstream, or use cloud provider's like GooglePubSub/AmazonSQS.
Features of NATS:
1. embeddable, you can embed nats directly on go application, so don't have to run a dedicated instance
ns, err := server.NewServer(opts)
go ns.Start()
if !ns.ReadyForConnections(4 * time.Second) {
log.Fatal("not ready for connections")
}
nc, err := nats.Connect(ns.ClientURL())
2. high performance, old benchmark (not apple-to-apple because the rest mostly have persistence (at -least-once delivery) by default)
3. wildcard topic, topic can contain wildcard, so you can subscribe topic like foo.*.bar (0-9a-zA-Z, separate with dot . for subtopic)
4. autoreconnect (use Reconnect: -1), unlike some amqp client-library that have to handle reconnection manually
5. built in top-like monitoring nats-top -s serverHostname -m port
Gotchas
Since NATS does not support persistence, if there's no subscriber, message will lost (unlike other message queue), so it behaves like Redis PubSub, not like Redis Queue, just the difference is Redis PubSub is fan-out/broadcast only, while nats can do both broadcast or non-persistent queue.
Use Case of NATS
1. publish/subscribe, for example to give signal to another services. this assume broadcast, since there's no queue.
Common pattern for this for example, there is API for List items changed after certain updatedAt, and the service A want to give signal to another service B that there's new/updated items, so A just need to broadcast this message to NATS, and any subscriber of that topic can get the signal to fetch from A when getting a signal, and periodically as fallback.
// on publisher
err := nc.Publish(strTopic, bytMsg)
// on subscsriber
_, err := nc.Subscribe(strTopic, func(m *nats.Msg) {
_ = m.Subject
_ = m.Data
})
// sync version
sub, err := nc.SubscribeSync(strTopic)
for {
msg, err := sub.NextMsg(5 * time.Second)
if err != nil { // nats.ErrTimeout if no message
break
}
_ = msg.Subject
_ = msg.Data
}
2. request/reply, for example to load-balance requests/autoscaling, so you can deploy anywhere, and the "worker" will catch up
we can use QueueSubscribe with same queueName to make sure only 1 worker handling per message. if you have 2 different queueName, the same message will be processed by 2 different worker that subscribe to the different queueName.
// on requester
msg, err := nc.Request(strTopic, bytMsg, 5*time.Second)
_ = msg == []byte("reply"} // from worker/responder
// on worker/responder
_, err := nc.QueueSubscribe(strTopic, queueName, func(m *nats.Msg) {
_ = m.Subject
_ = m.Data
m.Respond([]byte("reply")) // send back to requester
})
You can see more examples here, and example how to add otel trace on NATS here, and how to create mtls here.