2022-03-31

Getting started with Ansible

Ansible is one of the most popular server automation tool (other than Pulumi and Terraform), it's agentless and only need SSH access to run. It also can help you provision server or VM instances using cloud module. You can also provision vagrant/virtualbox/qemu/docker/lxc/containers inside an already running server using Ansible. Other competitor in this category includes Puppet and Chef  but they both require an agent to be installed inside the nodes that want to be controlled. To install Ansible in Ubuntu, run:

sudo add-apt-repository --yes --update ppa:ansible/ansible
sudo apt install ansible


You can put list of servers you want to control in /etc/ansible/hosts
or any other inventory file, something like this:

[DC01]
hostname.or.ip

[DC02]
hostname.or.ip
hostname.or.ip

[DC01:vars]
ansible_user=foo
ansible_pass=bar
# it's preferred to use ssh-keygen ssh-copy-id (passwordless login)
# and sudoers set to ALL=(ALL) NOPASSWD:ALL for the ansible user
# instead of hardcoding the username and password

If you put it on another file, you can use -i inventoryFileName to set the inventory file, also don't forget to check /etc/ansible/ansible.cfg default configs, for example you can set default inventory file to another file there.

Example for checking whether all server on DC02 up:

ansible DC02 -m ping

To run arbitary command on all server on DC01:

ansible DC01 -a "cat /etc/lsb-release" 
# add -f N, to run N forks in parallel

To create a set of commands, we can create a playbook file (which contains one or more play, and has one or more tasks), which is just a yaml file that contains specific structure, something like this:

---
  - name: make sure all tools installed # a play
    hosts: DC01 # or all or "*" or localhost
    become: yes # sudo
    tasks:
      - name: make sure micro installed # a task
        apt: # a module
          name: micro
          state: latest
      - name: make sure golang ppa added
        ansible.builtin.apt_repository:
          repo: deb http://ppa.launchpad.net/longsleep/golang-backports/ubuntu/ focal main
      - name: make sure latest golang installed
        apt: name=golang-1.18 state=present # absent to uninstall
      - name: make sure docker gpg key installed
        raw: 'curl -fsSL https://download.docker.com/linux/ubuntu/gpg | gpg --dearmor -o - > /usr/share/keyrings/docker-archive-keyring.gpg'
      - name: make sure docker-ce repo added
        ansible.builtin.apt_repository:
          repo: 'deb [arch=amd64 signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu impish stable'
      - name: make sure docker-ce installed
        apt: name=docker-ce

That playbook, for example if you save it on playbooks/ensure-tools-installed.yml, you can run it using ansible-playbook playbooks/ensure-tools-installed.yml

How to know list of modules and what's their key-value options? visit this site https://docs.ansible.com/ansible/2.9/modules/modules_by_category.html


2022-03-26

Move docker Data Directory to Another Partition

My first NVMe (i hate this brand S*****P****), sometimes hangs periodically (marking the filesystem readonly) so 50-100% CPU usage and nothing more can be done. So I have to move some parts of it into another NVMe drive, here's how I moved docker to another partition

sudo systemctl stop docker

then you can edit

sudo vim /etc/docker/daemon.json
# where /media/asd/nvme2 is your mount point to your other partition)
# add something like this
{
  "dns": ["8.8.8.8","1.1.1.1"],
  "data-root": "/media/asd/nvme2/docker"
}

Clone your partition to that new partition (no need to mkdir docker folder):

sudo rsync -aP --progress /var/lib/docker/ /media/asd/nvme2/docker
mv /var/lib/docker /var/lib/docker.backup

Then try to start again the docker service:

sudo systemctl start docker 
sudo systemctl status docker 

If it all works, you can delete the backup of original data directory.

2022-03-20

1 million Go goroutine vs C# task

Let's compare 1 million goroutines with 1 million tasks, which one more efficient in cpu usage and memory usage? The code is forked from kjpgit's techdemo

Name     UserCPU    SysCPU     AvgRSS       MaxRSS    Wall
c#_1t      38.96    0.68      582,951      636,136    1:00
c#_2t      88.33    0.95      623,956      820,620    1:02
c#_4t     142.86    1.09      687,365      814,028    1:03
c#_8t     235.80    1.71      669,882      820,704    1:05
c#_16t    434.76    4.01      734,545      771,240    1:08
c#_32t    717.39    4.81      720,235      769,888    1:11
go_1t      58.77    0.65    2,635,380    2,635,632    1:04
go_2t      64.48    0.71    2,639,206    2,642,752    1:00
go_4t      72.55    1.42    2,651,086    2,654,972    1:00
go_8t      80.87    2.82    2,641,664    2,643,392    1:00
go_16t     83.18    4.03    2,673,404    2,681,100    1:00
go_32t     86.65    4.30    2,645,494    2,657,580    1:00


Apparently usual result is as expected because all tasks/goroutine spawned first before processing, Go's scheduler more efficient in CPU usage, but C# runtime more efficient in memory usage, which is normal tho because goroutines  requires minimum 2KB overhead per goroutine, way higher cost than spawing a task. What if we increase to 10 millions tasks/goroutines, and let the spawning done in another task/goroutine, so if goroutine done it can restore back memory to the GC, here's the result:

Name     UserCPU    SysCPU     AvgRSS        MaxRSS    Wall
c#_1t      12.78    1.28    2,459,190     5,051,528    0:13
c#_2t      22.60    1.54    2,692,439     5,934,796    0:18
c#_4t      42.09    1.54    2,370,239     5,538,280    0:21
c#_8t      88.54    2.29    2,522,053     6,334,176    0:29
c#_16t    204.39    3.32    2,395,001     5,803,808    0:34
c#_32t    259.09    3.25    1,842,458     4,710,012    0:28
go_1t      13.97    0.97    4,514,200     6,151,088    0:14
go_2t      12.35    1.51    5,595,418     9,506,076    0:07
go_4t      22.09    2.40    6,394,162    12,517,848    0:07
go_8t      31.00    3.09    7,115,281    13,428,344    0:06
go_16t     40.32    3.52    7,126,851    13,764,940    0:06
go_32t     58.58    3.58    7,104,882    12,145,396    0:06

Result seems normal, high memory usage caused by a lot of goroutine spawned at the same time in different thread, not blocking the main thread, but after it's done, got collected by GC (previously it was time based exit condition, this time it would exit after all process done, since I move the sleep first before atomic increment). What if we lower back to 1 million but with same exit rule and spawning executed in different task/goroutine also checking completion done every 1s, here's the result:

Name    UserCPU SysCPU    AvgRSS     MaxRSS    Wall
c#_1t    1.18    0.16     328,134     511,652    0:02
c#_2t    2.18    0.22     294,608     554,488    0:02
c#_4t    3.19    0.20     305,336     554,064    0:02
c#_8t    7.77    0.31     292,281     530,368    0:02
c#_16t  12.33    0.25     304,352     569,460    0:02
c#_32t  37.90    1.25     337,837     684,252    0:03
go_1t    2.72    0.42   1,592,978   2,519,040    0:03
go_2t    3.04    0.47   1,852,084   2,637,532    0:03
go_4t    3.65    0.54   1,936,626   2,637,272    0:03
go_8t    3.27    0.59   1,768,540   2,655,208    0:02
go_16t   4.01    0.71   1,770,673   2,664,504    0:02
go_32t   4.96    0.72   1,770,354   2,669,244    0:02

The difference in processing time is nengligible, but the CPU usage and memory usage quite contrast. Next, let's try to spawn in burst (100K per second), so we add 1 second sleep every 100th task/goroutine, since it's not quite realistic even for DDOS'ed server to receive that much (unless the server finely tuned), here's the result:

Name    UserCPU  SysCPU    AvgRSS     MaxRSS     Wall
c#_1t     0.61    0.08    146,849     284,436    0:05
c#_2t     1.17    0.10    131,778     261,720    0:05
c#_4t     1.53    0.08    133,505     289,584    0:05
c#_8t     4.17    0.15    131,924     284,960    0:05
c#_16t   10.94    0.68    135,446     289,028    0:05
c#_32t   19.86    3.01    130,533     284,924    0:05
go_1t     1.84    0.24    731,872   1,317,796    0:06
go_2t     1.87    0.26    659,382   1,312,220    0:05
go_4t     2.00    0.30    661,296   1,322,152    0:05
go_8t     2.37    0.34    660,641   1,324,684    0:05
go_16t    2.82    0.39    660,225   1,323,932    0:05
go_32t    3.36    0.45    659,176   1,327,264    0:05

And for 5 millions:

Name    UserCPU    SysCPU    AvgRSS       MaxRSS    Wall
c#_1t     3.39    0.24      309,103      573,772    0:11
c#_2t     8.30    0.26      278,683      553,592    0:11
c#_4t    13.65    0.32      274,679      658,104    0:11
c#_8t    23.20    0.46      286,336      641,376    0:12
c#_16t   45.85    1.32      286,311      640,336    0:12
c#_32t   64.83    2.46      264,866      615,552    0:12
go_1t     6.25    0.50    1,397,434    2,629,936    0:13
go_2t     6.20    0.56    1,386,336    2,631,580    0:11
go_4t     7.52    0.65    1,410,523    2,625,308    0:11
go_8t     8.21    0.86    1,441,080    2,779,456    0:11
go_16t   11.17    0.96    1,436,220    2,687,908    0:11
go_32t   12.97    1.06    1,430,573    2,668,816    0:11

And for 25 millions:

Name    UserCPU   SysCPU    AvgRSS        MaxRSS    Wall
c#_1t     15.94    0.69     590,411    1,190,340    0:24
c#_2t     34.88    0.84     699,288    1,615,372    0:32
c#_4t     59.95    0.89     761,308    1,794,116    0:34
c#_8t    100.64    1.36     758,161    1,845,944    0:36
c#_16t   199.56    2.99     765,791    2,014,856    0:38
c#_32t   332.02    4.07     811,809    1,972,400    0:41
go_1t     21.76    0.71   2,846,565    4,413,968    0:29
go_2t     25.77    1.03   2,949,433    5,553,608    0:25
go_4t     28.74    1.24   2,920,447    5,800,088    0:24
go_8t     37.28    1.96   2,869,074    5,502,776    0:23
go_16t    43.46    2.67   2,987,114    5,769,356    0:24
go_32t    43.77    2.92   3,027,179    5,867,084    0:24

How about 25 millions and sleep per 200K?

Name    UserCPU   SysCPU      AvgRSS       MaxRSS    Wall
c#_1t     18.47    0.91      842,492    1,820,788    0:22
c#_2t     40.32    0.93    1,070,555    2,454,324    0:31
c#_4t     62.39    1.16    1,103,741    2,581,476    0:33
c#_8t    100.84    1.34    1,074,820    2,377,580    0:34
c#_16t   218.26    2.91    1,062,642    2,726,700    0:37
c#_32t   339.00    6.51    1,042,254    2,275,644    0:40
go_1t     22.61    0.88    3,474,195    5,071,944    0:27
go_2t     25.83    1.20    3,912,071    6,964,640    0:20
go_4t     37.98    1.68    4,180,188    7,392,800    0:20
go_8t     38.56    2.44    4,189,265    8,481,852    0:18
go_16t    44.49    3.19    4,187,142    8,483,236    0:18
go_32t    48.82    3.44    4,218,591    8,424,200    0:18

And lastly 25 millions and sleep per 400K?

Name    UserCPU    SysCPU    AvgRSS        MaxRSS    Wall
c#_1t     18.66    0.98    1,183,313    2,622,464    0:20
c#_2t     41.27    1.14    1,326,415    3,155,948    0:31
c#_4t     67.21    1.11    1,436,280    3,015,212    0:33
c#_8t    107.14    1.56    1,492,179    3,378,688    0:35
c#_16t   233.50    2.45    1,498,421    3,732,368    0:41
c#_32t   346.87    3.74    1,335,756    2,882,676    0:39
go_1t     24.13    0.82    4,048,937    5,099,220    0:26
go_2t     28.85    1.41    4,936,677    8,023,568    0:18
go_4t     31.51    1.95    5,193,653    9,537,080    0:14
go_8t     45.27    2.65    5,461,107    9,499,308    0:14
go_16t    53.43    3.19    5,183,009    9,476,084    0:14
go_32t    61.98    3.86    5,589,156   10,587,788    0:14

How to read results above? Wall = how much time need to complete, lower is better; AvgRSS/MaxRSS = average/max memory usage, lower is better; UserCPU = CPU time used in percent >100% means more than 1 full core compute time being used, lower is better. Versions used in this benchmark:

go version go1.17.6 linux/amd64
dotnet --version
6.0.201