2022-05-31

Getting started with SaltStack Configuration Management

SaltStack is one of the least popular Automation and Configuration Management tool, similar to Ansible, Chef, and Puppet (also stuff that not a CM, it's IaC tool but if you are skilled enough can be used as CM: Terraform and Pulumi). Unlike Ansible, it have agent that needs to be installed to the target servers. Unlike Chef and Puppet, it uses Python. Like Ansible, SaltStack can be used for infrastructure lifecycle management too. To install Salt, you need to run these commands:

sudo add-apt-repository ppa:saltstack/salt
sudo apt install salt-master salt-minion salt-cloud

# for minion:
sudo vim /etc/salt/minion
# change line "#master: salt" into:
# master: 127.0.0.1 # should be master's IP
# id: thisMachineId
sudo systemctl restart salt-minion


There's simpler way to install, bootstrap master and minion, it's using a bootstrap script:

curl -L https://bootstrap.saltstack.com -o install_salt.sh
sudo sh install_salt.sh -M # for master

sudo sh install_salt.sh    # for minion
# don't forget change /etc/salt/minion, and restart salt-minion


To the minion will send the key to master, to list all possible minion, run:

sudo salt-key -L

To accept/add minion, you can use:

sudo salt-key --accept=FROM_LIST_ABOVE
sudo salt-key -A # accept all unaccepted, -R reject all

# test connection to all minion
sudo salt '*' test.ping

# run a command 
sudo salt '*' cmd.run 'uptime'

If you don't want open port, you can also use salt-ssh, so it would work like Ansible:

pip install salt-ssh
# create /etc/salt/roster containing:
ID1:
  host: minionIp1
  user: root
  sudo: True
ID2_usually_hostname:
  host: minionIp2
  user: root
  sudo: True

To execute a command on all roster you can use salt-ssh:

salt-ssh '*' cmd.run 'hostname'

On SaltStack, there's 5 things that you need to know:
  1. Master (the controller) 
  2. Minion (the servers/machines being controlled)
  3. States (current state of servers)
  4. Modules (same like Ansible modules)
    1. Grains (facts/properties of machines, to gather facts call: salt-call --grains)
    2. Execution (execute action on machines, salt-call moduleX.functionX, for example: cmd.run 'cat /etc/hosts | grep 127. ; uptime ; lsblk', or user.list_users, or grains.ls -- for example to list all grains available properties, or grains.get ipv4_interfaces, or grains.get ipv4_interfaces:docker0)
    3. States (idempotent multiplatform for CM)
    4. Renderers (module that transform any format to Python dictionary)
    5. Pillars (user configuration properties)
  5. salt-call (primary command)
To run locally just add --local. To run on every host we can use salt '*' modulename.functionname. The wildcard can be changed with compound filtering argument, more detail and example here.
To start using salt with file mode, create a directory called salt, and create a top.sls file (it's just a yaml combined with jinja) which contains list of host, filters, and state module you want to call, usually it saved on /srv/ directory, containing:

base:
  '*': # every machine
    - requirements # state module to be called
    - statemodule0
dev1:
  'osrelease:*22.04*': # only machine with specific os version
     - match: grain
     - statemodule1
dev2:
  'os:MacOS': # only run on mac
     - match: grain
     - statemodule2/somesubmodule1
prod:
  'os:Pop and host:*.domain1': # only run on PopOS with tld domain1
     - match: grain
     - statemodule3
     - statemodule4
  'os:Pop': # this too will run if the machine match
     - match: grain
     - statemodule5

Then create a directory for each those items containing init.sls or create files for each those items with .sls extension. For example requirements.sls:

essential-packages: # ID = what this state module do
  pkg.installed: # module name, function name
    - pkgs:
      - bash
      - build-essentials
      - git
      - tmux
      - byobu
      - zsh
      - curl
      - htop
      - python-software-properties
      - software-properties-common
      - apache2-utils
  file.managed: 
    - name: /tmp/a/b
    - makedirs: True
    - user: root
    - group: wheel
    - mode: 644
    - source: salt://files/b # will copy ./files/b to machine
  file.managed:
    - name: /tmp/a/c
    - contents: # this will create a file with specific lines
      - line 1
      - line 2
  service.running:
    - name: myservice1
    - watch: # will restart the service if these changed
      - file: /etc/myservice.conf
      - file: /tmp/a/b
  file.append:
    - name: /tmp/a/c
    - text: 'some line' # will append to that file
  cmd.run:
    - name: /bin/someCmd1 param1; /bin/cmd2 --someflag2
    - onchanges:
      - file: /tmp/a/c # run cmd above if this file changed
  file.directory: # ensure directory created
    - name: /tmp/d
    - user: root
    - dirmode: 700
  archive.extracted: # extract from specific archive file
    - name: /tmp/e
    - source: https://somedomain/somefile.tgz
    - force: True
    - keep_source: True
    - clean: True

To apply run: salt-call state.apply requirements
Some other example, we can create a template with jinja and yaml combined, like this:

statemodule0:
  file.managed:

    - name: /tmp/myconf.cfg # will copy file based on jinja condition
    {% if '/usr/bin' in grains['pythonpath'] %}
    - source: salt://files/defaultpython.conf
    {% elif 'Pop' == grains['os'] %}
    - source: salt://files/popos.conf
    {% else %}
    - source: salt://files/unknown.conf
    {% endif %}
    - makedirs: True
  cmd.run:
    - name: echo
    - onchanges:
      - file: statemodule0 # refering statemodule0.file.managed.name

To create a python state module, you can create a file containing something like this:

#!py
def run():
  config = {}
  a_var = 'test1' # we can also do a loop, everything is dict/array
  config['create_file_{}'.format(a_var)] = {
    'file.managed': [
      {'name': '/tmp/{}'.format(a_var)},
      {'makedirs': True},
      {'contents': [
        'line1',
        'line2'
        ]
      },
    ],
  }
  return config
 
To include another state module, you can specify on statemodulename/init.sls, something like this:

include:
  - .statemodule2 # if this a folder, will run the init.sls inside
  - .statemodule3 # if this a file, will run statemodule3.sls

To run all state it you can call salt-call state.highstate or salt-call state.apply without any parameter.
It would execute 
top.sls file and the includes in order recursively.
To create a scheduled state, you can create a file containing something like this:

id_for_this:
  schedule.present:
    - name: highstate
    - run_on_start: True
    - function: state.highstate
    - minutes: 60
    - maxrunning: 1
    - enabled: True
    - returner: rawfile_json
    - splay: 600 


Full example can be something like this:

install nginx:
  pkg.install:
    - nginx

/etc/nginx/nginx.conf: # used as name
   file.managed:
     source: salt://_files/nginx.j2
     template: jinja
     require:
       - install nginx

run nginx:
  service.running:
    name: nginx
    enable: true
    watch: 
      - /etc/nginx/nginx.conf

Next, to create a pillar config, just create a normal sls file, containing something like this:

user1:
  active: true
  sudo: true
  ssh_keys:
    - ssh-rsa censored user1@domain1
nginx:
  server_name: 'foo.domain1'

To reference this on other salt file, you can use jinja something like this:

{% set sn = salt['pillar.get']('nginx:server_name') -%}
server {
  listen 443 ssl;
  server_name {{ sn }};
...

That's it for now, next if you want to learn more is to create your own executor module or other topics here.

2022-05-08

How to structure/layer your Golang Project (or whatever language you are using)

I've been maintaing a lot of other people's project, and see that most people blindly following a framework's structure without purpose. So I write this so that people can be convinced how to write a good directory structure or good layering on your application, especially for this case when creating a service.

It would be better to split your project to exactly 3 layers:
  1. presentation (handle only serialization/deserialization, and transport)
  2. business-logic (handles pure business logic, DTO goes here)
  3. persistence (handles records and it's persistence, DAO goes here)
It could be a package with bunch of sub-package, or per domain basis, or per user role basis. For example:

# monolith example

bin/ # should be added to .gitignore
presenter/ # de/serialization, transport, formatter goes here
  grpc.go
  restapi.go
  cli.go
  ...
businesslogic/ # pure business logic and INPUT/OUTPUT=DTO struct
  role1|role2|etc.go
  domain1|domain2|etc.go
  ...
models/ # all DAO/data access model goes here
  users.go
  schedules.go
  posts.go
  ...
deploy/ 
  Dockerfile
  docker-compose.yml
  deploy.sh
pkg/ # all 3rd party helpers/lib/wrapper goes here
  mysql/
  redpanda/
  aerospike/
  minio/

# Vertical-slice example

bin/
domain1/
  presenter1.go
  business1.go
  models1.go
domain2/
  presenter2.go
  business2.go
  models2.go
...
deploy/
pkg/

Also it's better to inject per function basis instead of whole struct/interface, something like this:

type LoginIn struct {
  Email string
  Password string
  // normally I embed CommonRequest object
}
type LoginOut struct {

  // normally I embed CommonResponse object with properties:
  SessionToken string
  Error string
  Success bool
}
type GuestDeps struct {
  GetUserByEmailPass func(string, string) (*User, error)
  // other injected dependencies, eg. S3 uploader, 3rd party libs
}
func (g *GuestDeps) Login(in *LoginIn) (out LoginOut) {
  // do validation
  // do retrieve from database
  // return proper object
}

So when you need to do testing, all you need is create a fake, either with counterfeiter (if you inject an interface instead of function) or manual, then check with autogold:

func TestLogin(t *testing.T) {
  t.Run(`fail_case`, func(t *testing.T){
    in := LoginIn{}
    deps := 
GuestDeps{
      GetUserByEmailPass: func(string,string) { return nil, errors.New("failed to retrieve from db") }
    }
    out := deps.Login(in)
    want := autogold.Want("fail_case",nil)
    want.Equal(t, out) // ^ update with go test -update 
  })
}

then on the main (real server implementation, you can put real dependency something like this:

rUser := userRepo.NewPgConn(conf.PgConnStr)
srv := httpRest.New( ... )
guest := GuestDeps{
  GetUserByEmailPass: rUser.GetUserByEmailPass,
  
DoFileUpload: s3client.DoFileUpload,
  ...
}
srv.Handle[LoginIn,LoginOut](LoginPath,guest.Login)
srv.Listen(conf.HostPort)

Why are we doing like this? because usually a framework that I ever used is either insanely overlayered or no clear separation between controller and business logic (it still handles transport, serialization, etc), and validation only happenned on outermost layer or sometimes half outside half inside the business logic (which can make it vulnerable), which when we create unit test, the programmer tempted to test whole http/grpc layer instead of pure business logic. This way we can also use another kind of serialization/transport layer without having to modify the business logic side. Imagine if you use 1 controller to handle the business logic, how much hassle it is if you have to switch framework because some reasons (framework no longer maintained, performance bottleneck in framework side, the framework doesn't provide proper middleware for some fragile telemetry, need to add other kind of serialization format or protocol, etc). But with this kind of layering, for example if we want to add grpc or json-rpc or command line or switching framework, or anything else, it's easy, just need to add a layer with proper serialization/transport then call the original business logic.
 
 
mermaid link (number 2-4 is our control)

Talk is cheap, show me the code example! gingorm1 or echogorm1 is the minimal example (you can always change the framework to fiber or default net/http or any other framework, an the orm to sqlc sqlx jet. But if you are all alone don't want to inject the database functions (which against clean-architecture, but this is most sensible way) and want to test directly against the database, you can check this example fiber1 or sveltefiber (without database). Note that those just example, I would not inject the database as a function (see fiber1 for example), I would directly depend and use dockertest for managed dependencies, and only use function injection for unmanaged dependency (3rd party). Some more complex example can be found here: street.

This is only from code maintainer's point of view, there are WORST practice that I found in the past when continuing other people's project:
  • table-based entity microservice, every table or set of tables has their own microservice, it's overly granular, where some should be coupled instead (eg. user, group, roles, permission -- this should be one service instead of 2-3 services)
  • MVC microservice, one microservice for each layer of the MVC and worse it's on different repository, eg. API service for X (V), API service for Y (V), webhook for X (C), adapter for third party Z (C/M), service for persistence (M), etc -- should be separated by domains/capability instead of by MVC layer, why? because if we implement one feature, that normally in monolith we only need to change 1 repository, but if the microservice separated by MVC layer, we have to modify 2-4 microservice when implementing 1 feature, have to start 2-4 service just to debug something, which doesn't make sense. It might make a bit sense if you are using monorepo, but without it, it's more pain than the benefit.
  • pub-sub channel-goroutine without persistence, this one is ok only if the request is discardable (all or nothing), but if it's very important (money for example) you should always persist every state, and would be better if there's a worker that progressing every state into next state so we don't have to fix manually.
There's also some GOOD things I found:
  • Start with auth, metrics, logs (show request id to user, and useful response), proper dependency injection, don't let this became too late that you have to fix everything later
  • Standard go test instead of custom test framework, because go test has proper tooling on the IDEs
  • Use worker for slow things, which is make sense, don't let user wait, unless the product guy want it all sync. Also send failures to slack or telegram or something
  • CRON pattern, runs a code every specific time, this is good when you need a scheduled task (eg. billing, reminder, etc)
  • Query-job publish task, this pattern separate CRON from time dependency (query from db, get list of items to be processed, publish to MQ), so the publish task can be triggered independently (eg. if there's bug in only 1 item), and regardless of time, and the workers will pick up any kind of work that are late.
  • Separating proxy (transport/serialization) and processor (business-logic/persistence), this have a really good benefit in terms of scalability of small requests (not for upload/download big files), where we put generic reverse proxy, push it to two-way pub-sub, then return back the response. For example, we create a rest proxy, grpc proxy, json-rpc proxy, all those 3 will push to NATS, then worker/processor will process the request and return proper response, this works like lambda, so programmer only need to focus building the worker/processor part instead of generic serialization/transport, all the generic auth, logging, request id, metrics, can be handled by the proxy, programmer only need to focus on business logic.
    the chart is something like this:

    generic proxy <--> NATS <--> worker/processor <--> databases/3rd party

    this way we can also scale independently either with monolith or microservice. Service mesh? no! service bus/star is the way :3

2022-05-07

Getting started with Trino

Trino is a distributed query engine, that allows you to JOIN from multiple datasources (databases like mysql, postgresql, bigquery, cassandra, mongodb, redis, prometheus, elasticsearch, csv file, google sheets, s3, etc). It's like Clickhouse but without high-tech (merge-tree) storage ability, so it cannot do blazing fast analytics query like in Clickhouse, but it can be as fast as the connected database that it uses, eg. if it uses Clickhouse connected, then it can be as fast as Clickhouse. It was developed by Facebook (previously named Presto). List of database connectors can be seen here. To use Trino, you can use dockerized version or manual:

# Docker
docker run -d
 -p 8080:8080 --name trino1 trinodb/trino
# web UI only for monitoring, use random username 

docker exec -it trino1 trino

# Ubuntu 22.04
java --version
python3 --version
# download and extract from https://trino.io/download.html
mkdir 
./trino-server-379/etc
cd trino-server-379
SRCURL=https://raw.githubusercontent.com/trinodb/trino-the-definitive-guide/master/single-installation/etc
wget -c $SRCURL/jvm.config
wget -c $SRCURL/log.properties
wget -c $SRCURL/node.properties
echo '
coordinator=true
node-scheduler.include-coordinator=true
http-server.http.port=8081
query.max-memory=5GB
query.max-memory-per-node=1GB
discovery.uri=http://127.0.0.1:8081
' > config.properties
echo '

node.data-dir=/tmp/
' >> node.properties
mkdir catalog
echo '
connector.name=cassandra
cassandra.contact-points=127.0.0.1
# more here https://trino.io/docs/current/connector/cassandra.html
' > catalog/localscylla.properties 
cd ..
python3 ./bin/launcher.py run # to run in background: start

# CLI/Client
EXELOC=/usr/bin
/trino
curl -O $EXELOC https://repo1.maven.org/maven2/io/trino/trino-cli/379/trino-cli-379-executable.jar
chmod a+x $EXELOC
trino --server http://localhost:8081

These are the list of commands can be used in trino (other than standard SQL):

SHOW CATALOGS;
SHOW SCHEMAS FROM/IN __CATALOG__; # eg. localscylla
SHOW TABLES FROM/IN __CATALOG__.__SCHEMA__;
DESCRIBE __CATALOG__.__SCHEMA__.__TABLE__;
EXPLAIN SELECT * FROM __CATALOG__.__SCHEMA__.__TABLE__;

That's it, you can add more databases connection by creating more etc/catalog/*.properties file with proper configuration (username, password, port, etc).