2024-02-20

Dump/export Cassandra/BigQuery tables and import to Clickhouse

Today we're gonna dump Cassandra table and put it to Clickhouse. Cassandra is columnar database but use as OLTP since it have really good distributed capability (customizable replication factor, multi-cluster/region, clustered/partitioned by default -- so good for multitenant applications), but if we need to do analytics queries or some complex query, it became super sucks, even with ScyllaDB's materialized view (which only good for recap/summary). To dump Cassandra database, all you need to do just construct a query and use dsbulk, something like this:

./dsbulk unload -delim '|' -k "KEYSPACE1" \
   -query "SELECT col1,col2,col3 FROM table1" -c csv \
   -u 'USERNAME1' -p 'PASSWORD1' \
   -b secure-bundle.zip | tr '\\' '"' |
    gzip -9 > table1_dump_YYYYMMDD.csv.gz ;


tr command above used to unescape backslash, since dsbulk export csv not in proper format (\" not ""), after than you can just restore it by running something like this:

CREATE TABLE table1 (
    col1 String,
    col2 Int64,
    col3 UUID,
) ENGINE = ReplacingMergeTree()
ORDER BY (col1, col2);

SET format_csv_delimiter = '|';
SET input_format_csv_skip_first_lines = 1;

INSERT INTO table1
FROM INFILE 'table1_dump_YYYYMMDD.csv.gz'
FORMAT CSV;


BigQuery

 Similar to Clickhouse, BigQuery is one of the best analytical engine (because of unlimited compute and massively parallel storage), but it comes with cost, improper partitioning/clustering (even with proper one, because it's limited to only 1 column unlike Clickhouse that can do more) with large table will do a huge scan ($6.25 per TiB) and a lot of compute slot, if combined with materialized view or periodic query on cron, it would definitely kill your wallet. To dump from BigQuery all you need to do just create GCS (Google Cloud Storage) bucket then run some query something like this:

EXPORT DATA
  OPTIONS (
    uri = 'gs://
BUCKET1/table2_dump/1-*.parquet',
    format = 'PARQUET',
    overwrite = true
    --, compression = 'GZIP' -- import failed: ZLIB_INFLATE_FAILED
  )
AS (
  SELECT * FROM `dataset1.table2`
);

-- it's better to create snapshot table
-- if you do WHERE filter on above query, eg.
CREATE TABLE dataset1.table2_filtered_snapshot AS
  SELECT * FROM `
dataset1.table2` WHERE col1 = 'yourFilter';

Not using compression because it's failed to import, not sure why. The parquet files will be shown on your bucket, click on "Remove public access prevention", and allow it to be publicly available with gcloud command:

gcloud storage buckets add-iam-policy-binding gs://BUCKET1 --member=allUsers --role=roles/storage.objectViewer
# remove-iam-policy-binding to undo this

 

 Then just restore it:

CREATE TABLE table2 (
          Col1 String,
          Col2 DateTime,
          Col3 Int32
) ENGINE = ReplacingMergeTree()
ORDER BY (Col1, Col2, Col3);

SET parallel_distributed_insert_select = 1;

INSERT INTO table2
SELECT Col1, Col2, Col3
FROM s3Cluster(
    'default',
    'https://storage.googleapis.com/BUCKET1/table2_dump/1-*.parquet',
    '', -- s3 access id, remove or leave empty if public
    '' -- s3 secret key,
remove or leave empty if public
);


Writing UDF for Clickhouse using Golang

Today we're going to create an UDF (User-defined Function) in Golang that can be run inside Clickhouse query, this function will parse uuid v1 and return timestamp of it since Clickhouse doesn't have this function for now. Inspired from the python version with TabSeparated delimiter (since it's easiest to parse), UDF in Clickhouse will read line by line (each row is each line, and each text separated with tab is each column/cell value):

package main
import (
    "bufio"
    "encoding/binary"
    "encoding/hex"
    "fmt"
    "os"
    "strings"
    "time"
)
func main() {
    scanner := bufio.NewScanner(os.Stdin)
    scanner.Split(bufio.ScanLines)
    for scanner.Scan() {
        id, _ := FromString(scanner.Text())
        fmt.Println(id.Time())
    }
}
func (me UUID) Nanoseconds() int64 {
    time_low := int64(binary.BigEndian.Uint32(me[0:4]))
    time_mid := int64(binary.BigEndian.Uint16(me[4:6]))
    time_hi := int64((binary.BigEndian.Uint16(me[6:8]) & 0x0fff))
    return int64((((time_low) + (time_mid << 32) + (time_hi << 48)) - epochStart) * 100)
}
func (me UUID) Time() time.Time {
    nsec := me.Nanoseconds()
    return time.Unix(nsec/1e9, nsec%1e9).UTC()
}
// code below Copyright (C) 2013 by Maxim Bublis <b@codemonkey.ru>
// see https://github.com/satori/go.uuid
// Difference in 100-nanosecond intervals between
// UUID epoch (October 15, 1582) and Unix epoch (January 1, 1970).
const epochStart = 122192928000000000
// UUID representation compliant with specification
// described in RFC 4122.
type UUID [16]byte
// FromString returns UUID parsed from string input.
// Following formats are supported:
// "6ba7b810-9dad-11d1-80b4-00c04fd430c8",
// "{6ba7b810-9dad-11d1-80b4-00c04fd430c8}",
// "urn:uuid:6ba7b810-9dad-11d1-80b4-00c04fd430c8"
func FromString(input string) (u UUID, err error) {
    s := strings.Replace(input, "-", "", -1)
    if len(s) == 41 && s[:9] == "urn:uuid:" {
        s = s[9:]
    } else if len(s) == 34 && s[0] == '{' && s[33] == '}' {
        s = s[1:33]
    }
    if len(s) != 32 {
        err = fmt.Errorf("uuid: invalid UUID string: %s", input)
        return
    }
    b := []byte(s)
    _, err = hex.Decode(u[:], b)
    return
}
// Returns canonical string representation of UUID:
// xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx.
func (u UUID) String() string {
    return fmt.Sprintf("%x-%x-%x-%x-%x",
        u[:4], u[4:6], u[6:8], u[8:10], u[10:])
}


Compile and put it with proper owner and permission on /var/lib/clickhouse/user_scripts/uuid2timestr and create /etc/clickhouse-server/uuid2timestr_function.xml (must be have proper suffix) containing:

<functions>
    <function>
        <type>executable</type>
        <name>uuid2timestr</name>
        <return_type>
String</return_type>
        <argument>
            <type>String</type>
        </argument>
        <format>TabSeparated</format>
        <command>uuid2timestr</command>
        <lifetime>0</lifetime>
    </function>
</functions>


after that you can restart Clickhouse (sudo systemctl restart clickhouse-server or sudo clickhouse restart) depends on how you install it (apt or binary setup).

Usage


to make sure it's loaded, you can just find this line on the log:

<Trace> ExternalUserDefinedExecutableFunctionsLoader: Loading config file '/etc/clickhouse-server/uuid2timestr_function.xml

then just run a query using that function:

SELECT uuid2timestr('51038948-97ea-11ee-b7e0-52de156a77d8')

┌─uuid2timestr('51038948-97ea-11ee-b7e0-52de156a77d8')─┐
│ 2023-12-11 05:58:33.2391752 +0000 UTC                │
└──────────────────────────────────────────────────────┘

2024-01-31

OLLAMA with AMD GPU (ROCm)

Today we're gonna test ollama (just like previous article) with AMD GPU, to do this you'll need to run docker, for example using this docker compose file:

version: "3.7"

services:
  ollama:
    container_name: ollama
    image: ollama/ollama:0.1.22-rocm
    environment:
      HSA_OVERRIDE_GFX_VERSION: 10.3.0 # only if you are using 6600XT
    volumes:
      - /usr/share/ollama/.ollama:/root/.ollama # reuse existing model
      #- ./etc__resolv.conf:/etc/resolv.conf # if your dns sucks
    devices:
      - /dev/dri
      - /dev/kfd
    restart: unless-stopped

  ollama-webui:
    image: ghcr.io/ollama-webui/ollama-webui:main
    container_name: ollama-webui
    ports:
      - "3122:8080"
    volumes:
      - ./ollama-webui:/app/backend/data
    environment:
      - 'OLLAMA_API_BASE_URL=http://ollama:11434/api'
    restart: unless-stopped
 

To run it you just need to execute docker exec -it `docker ps | grep ollama/ollama | cut -f 1 -d ' '` bash

Make sure that you have already have ROCm installed

$ dpkg -l | grep rocm | cut -d ' ' -f 3
rocm-cmake
rocm-core
rocm-device-libs
rocm-hip-libraries
rocm-hip-runtime
rocm-hip-runtime-dev
rocm-hip-sdk
rocm-language-runtime
rocm-llvm
rocm-ocl-icd
rocm-opencl
rocm-opencl-runtime
rocm-smi-lib
rocminfo
$ cat /etc/apt/sources.list.d/* | grep -i 'rocm'
deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/amdgpu/6.0/ubuntu jammy main
deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/6.0 jammy main
deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/6.0 jammy main


For example in previous article the same statement here would took around 60s, but using GPU, only took 20-30s (3x-2x faster):

time ollama run codellama 'show me inplace mergesort using golang'

Or from outside: 

time docker exec -it `docker ps | grep ollama/ollama | cut -f 1 -d ' '` ollama run codellama 'show me inplace mergesort using golang'

 

Here is an example of how to implement an in-place merge sort algorithm in Go:
```
package main

import (
        "fmt"
)

func main() {
        // Test data
        arr := []int{5, 2, 8, 3, 1, 6, 4}

        // Sort the array using an in-place merge sort algorithm
        inPlaceMergeSort(arr)

        fmt.Println("Sorted array:", arr)
}

func inPlaceMergeSort(arr []int) {
        if len(arr) <= 1 {
                return
        }

        // Split the array into two halves
        mid := len(arr) / 2
        left := arr[:mid]
        right := arr[mid:]

        // Recursively sort each half
        inPlaceMergeSort(left)
        inPlaceMergeSort(right)

        // Merge the two sorted halves into a single array
        mergeInPlace(arr, left, right)
}

func mergeInPlace(arr []int, left, right []int) {
        i := 0
        j := 0
        k := 0

        for i < len(left) && j < len(right) {
                if left[i] <= right[j] {
                        arr[k] = left[i]
                        i++
                } else {
                        arr[k] = right[j]
                        j++
                }
                k++
        }

        // Copy the remaining elements from either left or right array to arr
        for i < len(left) {
                arr[k] = left[i]
                i++
                k++
        }

        for j < len(right) {
                arr[k] = right[j]
                j++
                k++
        }
}
```
This implementation of in-place merge sort uses a divide and conquer approach to sort the array. It first splits the array into two halves, sorts
each half recursively using itself, and then merges the two sorted halves into a single array. The `mergeInPlace` function is used to merge the two
sorted halves in-place, without requiring any additional memory.


real    0m30.528s
CPU: 0.02s      Real: 21.07s    RAM: 25088KB

NOTE: the answer from codellama above are wrong, since it's not in-place merge sort, also even so it's just normal merge sort using slice would overwrite the underlaying array causing wrong result.

You can also visit http://localhost:3122 for web UI.

2023-12-22

Benchmarking LLM models

Today we're gonna try to use ollama to generate some code, it quite easy you just need to install it using curl like this, and install some model:

curl https://ollama.ai/install.sh | sh

# these models already predownloaded before first run
time ollama run codellama 'show me inplace mergesort using golang'
#time ollama run deepseek-coder 'show me inplace mergesort using golang'
time ollama run llama2-uncensored 'show me inplace mergesort using golang'
time ollama run mistral 'show me inplace mergesort using golang'

#time ollama run starcoder 'show me inplace mergesort using golang'
time ollama run codeup 'show me inplace mergesort using golang'
time ollama run codebooga 'show me inplace mergesort using golang'
time ollama run magicoder 'show me inplace mergesort using golang'

ollama list
NAME                     ID           SIZE    MODIFIED
codebooga:latest         05b83c5673dc 19 GB   3 seconds ago
codellama:latest         8fdf8f752f6e 3.8 GB  4 minutes ago
deepseek-coder:latest    140a485970a6 776 MB  10 minutes ago
llama2-uncensored:latest 44040b922233 3.8 GB  10 minutes ago
mistral:latest           4d9f4b269c33 4.1 GB  2 days ago
codeup:latest            54289661f7a9 7.4 GB  4 minutes ago
magicoder:latest         8007de06f5d9 3.8 GB  4 minutes ago

starcoder:latest         847e5a7aa26f 1.8 GB  10 minutes ago



Here's the result of each of them, the benchmark runs using 32-core CPU, since my GPU still unsupported unsupported (gfx1032):

codellama - 62.59s - will overwrite "left" if array overlapped, at least the structure a bit correct

 Sure! Here is an example of how to implement in-place merge sort using Go:
```
package main

import (
        "fmt"
)

func merge(arr []int, low int, mid int, high int) {
        left := arr[low : mid+1]
        right := arr[mid+1 : high+1]

        for i := low; i <= high; i++ {
                if left[i-low] > right[i-mid] {
                        arr[i] = left[i-low]
                } else {
                        arr[i] = right[i-mid]
                }
        }
}

func inplaceMergeSort(arr []int, low int, high int) {
        if low >= high {
                return
        }

        mid := (low + high) / 2
        inplaceMergeSort(arr, low, mid)
        inplaceMergeSort(arr, mid+1, high)
        merge(arr, low, mid, high)
}

func main() {
        arr := []int{5, 3, 7, 9, 2, 4, 6, 8}
        inplaceMergeSort(arr, 0, len(arr)-1)
        fmt.Println("Sorted array:", arr)
}
```
This code sorts the input array in place using the merge sort algorithm. The `merge` function is used to combine two sorted subarrays
into a single sorted array. The `inplaceMergeSort` function recursively splits the array into smaller subarrays until each subarray
contains only one element, at which point the `merge` function is called to combine the subarrays.

The `main` function demonstrates how to use the `inplaceMergeSort` function on an example input array.

deepseek-coder - 23.53s - fast but messy indentation, also spouting quicksort not mergesort 

 
Here's an in-place version of merge sort using Golang:

```go
package main
 
import (
    "fmt"
)     
    
func partition(array []int, low int , high int ) int {
        pivot := array[high] // Picks the last element as a pivot. This is done arbitrarily here for simplicity of code but in
real-world cases it should be chosen by an external algorithm like quicksort or similar with better average case performance then
mergesorts which pick random elements
        i := (low - 1)    // Index of smaller element  
        for j:=  low ;j <= high-1; j++ {     
            if array[j] < pivot{      
                i++                     // Swap arr[i+1] and arra[high], then increment i.  This is the part that makes sure all elements sma
smaller than Pivot go to left of it, while rest are in right side    // The key process is to check if current element 'array'[] has
value less
                //than pivot'. If yes,"then do nothing", else swap values and increment i.
                }                          }      return (i + 1)            }
            func merge( array []int, low int , mid int,high int ) { // The main function that sorts arr[l..r] using auxiliary space
a[]={0...n} and i is index of start element in left subarray å’Œ right.   pivot such as to be inserted
                // elements into correct position for final sorted array, then call itself recursively      return }            public func m
main() {          arr :=  [7,12,45689,30,-2 ,-5]         fmt.Println(array)    mergeSort(&arr[..])     // Calling the function to sort
array using mergesort
                }` `go run main.go 输出结果为[-5 -2  12  7 30 45689 ]] ) and go playground https://play.golang.org/p/-Ni_kJvKlf for running the code
```     Sorry, I can't provide you with a full in-place merge sort because it depends on your requirements (whether to print or not),
as well complexity analysis and other factors which may affect performance of an algorithm. The above is just how one might implement
such functionality within Go language using Goroutines if necessary for multi threading/parallelism, but would need a different
approach otherwise due the nature of merge sort itself being inherently sequential in operation (each element has to be processed
before it can go on). 
 

llama2-uncensored - 50.87s - err, using builtin sort XD

Sure, here's an implementation of in-place mergesort using Go:
```go
package main
import "fmt"
func mergeSort(arr []int) {
 if len(arr) <= 1 {
 return
 }
 mid := len(arr) / 2
 left := arr[:mid]
 right := arr[mid:]
 fmt.Println("Merging", left, "and", right)
 sort.Ints(left)
 sort.Ints(right)
 mergeSort(left)
 merge(left, right)
}
func merge(left []int, right []int) []int {
 res := make([]int, len(left)+len(right))
 for i := 0; i < len(res); i++ {
 if left[i] <= right[i-1] {
 res[i] = left[i]
 } else {
 res[i] = right[i-1]
 }
 }
 return res
}
func main() {
 arr := []int{5, 3, 2, 1, 7, 8, 4, 6}
 fmt.Println("Original array:", arr)
 mergeSort(arr)
 fmt.Println("Merged array: ", arr)
}
```
This implementation uses the built-in `sort` package to sort the left and right halves of the original array separately, and then
merges them using a recursive function called `merge`. Finally, it applies the sorted array to the original one using the `mergeSort`
function.

mistral - 96.77s - not really inplace XD it creates copy

 Here's an example of in-place MergeSort implementation in Golang. In this approach, we divide the given slice into smaller sub-arrays
until each sub-array contains only one element. Then we merge these sub-arrays back together to sort the original slice.

```go
func merge(arr []int, left int, mid int, right int) { // Merge function
    n1 := mid - left + 1
    n2 := right - mid

    Left := make([]int, n1)
    Right := make([]int, n2)

    for i := 0; i < n1; i++ {
        Left[i] = arr[left + i]
    }
    for j := 0; j < n2; j++ {
        Right[j] = arr[mid+1+j]
    }

    i := 0
    j := 0
    k := left

    for i < n1 && j < n2 {
        if Left[i] <= Right[j] {
            arr[k] = Left[i]
            i++
        } else {
            arr[k] = Right[j]
            j++
        }
        k++
    }

    for i < n1 { // Copy the rest of Left
        arr[k] = Left[i]
        i++
        k++
    }

    for j < n2 { // Copy the rest of Right
        arr[k] = Right[j]
        j++
        k++
    }
}

func mergeSort(arr []int, left int, right int) { // Merge sort function
    if len(arr) <= 1 {
        return
    }

    mid := (left + right) / 2

    mergeSort(arr, left, mid)
    mergeSort(arr, mid+1, right)
    merge(arr, left, mid, right)
}

func main() {
    arr := []int{12, 11, 13, 5, 6, 7}
    fmt.Println("Unsorted array:")
    printArray(arr)
    mergeSort(arr, 0, len(arr)-1)
    fmt.Println("Sorted array:")
    printArray(arr)
}

func printArray(arr []int) {
    for _, a := range arr {
        fmt.Println(a, " ")
    }
    fmt.Println()
}
```

In this example, the `mergeSort` function is called recursively on the array until each sub-array contains only one element. After
that, the merge function merges these sub-arrays back together, sorting the entire array in place. 

starcoder - 71.65s - BAD, spouting python nonsense

# golang
go run -race./cmd/test
# python
python3.7 -m pytest tests_unit --cov=src
-- spouting nonsense python code --

magicoder - 91.89s - not really in-place

 Here is a simple implementation of the in-place merge sort algorithm in Golang. It's very similar to the traditional merge sort
algorithm but works in-place without needing an additional array space:

```go
package main

import (
        "fmt"
)

func printArray(arr []int) {
        for _, value := range arr {
                fmt.Print(value, " ")
        }
        fmt.Println()
}

func mergeSort(arr []int) {
        var helper [100]int // declare an array to hold the helper values
        mergeSortRecursive(&helper[0], arr, 0, len(arr)-1)
}

func mergeSortRecursive(aux *[100]int, src []int, lo, hi int) {
        if lo >= hi {
                return // base case: array is sorted
        }

        // sort the left half
        mid := (lo + hi) / 2
        mergeSortRecursive(aux, src, lo, mid)

        // sort the right half
        mergeSortRecursive(aux, src, mid+1, hi)

        // merge the sorted halves together
        merge(&helper[0], aux, src, lo, mid, hi)
}

func merge(dst *[100]int, aux *[100]int, src []int, lo, mid, hi int) {
        i := lo
        j := mid + 1

        for k := lo; k <= hi; k++ {
                if i > mid {
                        dst[k] = src[j]
                        j++
                } else if j > hi {
                        dst[k] = src[i]
                        i++
                } else if src[i] < src[j] {
                        dst[k] = src[i]
                        i++
                } else {
                        dst[k] = src[j]
                        j++
                }
        }

        // copy back the sorted values to our original array
        for i, value := range dst[:hi-lo+1] {
                src[i+lo] = value
        }
}

func main() {
        arr := []int{37, 49, 38, 50, 3, 13}
        printArray(arr) // before sorting
        mergeSort(arr)
        printArray(arr) // after sorting
}
```
This code prints the input array, sorts it using merge sort algorithm and prints again to check if the sorting is successful.

codeup - 127.32s - using temporary variable, so this just normal mergesort, quite slow

Here is an example of an in-place mergesort implementation in Go:
```
func mergeSort(arr []int, low, high int) {
    if low < high {
        // divide the array into two halves
        mid := (low + high) / 2
        mergeSort(arr, low, mid)
        mergeSort(arr, mid+1, high)
        merge(arr, low, mid, high)
    }
}

func merge(arr []int, low, mid, high int) {
    var result []int
    left := arr[:mid]
    right := arr[mid:]
    for i := 0; i < len(left); i++ {
        result = append(result, left[i])
    }
    for j := 0; j < len(right); j++ {
        result = append(result, right[j])
    }
    return result
}
```
This code sorts the input array in place using the mergesort algorithm. The `mergeSort` function divides the array into two halves, sorts each half recursively, and then merges the two sorted halves back together using the `merge` function. The `merge` function takes four
integer parameters: the left and right indices of the two arrays being merged, and the low and high indices of the resulting array. It appends the elements of the left and right arrays to the resulting array in the correct order.

To use this code, you can simply call the `mergeSort` function with the address of the input array as an argument, like this:
```
arr := []int{1, 2, 3, 4, 5, 6}
mergeSort(arr)
fmt.Println(arr) // [1, 2, 3, 4, 5, 6]
```
This will sort the input array in place and print the resulting sorted array. Note that this implementation of mergesort has a time complexity of O(n log n), which means it will take longer to sort larger arrays.

codebooga - 490.72s - using temporary variable, so this just normal mergesort, slow AF

Here is an implementation of in-place merge sort using Go:

```go
package main

import (
        "fmt"
)

func merge(arr []int, low int, mid int, high int) {
        n1 := mid - low + 1
        n2 := high - mid

        L := make([]int, n1)
        R := make([]int, n2)

        for i := 0; i < n1; i++ {
                L[i] = arr[low+i]
        }
        for j := 0; j < n2; j++ {
                R[j] = arr[mid+1+j]
        }

        i, j, k := 0, 0, low

        for i < n1 && j < n2 {
                if L[i] <= R[j] {
                        arr[k] = L[i]
                        i++
                } else {
                        arr[k] = R[j]
                        j++
                }
                k++
        }

        for i < n1 {
                arr[k] = L[i]
                i++
                k++
        }

        for j < n2 {
                arr[k] = R[j]
                j++
                k++
        }
}

func mergeSort(arr []int, low int, high int) {
        if low < high {
                mid := (low + high) / 2
                mergeSort(arr, low, mid)
                mergeSort(arr, mid+1, high)
                merge(arr, low, mid, high)
        }
}

func inPlaceMergeSort(arr []int) {
        if len(arr) > 1 {
                mergeSort(arr, 0, len(arr)-1)
        }
}

func main() {
        arr := []int{12, 11, 13, 5, 6, 7}
        fmt.Println("Given array is: ", arr)
        inPlaceMergeSort(arr)
        fmt.Println("Sorted array is: ", arr)
}
```

This program sorts the given array using in-place merge sort algorithm. The `merge` function takes three parameters, an array `arr`, and two indices `low` and `high`. It merges the two sorted subarrays of `arr`, one starting from index `low` to `mid` (inclusive) and another from
`mid+1` to `high` (inclusive), into a single sorted subarray. The `mergeSort` function is a recursive function that sorts the array using merge sort algorithm by dividing it into two halves until each half has only one element, then calling the `merge` function to combine them
back in order. Finally, the `inPlaceMergeSort` function calls `mergeSort` on the input array and sorts it in place. The `main` function demonstrates how to use this implementation by sorting an example array.

Conclusion

So as you can see, size of model determines the speed of resolving the answer, also all of them are not good enough for now to implement in-place merge part.

2023-11-27

Install Ruby 2.7 on Ubuntu 22.04

So I have a past project from 2014 that still running until now (5 independent services: 3 written in Golang - 2 read-only service, 2 write-only service written in Ruby, 1 PostgreSQL, 1 Redis, ~45GB database / 2.7GB daily backup, 1.1GB cache, 8GB logs, <800MB compressed), and the server RAID already near death (since the server was alive without issue for 12 years), and on new server there's freshly installed 22.04, and old service when trying to be run with ruby 3.x will always failed because of library compilation issues (eg. still depend on outdated postgres12-dev). Here's how you can install ruby 2.7 without rvm:

# add old brightbox PPA (must be focal, not jammy)
echo 'deb https://ppa.launchpadcontent.net/brightbox/ruby-ng/ubuntu focal main
deb-src https://ppa.launchpadcontent.net/brightbox/ruby-ng/ubuntu focal main ' > /etc/apt/sources.list.d/brightbox-ng-ruby-2.7.list

# add the signing key
sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys F5DA5F09C3173AA6

# add the focal security for libssl1.1
echo "deb http://security.ubuntu.com/ubuntu focal-security main" | sudo tee /etc/apt/sources.list.d/focal-security.list

# update and install
sudo apt update
sudo apt install -y ruby2.7 ruby2.7-dev

 

that's it, now you can run gem install (with sudo for global) without any compilation issue as long as you have correct C library dependency (eg. postgresql-server-dev-12).

2023-11-24

mTLS using Golang Fiber

In this demo we're going to create mTLS using Go and Fiber. To create certificates that can be used for mutual authentication, what you need to have is just an OpenSSL program (or simplecert, or mkcert like in previous natsmtls1 example), create a CA (certificate authority), server certs, and client certs, something like this:

# generate CA Root
openssl req -newkey rsa:2048 -new -nodes -x509 -days 3650 -out ca.crt -keyout ca.key -subj "/C=SO/ST=Earth/L=MyLocation/O=MyOrganiz/OU=MyOrgUnit/CN=localhost"

# generate Server Certs
openssl genrsa -out server.key 2048
# generate server Cert Signing request
openssl req -new -key server.key -days 3650 -out server.csr -subj "/C=SO/ST=Earth/L=MyLocation/O=MyOrganiz/OU=MyOrgUnit/CN=localhost"
# sign with CA Root
openssl x509  -req -in server.csr -extfile <(printf "subjectAltName=DNS:localhost") -CA ca.crt -CAkey ca.key -days 3650 -sha256 -CAcreateserial -out server.crt

# generate Client Certs
openssl genrsa -out client.key 2048
# generate client Cert Signing request
openssl req -new -key client.key -days 3650 -out client.csr -subj "/C=SO/ST=Earth/L=MyLocation/O=$O/OU=$OU/CN=localhost"
# sign with CA Root
openssl x509  -req -in client.csr -extfile <(printf "subjectAltName=DNS:localhost") -CA ca.crt -CAkey ca.key -out client.crt -days 3650 -sha256 -CAcreateserial


You will get at least 2 files related to CA, 3 files related to server, and 3 files related to client, but what you really need is just CA public key, server private and public key (key pairs), and client private and public key (key pairs). If you need to generate another client or rollover server keys, you will still need CA's private key so don't erase it.

Next, now that you already have those 5 keys, you will need to load CA public key, and server key pair and use it on fiber, something like this:

caCertFile, _ := os.ReadFile(in.CaCrt)
caCertPool := x509.NewCertPool()
caCertPool.AppendCertsFromPEM(caCertFile)

serverCerts, _ := tls.LoadX509KeyPair(in.ServerCrt, in.ServerKey)

tlsConfig := &tls.Config{
    ClientCAs:        caCertPool,
    ClientAuth:       tls.RequireAndVerifyClientCert,
    MinVersion:       tls.VersionTLS12,
    CurvePreferences: []tls.CurveID{tls.CurveP521, tls.CurveP384, tls.CurveP256},
        CipherSuites: []uint16{
        tls.TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,
        tls.TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA,
        tls.TLS_RSA_WITH_AES_256_GCM_SHA384,
        tls.TLS_RSA_WITH_AES_256_CBC_SHA,
         tls.TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,
        tls.TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,
     },
     Certificates: []tls.Certificate{serverCerts},
}

// attach the certs to TCP socket, and start Fiber server
app := fiber.New(fiber.Config{
    Immutable: true,
})
app.Get("/", func(c *fiber.Ctx) error {
    return c.String(`secured string`)
})
ln, _ := tls.Listen("tcp", `:1443`, tlsConfig)
app.Listener(ln)


next on the client side, you just need to load CA public key, client key pairs, something like this:

caCertFile, _ := os.ReadFile(in.CaCrt)
caCertPool := x509.NewCertPool()
caCertPool.AppendCertsFromPEM(caCertFile)
certificate, _ := tls.LoadX509KeyPair(in.ClientCrt, in.ClientKey)

httpClient := &http.Client{
    Timeout: time.Minute * 3,
    Transport: &http.Transport{
        TLSClientConfig: &tls.Config{
            RootCAs:      caCertPool,
            Certificates: []tls.Certificate{certificate},
        },
    },
}

r, _ := httpClient.Get(`https://localhost:1443`)


that's it, that's how you secure client-server communication between Go client and server with mTLS, this code can be found here.

2023-10-14

Benchmarking docker-volume vs mount-fs vs tmpfs

So today we're gonna benchmark between docker-volume (bind to docker-managed volume), bind/mount-fs (binding to host filesystem), and tmpfs. Which one can be the fastest? here's the docker compose:

version: "3.7"
services:
  web:
    image: ubuntu
    command: "sleep 3600"
    volumes:
        - ./temp1:/temp1 # mountfs
        - temp2:/temp2   # dockvol
        - temp3:/temp3   # tmpfs

volumes:
  temp2:
  temp3:
    driver_opts:
      type: tmpfs
      device: tmpfs


The docker compose file is on the sibling directory as data-root of docker to ensure using the same SSD. First benchmark we're gonna clone from this repository, then run copy, create 100 small files, then do 2 sequential write (small and large), here's the result of those (some steps not pasted below, eg. removing file when running benchmark twice for example):

apt install git g++ make time
alias time='/usr/bin/time -f "\nCPU: %Us\tReal: %es\tRAM: %MKB"'

cd /temp3 # tmpfs
git clone https://github.com/nikolausmayer/file-IO-benchmark.git

### copy small files

time cp -R /temp3/file-IO-benchmark /temp2 # dockvol
CPU: 0.00s      Real: 1.02s     RAM: 2048KB

time cp -R /temp3/file-IO-benchmark /temp1 # bindfs
CPU: 0.00s      Real: 1.00s     RAM: 2048KB

### create 100 x 10MB files

cd /temp3/file*
time make data # tmpfs
CPU: 0.41s      Real: 0.91s     RAM: 3072KB

cd /temp2/file*
time make data # dockvol
CPU: 0.44s      Real: 1.94s     RAM: 2816KB

cd /temp1/file*
time make data # mountfs
CPU: 0.51s      Real: 1.83s     RAM: 2816KB

### compile

cd /temp3/file*
time make # tmpfs
CPU: 2.93s  Real: 3.23s RAM: 236640KB

cd /temp2/file*
time make # dockvol
CPU: 2.94s  Real: 3.22s RAM: 236584KB

cd /temp1/file*
time make # mountfs
CPU: 2.89s  Real: 3.13s RAM: 236300KB

### sequential small

cd /temp3 # tmpfs
time dd if=/dev/zero of=./test.img count=10 bs=200M

2097152000 bytes (2.1 GB, 2.0 GiB) copied, 0.910784 s, 2.3 GB/s

cd /temp2 # dockvol
time dd if=/dev/zero of=./test.img count=10 bs=200M

2097152000 bytes (2.1 GB, 2.0 GiB) copied, 2.26261 s, 927 MB/s

cd /temp1 # mountfs
time dd if=/dev/zero of=./test.img count=10 bs=200M
2097152000 bytes (2.1 GB, 2.0 GiB) copied, 2.46954 s, 849 MB/s

### sequential large

cd /temp3 # tmpfs
time dd if=/dev/zero of=./test.img count=10 bs=1G
10737418240 bytes (11 GB, 10 GiB) copied, 4.95956 s, 2.2 GB/s

cd /temp2 # dockvol
time dd if=/dev/zero of=./test.img count=10 bs=1G
10737418240 bytes (11 GB, 10 GiB) copied, 81.8511 s, 131 MB/s
10737418240 bytes (11 GB, 10 GiB) copied, 44.2367 s, 243 MB/s
# ^ running twice because I'm not sure why it's so slow

cd /temp1 # mountfs
time dd if=/dev/zero of=./test.img count=10 bs=1G
10737418240 bytes (11 GB, 10 GiB) copied, 12.7516 s, 842 MB/s

The conclusion is, docker volume is a bit faster (+10%) for sequential small, but significantly slower (-72% to -84%) for large sequential files compared to bind/mount-fs, for the other cases seems there's no noticeable difference. I always prefer bind/mount-fs over docker volume because of safety, for example if you accidentally run docker volume rm $(docker volume ls -q) this would delete all your docker volume (I did this multiple times on my own dev PC), also you can easily backup/rsync/copy/manage files if using bind/mount-fs. For other cases, that you don't care whether losing files or not and need high performance (as long as your ram is enough), just use tmpfs.

2023-10-07

NATS: at-most once Queue / simpler networking

As you might already know in the past article, I'm a huge fan of NATS, NATS is one of the fastest one at-most-once delivery non-persistent message broker. Unlike rabbitmq/lavinmq/kafka/redpanda, NATS is not a persistent queue, the persistent version is called NATS-Jetstream (but it's better to use rabbitmq/lavinmq/kafka/redpanda since they are more popular and have more integration than NATS-Jetstream, or use cloud provider's like GooglePubSub/AmazonSQS.

Features of NATS:

1. embeddable, you can embed nats directly on go application, so don't have to run a dedicated instance

ns, err := server.NewServer(opts)
go ns.Start()

if !ns.ReadyForConnections(4 * time.Second) {
      log.Fatal("not ready for connections")
}
nc, err := nats.Connect(ns.ClientURL())


2. high performance, old benchmark (not apple-to-apple because the rest mostly have persistence (at -least-once delivery) by default)
3. wildcard topic, topic can contain wildcard, so you can subscribe topic like foo.*.bar (0-9a-zA-Z, separate with dot . for subtopic)
4. autoreconnect (use Reconnect: -1), unlike some amqp client-library that have to handle reconnection manually
5. built in top-like monitoring nats-top -s serverHostname -m port

Gotchas

Since NATS does not support persistence, if there's no subscriber, message will lost (unlike other message queue), so it behaves like Redis PubSub, not like Redis Queue, just the difference is Redis PubSub is fan-out/broadcast only, while nats can do both broadcast or non-persistent queue.

Use Case of NATS

1. publish/subscribe, for example to give signal to another services. this assume broadcast, since there's no queue.
Common pattern for this for example, there is API for List items changed after certain updatedAt, and the service A want to give signal to another service B that there's new/updated items, so A just need to broadcast this message to NATS, and any subscriber of that topic can get the signal to fetch from A when getting a signal, and periodically as fallback.

// on publisher
err := nc.Publish(strTopic, bytMsg)

// on subscsriber
_, err := nc.Subscribe(strTopic, func(m *nats.Msg) {
    _ = m.Subject
    _ = m.Data
})

// sync version
sub, err := nc.SubscribeSync(strTopic)
for {
  msg, err := sub.NextMsg(5 * time.Second)
  if err != nil { // nats.ErrTimeout if no message
    break
  }
  _ = msg.Subject
  _ = msg.Data
}

 

2. request/reply, for example to load-balance requests/autoscaling, so you can deploy anywhere, and the "worker" will catch up
we can use QueueSubscribe with same queueName to make sure only 1 worker handling per message. if you have 2 different queueName, the same message will be processed by 2 different worker that subscribe to the different queueName.

// on requester
msg, err := nc.Request(strTopic, bytMsg, 5*time.Second)
_ = msg == []byte("reply"} // from worker/responder

// on worker/responder
_, err := nc.QueueSubscribe(strTopic, queueName, func(m *nats.Msg) {
    _ = m.Subject
    _ = m.Data
    m.Respond([]byte("reply")) // send back to requester
})

You can see more examples here, and example how to add otel trace on NATS here, and how to create mtls here.

2023-09-28

Chisel: Ngrok local-tunnel Alternative

So today we're gonna learn a tool called chisel, from the same creator of overseer that I usually use to graceful restart production service.

What chisel can do? It can forward traffic from private network to public network, for example if you have service/port that only accessible from your internal network, and you want it to be exposed/tunneled to server that has public IP/accessible from outside, but only for the case when you cannot use reverse proxy because the reverse proxy cannot access your private server (eg. because it protected by firewall or doesn't have public IP at all).

internet --> public server <-- internet <-- private server/localhost
                           <--> tunnel <-->


You can first install chisel by running this command

go install github.com/jpillora/chisel@latest 

or download the binary directly. 

Then in the public server (server with public IP), you can do something like this:

chisel server --port 3229 --reverse

this would listen to port 3229 for tunnel requests.

On the client/private network that you want to be exposed to public you can run this command:

chisel client http://publicServerIP:3229 R:3006:127.0.0.1:3111

The command above means that on the server, there will be port 3006 listened, any traffic that goes to that port, will be forwarded to client to port 3111.

After that you can add https for example using caddy (don't forget to add DNS first so letsencrypt can get the proper certificate):

https://myWebSite.com {
  reverse_proxy localhost:3006
}

Other alternatives are cloudflare tunnel, but it requires you to setup network and other stuff in their website (not sure what they will charge you for excess traffic), there's also ngrok (the original, but now a paid service), localtunnel (but it always dead after few requests).

More alternative and resources here: