2024-02-20

Writing UDF for Clickhouse using Golang

Today we're going to create an UDF (User-defined Function) in Golang that can be run inside Clickhouse query, this function will parse uuid v1 and return timestamp of it since Clickhouse doesn't have this function for now. Inspired from the python version with TabSeparated delimiter (since it's easiest to parse), UDF in Clickhouse will read line by line (each row is each line, and each text separated with tab is each column/cell value):

package main
import (
    "bufio"
    "encoding/binary"
    "encoding/hex"
    "fmt"
    "os"
    "strings"
    "time"
)
func main() {
    scanner := bufio.NewScanner(os.Stdin)
    scanner.Split(bufio.ScanLines)
    for scanner.Scan() {
        id, _ := FromString(scanner.Text())
        fmt.Println(id.Time())
    }
}
func (me UUID) Nanoseconds() int64 {
    time_low := int64(binary.BigEndian.Uint32(me[0:4]))
    time_mid := int64(binary.BigEndian.Uint16(me[4:6]))
    time_hi := int64((binary.BigEndian.Uint16(me[6:8]) & 0x0fff))
    return int64((((time_low) + (time_mid << 32) + (time_hi << 48)) - epochStart) * 100)
}
func (me UUID) Time() time.Time {
    nsec := me.Nanoseconds()
    return time.Unix(nsec/1e9, nsec%1e9).UTC()
}
// code below Copyright (C) 2013 by Maxim Bublis <b@codemonkey.ru>
// see https://github.com/satori/go.uuid
// Difference in 100-nanosecond intervals between
// UUID epoch (October 15, 1582) and Unix epoch (January 1, 1970).
const epochStart = 122192928000000000
// UUID representation compliant with specification
// described in RFC 4122.
type UUID [16]byte
// FromString returns UUID parsed from string input.
// Following formats are supported:
// "6ba7b810-9dad-11d1-80b4-00c04fd430c8",
// "{6ba7b810-9dad-11d1-80b4-00c04fd430c8}",
// "urn:uuid:6ba7b810-9dad-11d1-80b4-00c04fd430c8"
func FromString(input string) (u UUID, err error) {
    s := strings.Replace(input, "-", "", -1)
    if len(s) == 41 && s[:9] == "urn:uuid:" {
        s = s[9:]
    } else if len(s) == 34 && s[0] == '{' && s[33] == '}' {
        s = s[1:33]
    }
    if len(s) != 32 {
        err = fmt.Errorf("uuid: invalid UUID string: %s", input)
        return
    }
    b := []byte(s)
    _, err = hex.Decode(u[:], b)
    return
}
// Returns canonical string representation of UUID:
// xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx.
func (u UUID) String() string {
    return fmt.Sprintf("%x-%x-%x-%x-%x",
        u[:4], u[4:6], u[6:8], u[8:10], u[10:])
}


Compile and put it with proper owner and permission on /var/lib/clickhouse/user_scripts/uuid2timestr and create /etc/clickhouse-server/uuid2timestr_function.xml (must be have proper suffix) containing:

<functions>
    <function>
        <type>executable</type>
        <name>uuid2timestr</name>
        <return_type>
String</return_type>
        <argument>
            <type>String</type>
        </argument>
        <format>TabSeparated</format>
        <command>uuid2timestr</command>
        <lifetime>0</lifetime>
    </function>
</functions>


after that you can restart Clickhouse (sudo systemctl restart clickhouse-server or sudo clickhouse restart) depends on how you install it (apt or binary setup).

Usage


to make sure it's loaded, you can just find this line on the log:

<Trace> ExternalUserDefinedExecutableFunctionsLoader: Loading config file '/etc/clickhouse-server/uuid2timestr_function.xml

then just run a query using that function:

SELECT uuid2timestr('51038948-97ea-11ee-b7e0-52de156a77d8')

┌─uuid2timestr('51038948-97ea-11ee-b7e0-52de156a77d8')─┐
│ 2023-12-11 05:58:33.2391752 +0000 UTC                │
└──────────────────────────────────────────────────────┘