2022-05-07

Getting started with Trino

Trino is a distributed query engine, that allows you to JOIN from multiple datasources (databases like mysql, postgresql, bigquery, cassandra, mongodb, redis, prometheus, elasticsearch, csv file, google sheets, s3, etc). It's like Clickhouse but without high-tech (merge-tree) storage ability, so it cannot do blazing fast analytics query like in Clickhouse, but it can be as fast as the connected database that it uses, eg. if it uses Clickhouse connected, then it can be as fast as Clickhouse. It was developed by Facebook (previously named Presto). List of database connectors can be seen here. To use Trino, you can use dockerized version or manual:

# Docker
docker run -d
 -p 8080:8080 --name trino1 trinodb/trino
# web UI only for monitoring, use random username 

docker exec -it trino1 trino

# Ubuntu 22.04
java --version
python3 --version
# download and extract from https://trino.io/download.html
mkdir 
./trino-server-379/etc
cd trino-server-379
SRCURL=https://raw.githubusercontent.com/trinodb/trino-the-definitive-guide/master/single-installation/etc
wget -c $SRCURL/jvm.config
wget -c $SRCURL/log.properties
wget -c $SRCURL/node.properties
echo '
coordinator=true
node-scheduler.include-coordinator=true
http-server.http.port=8081
query.max-memory=5GB
query.max-memory-per-node=1GB
discovery.uri=http://127.0.0.1:8081
' > config.properties
echo '

node.data-dir=/tmp/
' >> node.properties
mkdir catalog
echo '
connector.name=cassandra
cassandra.contact-points=127.0.0.1
# more here https://trino.io/docs/current/connector/cassandra.html
' > catalog/localscylla.properties 
cd ..
python3 ./bin/launcher.py run # to run in background: start

# CLI/Client
EXELOC=/usr/bin
/trino
curl -O $EXELOC https://repo1.maven.org/maven2/io/trino/trino-cli/379/trino-cli-379-executable.jar
chmod a+x $EXELOC
trino --server http://localhost:8081

These are the list of commands can be used in trino (other than standard SQL):

SHOW CATALOGS;
SHOW SCHEMAS FROM/IN __CATALOG__; # eg. localscylla
SHOW TABLES FROM/IN __CATALOG__.__SCHEMA__;
DESCRIBE __CATALOG__.__SCHEMA__.__TABLE__;
EXPLAIN SELECT * FROM __CATALOG__.__SCHEMA__.__TABLE__;

That's it, you can add more databases connection by creating more etc/catalog/*.properties file with proper configuration (username, password, port, etc).

No comments:

Post a Comment

THINK: is it True? is it Helpful? is it Inspiring? is it Necessary? is it Kind?