Showing posts with label mongodb. Show all posts
Showing posts with label mongodb. Show all posts

2020-02-16

Create Leaderboard using MongoDB and Go

So now I tried to create a leaderboard using MongoDB and Golang (since I failed to make on using CouchBase), first you need to install MongoDB using these commands:

wget -qO - https://www.mongodb.org/static/pgp/server-4.2.asc | sudo apt-key add -
echo "deb [ arch=amd64,arm64 ] https://repo.mongodb.org/apt/ubuntu bionic/mongodb-org/4.2 multiverse" | sudo tee /etc/apt/sources.list.d/mongodb-org-4.2.list
sudo apt-get update
sudo apt-get install -y mongodb-org
# if conflict, do: sudo apt-get purge mongodb mongodb-server mongodb-server-core mongodb-clients mongodb-org mongo-tools

sudo systemctl start mongod
sudo systemctl status mongod
sudo systemctl enable mongod

mongo

After installing, create a new go file (don't forget go get the external imports:

package main

import (
  "context"
  "fmt"
  "github.com/kokizzu/gotro/L"
  //"github.com/simagix/keyhole/mdb"
  "go.mongodb.org/mongo-driver/bson"
  "go.mongodb.org/mongo-driver/mongo"
  "go.mongodb.org/mongo-driver/mongo/options"
  "math/rand"
  "time"
)

type Score struct {
  Id    string `bson:"_id"`
  User  int64  `bson:"user"`
  RefId int64  `bson:"ref_id"`
  Epoch int64  `bson:"epoch"`
  Score int    `bson:"score"`
}

func main() {
  ctx, _ := context.WithTimeout(context.Background(), 10*time.Second)
  client, err := mongo.Connect(ctx, options.Client().ApplyURI("mongodb://127.0.0.1:27017"))
  if L.IsError(err, `cannot connect to mongodb`) {
    return
  }
  
  collection := client.Database("test1").Collection("scores")
  
  for x := 0; x < 1000*1000; x++ {
    userId := 1 + rand.Intn(1000*10)
    refId := 1 + rand.Intn(1000)
    epoch := time.Now().AddDate(0, 0, -rand.Intn(365)+1).Unix()
    score := 10 + rand.Intn(100)
    id := fmt.Sprintf("user%dref%d", userId, refId)
    update := bson.M{
      "$set": &Score{
        Id:    id,
        User:  int64(userId),
        RefId: int64(refId),
        Epoch: epoch,
        Score: score,
      },
    }
    filter := bson.M{"_id": bson.M{"$eq": id}}
    _, err = collection.UpdateOne(context.Background(), filter, update, options.Update().SetUpsert(true))
    if L.IsError(err, `failed upsert`) {
      return
    }
  }
  
  // Use query
  window := []int{7, 30, 365}
  for _, delta := range window {
    fmt.Println(delta)
    epoch := time.Now().AddDate(0, 0, -delta).Unix()
/*
db.scores.aggregate([
  {
    $match:{  
      epoch:{$gt: 1251251}
    }
  },
  {
    $group:{
      _id: '$user',
      score:{$sum: '$score'}
    }
  },
  {
    $sort:{
      score:-1
    }
  },
  {$limit:10}
])
*/
    t := time.Now()
    cursor, err := collection.Aggregate(context.Background(), []bson.M{{
      `$match`: bson.M{
        `epoch`: bson.M{
          `$gt`: epoch,
        },
      },
    }, {
      `$group`: bson.M{
        `_id`:   `$user`,
        `score`: bson.M{`$sum`: `$score`,},
      },
    }, {
      `$sort`: bson.M{
        `score`: -1,
      },
    }, {
      `$limit`: 10,
    }}, options.Aggregate())
    if L.IsError(err, `failed query`) {
      return
    }
    for cursor.Next(context.Background()) {
      //row := Score{}
      var row map[string]interface{}
      cursor.Decode(&row)
      fmt.Printf("Row: %v\n", row)
    }
    cursor.Close(context.Background())
    dur := time.Since(t)
    fmt.Printf("Done in %s\n",dur)
  }
}

Well, this is the first time i was using MongoDB (was studying the syntax back when MongoDB silently discarding strings that too long, probably 8 years ago?). Also that context everywhere quite annoying, it would be nicer if it was set default globally on connection instead of per query, and split into two function one with default context and one with passed context. The result is quite nice btw, the performance is blazing fast for 9.5k data:

7
Row: map[_id:188 score:207]
Row: map[_id:491 score:205]
Row: map[_id:806 score:204]
Row: map[_id:720 score:186]
Row: map[_id:544 score:180]
Row: map[_id:682 score:163]
Row: map[_id:693 score:154]
Row: map[_id:138 score:152]
Row: map[_id:24 score:139]
Row: map[_id:174 score:127]
Done in 6.30876ms
30
Row: map[_id:693 score:467]
Row: map[_id:188 score:446]
Row: map[_id:780 score:300]
Row: map[_id:682 score:290]
Row: map[_id:570 score:280]
Row: map[_id:544 score:279]
Row: map[_id:943 score:261]
Row: map[_id:340 score:251]
Row: map[_id:714 score:250]
Row: map[_id:737 score:248]
Done in 6.561656ms
365
Row: map[_id:398 score:1346]
Row: map[_id:837 score:1324]
Row: map[_id:570 score:1180]
Row: map[_id:947 score:1173]
Row: map[_id:81 score:1165]
Row: map[_id:531 score:1121]
Row: map[_id:821 score:1105]
Row: map[_id:919 score:1104]
Row: map[_id:278 score:1095]
Row: map[_id:12 score:1091]
Done in 11.173371ms

Upsert performance on my PC is about 5k rows per second, the query syntax would took a while to get used to (unlike standard SQL), the query performance for 759k rows are:

7
Row: map[_id:47896 score:281]
Row: map[_id:56009 score:268]
Row: map[_id:84819 score:268]
Row: map[_id:69869 score:253]
Row: map[_id:74753 score:241]
Row: map[_id:89237 score:233]
Row: map[_id:138 score:221]
Row: map[_id:30651 score:215]
Row: map[_id:7545 score:213]
Row: map[_id:13532 score:213]
Done in 335.78427ms
30
Row: map[_id:693 score:467]
Row: map[_id:188 score:446]
Row: map[_id:756 score:366]
Row: map[_id:40581 score:363]
Row: map[_id:42851 score:347]
Row: map[_id:60245 score:340]
Row: map[_id:89237 score:332]
Row: map[_id:138 score:327]
Row: map[_id:22743 score:325]
Row: map[_id:858 score:325]
Done in 422.796678ms
365
Row: map[_id:85 score:1688]
Row: map[_id:837 score:1546]
Row: map[_id:398 score:1517]
Row: map[_id:531 score:1498]
Row: map[_id:406 score:1492]
Row: map[_id:821 score:1489]
Row: map[_id:570 score:1468]
Row: map[_id:81 score:1458]
Row: map[_id:178 score:1444]
Row: map[_id:947 score:1437]
Done in 1.488410158s

Now let's check the index performance, we'll add 2 index: epoch in descending and user in ascending

db.scores.createIndex( { user: 1 } )
db.scores.createIndex( { epoch: -1 } )

The query performance for the same number of rows are:

7
Row: map[_id:47896 score:281]
Row: map[_id:56009 score:268]
Row: map[_id:84819 score:268]
Row: map[_id:69869 score:253]
Row: map[_id:74753 score:241]
Row: map[_id:89237 score:233]
Row: map[_id:138 score:221]
Row: map[_id:30651 score:215]
Row: map[_id:13532 score:213]
Row: map[_id:7545 score:213]
Done in 62.307086ms
30
Row: map[_id:693 score:467]
Row: map[_id:188 score:446]
Row: map[_id:756 score:366]
Row: map[_id:40581 score:363]
Row: map[_id:42851 score:347]
Row: map[_id:60245 score:340]
Row: map[_id:89237 score:332]
Row: map[_id:138 score:327]
Row: map[_id:858 score:325]
Row: map[_id:22743 score:325]
Done in 254.839226ms
365
Row: map[_id:85 score:1688]
Row: map[_id:837 score:1546]
Row: map[_id:398 score:1517]
Row: map[_id:531 score:1498]
Row: map[_id:406 score:1492]
Row: map[_id:821 score:1489]
Row: map[_id:570 score:1468]
Row: map[_id:81 score:1458]
Row: map[_id:178 score:1444]
Row: map[_id:947 score:1437]
Done in 2.849935466s

Weirdly, when index can't filter the result well (365 days before today, which possibly returns all rows) it performance getting worse by almost 2x.

2017-05-11

TechEmpower Framework Benchmark Round 14

New benchmark result is out, as usual the important part is the data-update benchmark:


At that chart, the top ranking language are: Kotlin, C, Java, C++, Go, Perl, Javascript, Scala, C#; and for the database: MySQL, PostgreSQL, MongoDB.

Also the other benchmark that reflect real world case is multiple-queries:
On that benchmmark, the top performer programming language are: Dart, C++, Java, C, Go, Kotlin, Javascript, Scala, Ruby, and Ur; and the database: MongoDB, PostgreSQL, MySQL. You can see the previous result here, and here.


2016-11-16

Techempower Framework Benchmark Round 13

After long wait, the latest Techempower Framework Benchmark 13 is out! And just like previous one, Go's fasthttp really really fast. As usual the most important thing is data updates benchmark:


Top ranker in this part (more than 1024 req/s) are GoC++JavaJavascript (NodeJS)PerlC#, ElixirDartScalaPythonClojure, and Groovy (Actually PHP are there below the last one on the picture with 1018 req/s). And for the database part would be: PostgreSQL, MongoDB, and MySQL.

2016-02-28

Unfair Database Benchmark

So today we will benchmark some popular database, PostgreSQL, MySQL (Percona), Aerospike, Couchbase, MongoDB, Redis under 64-bit Linux (i7-4700MQ, 16GB RAM, non SSD harddisk) and Go using certain library. The settings for each database should have maximum 4GB in working memory, each insert should be done in transaction (if possible), index must be created (if possible), searching 40 times of insertion number in random.

Disclaimer: this is not apple to apple comparison, you can look for the source (and sent a pull request) on my github or dropbox. Let's get started!

PostgreSQL (9.5.1)

using JSONB data type

sudo pacman -S postgresql

sudo su - postgres <<EOF
initdb --locale en_CA.UTF-8 -E UTF8 -D '/var/lib/postgres/data'
EOF

echo "
max_connections = 64                 
superuser_reserved_connections = 3   
shared_buffers = 2GB                 
work_mem = 32MB                      
dynamic_shared_memory_type = posix  
effective_cache_size = 4GB
logging_collector = off         
datestyle = 'iso, dmy'
timezone = 'Asia/Jakarta'
lc_messages = 'en_CA.UTF-8'                    
lc_monetary = 'en_CA.UTF-8'                    
lc_numeric = 'en_CA.UTF-8'                     
lc_time = 'en_CA.UTF-8'                        
default_text_search_config = 'pg_catalog.english'
" > /var/lib/postgresql/data/postgresql.conf

sudo systemctl start postgresql

sudo su - postgres <<EOF
createdb test2
EOF

go get -u -v gitlab.com/kokizzu/gokil

The result:

initial database size: 7192 kB
INSERT micro 100 983.62 ms
INSERT tiny 1000 13223.53 ms
INSERT small 10000 119006.15 ms
final database size: 10 MB
SEARCH micro 100 354.34 ms
SEARCH tiny 1000 3499.50 ms
SEARCH small 10000 35499.98 ms


MySQL (Percona 5.7.10)

using InnoDB, current JSON data type doesn't support indexing

sudo pacman -S percona-server percona-server-clients 

echo '[client]
port            = 3306
socket          = /run/mysqld/mysqld.sock
[mysqld]
port            = 3306
socket          = /run/mysqld/mysqld.sock
datadir         = /var/lib/mysql
symbolic-links=0
sql_mode=NO_ENGINE_SUBSTITUTION,STRICT_TRANS_TABLES
key_buffer_size = 800M
max_allowed_packet = 16M
table_open_cache = 64
sort_buffer_size = 32M
net_buffer_length = 4M
read_buffer_size = 16M
read_rnd_buffer_size = 16M
myisam_sort_buffer_size = 128M
innodb_buffer_pool_size = 4G
[mysqld_safe]
log-error       = /var/log/mysqld.log
pid-file        = /run/mysqld/mysqld.pid
' > /etc/mysql/my.cnf

sudo systemctl start mysqld

echo 'CREATE DATABASE test2' | mysql -u root

The result:

initial database size: 0
INSERT micro 100 3258.85 ms
INSERT tiny 1000 32297.04 ms
INSERT small 10000 341951.99 ms
final database size: 2.02 MB
SEARCH micro 100 282.32 ms
SEARCH tiny 1000 2843.62 ms
SEARCH small 10000 28404.53 ms


MongoDB (3.2.3)

using struct and map[string]interface{} inside the struct

sudo pacman -S mongodb

go get -u -v gopkg.in/mgo.v2

The result:

initial database size: 78 MB

INSERT micro 100 10.19 ms
INSERT tiny 1000 83.58 ms
INSERT small 10000 894.40 ms
final database size: 78 MB
SEARCH micro 100 321.75 ms
SEARCH tiny 1000 3183.75 ms
SEARCH small 10000 33069.50 ms


CouchBase (4.0.0)

invalid implementation, always got panic: {MCResponse status=0x20 keylen=0, extralen=0, bodylen=12}

sudo yaourt -S couchbase-server-community

go get -u -v github.com/couchbase/go-couchbase

sudo systemctl restart couchbase-server

Visit http://127.0.0.1:8091 to configure, I use 3000MB data cache and 1000MB index cache.

The result (incorrect):

initial database size: 4.06 MB
INSERT micro 100 132.29 ms
INSERT tiny 1000 1317.61 ms
INSERT small 10000 13082.86 ms
final database size: -
SEARCH micro 100 - ms
SEARCH tiny 1000 69020.08 ms
SEARCH small 10000 - ms


AeroSpike (3.7.4)

without secondary index, since I could not found source or binary package for their command line tool for creating the index)

sudo yourt -S aerospike-server-ce
# change this two line on PKGBUILD:
# source=("http://aerospike.com/download/server/latest/artifact/tgz")
# sha512sums=('e9a574e5020db9542de51ad0a1d8da28b8f89d10505848c406656cf113654c0135dfd63fe0aaaafb06f50938124a00275f1b2a0a5ead7058b7b53964c046f3a1')

go get -u -v github.com/aerospike/aerospike-client-go

cd /
sudo aerospike init 

echo '
namespace test2 {
  replication-factor 2
  memory-size 4G
  default-ttl 0 # 30 days, use 0 to never expire/evict.
  storage-engine device {
    file /opt/aerospike/data
    filesize 1G
  }
}' >> /etc/aerospike.conf

sudo mkdir /opt/aerospike

sudo aerospike start

The result:

initial database size: 1 GB
INSERT micro 100 4.63 ms
INSERT tiny 1000 40.75 ms
INSERT small 10000 426.20 ms
final database size: 1 GB
SEARCH micro 100 113.93 ms
SEARCH tiny 1000 1086.70 ms
SEARCH small 10000 11259.24 ms


Redis (3.0.7)

using gob for serialization and deserialization.

sudo yourt -S redis

go get -u -v github.com/garyburd/redigo/redis

sudo systemctl start redis

The result:

initial database size: 18 bytes

INSERT micro 100 3.94 ms
INSERT tiny 1000 38.28 ms
INSERT small 10000 377.24 ms
final database size: 3.9 MB
SEARCH micro 100 424.79 ms
SEARCH tiny 1000 4263.79 ms
SEARCH small 10000 42904.11 ms

Here's the recap:


2015-10-15

State of JSON in SQLite and NoSQL

Another news for those who like JSON format, there's JSON1 extension for SQLite, this is a great news for those who only need embedded database for their app. Their functions are similar to PostgreSQL's JSON/JSONB functions. On other hand, there's already a lot of BSON solution in NoSQL world, such as the infamous MongoDB  (and TokuMX) RethinkDB, ArangoDB (SQL-like syntax), PouchDB (Javascript version of CouchDB),  SequoiaDBDjonDB, etc. For more information about those databases, see the video below.

SQLite 3 Tutorial

PostgreSQL 9.4's JSONB

What's new in MongoDB 3

Fractal Tree Index (TokuMX)

RethinkDB

ArangoDB

EDIT Hey, apparently there are also JSON data type support in MySQL 5.7.8 (2015-08-03, Release Candidate).

2014-11-24

Database GUI for MongoDB

There are many database GUI for MongoDB, today I will review about GenghisApp, Humongous, and RoboMongo.

You could install the Ruby-based one using this command:

gem install humongous
gem install genghisapp bson_ext

to start it, use this command:

humongous # visit http://localhost:9000
genghisapp # visit http://localhost:8000
php -S localhost:8000 genghis.php # php version

to stop it, use this command:

humongous -K
genghisapp --kill

I don't know why but when I tried both, they are not running at all, just stopped after showing "trying port xxxx", those programs closes immediately after that, the program doesn't show up on ps and netstat.

The next in line is RoboMongo, first, you must download using this command:

wget -c http://robomongo.org/files/linux/robomongo-0.8.4-x86_64.tar.gz

Extract it, then run the robomongo.sh shell script on the bin/ folder, set the connection, then it would show up something like this:


This one quote good, it shows the command line when doing any action, but as usual, Java-based app doesn't work well with dark GTK+ theme.

There are some other GUI for MongoDB that support live-monitoring, for example MongoBird and MongOwl, but don't forget, there are also built-in web monitoring on port 28017.