2016-02-10

Simpler Way to find rows with Largest/Smallest value for each group in SQL

Sometimes we need to find largest or smallest value for each column and also associated column. Recently I found out that DISTINCT ON is quite good, than my previous way (LEFT JOIN less IS NULL, or WHERE IN SELECT MAX/MIN), see this example:

CREATE TABLE test1 ( id bigserial primary key, val int, name text );

INSERT INTO test1(val,name) VALUES(6,'a'),(9,'a'),(5,'b'),(11,'b'),(2,'c'),(8,'c');

SELECT * FROM test1;

 id | val | name 
----+-----+------
  1 |   6 | a
  2 |   9 | a
  3 |   5 | b
  4 |  11 | b
  5 |   2 | c
  6 |   8 | c

To find which id for each name that has greatest value, we usually do something like this (WHERE IN SELECT MAX):

SELECT *
FROM test1 x1
WHERE (name, val) IN (
  SELECT name, MAX(val) 
  FROM test1
  GROUP BY 1
)
ORDER BY x1.name;


 id | val | name
----+-----+------
  2 |   9 | a
  4 |  11 | b
  6 |   8 | c

 Sort  (cost=73.95..74.64 rows=275 width=44)
   Sort Key: test1.name
   ->  Hash Join  (cost=33.50..62.81 rows=275 width=44)
         Hash Cond: ((test1.name = test1_1.name) AND (test1.val = (max(test1_1.val))))
         ->  Seq Scan on test1  (cost=0.00..21.00 rows=1100 width=44)
         ->  Hash  (cost=30.50..30.50 rows=200 width=36)
               ->  HashAggregate  (cost=26.50..28.50 rows=200 width=36)
                     Group Key: test1_1.name
                     ->  Seq Scan on test1 test1_1  (cost=0.00..21.00 rows=1100 width=36)

Or this (LEFT JOIN less IS NULL):

SELECT x1.*
FROM test1 x1
  LEFT JOIN test1 x2
    ON x1.name = x2.name
    AND x1.val < x2.val
WHERE x2.id IS NULL
GROUP BY x1.id, x1.name
ORDER BY x1.name; 


 id | val | name 
----+-----+------
  2 |   9 | a
  4 |  11 | b
  6 |   8 | c

 Group  (cost=264.68..264.75 rows=10 width=44)
   Group Key: x1.name, x1.id
   ->  Sort  (cost=264.68..264.70 rows=10 width=44)
         Sort Key: x1.name, x1.id
         ->  Merge Left Join  (cost=153.14..264.51 rows=10 width=44)
               Merge Cond: (x1.name = x2.name)
               Join Filter: (x1.val < x2.val)
               Filter: (x2.id IS NULL)
               ->  Sort  (cost=76.57..79.32 rows=1100 width=44)
                     Sort Key: x1.name
                     ->  Seq Scan on test1 x1  (cost=0.00..21.00 rows=1100 width=44)
               ->  Sort  (cost=76.57..79.32 rows=1100 width=44)
                     Sort Key: x2.name
                     ->  Seq Scan on test1 x2  (cost=0.00..21.00 rows=1100 width=44)


There some other way, that I found simpler (and faster in real life) also return only one row if there's duplicate value, that is DISTINCT ON!

SELECT DISTINCT ON (x1.name) *
FROM test1 x1
ORDER BY x1.name, x1.val DESC;

 id | val | name 
----+-----+------
  2 |   9 | a
  4 |  11 | b
  6 |   8 | c

 Unique  (cost=76.57..82.07 rows=200 width=44)
   ->  Sort  (cost=76.57..79.32 rows=1100 width=44)
         Sort Key: name, val
         ->  Seq Scan on test1 x1  (cost=0.00..21.00 rows=1100 width=44)

So in this case, we look for highest value, distinct by name. Well, that's all for now.

2016-01-20

Kostya's Programming Language Implementation Benchmark

These data taken from https://github.com/kostya/benchmarks with some values removed, I only take the best result, and number of implementation is greater or equal to 3 (Comp, this could be the indicator how easy it's to implement). No further delay, here's the recap:

Implementation Json Brainfuck Mandel Base64 Matmul Havlak Comp Avg Time Avg RAM
Nim Clang 0.12.0 3.37 849.60 3.21 0.70 28.96 1.00 4.67 52.70 3.70 142.30 17.36 907.00 6 10.21 325.55
Rust 1.5.0-nightly 1.35 208.70 5.46 4.90 46.34 4.90 4.25 42.90 4.61 76.90

5 12.40 67.66
Crystal 0.10.0-9d59a34 2.63 1.20 6.97 1.30 48.62 1.30 2.21 85.80 3.83 72.20 15.87 398.10 6 13.36 93.32
Nim Gcc 0.12.0 3.49 903.50 4.52 0.60 50.45 0.90 4.57 52.70 3.76 152.70 17.51 889.10 6 14.05 333.25
Java 1.8.0_45 1.48 518.30 4.94 147.60 55.14 69.90

3.50 136.20

4 16.27 218.00
D 2.068.0 12.42 1,417.10 6.57 1.00 45.29 1.20 6.18 89.10 2.30 71.30 28.90 418.20 6 16.94 332.98
C++ 4.8.2 0.72 1.00 5.08 1.10 56.63 1.10 5.45 65.20

17.72 174.50 5 17.12 48.58
D Ldc 0.15.2-beta1 27.23 919.60 6.61 0.90 43.30 0.90 3.27 44.10 2.01 68.90 25.15 214.90 6 17.93 208.22
D Gdc 1.20.0-232-gc746732 0.34 226.70 8.87 1.00 70.12 1.50 3.16 45.20 2.33 73.00 31.79 197.60 6 19.44 90.83
Go 1.5 4.62 273.10 7.29 1.30 52.56 7.60 13.27 106.20 4.76 73.30 35.34 347.10 6 19.64 134.77
Javascript Node v5.0.0 2.80 829.90 8.74 15.00 92.65 15.80 4.38 628.40 5.88 85.90

5 22.89 315.00
Julia 0.3.11 10.27 2,353.90 9.25 59.00 94.33 56.90 14.76 380.20 31.34 375.80

5 31.99 645.16
Go Gcc 4.9.1 17.64 473.10 13.60 10.00 85.67 10.70 39.56 185.50 3.90 84.50 32.94 365.70 6 32.22 188.25
Python Pypy 4.0.0 with GCC 4.8.4 4.81 1,553.00 13.94 55.40 126.46 64.50 7.32 582.30 7.68 122.60 45.51 625.90 6 34.29 500.62
C# Mono 4.0.1 25.74 3,757.90 18.08 15.40 118.72 13.60 9.01 71.70 15.17 83.60 40.54 270.00 6 37.88 702.03
Javascript Jx 0.10.40 2.73 706.80 17.14 11.00 192.23 12.40 6.97 710.60 5.92 83.80

5 45.00 304.92
Scala 2.11.6 360.95 2,789.00 5.90 116.30 64.37 126.40 10.69 292.50 3.62 136.20 32.18 363.00 6 79.62 637.23
Ruby Jruby 1.7.20 21.98 2,761.10 87.05 124.10

12.65 514.90 416.12 582.40

4 134.45 995.63
Ruby 2.1.2p95 8.23 1,085.50 226.86 8.00

2.73 125.30 338.40 82.80

4 144.06 325.40
Ruby Jruby9K 9.0.0.0.pre2 16.53 2,050.50 160.15 297.20

12.16 530.60 467.59 608.30

4 164.11 871.65
Python3 3.4.3 5.92 1,037.80 480.78 5.50

8.16 47.50



3 164.95 363.60
Python 2.7.6 5.07 1,352.90 452.44 4.90

7.62 52.60 3.08 65.30 396.54 724.00 5 172.95 439.94
Perl 5.18.2 2.68 888.40



3.63 47.90 666.46 604.10

3 224.26 513.47
Ruby Rbx 2.2.10 67.13 4,681.00 472.08 45.00

4.29 30.70 591.70 325.00

4 283.80 1,270.43
Tcl 8.6

262.20 2.70

7.20 66.00 1,066.66 279.90

3 445.35 116.20

For now i'm still betting on Crystal (LLVM-based) and Golang. For next I think I'll try D, maybe with DLangIDE or CoEdit..