Programming Rants

2016-02-10

Simpler Way to find rows with Largest/Smallest value for each group in SQL

Sometimes we need to find largest or smallest value for each column and also associated column. Recently I found out that DISTINCT ON is quite good, than my previous way (LEFT JOIN less IS NULL, or WHERE IN SELECT MAX/MIN), see this example:

CREATE TABLE test1 ( id bigserial primary key, val int, name text );

INSERT INTO test1(val,name) VALUES(6,'a'),(9,'a'),(5,'b'),(11,'b'),(2,'c'),(8,'c');

SELECT * FROM test1;

id | val | name
----+-----+------
1 | 6 | a
2 | 9 | a
3 | 5 | b
4 | 11 | b
5 | 2 | c
6 | 8 | c

To find which id for each name that has greatest value, we usually do something like this (WHERE IN SELECT MAX):

SELECT *
FROM test1 x1
WHERE (name, val) IN (
SELECT name, MAX(val)
FROM test1
GROUP BY 1
)
ORDER BY x1.name;

id | val | name
----+-----+------
2 | 9 | a
4 | 11 | b
6 | 8 | c

Sort (cost=73.95..74.64 rows=275 width=44)
Sort Key: test1.name
-> Hash Join (cost=33.50..62.81 rows=275 width=44)
Hash Cond: ((test1.name = test1_1.name) AND (test1.val = (max(test1_1.val))))
-> Seq Scan on test1 (cost=0.00..21.00 rows=1100 width=44)
-> Hash (cost=30.50..30.50 rows=200 width=36)
-> HashAggregate (cost=26.50..28.50 rows=200 width=36)
Group Key: test1_1.name
-> Seq Scan on test1 test1_1 (cost=0.00..21.00 rows=1100 width=36)

Or this (LEFT JOIN less IS NULL):

SELECT x1.*
FROM test1 x1
LEFT JOIN test1 x2
ON x1.name = x2.name
AND x1.val < x2.val
WHERE x2.id IS NULL
GROUP BY x1.id, x1.name
ORDER BY x1.name;

id | val | name
----+-----+------
2 | 9 | a
4 | 11 | b
6 | 8 | c

Group (cost=264.68..264.75 rows=10 width=44)
Group Key: x1.name, x1.id
-> Sort (cost=264.68..264.70 rows=10 width=44)
Sort Key: x1.name, x1.id
-> Merge Left Join (cost=153.14..264.51 rows=10 width=44)
Merge Cond: (x1.name = x2.name)
Join Filter: (x1.val < x2.val)
Filter: (x2.id IS NULL)
-> Sort (cost=76.57..79.32 rows=1100 width=44)
Sort Key: x1.name
-> Seq Scan on test1 x1 (cost=0.00..21.00 rows=1100 width=44)
-> Sort (cost=76.57..79.32 rows=1100 width=44)
Sort Key: x2.name
-> Seq Scan on test1 x2 (cost=0.00..21.00 rows=1100 width=44)

There some other way, that I found simpler (and faster in real life) also return only one row if there's duplicate value, that is DISTINCT ON!

SELECT DISTINCT ON (x1.name) *
FROM test1 x1
ORDER BY x1.name, x1.val DESC;

id | val | name
----+-----+------
2 | 9 | a
4 | 11 | b
6 | 8 | c

Unique (cost=76.57..82.07 rows=200 width=44)
-> Sort (cost=76.57..79.32 rows=1100 width=44)
Sort Key: name, val
-> Seq Scan on test1 x1 (cost=0.00..21.00 rows=1100 width=44)

So in this case, we look for highest value, distinct by name. Well, that's all for now.

2016-01-20

Kostya's Programming Language Implementation Benchmark

These data taken from https://github.com/kostya/benchmarks with some values removed, I only take the best result, and number of implementation is greater or equal to 3 (Comp, this could be the indicator how easy it's to implement). No further delay, here's the recap:

Implementation	Json		Brainfuck		Mandel		Base64		Matmul		Havlak		Comp	Avg Time	Avg RAM
Nim Clang 0.12.0	3.37	849.60	3.21	0.70	28.96	1.00	4.67	52.70	3.70	142.30	17.36	907.00	6	10.21	325.55
Rust 1.5.0-nightly	1.35	208.70	5.46	4.90	46.34	4.90	4.25	42.90	4.61	76.90			5	12.40	67.66
Crystal 0.10.0-9d59a34	2.63	1.20	6.97	1.30	48.62	1.30	2.21	85.80	3.83	72.20	15.87	398.10	6	13.36	93.32
Nim Gcc 0.12.0	3.49	903.50	4.52	0.60	50.45	0.90	4.57	52.70	3.76	152.70	17.51	889.10	6	14.05	333.25
Java 1.8.0_45	1.48	518.30	4.94	147.60	55.14	69.90			3.50	136.20			4	16.27	218.00
D 2.068.0	12.42	1,417.10	6.57	1.00	45.29	1.20	6.18	89.10	2.30	71.30	28.90	418.20	6	16.94	332.98
C++ 4.8.2	0.72	1.00	5.08	1.10	56.63	1.10	5.45	65.20			17.72	174.50	5	17.12	48.58
D Ldc 0.15.2-beta1	27.23	919.60	6.61	0.90	43.30	0.90	3.27	44.10	2.01	68.90	25.15	214.90	6	17.93	208.22
D Gdc 1.20.0-232-gc746732	0.34	226.70	8.87	1.00	70.12	1.50	3.16	45.20	2.33	73.00	31.79	197.60	6	19.44	90.83
Go 1.5	4.62	273.10	7.29	1.30	52.56	7.60	13.27	106.20	4.76	73.30	35.34	347.10	6	19.64	134.77
Javascript Node v5.0.0	2.80	829.90	8.74	15.00	92.65	15.80	4.38	628.40	5.88	85.90			5	22.89	315.00
Julia 0.3.11	10.27	2,353.90	9.25	59.00	94.33	56.90	14.76	380.20	31.34	375.80			5	31.99	645.16
Go Gcc 4.9.1	17.64	473.10	13.60	10.00	85.67	10.70	39.56	185.50	3.90	84.50	32.94	365.70	6	32.22	188.25
Python Pypy 4.0.0 with GCC 4.8.4	4.81	1,553.00	13.94	55.40	126.46	64.50	7.32	582.30	7.68	122.60	45.51	625.90	6	34.29	500.62
C# Mono 4.0.1	25.74	3,757.90	18.08	15.40	118.72	13.60	9.01	71.70	15.17	83.60	40.54	270.00	6	37.88	702.03
Javascript Jx 0.10.40	2.73	706.80	17.14	11.00	192.23	12.40	6.97	710.60	5.92	83.80			5	45.00	304.92
Scala 2.11.6	360.95	2,789.00	5.90	116.30	64.37	126.40	10.69	292.50	3.62	136.20	32.18	363.00	6	79.62	637.23
Ruby Jruby 1.7.20	21.98	2,761.10	87.05	124.10			12.65	514.90	416.12	582.40			4	134.45	995.63
Ruby 2.1.2p95	8.23	1,085.50	226.86	8.00			2.73	125.30	338.40	82.80			4	144.06	325.40
Ruby Jruby9K 9.0.0.0.pre2	16.53	2,050.50	160.15	297.20			12.16	530.60	467.59	608.30			4	164.11	871.65
Python3 3.4.3	5.92	1,037.80	480.78	5.50			8.16	47.50					3	164.95	363.60
Python 2.7.6	5.07	1,352.90	452.44	4.90			7.62	52.60	3.08	65.30	396.54	724.00	5	172.95	439.94
Perl 5.18.2	2.68	888.40					3.63	47.90	666.46	604.10			3	224.26	513.47
Ruby Rbx 2.2.10	67.13	4,681.00	472.08	45.00			4.29	30.70	591.70	325.00			4	283.80	1,270.43
Tcl 8.6			262.20	2.70			7.20	66.00	1,066.66	279.90			3	445.35	116.20

For now i'm still betting on Crystal (LLVM-based) and Golang. For next I think I'll try D, maybe with DLangIDE or CoEdit..