Showing posts with label tips. Show all posts
Showing posts with label tips. Show all posts

2016-02-10

Simpler Way to find rows with Largest/Smallest value for each group in SQL

Sometimes we need to find largest or smallest value for each column and also associated column. Recently I found out that DISTINCT ON is quite good, than my previous way (LEFT JOIN less IS NULL, or WHERE IN SELECT MAX/MIN), see this example:

CREATE TABLE test1 ( id bigserial primary key, val int, name text );

INSERT INTO test1(val,name) VALUES(6,'a'),(9,'a'),(5,'b'),(11,'b'),(2,'c'),(8,'c');

SELECT * FROM test1;

 id | val | name 
----+-----+------
  1 |   6 | a
  2 |   9 | a
  3 |   5 | b
  4 |  11 | b
  5 |   2 | c
  6 |   8 | c

To find which id for each name that has greatest value, we usually do something like this (WHERE IN SELECT MAX):

SELECT *
FROM test1 x1
WHERE (name, val) IN (
  SELECT name, MAX(val) 
  FROM test1
  GROUP BY 1
)
ORDER BY x1.name;


 id | val | name
----+-----+------
  2 |   9 | a
  4 |  11 | b
  6 |   8 | c

 Sort  (cost=73.95..74.64 rows=275 width=44)
   Sort Key: test1.name
   ->  Hash Join  (cost=33.50..62.81 rows=275 width=44)
         Hash Cond: ((test1.name = test1_1.name) AND (test1.val = (max(test1_1.val))))
         ->  Seq Scan on test1  (cost=0.00..21.00 rows=1100 width=44)
         ->  Hash  (cost=30.50..30.50 rows=200 width=36)
               ->  HashAggregate  (cost=26.50..28.50 rows=200 width=36)
                     Group Key: test1_1.name
                     ->  Seq Scan on test1 test1_1  (cost=0.00..21.00 rows=1100 width=36)

Or this (LEFT JOIN less IS NULL):

SELECT x1.*
FROM test1 x1
  LEFT JOIN test1 x2
    ON x1.name = x2.name
    AND x1.val < x2.val
WHERE x2.id IS NULL
GROUP BY x1.id, x1.name
ORDER BY x1.name; 


 id | val | name 
----+-----+------
  2 |   9 | a
  4 |  11 | b
  6 |   8 | c

 Group  (cost=264.68..264.75 rows=10 width=44)
   Group Key: x1.name, x1.id
   ->  Sort  (cost=264.68..264.70 rows=10 width=44)
         Sort Key: x1.name, x1.id
         ->  Merge Left Join  (cost=153.14..264.51 rows=10 width=44)
               Merge Cond: (x1.name = x2.name)
               Join Filter: (x1.val < x2.val)
               Filter: (x2.id IS NULL)
               ->  Sort  (cost=76.57..79.32 rows=1100 width=44)
                     Sort Key: x1.name
                     ->  Seq Scan on test1 x1  (cost=0.00..21.00 rows=1100 width=44)
               ->  Sort  (cost=76.57..79.32 rows=1100 width=44)
                     Sort Key: x2.name
                     ->  Seq Scan on test1 x2  (cost=0.00..21.00 rows=1100 width=44)


There some other way, that I found simpler (and faster in real life) also return only one row if there's duplicate value, that is DISTINCT ON!

SELECT DISTINCT ON (x1.name) *
FROM test1 x1
ORDER BY x1.name, x1.val DESC;

 id | val | name 
----+-----+------
  2 |   9 | a
  4 |  11 | b
  6 |   8 | c

 Unique  (cost=76.57..82.07 rows=200 width=44)
   ->  Sort  (cost=76.57..79.32 rows=1100 width=44)
         Sort Key: name, val
         ->  Seq Scan on test1 x1  (cost=0.00..21.00 rows=1100 width=44)

So in this case, we look for highest value, distinct by name. Well, that's all for now.

2015-09-08

Query for Student Transcript (and Cumulative GPA)

There are some rules to generate transcript, some universities will use bet the highest grade, but some other will use get the latest grade. For example if we have this schema

CREATE TABLE schedules (
  id BIGSERIAL PRIMARY KEY,
  subject_id BIGINT FOREIGN KEY REFERENCES subjects(id),
  semester VARCHAR(5) -- YYYYS -- year and semester
);
CREATE TABLE enrollments (
  id BIGSERIAL PRIMARY KEY,
  student_id BIGINT FOREIGN KEY REFERENCES students(id),
  schedule_id BIGINT FOREIGN KEY REFERENCES schedules(id),
  grade VARCHAR(2) -- letter
);

If the university rule for transcript is get the highest score, the query is quite simple, first you'll need to make sure that grade is ordered in certain way on MAX agregate (for example: A+ > A > A- > B+ > ...) , then you'll need to do this query:

SELECT s.subject_id
  , MAX(e.grade)
FROM enrollments e
  JOIN schedules s
    ON e.schedule_id = s.id
WHERE e.student_id = ?
GROUP BY 1

If the university rules is get the latest score, first of all, you'll need to make sure that the semester is ordered (for example: 2015A < 2015B < 2015C < 2015A < ...), then you'll need to find the latest semester of each subject, for example:

SELECT s.subject_id
  , MAX(s.semester)
FROM enrollments e
  JOIN schedules s
    ON e.schedule_id = s.id
WHERE e.student_id = ?
GROUP BY 1

Then you'll need to fetch the credits that is the latest semester, for example:

SELECT s.subject_id
  , e.grade
FROM enrollments e
  JOIN schedules s
    ON e.schedule_id = s.id
WHERE e.student_id = ?
  AND (s.subject_id, s.semester) IN
    ( SELECT s.subject_id
        , MAX(s.semester)
      FROM enrollments e
        JOIN schedules s
          ON e.schedule_id = s.id
      WHERE e.student_id = ?
      GROUP BY 1
    )
GROUP BY 1

That way, it you'll only get the latest grade (as defined on the sub-query).

There are another alternative for this, you can use aggregate with OVER and PARTITION BY syntax, then compare the result with the aggregate to remove the previous scores, for example:

SELECT r.subject_id
  , r.grade
FROM (
  SELECT s.subject_id
    , e.grade
    , s.semester
    , MAX(s.semester) OVER (PARTITION BY s.subject_id) max_sem
  FROM enrollments e
    JOIN schedules s
      ON e.schedule_id = s.id
  WHERE e.student_id = ?) r
WHERE r.semester_id = r.max_sem

That's it, that's the way to get the transcript. Those queries can be modified to fetch the Cumulative GPA. just add one more join with subjects table to get the units/credits, then just calculate the SUM(subjects.unit * enrollments.num_grade) / SUM(subjects.unit)
You can also calculate the Cumulative GPA per semester by adding the semester for the partition/group by part. If you wonder, why do I write this post? because I have googled and not found any posts that write about this.

2015-08-14

Moving PostgreSQL Database to RAM

If you are software developer, sometimes you need to test your program faster. Normally the bottleneck of your program are in the database (writing to disk). You can increase the performance of your development/testing, by moving the database to the RAM if you have enough free RAM (>3GB). In this article, I will guide you to move your PostgreSQL database to RAM. First of all, you'll need to stop and disable the PostgreSQL, so the port won't conflict:

sudo systemctl stop postgresql
sudo systemctl disable postgresql

As we already know, /tmp folder on Linux mostly use tmpfs, that is a RAM file system. So if we create the database on the /tmp directory, it's on the RAM. What you'll need to do is create a script containing something like this:

#!/usr/bin/env bash
sudo killall postgres
# init directories
src=/tmp/data
sudo rm -rf $src
mkdir -p $src
sudo chown postgres:postgres $src
sudo su - postgres <<EOF
initdb --locale en_CA.UTF-8 -E UTF8 -D '/tmp/data'
sed -i -- 's/max_connections = 100/max_connections = 1024/' /tmp/data/postgresql.conf
sed -i -- 's/#logging_collector = off/logging_collector = on/' /tmp/data/postgresql.conf
sed -i -- "s/#log_directory = 'pg_log'/log_directory = '\/tmp'/" /tmp/data/postgresql.conf
sed -i -- "s/#log_file_mode = 0600/log_file_mode = 0644/" /tmp/data/postgresql.conf
sed -i -- "s/#log_min_duration_statement = -1/log_min_duration_statement = 0/" /tmp/data/postgresql.conf
sed -i -- "s/#log_error_verbosity = default/log_error_verbosity = verbose/" /tmp/data/postgresql.conf
sed -i -- "s/#log_statement = 'none'/log_statement = 'all'/" /tmp/data/postgresql.conf
# sed -i -- "s///" /tmp/data/postgresql.conf
postgres -D /tmp/data &
echo sleep 2 seconds..
sleep 2
createuser xxx
createdb xxx
psql -c 'GRANT ALL PRIVILEGES ON DATABASE xxx TO xxx;'
echo you can restore database now..
EOF

This script will erase all your database and create a new empty database on the RAM, that you can restore into later. This script will also create a log file that shows all queries that could help on the softwade development process. Lastly, to measure your PostgreSQL's data directory size in MB you can type this command:

$ sudo du -s -B 1M /tmp/data/
336     /tmp/data/

That's it, as a measurement, to restore a 137 MB database (about 300k records) in my PC normally it took about 17 seconds, when the database moved to RAM, it only took 5 seconds, so yes, it's a huge improvement.

2015-02-14

How to Learn Programming and Train your Logic

So today one of my student ask me, how to train your logic in programming? There are two steps that you must do, here's my tips in English, the Indonesian slang version below. For those who doesn't understand English, it's greatly recommended to learn about it (at least you must able to read and understand English) before studying programming, since most all programming language tools, errors, and documentation written in English.

Tips on learning programming and train your logic

Actually, learning programming language is quite similar to learning ordinary language (Indonesian or English).

1. memorize the syntax

  • first thing you must do is.. memorize the syntax, yes, because it would took a long time to describe logic with shapes and pictures ^^ (especially when the logic is too long and complex), so learn how to describe a logic, and the most concrete way to do it is using a programming language!
    • understand and memorize how to create a variable, data types (character, integer, real numbers, boolean, string), how program works (from beginning to end, or line by line), how to create a main function and how to compile
    • this is analogous to understand a alphabet and how words are made, kind of words (nouns, adjective, verbs, adverb, etc)
    • we could not make a correct word without understanding alphabet
    • we could not make a correct statement without understanding kinds of words (is this word a subject/name? is this word a predicate/verb? is this an object/complement? is this word a modifier?)
    • people (or the compiler) would complain if we write many statements without punctuations (dot or question mark or exclamation sign, etc)
    • by doing a compilation process, it's like to tell the word processor to do a spell-check on our script, is there any miss-spelling on my script?
  • understand and memorize how to use ask for input and give an output, using built-in functions and math operators (addition, subtraction, division, multiplication and modulo), grammar, and also feedback (error and how to run a program)
    • this step analogous to how to create a statement, how to ask a question and how to give a correct answer
    • after we know how to ask and answer, we now could interact with computer, we also required to store their answer for later decision (just like approaching someone you like, if we remember that she like noodle (stored on a variable), maybe sometime we could ask her out on some noodle restaurant)
    • understand that there would be an input (how we tell something to the computer), process (how computer transform data into information), and output (how computer tell us something)
    • including a customer feedback, in this case, we are the store manager (or producer), and compiler is our prospective buyer/customer, when we knew that customer want, we already reduce 1/3rd of the problem (compile error - correct purchase request), 2 other problem would be runtime error (correct purchase order but our staff execute it poorly or give wrong item), and logic error (understand what customer need, but give a bad service)
    • running a program is just like accepting reader's feedback, is what we write could be understood well by computer (give an expected result)
  • learn and understand how boolean algebra works (and, or), branch or decision (if-then), looping or iteration (while, until) 
    • this is analogous to understand how to make correct life decision (if I need money, I should work; if I want to be an expert in certain field, I must practice and use it in my life, I must keep trying until I succeed or until I gave up; etc)
  • just doing first three steps is enough to write a program, but if you want your life to be easier, learn how to make a phrase/synonym/acronym (function/procedure + array/list/associative array/map + struct/record)
    • this is analogous to understanding certain word or phrase to shorten our statements (rather than say "4 legged animal that say meow", it would be easier to call it a "cat")
  • these first steps can be done by reading a good programming books
  • the criteria of success for step #1 is when you could read many program's source code and understand it's meaning, line by line

2. try to write, practice, practice and practice..

  • read a question, understand what's the input, what's the output, and what how to process those input into a correct output, for example, if you're given a raw ingredients, how to process it into a well prepared hot soup that ready to be eaten?
    • if you could not found the answer by yourself, you must read other people's source code (for example in codeforces), try to read and understand, imitate (try to cook it yourself), don't ask for other people to cook it for you (directly copying and pasting) then you claim it as your own soup
  • try to solve questions on the online judge (for example in URI, SPOJ, UVA, etc)
    • more you practice cooking, or workout/exercise, or anything, we would become more experienced in that field.. (practice doesn't lie - the practice result/planting/investing time/money correctly will produce a result)
    • start from easiest problem (start crawling, walking before you learn to run)
  • and lastly, don't start step 2 before you do step 1 (start by writing an alphabet before you write a poem)

Tips sukses belajar bahasa pemrograman dan mengasah logika

Benernya belajar bahasa pemrograman itu sama saja seperti mempelajari bahasa biasa (Indonesia or Inggris).

1. hafalkan syntax-nya.

  • pertama kali yang harus dilakukan adalah, hafalkan syntax, yup, karena mendeskripsikan suatu logika dengan gambar itu merepotkan ^^ (apalagi kalo logika-nya panjang dan kompleks); jadi dipelajari dahulu cara mendeskripsikan suatu logika, dan cara yang paling kokrit adalah dengan bahasa pemrograman!
    • pahami dan hafalkan cara membuat variabel, jenis2 tipe data (karakter, bilangan bulat, bilangan riil, boolean, string), cara program berjalan (dari atas ke bawah atau baris demi baris), cara membuat fungsi main dan cara compile..
    • ini analoginya seperti memahami huruf dan bagaimana sebuah kata dibentuk, jenis-jenis kata (kata benda, kata sifat, kata kerja, kata keterangan, dst)
    • kita tidak bisa membuat kata dengan benar kalau belum hafal semua huruf
    • kita tidak bisa membuat kalimat dengan benar, kalau tidak tahu kata ini jenisnya apa (subject/nama orang kah? predikat/kata kerja kah? obyek kah? kata keterangankah?)
    • orang (atau compiler) akan ngomel2 ketika kita menulis banyak kalimat tanpa diberi titik (atau titik koma atau pemisah lainnya)
    • dengan melakukan proses compile, itu seperti meminta program word processor untuk memeriksa spelling, ada kata/huruf yang salah atau tidak di naskah ku?
  • pahami dan hafalkan cara meminta input dan mencetak output, cara menggunakan fungsi2 bawaan dan operator2 matematika (penjumlahan, pengurangan, pembagian, perkalian, modulo, akar), tata bahasa, dan feedback (cara membaca error dan cara run program)
    • ini analoginya seperti memahami cara membuat kalimat (SPOK), cara membuat kalimat tanya, dan cara menjawab yang benar
    • setelah tahu cara bertanya dan cara menjawab, kita sudah bisa berinteraksi (PDKT dengan komputer), kita tinggal simpan jawaban si komputer untuk masa depan, misal dia suka makan ayam, ya suatu saat kita masak'kan soto ayam atau ayam goreng, dll
    • pahami bahwa minimal ada input (cara kita memberi tahu komputer), proses (cara data diubah menjadi informasi oleh komputer), dan output (cara komputer memberi tahu kita)
    • termasuk dengan customer feedback, dalam hal ini kita yang jadi toko/produser/distributor, compiler yang jadi customer; ketika tahu customer maunya apa, ya kita sudah mengurangi 1/3 masalah.. (compile error - pesanan benar), 2 masalah lagi adalah: runtime error (pesanan benar tapi staff kalian salah ngasih barang) dan logic error.. (paham kebutuhan customer tapi salah dalam memberikan pelayanan)
    • run program itu seperti menerima feedback dari pembaca, apakah yang ingin kumaksud sama dengan yang dipahami oleh komputer (hasilnya sesuai keinginan)
  • pahami dan hafalkan cara kerja aljabar boolean (dan, atau), percabangan (bila/jika, maka), perulangan (sampai, ketika masih)
    • ini analoginya seperti memahami cara membuat keputusan hidup yang benar, (jika butuh uang, maka saya harus bekerja; jika saya ingin mahir di suatu bidang, maka saya harus latihan dan sering menggunakannya dalam kehidupan; saya akan mencoba terus sampai saya berhasil atau saya putus asa; dst)
  • benernya dari 3 hal di atas sudah cukup untuk membuat program, tapi kalau mau hidup lebih mudah, pelajari cara membuat istilah/sinonim/frasa (fungsi/prosedur + array/list/associative array/map + struct/record)
    • ini analoginya seperti memahami suatu istilah, dengan menggunakan istilah, kita dapat mempersingkat kalimat (daripada menyebut "binatang berkaki empat yang mengeong", kan lebih muda menyebutnya "kucing")
  • step pertama ini bisa dilakukan dengan membaca buku programming yang tepat.
  • kesuksesan step pertama bisa diukur ketika sudah sanggup membaca banyak source code program dan memahami maksudnya baris demi baris.

2. coba menulis, latihan, latihan, dan latihan..

  • baca soal, dipahami apa inputnya, apa outputnya, dan kira2 prosesnya seperti apa (diapakan), misal dikasih 1 bungkus mie instan, maka harus diapakan supaya jadi mie goreng yang siap dimakan?
    • kalau belum bisa menemukan sendiri, ya terpaksa baca source code orang (misal di codeforces), pahami, tiru (dicoba masak sendiri), jangan minta orang lain yg membuatkan (copy paste langsung) lalu di claim sebagai mie goreng buatan sendiri..
  • coba soal2 di online judge (misal TOKI Learning Center, URI, SPOJ, dst)
    • makin sering kita latihan masak, atau olahraga, atau apapun, maka kita akan makin mahir.. (practice doesn't lie - hasil latihan/menanan/menginvestasikan materi/waktu dengan benar pasti berbuah)
    • mulailah dari problem/soal yang paling sederhana (belajarlah merangkak terlebih dahulu sebelum belajar berlari).
  • jangan lupa, jangan mulai step 2 sebelum menjalankan step 1 (belajarlah menulis huruf terlebih dahulu, sebelum belajar membuat puisi).