Getting names in the right order with PostgreSQL
Western and Eastern name orders can be a PITA, particularly when titles and honorifics come into play.
The application I’m currently working on is a database of articles from a military magazine. Most of the authors are Hungarians, meaning they have a reversed name order. Many of them are also holding military ranks which have to be displayed. Others are Westerners, some with ranks and honorifics, some without. Some of them are fighter pilots, and they use their nicknames (callsigns) in writing. Not all of them though. At the end of the day it’s a completely random heap of names, titles, nicknames and honorifics, and I have to put this into order.
Here are some examples:
Tőrös István
is the editor-in-chief with a very basic Hungarian name: last name first, and first name last. A few other languages, for example Japanese or Korean also follow this reversed name order.
Varga Csaba Béla
is another Hungarian name, but with two given names (or as we call them, Christian names). We don’t call it a middle name like in English, but “second given name”.
Hadi "Dino" István
is a Hungarian fighter pilot who used his nickname when signing his articles. Therefore it should be displayed.
Dr. Kositzky Attila altbgy.
is another military officer and pilot, but he used his scientific title and military rank (Brigadier General) and no nickname.
(I wonder how long it will take for some of them to find this article and wonder how their names became examples in a coding tutorial.)
This diversity may already be enough to make your head spin, but let’s throw a few foreigners into the mix:
Anatoly Nikolaevich Kvochur
is Russian, so he of has a “middle name” called a patronymic. He’s also a military officer, but doesn’t use his rank when writing.
Gen. Chuck Yeager
does though. He has a Western name, first name and last name in “proper” order (for Westerners at least) and here we have the rank before the name, not after it like for Hungarians.
Let’s bring a really fancy person into the pool, although he never wrote any articles for this magazine:The Right Honourable Mr. Justin Pierre James Trudeau, Prime Minister of Canada.
This guy has not one, but three titles, and two middle names! Yikes! How would you handle that?!
If your application is English only, it’s a little bit easier. As a general rule, foreign names in English should be written using Western name order. For example my own name would be Tamas Polgar
and not Polgár Tamás
. In many other languages however, including Hungarian, the original name order is retained, so Donald Trump
will never become Trump Donald
. Titles and ranks will still follow Hungarian customs though: President Donald Trump
will be Donald Trump elnök
. Let’s just ignore the problem of translating titles and ranks for now.
The solution
To retrieve names in correct order you can code some lengthy processing function in your backend, but it’s easier to teach PostgreSQL to do it for you.
Here is the structure of my authors
table:
CREATE TABLE IF NOT EXISTS topgun.authors (
id serial NOT NULL PRIMARY KEY,
title character varying NOT NULL DEFAULT '',
first_name character varying NOT NULL,
middle_name character varying NOT NULL DEFAULT '',
last_name character varying NOT NULL,
rank character varying NOT NULL DEFAULT '',
nickname character varying NOT NULL DEFAULT '',
name_order numeric NOT NULL DEFAULT 0
);
Only first_name
and last_name
are mandatory. The name_order
field is a flag with the value of 0
for Eastern and 1
for Western name order. We are going to ignore multiple middle names because technically they’re one string, so a single field, middle_name
will serve just fine for them. The same goes for multiple titles and ranks. We have title
and rank
separated because the first will always stay at the beginning of the name (Mr., Dr., etc.) but the other one may vary.
Here’s how we’ll retrieve the names.
Western order:
Long: title rank first_name middle_name nickname last_name
Short: first_name middle_name last_name
Eastern order:
Long: title, last_name, nickname, middle_name, first_name, rank
Short: last_name, first_name, middle_name
Let’s create or query:
SELECT
*
FROM topgun.authors
ORDER BY last_name, first_name, id
One of the reasons to store first, middle and last names separately is to simplify ordering, but it’ll also come handy when implementing a search function.
Now let’s extend our query with a conditional.
SELECT
*, (CASE WHEN(name_order = 1)
THEN
CONCAT(
(first_name || ' '),
(CASE WHEN (middle_name <> '')
THEN (middle_name || ' ')
END),
last_name
) ELSE
CONCAT(
(last_name || ' '),
(first_name),
(CASE WHEN (middle_name <> '')
THEN (' ' || middle_name)
END)
)
END) AS nameFROM topgun.authors
ORDER BY last_name, first_name, id
The result will now contain a field called name
which contains the name of the person in proper order.
CONCAT
means string concatenation. In Western name order, which is the first branch of the CASE WHEN
condition, it concatenates first_name
, middle_name
and last_name.
There’s another CASE WHEN
in it because we don’t want to add the middle name if the person doesn’t have one. The same happens in the ELSE
branch, which means Eastern name order, but the fields are concatenated in a different order. In both cases the field values are also concatenated with a space character either before or after the value, as required.
The full name of the person with titles, nicknames, etc. works the same way, but it’s a little more complicated. I’m only showing the conditional here:
(CASE WHEN (name_order = 1)
THEN
CONCAT(
(CASE WHEN (title <> '')
THEN (title || ' ')
END),
(CASE WHEN (rank <> '')
THEN (rank || ' ')
END),
(first_name || ' '),
(CASE WHEN (middle_name <> '')
THEN (middle_name || ' ')
END),
(CASE WHEN (nickname <> '')
THEN ('"' || nickname || '" ')
END),
last_name
) ELSE
CONCAT(
(CASE WHEN (title <> '')
THEN (title || ' ')
END),
(last_name || ' '),
(CASE WHEN (nickname <> '')
THEN ('"' || nickname || '" ')
END),
(CASE WHEN (middle_name <> '')
THEN (middle_name || ' ')
END),
first_name,
(CASE WHEN (rank <> '')
THEN (' ' || rank)
END)
)
END) AS full_name
Aaand magic! The query now returns full names in proper name order! Nicknames will be displayed in quotes.
If you let your fields take a NULL
value, simply add IS NOT NULL
to your conditions like this:
(CASE WHEN (middle_name IS NOT NULL AND middle_name <> '')
THEN (middle_name || ' ')
END),
Of course this can be polished even further, but you probably got the idea.