Using macrons for te reo Māori with GIS

Toitū Te Whenua LINZ
On Location

--

Learn about the use and importance of macrons for te reo Māori with GIS, including UTF-8 character encoding, fonts and databases.

What is a macron?

In te reo Māori a macron is the horizontal line above any of the five vowels, a e i o u or A E I O U. A macron indicates a long vowel sound, which influences a word’s meaning and pronunciation.

Macrons have been used in conjunction with the Māori language since it was first documented. However, they were typically not used on maps because they could be misinterpreted as cartographic symbols. Te Taura Whiri i te Reo Māori (the Māori Language Commission) has published Guidelines for Māori Language Orthography , which sets out the standard orthographic conventions for writing the Māori language. The New Zealand Geographic Board Ngā Pou Taunaha o Aotearoa began using macrons on official place names in the 1990s.

Historic Survey: A sketch of the Ngāmoe Block published in May 1885 includes Māori name labels with macrons absent.

Using official Māori place names helps to preserve New Zealand’s unique heritage and culture and ensures that place names are standardised, consistent and accurate.

The New Zealand Geographic Board (Ngā Pou Taunaha o Aotearoa) Act 2008 includes a compliance section, which requires official place names to be used in all official documents. This means that if a macron is in an official place name, it must be shown. Crown agencies are bound by this clause.

Macrons and geospatial software

As all major modern GIS platforms support macrons, you are unlikely to run into problems unless you are using older iterations of software.

Character encoding: UTF-8 and ASCII

Character encoding is used to represent a dictionary of characters through utilising an encoding system which assigns an identifier to each character for digital representation. Character encoding essentially defines how individual characters are represented in a text document.

Unicode is one of the most popular types of universal character encoding. UTF-8 is a Unicode-based encoding which supports many different languages and characters, including Māori words with macrons.

ASCII is another common character encoding. ASCII was used widely in the 1960s and was designed to represent basic English characters. The ASCII character set only includes 128 characters. In contrast, UTF-8 supports approximately 1,000,000 characters.

As UTF-8 supports a significantly larger character set, it is recommended to use UTF-8 as your default character encoding.

An example of New Zealand lake place names with macrons included, using the UTF-8 encoding.

Typing Māori macrons

There are two ways in which a user can type a macron. You can either input an unicode key combination or install the Māori keyboard.

The unicode key combination consists of a keyboard shortcut which outputs a specified macron:

The other option involves changing your current keyboard language preference to the Māori keyboard.

Installing Māori keyboard

  1. Open Settings
  2. Click on Time & Language
  3. Click on Language
  4. Under the Preferred languages section, click Add a language
  5. Search and select Te reo Māori
  6. Click next, accept default settings and install

Changing Windows 10 language preferences

  1. Click the Input Indicator icon in the bottom-right corner of the taskbar.
  2. Select the Māori keyboard layout.

Once the Māori Keyboard is selected, you can type your preferred macron by pressing ` (the ~ key) and then the vowel:

Macrons and Excel

The default character encoding of an Excel document may not be set to UTF-8, causing macrons to not be included in the document or corrupted. However, it is easy to change the character encoding of an Excel spreadsheet.

Changing character encoding

  1. Click the Data tab and then select From Text/CSV.
  2. Browse to the Excel spreadsheet, select the file and then click Import.
  3. Click the File Origin drop-down menu and select UTF-8 and Load.

The file type in which an Excel spreadsheet is saved also determines whether macrons are retained in the document.

Saving in UTF-8

Once the character encoding is set to UTF-8, saving as an Excel file will automatically retain the encoding. When saving as a CSV file it will not automatically retain the encoding. You need to save as and select CSV UTF-8 (Comma delimited). If the CSV file is not saved as this file type, when it’s reopened you’ll need to repeat the instructions above to set the character encoding again.

Note: If macrons weren’t originally included in the dataset when it was created, then changing the spreadsheet’s character encoding or saving it as a UTF-8 file will not result in macrons becoming present in the dataset.

When to use macrons

There are a couple of respected resources which can be used to identify whether a word includes a macron.

The Māori Dictionary is an informative online encyclopedia which includes definitions on a plethora of Māori words.

The New Zealand Gazetteer is the authoritative source for place names in New Zealand.

Macrons and databases

Character encoding is defined when a database is first set up, and generally once the encoding is set, it can’t be changed; so it’s important to check the encoding before trying to load characters with macrons into it. If the encoding is set to UTF-8, macrons won’t cause a problem. However, if the encoding is set to ASCII it’s very likely you’ll have to create a new database with the encoding set to UTF-8.

Once you have a database that supports UTF-8, a common issue to consider is how to handle macrons in database searches. If the database query is defined so that the search requires macrons to be included where relevant, then if the user types a place name without a macron, the correct result will not appear. A solution for this is to have a duplicate field with the macrons replaced with their non-macron equivalents to allow plain text searching.

However, if someone does include a macron in their search words, this new query will not return the correct result. In order to solve this, any macrons present in a name need to be replaced with their plain text equivalent. This will allow the correct record to be found using the ASCII name field and the UTF-8 macronised text can still be returned.

Macron fonts

While data is stored in a database format, it is displayed using a font, therefore it is important when labelling a map to select a font which supports macron use. In order to check if a particular font includes macrons, perform a Google search or click here.

Macrons and shapefiles

Shapefiles support macron use and while they are often regarded as a dated format, they are widely supported and so are still commonly used. If a shapefile contains corrupted macrons, the first thing to check is the shapefile’s encoding.

Open the shapefile folder in File Explorer and see if there is a CPG file. If there is, open the CPG file in a text editor.

The CPG file will include the shapefile’s character encoding. Check to see if the encoding is set to “UTF-8”. If the shapefile is using another encoding system then it is unlikely to support macrons. If the shapefile is still corrupted after ensuring the character encoding is set to UTF-8, it will need to be recreated. Potentially the source data could have been corrupted requiring the macrons to be created again at source.

Note: File Geodatabases and Geopackages are much more robust than shapefiles when handling macrons as their default character encoding is UTF-8.

Want to know more?

Check out the video below from the LINZ GrowGISNZ GeoBites online learning programme:

Using macrons for te reo Māori with NZ GIS tools and data

--

--

Toitū Te Whenua LINZ
On Location

Toitū Te Whenua LINZ is the New Zealand Government’s lead agency for location and property information, Crown land and managing overseas investment.