What in the Hell is a Grapheme Cluster?

In my previous post I talked about Swift’s handling of strings and the problems making the characters of a string randomly accessible because of Swift Strings being Unicode compliant. Another part of the issue, and the reason that we can think of Characters in Swift as just Strings in and of themselves, is the concept of Grapheme Clusters.

One way to think of a Grapheme Cluster is to think of a situation where, visually, two characters, that when side-by-side, actually become ONE character. A sort of melding or the two individual characters. For an example think of the letter ‘o’ and the “grave” character ‘`’. Two separate characters but, in Unicode, when put side-by-side become this character, ‘ò’. The latin letter ‘o’ with grave. These are called combining characters and Swift supports them as well.

So when I tell Swift that I want the fifth character in a string, not only does it have to take into account the fact that a character might take more than one byte, it also has to take into account that the character might be part of a Grapheme Cluster and look at the other “characters” around it. This is why the Swift Character type is not just a single character but a “mini-String” of characters that make up a single visual character on the screen.

And, yes, you can tell Swift to treat them separately but that is a story for another post on String “views”.

Leave a Reply