Strings in Go

String Types

In Go, a string value is a sequence of bytes. A string value could be empty and the number of bytes is called the length of the string and is never negative. Strings are immutable: once created, it is impossible to change the contents of a string.

The predeclared string type is string; it is a defined type.

Normally, the length of a string s can be discovered using the built-in function len. This function returns the number of bytes in a string, not the number of characters. This distinction is important because strings in Go support unicode encoding. So a character in a UTF-8 string can be more than one byte. For instance, this is a valid string in Go. Some of the characters here are not even valid UTF-8.

const sample = "\xbd\xb2\x3d\xbc\x20\xe2\x8c\x98"

Printing this string to stdout will produce the following output.

��=� ⌘

Length of a string

Like we discussed in the previous section, the built-in function len returns the number of bytes in a string. So, for instance, if our sample string is Well done 👍🏼, then len will return 18.

s := "Well done 👍🏼"
fmt.Printf("len(s) = %d\n", len(s))

The output is:

len(s) = 18

Now, instead of the number of bytes, if we wanted to count the number of characters in the string, then we would need to use a different function. May be the standard RuneCountInString function from utf8 might come in handy. Let’s try that here:

s := "Well done 👍🏼"
fmt.Printf("# of characters in s = %d\n", utf8.RuneCountInString(s))

The output is:

# of characters in s = 12

This function here returned 12 as the output, but we have only 11 characters in the string s. This is because the function RuneCountInString returns the number of runes in the string. Each unicode code point in a string is represented as a rune literal. Let’s see what the rune literals are here:

s := "Well done 👍🏼"
for _, i := range []rune(s) {
    fmt.Printf("%v ", i)
}

The output is:

87 101 108 108 32 100 111 110 101 32 128077 127996

As expected, we have 12 rune literals in the string s. Note that each english alphabet is a single rune literal, which is equivalent to it’s respective ASCII value, but the emoji character requires two rune literals 128077 and 127996. This is because the emoji used here requires a separate rune or byte to represent the skin tone. So then, how do we actually figure out the number of characters?

There is an open source library github.com/rivo/uniseg that provides a function to do just this. It provides a function GraphemeClusterCount that returns the number of characters that is present in this string. Check out the source code on Github for details about the author and the implementation.

Now let’s try this library and see if we can get the actual number of characters in the given string.

import "github.com/rivo/uniseg"

s := "Well done 👍🏼"
fmt.Printf("uniseg.GraphemeClusterCount(s) = %d\n", uniseg.GraphemeClusterCount(s3))

The output is:

uniseg.GraphemeClusterCount(s3) = 11

Finally, we have a function that provides the right value for the number of the characters in the given string. This works in almost all cases that I have tested. Please do run your own tests before using this.

String Conversion

Before we end this article, let’s look at a common mistake that beginners to Go might encounter when working with strings. Let’s look at a snippet here.

s1 := string(65)
fmt.Printf("length of s1 = %d\n", len(s1))

s2 := strconv.FormatInt(65, 10)
fmt.Printf("length of s2 = %d\n", len(s2))

Here the output would look like this:

length of s1 = 1
length of s2 = 2

This is because the string(65) returns the character that is represented by integer 65 in unicode. In this case, that value is character A. So the length of s1 is 1. To actually convert the integer to string, use the function FormatInt from strconv package. Here strconv.FormatInt(65, 10) converts base 10 representation of 65 to a string. Hence the variable s2 is set to "65" and the length of s2 is 2.

References:

In case you would like to get notified about more articles like this, please subscribe to my substack.



Comments

comments powered by Disqus