Source code in Go is defined to be UTF-8 text; no other representation is allowed. This means that when you write a program in Go, the text editor you use must save the source code file using UTF-8 encoding.

Strings

Strings are read-only slices of bytes with extra syntactic support in Go. A string header consists of a pointer to the original array and the length of the substring. (It has no capacity of the substring, unless it convert to byte slice.) The array underlying a string is hidden from view.

original := "Hello, Go!"
originalHeader := (*reflect.StringHeader)(unsafe.Pointer(&original))

newStr := original[:5]
newStrHeader := (*reflect.StringHeader)(unsafe.Pointer(&newStr))

fmt.Printf("Array address: %x\\n", originalHeader.Data) // 49c5b6
fmt.Printf("Array address: %x\\n", newStrHeader.Data) // 49c5b6

The behaviour of slicing sting, is the same as slicing slice: create a new slice (byte slice) that points to the same underlying array. The approach is efficient:

A historical note: The earliest implementation of strings always allocated, but when slices were added to the language, they provided a model for efficient string handling.

Not only UTF-8 strings

The string literal is always UTF-8 text in Go, but string values (the actual data stored in a string variable) can contain arbitrary bytes. This means you can manipulate and store any sequence of bytes in a Go string, not just UTF-8 encoded text.

Arbitrary bytes refer to any sequence of bytes that can represent data.

// UTF-8 encoded string literal
utf8Str := "Hello, 世界"  // "Hello, World" in Chinese
fmt.Println("UTF-8 String Literal:", utf8Str) // Hello, 世界

// Invalid UTF-8 text
// byteArray := []byte{0x48, 0x65, 0x6C, 0x6C, 0x6F, 0x00, 0xFF}
// arbitraryStr := string(byteArray)
arbitraryStr := "\\x48\\x65\\x6c\\x6c\\x6f\\x00\\xff"
fmt.Println("Arbitrary Byte String:", arbitraryStr) // Hello �

// Displaying bytes of the arbitrary string
fmt.Printf("Bytes of Arbitrary String: %v\\n", []byte(arbitraryStr)) // [72 101 108 108 111 0 255]

The invalid byte 0xFF is displayed as , indicating it is not valid UTF-8. The byte representation confirms the presence of these arbitrary bytes.

Length is Number of Bytes

When used with a string, the len() function returns the number of bytes in the string, not the number of characters.

str := "Aé€𐍈ñ́"
fmt.Println(len(str)) // 1 + 2 + 3 + 4 + 4 = 14