🔡 How Unicode Characters Become Bytes in Go: From Code Points to UTF-8

Ever wondered how languages like Telugu, emojis, or even simple letters like 'A' are stored in Go?
It all comes down to Unicode and UTF-8 — and Go makes working with them surprisingly clean.
Let’s peel back the layers of abstraction and see what really happens under the hood when you write a character in Go.
Unicode vs UTF-8 vs Go Types
| Concept | What It Is |
| Unicode | A universal set of character codes (code points) — one number per symbol |
| UTF-8 | A way to store those code points using 1–4 bytes |
| Go | Uses rune to store a Unicode code point, and []byte to store its UTF-8 bytes |
Let's Take a Character: త
This is a Telugu letter, pronounced “ta”.
Step 1: Unicode
Every character has a unique code point in the Unicode spec.
goCopyEditr := 'త'
fmt.Printf("Unicode: U+%04X\n", r) // Output: U+0C24
So 'త' has Unicode code point U+0C24 = 3108 in decimal.
Step 2: Convert to UTF-8
The Unicode code point is abstract — we need a way to store it in memory. That’s where UTF-8 comes in.
UTF-8 stores U+0C24 as:
Bytes:
[224, 176, 164]Hex:
[0xE0, 0xB0, 0xA4]Binary:
11100000 10110000 10100100
UTF-8 uses variable-length encoding. Since త is in the U+0800–U+FFFF range, it uses 3 bytes.
Step 3: See It in Go
Here's a complete Go function to visualise this transformation:
package main
import (
"fmt"
)
func printUTF8Encoding(s string) {
fmt.Printf("Input: %q\n\n", s)
for i, r := range s {
utf8Bytes := []byte(string(r))
binaryUnicode := fmt.Sprintf("%016b", r)
fmt.Printf("Character #%d: %q\n", i+1, r)
fmt.Printf("→ Unicode: U+%04X (decimal: %d)\n", r, r)
fmt.Printf("→ Binary (Unicode): %s\n", insertEvery4Bits(binaryUnicode))
fmt.Printf("→ UTF-8 Bytes: %v\n", utf8Bytes)
fmt.Print("→ Hex: ")
for _, b := range utf8Bytes {
fmt.Printf("0x%X ", b)
}
fmt.Print("\n→ Binary: ")
for _, b := range utf8Bytes {
fmt.Printf("%08b ", b)
}
fmt.Println("\n---")
}
}
func insertEvery4Bits(s string) string {
out := ""
for i, c := range s {
if i > 0 && (len(s)-i)%4 == 0 {
out += " "
}
out += string(c)
}
return out
}
func main() {
printUTF8Encoding("త")
}
🧪 Output:
Input: "త"
Character #1: 'త'
→ Unicode: U+0C24 (decimal: 3108)
→ Binary (Unicode): 0000 1100 0010 0100
→ UTF-8 Bytes: [224 176 164]
→ Hex: 0xE0 0xB0 0xA4
→ Binary: 11100000 10110000 10100100
🔍 Summary
| Layer | Value |
| Unicode | U+0C24 (decimal: 3108) |
| UTF-8 Bytes | [224, 176, 164] |
Go rune | int32 → 3108 |
Go []byte | UTF-8 bytes |
✨ Try More!
Pass any string into the function: "😊", "hi", "తెలుగు", "你好" — and you'll see how Go handles it.





