🔡 How Unicode Characters Become Bytes in Go: From Code Points to UTF-8

Ever wondered how languages like Telugu, emojis, or even simple letters like 'A' are stored in Go?
It all comes down to Unicode and UTF-8 — and Go makes working with them surprisingly clean.

Let’s peel back the layers of abstraction and see what really happens under the hood when you write a character in Go.

Unicode vs UTF-8 vs Go Types

Concept	What It Is
Unicode	A universal set of character codes (code points) — one number per symbol
UTF-8	A way to store those code points using 1–4 bytes
Go	Uses `rune` to store a Unicode code point, and `[]byte` to store its UTF-8 bytes

Let's Take a Character: `త`

This is a Telugu letter, pronounced “ta”.

Step 1: Unicode

Every character has a unique code point in the Unicode spec.

goCopyEditr := 'త'

fmt.Printf("Unicode: U+%04X\n", r) // Output: U+0C24

So 'త' has Unicode code point U+0C24 = 3108 in decimal.

Step 2: Convert to UTF-8

The Unicode code point is abstract — we need a way to store it in memory. That’s where UTF-8 comes in.

UTF-8 stores U+0C24 as:

Bytes: [224, 176, 164]
Hex: [0xE0, 0xB0, 0xA4]
Binary:
```
  11100000 10110000 10100100
```

UTF-8 uses variable-length encoding. Since త is in the U+0800–U+FFFF range, it uses 3 bytes.

Step 3: See It in Go

Here's a complete Go function to visualise this transformation:

package main

import (
    "fmt"
)

func printUTF8Encoding(s string) {
    fmt.Printf("Input: %q\n\n", s)
    for i, r := range s {
        utf8Bytes := []byte(string(r))
        binaryUnicode := fmt.Sprintf("%016b", r)

        fmt.Printf("Character #%d: %q\n", i+1, r)
        fmt.Printf("→ Unicode:  U+%04X (decimal: %d)\n", r, r)
        fmt.Printf("→ Binary (Unicode): %s\n", insertEvery4Bits(binaryUnicode))
        fmt.Printf("→ UTF-8 Bytes: %v\n", utf8Bytes)

        fmt.Print("→ Hex:       ")
        for _, b := range utf8Bytes {
            fmt.Printf("0x%X ", b)
        }

        fmt.Print("\n→ Binary:    ")
        for _, b := range utf8Bytes {
            fmt.Printf("%08b ", b)
        }

        fmt.Println("\n---")
    }
}

func insertEvery4Bits(s string) string {
    out := ""
    for i, c := range s {
        if i > 0 && (len(s)-i)%4 == 0 {
            out += " "
        }
        out += string(c)
    }
    return out
}

func main() {
    printUTF8Encoding("త")
}

🧪 Output:

Input: "త"

Character #1: 'త'
→ Unicode:  U+0C24 (decimal: 3108)
→ Binary (Unicode): 0000 1100 0010 0100
→ UTF-8 Bytes: [224 176 164]
→ Hex:       0xE0 0xB0 0xA4
→ Binary:    11100000 10110000 10100100

🔍 Summary

Layer	Value
Unicode	`U+0C24` (decimal: 3108)
UTF-8 Bytes	`[224, 176, 164]`
Go `rune`	`int32` → 3108
Go `[]byte`	UTF-8 bytes

✨ Try More!

Pass any string into the function: "😊", "hi", "తెలుగు", "你好" — and you'll see how Go handles it.

🔡 How Unicode Characters Become Bytes in Go: From Code Points to UTF-8

Let's Take a Character: `త`

Step 1: Unicode

Step 2: Convert to UTF-8

Step 3: See It in Go

🧪 Output:

🔍 Summary

✨ Try More!

Comments

More from this blog

Curo x Rive

Vibe Match: Building AI-Powered Vibe-Based Search

Understanding Go Channels: Why for range data Can Trip You Up

Building Harbor Health: Behind the App Flow

Command Palette

Let's Take a Character: త

Step 1: Unicode

Step 2: Convert to UTF-8

Step 3: See It in Go

🧪 Output:

🔍 Summary

✨ Try More!

Comments

More from this blog

Let's Take a Character: `త`