Skip to main content

Command Palette

Search for a command to run...

🔡 How Unicode Characters Become Bytes in Go: From Code Points to UTF-8

Published
3 min read
🔡 How Unicode Characters Become Bytes in Go: From Code Points to UTF-8
D

A Full Stack Developer with a knack for creating engaging web experiences. Currently tinkering with GO.

Ever wondered how languages like Telugu, emojis, or even simple letters like 'A' are stored in Go?
It all comes down to Unicode and UTF-8 — and Go makes working with them surprisingly clean.

Let’s peel back the layers of abstraction and see what really happens under the hood when you write a character in Go.

Unicode vs UTF-8 vs Go Types

ConceptWhat It Is
UnicodeA universal set of character codes (code points) — one number per symbol
UTF-8A way to store those code points using 1–4 bytes
GoUses rune to store a Unicode code point, and []byte to store its UTF-8 bytes

Let's Take a Character:

This is a Telugu letter, pronounced “ta”.

Step 1: Unicode

Every character has a unique code point in the Unicode spec.

goCopyEditr := 'త'

fmt.Printf("Unicode: U+%04X\n", r) // Output: U+0C24

So 'త' has Unicode code point U+0C24 = 3108 in decimal.

Step 2: Convert to UTF-8

The Unicode code point is abstract — we need a way to store it in memory. That’s where UTF-8 comes in.

UTF-8 stores U+0C24 as:

  • Bytes: [224, 176, 164]

  • Hex: [0xE0, 0xB0, 0xA4]

  • Binary:

      11100000 10110000 10100100
    

UTF-8 uses variable-length encoding. Since is in the U+0800U+FFFF range, it uses 3 bytes.

Step 3: See It in Go

Here's a complete Go function to visualise this transformation:

package main

import (
    "fmt"
)

func printUTF8Encoding(s string) {
    fmt.Printf("Input: %q\n\n", s)
    for i, r := range s {
        utf8Bytes := []byte(string(r))
        binaryUnicode := fmt.Sprintf("%016b", r)

        fmt.Printf("Character #%d: %q\n", i+1, r)
        fmt.Printf("→ Unicode:  U+%04X (decimal: %d)\n", r, r)
        fmt.Printf("→ Binary (Unicode): %s\n", insertEvery4Bits(binaryUnicode))
        fmt.Printf("→ UTF-8 Bytes: %v\n", utf8Bytes)

        fmt.Print("→ Hex:       ")
        for _, b := range utf8Bytes {
            fmt.Printf("0x%X ", b)
        }

        fmt.Print("\n→ Binary:    ")
        for _, b := range utf8Bytes {
            fmt.Printf("%08b ", b)
        }

        fmt.Println("\n---")
    }
}

func insertEvery4Bits(s string) string {
    out := ""
    for i, c := range s {
        if i > 0 && (len(s)-i)%4 == 0 {
            out += " "
        }
        out += string(c)
    }
    return out
}

func main() {
    printUTF8Encoding("త")
}

🧪 Output:

Input: "త"

Character #1: 'త'
→ Unicode:  U+0C24 (decimal: 3108)
→ Binary (Unicode): 0000 1100 0010 0100
→ UTF-8 Bytes: [224 176 164]
→ Hex:       0xE0 0xB0 0xA4
→ Binary:    11100000 10110000 10100100

🔍 Summary

LayerValue
UnicodeU+0C24 (decimal: 3108)
UTF-8 Bytes[224, 176, 164]
Go runeint32 → 3108
Go []byteUTF-8 bytes

✨ Try More!

Pass any string into the function: "😊", "hi", "తెలుగు", "你好" — and you'll see how Go handles it.