Post

Emoji Smuggling: Hiding Data in Unicode Variation Selectors

How emoji smuggling works at the Unicode level — variation selectors, tokenizer gaps, and a Python implementation that hides arbitrary text inside a single smiley face.

Emoji Smuggling: Hiding Data in Unicode Variation Selectors

A few weeks ago I watched this Wes Roth video on emoji smuggling and went down the rabbit hole.

I started digging into how it actually works at the Unicode level — variation selectors, invisible codepoints, tokenizer behavior. The more I looked, the more elegant (and concerning) the technique became.

What Is Emoji Smuggling?

Emoji smuggling exploits the gap between how humans see text and how AI models tokenize it. The core idea: embed invisible Unicode codepoints inside an emoji so that a human sees 😀 while an LLM processes a hidden payload like "ignore previous instructions".

It’s steganography at the Unicode level.

Why It Works: The Tokenizer vs. Filter Gap

The vulnerability lives in the LLM pipeline architecture, specifically in how safety filters and tokenizers process Unicode differently.

Safety Filters

Traditional safety filters (regex matching, keyword blocklists) run on “visible” text. They’re optimized for ASCII and common Unicode. When they encounter obscure codepoints like variation selectors, they typically strip them or ignore them to avoid false positives.

Tokenizers

Modern LLM tokenizers (like OpenAI’s tiktoken) use BPE (Byte-Pair Encoding). When they encounter unknown Unicode, they fall back to UTF-8 byte-level tokenization. The tokenizer preserves variation selectors as distinct tokens, so the model receives the hidden bytes.

An important nuance: Paul Butler’s original research found that models don’t spontaneously decode hidden variation selectors on their own — they need explicit decoding instructions or a code interpreter. The hidden data reaches the model, but it doesn’t automatically interpret it as instructions. However, when combined with a visible prompt that includes decoding logic, the payload can be extracted and acted on.

The Gap

The filter sees a harmless smiley face. The tokenizer sees a smiley face followed by a long sequence of bytes. The payload slips past the filter and lands in the model’s context window — where it can be decoded if the model is given instructions to do so.

The Mechanism: Variation Selectors

Unicode designates 256 codepoints as variation selectors:

Range Selectors Codepoints
VS1–VS16 16 selectors U+FE00 to U+FE0F
VS17–VS256 240 selectors U+E0100 to U+E01EF

Variation selectors are intended to tell a text renderer how to display a character — for example, forcing a glyph to render as text-style vs. emoji-style. If a character has no defined variation for a given selector, the renderer simply ignores the selector codepoint. It’s invisible.

Since there are 240 selectors in the supplement block (VS17–VS256), you can map the standard ASCII range directly to them. Each character in your payload maps to one variation selector codepoint, appended after a base emoji.

Encoding “hello world” Into a Smiley Face

Here’s the encoding logic:

  1. Take the ASCII value of each character (e.g., h is 0x68, or 104)
  2. Add it to the base of the Variation Selector Supplement block (0xE0100)
  3. Append the resulting codepoints to a base emoji
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
def encode_emoji_smuggle(base_emoji, secret_text):
    VS_OFFSET = 0xE0100  # Start of Variation Selector Supplement (VS17+)

    encoded_payload = ""
    for char in secret_text:
        encoded_payload += chr(VS_OFFSET + ord(char))

    return base_emoji + encoded_payload


def decode_emoji_smuggle(smuggled_string):
    VS_OFFSET = 0xE0100

    decoded = ""
    for char in smuggled_string:
        cp = ord(char)
        if 0xE0100 <= cp <= 0xE01EF:
            decoded += chr(cp - VS_OFFSET)

    return decoded


base = "😀"
secret = "hello world"
result = encode_emoji_smuggle(base, secret)

print(f"Visible output: {result}")
print(f"String length:  {len(result)}")  # 12 (1 emoji + 11 hidden chars)
print(f"Decoded:        {decode_emoji_smuggle(result)}")

Run this and you’ll see a single 😀 printed to the terminal. But len() reveals 12 characters — the emoji plus 11 invisible variation selectors. Paste the output into a Unicode inspector and you’ll see the hidden Variation Selector blocks.

The “Invisible Ink” Variant: Unicode Tag Set

There’s a second approach using the Unicode Tag Set (U+E0000 to U+E007F). These 128 codepoints mirror the printable ASCII range but are completely non-rendering.

1
2
3
4
5
6
7
8
9
def encode_tag_set(secret_text):
    return "".join(chr(0xE0000 + ord(c)) for c in secret_text)

def decode_tag_set(tagged_string):
    return "".join(
        chr(ord(c) - 0xE0000)
        for c in tagged_string
        if 0xE0000 <= ord(c) <= 0xE007F
    )

The mapping is straightforward: H (0x48) becomes U+E0048, i (0x69) becomes U+E0069. The entire string is invisible — no base emoji needed. This variant is even harder to spot because there’s nothing visible at all.

Tokenization Splitting

Beyond data smuggling, there’s a simpler variant that exploits tokenization directly. Inserting an emoji into a restricted word forces the tokenizer to split it:

  • "Bomb" → tokenizer recognizes it, safety filter catches it
  • "B💣mb" → tokenizer splits into separate tokens, safety filter doesn’t match the pattern

The model still understands the human intent from context, but the keyword-based filter sees tokens that don’t match its blocklist. The Mindgard/Lancaster University paper (arXiv:2504.11168) found that emoji smuggling achieved a 100% attack success rate across the six guardrails they tested — Azure Prompt Shield, Protect AI, Meta Prompt Guard, Nvidia NeMo, and others. A separate Emoji Attack paper tested Llama Guard specifically and found detection dropped from 81% to 35%.

Defense: Explicit Unicode Stripping

You’ll see NFKC (Normalization Form Compatibility Composition) recommended as a defense in many write-ups. NFKC is one of four Unicode normalization forms — it decomposes characters using compatibility mappings, then recomposes them into a canonical form. For example, it turns the ligature into fi and the fullwidth into A. It’s good for collapsing visually similar characters into a single representation.

But NFKC alone does not strip variation selectors. Variation selectors have no canonical decomposition mapping in the Unicode Character Database, so unicodedata.normalize('NFKC', text) leaves them untouched.

What you actually need is NFKC_Casefold (a different operation that combines NFKC with case folding and removal of “default ignorable” codepoints like variation selectors), or explicit stripping of the dangerous ranges. Since Python’s unicodedata module doesn’t expose NFKC_Casefold directly, explicit filtering is the most reliable approach:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
import unicodedata

def sanitize_input(text):
    # NFKC handles some normalization but does NOT strip variation selectors
    normalized = unicodedata.normalize('NFKC', text)

    cleaned = ""
    for c in normalized:
        cp = ord(c)
        # Strip Variation Selectors (VS1-VS16)
        if 0xFE00 <= cp <= 0xFE0F:
            continue
        # Strip Variation Selectors Supplement (VS17-VS256)
        if 0xE0100 <= cp <= 0xE01EF:
            continue
        # Strip Unicode Tag Set
        if 0xE0000 <= cp <= 0xE007F:
            continue
        # Strip zero-width characters
        if cp in (0x200B, 0x200C, 0x200D, 0xFEFF):
            continue
        # Strip directional overrides
        if 0x202A <= cp <= 0x202E:
            continue
        cleaned += c

    return cleaned

The key insight: you have to explicitly enumerate the codepoint ranges used for smuggling. Relying on normalization alone is a false sense of security.

Key Takeaways

  • Emoji smuggling is steganography — it hides data in the gap between visual rendering and byte-level tokenization
  • 240 variation selectors in the supplement block can each encode one byte of payload, appended invisibly to any base emoji
  • The Unicode Tag Set (U+E0000U+E007F) provides 128 invisible codepoints that mirror ASCII — no base emoji needed
  • Models don’t spontaneously decode hidden payloads — they need explicit instructions, but the data is there in the token stream waiting to be extracted
  • NFKC normalization is not enough — it does not strip variation selectors. You need explicit codepoint range filtering
  • Token splitting with mid-word emojis bypasses keyword-based filters even without invisible characters
  • Guardrail bypass rates are real — Mindgard found 100% success across six major guardrails; Llama Guard detection dropped from 81% to 35%

Resources

This post is licensed under CC BY 4.0 by the author.