#adversarial_ml — Public Fediverse posts on home.social

⚠️ Vulnerability Report
=======================

🎯 AI

Executive summary: New analysis highlights that emojis and uncommon
Unicode byte sequences can cause brittle behavior in large language
models by producing unexpected tokenization outputs under Byte-Pair
Encoding (BPE) or similar tokenizers. This is an operational security
concern for any pipeline that accepts user text and relies on
deterministic token boundaries.

Technical details:
• Tokenizers relying on BPE or byte-level vocabularies split input
into subword units; multi-byte Unicode characters (for example emoji
or combined sequences) may be tokenized as rare or out-of-vocabulary
byte patterns.
• Rare or unseen byte sequences can create token fragmentation (many
short tokens) or produce tokens that map to semantically different
vectors, altering model context and generation.
• Edge cases include surrogate pairs, zero-width joiners, skin-tone
modifiers, and compound emoji sequences that change byte alignment.

Analysis and impact:
• Downstream effects include unintended prompt truncation, semantic
drift, and increased susceptibility to adversarial inputs that
leverage token boundary manipulation.
• Attackers can craft inputs that force models into degraded contexts,
leak system prompts through context misalignment, or trigger unsafe
completions by exploiting tokenization mismatches.

Detection:
• Monitor token length distributions versus character lengths to
detect anomalies where character count rises but token count balloons.
• Instrument preprocessing logs to capture unusual byte-sequence
frequencies and new tokens entering the embedding table.
• Use synthetic test suites that include emoji variants, combining
characters, and long multi-byte sequences.

Mitigation:
• Implement Unicode normalization (NFC/NFKC) in preprocessing and
strip or canonicalize zero-width joiners where appropriate.
• Expand tokenizer training data with diverse emoji and multi-byte
sequences, or use byte-level tokenizers robust to unseen sequences.
• Add input sanitation layers that flag or constrain user-supplied
content with high token/character ratios and apply rate limits or
transformation policies.

References / notes:
• This is a tokenizer-level robustness issue rather than a single
CVE-class vulnerability; mitigations focus on preprocessing, tokenizer
coverage, and monitoring.

🔹 llm_security #tokenization #BPE #unicode #adversarial_ml

🔗 Source: https://infosecwriteups.com/the-emoji-that-broke-the-ai-into-27-pieces-a6ab1e1c551b