home.social

#adversarial_ml — Public Fediverse posts

Live and recent posts from across the Fediverse tagged #adversarial_ml, aggregated by home.social.

  1. ⚠️ Vulnerability Report
    =======================

    🎯 AI

    Executive summary: New analysis highlights that emojis and uncommon
    Unicode byte sequences can cause brittle behavior in large language
    models by producing unexpected tokenization outputs under Byte-Pair
    Encoding (BPE) or similar tokenizers. This is an operational security
    concern for any pipeline that accepts user text and relies on
    deterministic token boundaries.

    Technical details:
    • Tokenizers relying on BPE or byte-level vocabularies split input
    into subword units; multi-byte Unicode characters (for example emoji
    or combined sequences) may be tokenized as rare or out-of-vocabulary
    byte patterns.
    • Rare or unseen byte sequences can create token fragmentation (many
    short tokens) or produce tokens that map to semantically different
    vectors, altering model context and generation.
    • Edge cases include surrogate pairs, zero-width joiners, skin-tone
    modifiers, and compound emoji sequences that change byte alignment.

    Analysis and impact:
    • Downstream effects include unintended prompt truncation, semantic
    drift, and increased susceptibility to adversarial inputs that
    leverage token boundary manipulation.
    • Attackers can craft inputs that force models into degraded contexts,
    leak system prompts through context misalignment, or trigger unsafe
    completions by exploiting tokenization mismatches.

    Detection:
    • Monitor token length distributions versus character lengths to
    detect anomalies where character count rises but token count balloons.
    • Instrument preprocessing logs to capture unusual byte-sequence
    frequencies and new tokens entering the embedding table.
    • Use synthetic test suites that include emoji variants, combining
    characters, and long multi-byte sequences.

    Mitigation:
    • Implement Unicode normalization (NFC/NFKC) in preprocessing and
    strip or canonicalize zero-width joiners where appropriate.
    • Expand tokenizer training data with diverse emoji and multi-byte
    sequences, or use byte-level tokenizers robust to unseen sequences.
    • Add input sanitation layers that flag or constrain user-supplied
    content with high token/character ratios and apply rate limits or
    transformation policies.

    References / notes:
    • This is a tokenizer-level robustness issue rather than a single
    CVE-class vulnerability; mitigations focus on preprocessing, tokenizer
    coverage, and monitoring.

    🔹 llm_security #tokenization #BPE #unicode #adversarial_ml

    🔗 Source: infosecwriteups.com/the-emoji-