Table of contents

When you set a max length on a form field or API, you expect it to hold. But what if a four-character string could secretly carry 10,000 extra bytes of invisible data, crashing your database or bypassing your validation? That was the vulnerability I found and fixed in the popular JavaScript library validator. It was a subtle bug involving Unicode Variation Selectors that allowed attackers to inject massive payloads while still passing length checks.

Introduction

The isLength function is simple on the surface: count characters and compare to a limit. But in a Unicode world, “character” is tricky. You need to match the perceived length (what the user sees) with the storage length (what your database handles), or you risk truncation, performance issues, and, as I found, critical bypasses.

Unicode and the illusion of length

JavaScript strings use UTF‑16. Some visible “characters” span two code units (surrogate pairs). isLength adjusts for that. It treats a base character plus a Variation Selector as one perceived character. That’s because the selector only changes the presentation.

// Emoji still count as one perceived character
const smile = '🙂';
console.log(smile.length);          // 2 (UTF-16 code units)
// With isLength, it's treated as 1

The attack vector: Variation selectors gone wild

Variation Selectors (U+FE0F, U+FE0E) tweak how a base character displays. They aren’t content on their own. The old logic subtracted all selectors. So one can pad a short string with thousands of them while still passing the check with a low max.

// Apparent 4 chars, but thousands of extra selectors
const s = 'test' + '\uFE0F'.repeat(10_000);
console.log(s.length); // 10,004 (UTF-16)

What are unicode variation selectors?

These are zero‑width code points (U+FE0E for text and U+FE0F for emoji among the others) that modify the presentation of the character that immediately precedes them. They change appearance, not meaning. A base character plus a selector is meant to count as one perceived character.

As of Unicode 17.0, using them to choose emoji vs. text for many legacy dingbat bases is being phased out. The new emojis have separate code points assigned for each style. For example, the emoji 🪯 (U+1FAAF KHANDA) has a dedicated emoji code point. The older dingbat-style symbol is ☬ (U+262C ADI SHAKTI). Variation selectors still exist and work for bases that support them. But modern additions increasingly prefer distinct code points over selector‑based presentation. Unicode provides the emoji presentation sequences chart.

Here are some examples (text vs. emoji):

Heart: ❤︎ (text, U+2764 U+FE0E) vs ❤️ (emoji, U+2764 U+FE0F)
Airplane: ✈︎ (text, U+2708 U+FE0E) vs ✈️ (emoji, U+2708 U+FE0F)
Snowman: ☃︎ (text, U+2603 U+FE0E) vs ☃️ (emoji, U+2603 U+FE0F)
Writing hand: ✍︎ (text, U+270D U+FE0E) vs ✍️ (emoji, U+270D U+FE0F)
Telephone: ☎︎ (text, U+260E U+FE0E) vs ☎️ (emoji, U+260E U+FE0F)

Console example:

// Base character: Heavy Black Heart
const heart = '\u2764';
const heartText = heart + '\uFE0E'; // request text style
const heartEmoji = heart + '\uFE0F'; // request emoji style

console.log(heart, heartText, heartEmoji); // ❤ ❤︎ ❤️

By default, (with no selector), presentation differs by platform, browser, and font. Some show emoji by default; others prefer text. If you need a specific look, add U+FE0E (text) or U+FE0F (emoji). On the web, you can use the font-variant-emoji CSS property.

Important: A selector only makes sense right after a compatible base. Multiple selectors don’t stack or change meaning. ❤︎\uFE0F\uFE0F doesn’t become “more emoji.” Only the first base+selector pair affects the presentation. All the extras are stray code points.

For validation, ignore only one base+selector pair. Count stray or repeated selectors toward length. Otherwise, a tiny word can hide kilobytes and waste CPU. You can find more information about the issue in the Snyk Vulnerability database under SNYK-JS-VALIDATOR-13653476 entry.

The surgical fix: Counting only valid pairs

The change is precise: subtract only base+selector pairs instead of every selector. Old: /(\uFE0F|\uFE0E)/g. New: /[^\uFE0F\uFE0E][\uFE0F\uFE0E]/g. [^…] means “not these,” which forces a preceding non‑selector base. So stray or repeated selectors get counted. This distinction is the key to the fix.

Effective length: str.length – surrogatePairs – basePlusSelectorPairs.

// Padding with stray selectors now fails the max check
const payload = 'test' + '\uFE0F'.repeat(5);
console.log('isLength(payload, { max: 4 }):', isLength(payload, { max: 4 })); // false

// A valid base+selector pair still counts as one perceived char
const basePair = 'A' + '\uFE0F';
console.log('isLength(basePair, { max: 1 }):', isLength(basePair, { max: 1 })); // true

Disclosure and timeline

I followed a responsible disclosure process. I reported the issue, prepared a minimal PoC, and worked with the maintainers on a fix. The maintainers merged the patch, published a release, and then the advisory went live.

I reported the issue along with a proposed fix on October 18, 2025. The maintainers merged a pull request on November 5. Finally, CVE-2025-12758 was published on November 26. More info and links on the GitHub Security Advisory.

Conclusion

The key takeaway remains: Variation Selectors are modifiers, not content. The bug allowed them to be used as invisible padding to bypass max-length checks. After the fix only valid base-and-selector pairs pass the length check. It ensures all stray or repeated selectors are counted toward the actual length.

Please update to [email protected] or newer. When working with string length, always measure what you are storing, not just what users see.

When Zero‑Width Isn’t Zero: How I Found and Fixed a Vulnerability

Introduction

Unicode and the illusion of length

The attack vector: Variation selectors gone wild

What are unicode variation selectors?

The surgical fix: Counting only valid pairs

Disclosure and timeline

Conclusion

Ready to take your business to the next level with a digital product?

How to develop an AI app with a local model in Kotlin Multiplatform

A Practical Guide to Flutter Accessibility – Part 1: The Basics

Resources

When Zero‑Width Isn’t Zero: How I Found and Fixed a Vulnerability

Introduction

Unicode and the illusion of length

The attack vector: Variation selectors gone wild

What are unicode variation selectors?

The surgical fix: Counting only valid pairs

Disclosure and timeline

Conclusion

Ready to take your business to the next level with a digital product?

Related articles

How to develop an AI app with a local model in Kotlin Multiplatform

A Practical Guide to Flutter Accessibility – Part 1: The Basics

Resources

I have a project in mind and want to talk about it!

Woohoo! Your answers just landed in our inbox!

Woohoo! Your answers just
landed in our inbox!