JS: RegExp.prototype.unicode (flag u)
RegExp.prototype.unicode (u)
u-
(new in ECMAScript 2015)
Interpret the text to be matched as byte sequence of characters of their unicode code points.
- This is useful if you want to know if a byte sequence occur in the text.
- With flag
uoff, the text is byte sequence of UTF-16 encoding. - With flag
uon, the text is byte sequence of the character's code points.
- For this flag to be useful, the regex must contain a byte sequence you seek, typically specified by
\uxxxxwhere the xxxx is 4 hexadecimal digits. - If the regex uses literal characters such as
abcor literal unicode"🦋", they are interpreted as byte sequence of UTF-16 encoding.
console.log(/\uD83E/.test("🦋") === true); // true console.log(/\uD83E/u.test("🦋") === false); // true /* D83E is first 2 bytes of the butterfly character in utf 16. D83E is part of surrogate code point, but not a standalone unicode character. The hexadecimal for the butterfly character is 1F98B . */ /* 🦋 Name: BUTTERFLY ID 129419 HEXD 1F98B UTF8 F0 9F A6 8B UTF16 D83E DD8B */ 🛑 WARNING: when you use the unicode escape sequence form
\u{hexadecimal}in literal regex expression e.g."🦋".match(/\u{1F98B}/), it is not interpreted as the code point's byte sequence, unless you have theuflag. But if you use it in a string, as arg to regex constructor, e.g."🦋".match( RegExp("\u{1F98B}"));it works (interpreted as byte sequence 1F98B)./* digit 0 has codepoint 48. It is 30 in hexadecimal. the unicode escape form \u{30} stand for the char 0 in string. */ console.log("\u{30}" === "0"); /* however, in regex literal expression, /\u{30}/ means the char repeated 30 times. you need the unicode flag u to interpreted it as digit 0. /\u{30}/u */ // the regex here is interpreted as u repeated 30 times // deno-fmt-ignore console.log(/\u{30}/.test("0") === false); // the regex here is interpreted as the character 0, because the flag u // deno-fmt-ignore console.log(/\u{30}/u.test("0") === true); // if you use the regex constructor , you don't have this problem. console.log(RegExp("\u{30}", "").test("0") === true); /* replace butterfly char by x */ // you don't need the regex flag u to match unicode characters console.log("🦋".replace(/🦋/, "x") === "x"); // but having it on doesn't hurt. console.log("🦋".replace(/🦋/u, "x") === "x");