Regex Guide - Masking w/ Custom Entity

Prev Next

Regex Guide for Masking with Regex-based Custom Entity


Regular expressions (regex) are patterns used to find and manipulate specific text segments.
In the Redaction service, regex allows you to define which parts of the text should be detected and masked — such as numbers, words, symbols, or specific formats (e.g., emails or IDs).

This guide summarizes the supported regex syntax used in the Masking feature and provides examples showing how each pattern behaves while used in masking.

Use this reference to design precise, valid patterns that match your masking needs without producing syntax errors or unintended results.

Only Full Token/String Masking is Applicable

Partial token/string masking is not applicable with regex-based custom entity. Even if your regex defines a part of a token/string, the entire token/string including the value you specified will be masked.

Example Input Example Expression Matching Masking
educated ^ed "educated" "educated"

Allowed Characters

1. Letters and Numbers

A–Z, a–z, 0–9, Ç, Ö, Ü, ß, é, ı, أ, Ж, あ, 你, etc.

2. Common Symbols (Literal)

These are safe to use directly as normal characters — they have no special regex meaning unless used inside certain contexts like []:

- _ @ ! # % & : , ; ' " ` / < = > ~ space tab

(Inside a character class [ ... ], the hyphen - defines a range.
If you mean a literal dash, place it at the start or end: [-abc] or [abc-].)

3. Regex Metacharacters (Special Meaning)

These control how matching works:

. ^ $ * + ? ( ) [ ] { } | \

Basics of Regular Expression Structure

Anchors & Character Classes

Character Classes Name Description Example Expression Example Match (In Green)
. Dot (wildcard) Any character (except newline) edu.ated "educated", "edu_ated”, "edu4ated
^ Beginning Matches the beginning of the string. (This matches a position, not a character). ^ed "educated"
$ End Matches the end of the string. (This matches a position, not a character). ed$ "educated"
\b Word boundary Matches a word boundary position between a word character and non-word character or position (start / end of string) ed\b "educated and qualified"
\B Not word boundary Matches any position that is not a word boundary. This matches a position, not a character. ed\B "educated and qualified"
\d Digit Matches any digit character (0-9). Equivalent to [0-9] \d "file_25
\D Not digit Matches any character that is not a digit character (0-9). Equivalent to [^0-9] \D "file_25”
\w Word Matches any word character (alphanumeric & underscore). Only matches low-ascii characters (no accented or non-roman characters). Equivalent to [A-Za-z0-9_] \w "25 $"
\W Not word Matches any character that is not a word character (alphanumeric & underscore). Equivalent to [^A-Za-z0-9_] \W "25 $"
\s Whitespace White space: space, tab, newline a\sb\s "a b c"
\S Not whitespace Matches any character that is not a whitespace character (spaces, tabs, line breaks) \S "anything-without-whitespace"

Quantifiers & Alternation

Character Name Description Example Expression Example Match (In Green)
+ Plus Matches 1 or more of the preceding token go+ "g", "go", "gooo", "gooo..."
* Star Matches 0 or more of the preceding token go* "g", "go", "gooo", "gooo..."
? Lazy Makes the preceding quantifier lazy, causing it to match as few characters as possible. By default, quantifiers are greedy, and will match as many characters as possible colou?r "color", "colour"
{} Quantifier bounds Matches the specified quantity of the previous token. For example:
{1,3} will match 1 to 3.
{3} will match exactly 3.
{3,} will match 3 or more.
b\w{2,3} “b be bee beer beers”
\ Escape for special meaning or literal reference Matches the results of a capture group. For example:
\1 matches the results of the first capture group.
\. matches a literal dot.

(\w)a\1

\.

"hah dad bad"

"abc...abcd...."
| Alternation (OR) Acts like a boolean OR. Matches the expression before or after the |.
It can operate within a group, or on a whole expression. The patterns will be tested in order.
b(a|e|i)d "bad", "bud", "bod", "bed", "bid"

Ranges & Groups & Lookaround

Character Name Description Example Expression Example Match (In Green)
[] Character list or range Match any character in the set OR Matches a character having a character code between the two specified characters inclusive. [abc]

[A-Z]
"a" or "b" or "c"

"A" to "C"
() Capturing group Groups multiple tokens together and creates a capture group for extracting a substring or using a backreference. (ha)+ "ha", "hahahah", "haa", "hah!"
(?= ... ) Positive lookahead Matches a group after the main expression without including it in the result. \d(?=px) "1pt 2px 3em 4px"
(?! ... ) Negative lookahead Specifies a group that can not match after the main expression (if it matches, the result is discarded). \d(?!px) "1pt1 2px 3em 4px"

⚠️Characters to Avoid (Not Allowed)

It will not be allowed to create an entity with an invalid regex.

Character Type Why to Avoid
Unescaped double quote " Causes syntax error
Incomplete or Dangling escapes like \ or \x or \u Invalid syntax
Wrong quantifiers like a{,3} or a{5,2} Invalid bounds
Unpaired surrogate code points like [\uD800-\uDFFF], \uD800 Invalid unicode

Regex Common Examples

Type of Input Pattern Example Match
Date format ^(0?[1-9]|1[0-2])[\/](0?[1-9]|[12]\d|3[01])[\/](19|20)\d{2}$ 10/2/2019
01/02/2019
USD currency ^($)(\d)+ $10
$100000
IPv4 address \b(?:(?:2(?:[0-4][0-9]|5[0-5])|[0-1]?[0-9]?[0-9]).){3}(?:(?:2([0-4][0-9]|5[0-5])|[0-1]?[0-9]?[0-9]))\b 192.168.0.1
255.255.255.255
Phone number ^\s*(?:+?(\d{1,3}))?([-. (](\d{3})[-. )])?((\d{3})[-. ](\d{2,4})(?:[-.x ](\d+))?)\s*$ 0400123000
+61 0400000000
Aphanumeric Starting with the letter "C"
Exact length: 7 characters
^C\d{7}$ C0000001
Alphanumeric with spaces allowed.
Max length: 20 characters
^[\w\d\s]{1,20}$ 12345678912345678912
abcdefghijklmnopqrst
Aphanumeric Starting with 2 letters, underline, 3 numeric ^[A-z]{2}_[0-9]{3}$ DK_003