Regex Guide - Masking w/ Custom Entity

Regex Guide for Masking with Regex-based Custom Entity

Regular expressions (regex) are patterns used to find and manipulate specific text segments.
In the Redaction service, regex allows you to define which parts of the text should be detected and masked — such as numbers, words, symbols, or specific formats (e.g., emails or IDs).

This guide summarizes the supported regex syntax used in the Masking feature and provides examples showing how each pattern behaves while used in masking.

Use this reference to design precise, valid patterns that match your masking needs without producing syntax errors or unintended results.

Only Full Token/String Masking is Applicable

Partial token/string masking is not applicable with regex-based custom entity. Even if your regex defines a part of a token/string, the entire token/string including the value you specified will be masked.

Example Input	Example Expression	Matching	Masking
educated	^ed	"educated"	"educated"

Allowed Characters

1. Letters and Numbers

A–Z, a–z, 0–9, Ç, Ö, Ü, ß, é, ı, أ, Ж, あ, 你, etc.

2. Common Symbols (Literal)

These are safe to use directly as normal characters — they have no special regex meaning unless used inside certain contexts like []:

- _ @ ! # % & : , ; ' " ` / < = > ~ space tab

(Inside a character class [ ... ], the hyphen - defines a range.
If you mean a literal dash, place it at the start or end: [-abc] or [abc-].)

3. Regex Metacharacters (Special Meaning)

These control how matching works:

. ^ $ * + ? ( ) [ ] { } | \

Basics of Regular Expression Structure

Anchors & Character Classes

Character Classes	Name	Description	Example Expression	Example Match (In Green)
.	Dot (wildcard)	Any character (except newline)	edu.ated	"educated", "edu_ated”, "edu4ated”
^	Beginning	Matches the beginning of the string. (This matches a position, not a character).	^ed	"educated"
$	End	Matches the end of the string. (This matches a position, not a character).	ed$	"educated"
\b	Word boundary	Matches a word boundary position between a word character and non-word character or position (start / end of string)	ed\b	"educated and qualified"
\B	Not word boundary	Matches any position that is not a word boundary. This matches a position, not a character.	ed\B	"educated and qualified"
\d	Digit	Matches any digit character (0-9). Equivalent to [0-9]	\d	"file_25”
\D	Not digit	Matches any character that is not a digit character (0-9). Equivalent to [^0-9]	\D	"file_25”
\w	Word	Matches any word character (alphanumeric & underscore). Only matches low-ascii characters (no accented or non-roman characters). Equivalent to [A-Za-z0-9_]	\w	"25 $"
\W	Not word	Matches any character that is not a word character (alphanumeric & underscore). Equivalent to [^A-Za-z0-9_]	\W	"25 $"
\s	Whitespace	White space: space, tab, newline	a\sb\s	"a b c"
\S	Not whitespace	Matches any character that is not a whitespace character (spaces, tabs, line breaks)	\S	"anything-without-whitespace"

Quantifiers & Alternation

Character	Name	Description	Example Expression	Example Match (In Green)
+	Plus	Matches 1 or more of the preceding token	go+	"g", "go", "gooo", "gooo..."
*	Star	Matches 0 or more of the preceding token	go*	"g", "go", "gooo", "gooo..."
?	Lazy	Makes the preceding quantifier lazy, causing it to match as few characters as possible. By default, quantifiers are greedy, and will match as many characters as possible	colou?r	"color", "colour"
{}	Quantifier bounds	Matches the specified quantity of the previous token. For example: {1,3} will match 1 to 3. {3} will match exactly 3. {3,} will match 3 or more.	b\w{2,3}	“b be bee beer beers”
\	Escape for special meaning or literal reference	Matches the results of a capture group. For example: \1 matches the results of the first capture group. \. matches a literal dot.	(\w)a\1 \.	"hah dad bad" "abc...abcd...."
\|	Alternation (OR)	Acts like a boolean OR. Matches the expression before or after the \|. It can operate within a group, or on a whole expression. The patterns will be tested in order.	b(a\|e\|i)d	"bad", "bud", "bod", "bed", "bid"

Ranges & Groups & Lookaround

Character	Name	Description	Example Expression	Example Match (In Green)
[]	Character list or range	Match any character in the set OR Matches a character having a character code between the two specified characters inclusive.	[abc] [A-Z]	"a" or "b" or "c" "A" to "C"
()	Capturing group	Groups multiple tokens together and creates a capture group for extracting a substring or using a backreference.	(ha)+	"ha", "hahahah", "haa", "hah!"
(?= ... )	Positive lookahead	Matches a group after the main expression without including it in the result.	\d(?=px)	"1pt 2px 3em 4px"
(?! ... )	Negative lookahead	Specifies a group that can not match after the main expression (if it matches, the result is discarded).	\d(?!px)	"1pt1 2px 3em 4px"

⚠️Characters to Avoid (Not Allowed)

It will not be allowed to create an entity with an invalid regex.

Character Type	Why to Avoid
Unescaped double quote "	Causes syntax error
Incomplete or Dangling escapes like \ or \x or \u	Invalid syntax
Wrong quantifiers like a{,3} or a{5,2}	Invalid bounds
Unpaired surrogate code points like [\uD800-\uDFFF], \uD800	Invalid unicode

Regex Common Examples

Type of Input	Pattern	Example Match
Date format	^(0?[1-9]\|1[0-2])[\/](0?[1-9]\|[12]\d\|3[01])[\/](19\|20)\d{2}$	10/2/2019 01/02/2019
USD currency	^($)(\d)+	$10 $100000
IPv4 address	\b(?:(?:2(?:[0-4][0-9]\|5[0-5])\|[0-1]?[0-9]?[0-9]).){3}(?:(?:2([0-4][0-9]\|5[0-5])\|[0-1]?[0-9]?[0-9]))\b	192.168.0.1 255.255.255.255
Phone number	^\s(?:+?(\d{1,3}))?([-. (](\d{3})[-. )])?((\d{3})[-. ](\d{2,4})(?:[-.x ](\d+))?)\s$	0400123000 +61 0400000000
Aphanumeric Starting with the letter "C" Exact length: 7 characters	^C\d{7}$	C0000001
Alphanumeric with spaces allowed. Max length: 20 characters	^[\w\d\s]{1,20}$	12345678912345678912 abcdefghijklmnopqrst
Aphanumeric Starting with 2 letters, underline, 3 numeric	^[A-z]{2}_[0-9]{3}$	DK_003