SMC regular expression syntax

A regular expression is a sequence of characters that defines a matching pattern. These patterns are used for matching byte sequences in network traffic.

The expression matching always starts from the beginning of the traffic stream, defined by the associated Situation Context. Depending on the context, this can mean:
  • The beginning of a TCP stream.
  • The beginning of a UDP packet.
  • A protocol-specific field or header, such as the beginning of an HTTP request header or the beginning of an HTTP Request URI.

A regular expression consists of one or more branches that are separated by a logical OR symbol “|”. A Situation match occurs if any of the branches matches the traffic stream.

Regular expression matching

# This regular expression matches
# if any of the following patterns are seen
# at the beginning of the traffic stream: "aaa", "bbb", "ccc"
aaa|bbb|ccc

The basic sequences that can be used in an SMC regular expression are listed in the following table:

Table 1. SMC regular expression syntax
Sequence Description Example
<char> Matches only the defined characters. "2", "A", "foo" match exactly to the defined characters: "2", "A", and "foo" respectively.
. (dot) Matches any character, including the null character \x00 and a missing character. Matches also other than printable characters, such as the linefeed.

A missing character is a special character used by the engine to represent characters missing from a TCP connection.

For example, in capture mode, the engine might not see all traffic of a TCP connection.

"." matches any single character or byte.
\x<hex> Matches the hexadecimal byte value ranging from \x00 to \xFF. "\x4d" matches hexadecimal value "4d" which represents the decimal value 77 and the ASCII character "M".
[<char>] Matches any single character in the list. "[15aB]" matches when any of the characters " 1 ", "5", "a", or "B" are in the matching location of the inspected string.
[^<char>] Matches any single character that is not on the list. "[^aBc]" matches if none of the characters "a ", "B", or "c" is present in the matching location of the inspected string.
[<char1>-<char2>] Matches all characters ranging from <char1> to <char2>, these two characters included. "[a-f]" matches any character within the range from "a" to "f ", with "a" and "f" included.
\<char> Used for escaping special metacharacters to be interpreted as normal characters. The metacharacters are: \|)(][^-*+?.# "\[" matches the "[" character instead of interpreting it as the regular expression class metacharacter.
#<text> Anything starting with "#" up to the linefeed (\x0a) or the carriage return character (\x0d) is regarded as a comment and not used in the matching process. "# my comment." is not used in the matching process.
(<expr1>|<expr2>) Matches if either expression <expr1> or <expr2> matches. "a(bc|de)" matches "abc" and "ade".

Example regular expressions

# This regular expression matches any of the following strings: 
# "login.php", "login1.php", "login2.php", "login_internal.php"
# Note: to match the "." character, the character must be escaped in the 
# regular expression by prefixing the character with "\"
login\.php|login[12]\.php|login_internal\.php

# Alternatively, the branches of the above regular expression can be 
# combined into one single branch as shown below
login([123]|_internal)?\.php

It is also possible to indicate repeated, consecutive characters, or regular expressions using quantifiers. The quantifiers available in SMC regular expression syntax are listed in the following table.

Table 2. SMC regular expression quantifiers
Sequence Description Example
<expr>* Matches if there are zero or more consecutive <expr> strings. "a*" matches "<empty>", "a", "aa" and so on.
<expr>+ Matches if there are one or more consecutive <expr> strings. "a+" matches "a", "aa", "aaa" and so on, but not the empty string.
<expr>? Matches if there is zero or one <expr> string. "a?" matches "<empty>" and "a".
<expr>{n,m} {num} matches exactly num times the expression.

{num,} matches num or more times the expression.

{num,max} matches at least num and no more than max times the expression.

" a{5,}" matches five or more consecutive "a" characters.

"a{5,7}" matches 5, 6, or 7 consecutive "a" characters.

The quantifiers always apply only to the single previous character (or special character sequence), unless otherwise indicated by parentheses. For example, the regular expression “login*” matches “logi”, “login” or “loginnnn”, whereas the regular expression “(login)*“ matches the empty string “”, “login” or “loginloginlogin”.

As the matching of a regular expression is always started from the beginning of the traffic stream, “.*” (any character zero or more times) is often needed when writing SMC regular expressions. For example, the regular expression “.*/etc/passwd” searches for the string “/etc/passwd” anywhere in the traffic stream.
Note: Use the wildcard characters '*' and '+', as well as '<expr>{n,m}' (where m has a large value) with care. If used in the middle of a regular expression, they can result in an expression that has a very large number of matching states, and that is too complex for efficient use. It is recommended to use these wildcards only in the beginning of a branch.