SMC regular expression syntax
A regular expression is a sequence of characters that defines a matching pattern. These patterns are used for matching byte sequences in network traffic.
- The beginning of a TCP stream.
- The beginning of a UDP packet.
- A protocol-specific field or header, such as the beginning of an HTTP request header or the beginning of an HTTP Request URI.
A regular expression consists of one or more branches that are separated by a logical OR symbol “|”. A Situation match occurs if any of the branches matches the traffic stream.
Regular expression matching
# This regular expression matches
# if any of the following patterns are seen
# at the beginning of the traffic stream: "aaa", "bbb", "ccc"
aaa|bbb|ccc
The basic sequences that can be used in an SMC regular expression are listed in the following table:
Sequence | Description | Example |
---|---|---|
<char>
|
Matches only the defined characters. | "2", "A", "foo" match exactly to the defined characters: "2", "A", and "foo" respectively. |
. (dot)
|
Matches any character, including the null character \x00 and a missing character. Matches also other than printable characters, such as the linefeed.
A missing character is a special character used by the engine to represent characters missing from a TCP connection. For example, in capture mode, the engine might not see all traffic of a TCP connection. |
". " matches any single character or byte. |
\x<hex>
|
Matches the hexadecimal byte value ranging from \x00 to \xFF .
|
"\x4d " matches hexadecimal value "4d" which represents the decimal value 77 and the
ASCII character "M". |
[<char>]
|
Matches any single character in the list. | "[15aB] " matches when any of the characters " 1 ", "5", "a", or "B" are in the matching
location of the inspected string. |
[^<char>]
|
Matches any single character that is not on the list. | "[^aBc] " matches if none of the characters "a ", "B", or "c" is present in the matching location of the inspected string. |
[<char1>-<char2>]
|
Matches all characters ranging from <char1> to <char2>, these two characters included. | "[a-f] " matches any character within the range from "a" to "f ", with "a" and "f" included. |
\<char>
|
Used for escaping special metacharacters to be interpreted as normal characters. The metacharacters
are:
|
"\[ " matches the "[ " character instead of interpreting it as the regular expression class metacharacter. |
#<text>
|
Anything starting with "# " up to the linefeed (\x0a ) or the carriage
return character (\x0d ) is regarded as a comment and not used in the matching process. |
"# my comment. " is not used in the matching process.
|
(<expr1>|<expr2>)
|
Matches if either expression <expr1> or <expr2> matches.
|
"a(bc|de) " matches "abc" and "ade".
|
Example regular expressions
# This regular expression matches any of the following strings:
# "login.php", "login1.php", "login2.php", "login_internal.php"
# Note: to match the "." character, the character must be escaped in the
# regular expression by prefixing the character with "\"
login\.php|login[12]\.php|login_internal\.php
# Alternatively, the branches of the above regular expression can be
# combined into one single branch as shown below
login([123]|_internal)?\.php
It is also possible to indicate repeated, consecutive characters, or regular expressions using quantifiers. The quantifiers available in SMC regular expression syntax are listed in the following table.
Sequence | Description | Example |
---|---|---|
<expr>*
|
Matches if there are zero or more consecutive <expr> strings.
|
"a* " matches "<empty> ", "a ", "aa " and so on. |
<expr>+
|
Matches if there are one or more consecutive <expr> strings.
|
"a+ " matches "a ", "aa ", "aaa " and so on, but not the empty string. |
<expr>?
|
Matches if there is zero or one <expr> string.
|
"a? " matches "<empty> " and "a ". |
<expr>{n,m}
|
{num} matches exactly num times the expression.
|
" a{5,} " matches five or more consecutive "a " characters.
" |
The quantifiers always apply only to the single previous character (or special character sequence), unless otherwise indicated by parentheses. For example, the regular expression “login*” matches “logi”, “login” or “loginnnn”, whereas the regular expression “(login)*“ matches the empty string “”, “login” or “loginloginlogin”.
.*
” (any character zero or more times) is often needed
when writing SMC regular expressions.
For example, the regular expression “.*/etc/passwd
” searches for
the string “/etc/passwd
” anywhere in the traffic stream. *
' and '+
', as well
as '<expr>{n,m}
' (where m has a large value) with care.
If used in the middle of a regular expression, they can result in an expression
that has a very large number of matching states, and that is too complex for
efficient use. It is recommended to use these wildcards only in the beginning of
a branch.