This one was used to convert the headers between MediaWiki and DokuWiki:
Regex: =+(.+?)=+ Substitute: ====== $1 ======
Regex: \[\[ll(.+?)\]\] Substitute: ''ll$1''
Regex: \[(http){1}(.+?)\s(.+?)\] Substitute: [[$1$2|$3]]
^([A-Z ]+?):$
Search:
^(\|)\[\[http\:\/\/was\.fm\/wiki/(.+?)]]
Replace:
$1{{wiki:$2}}
[0-9A-Fa-f]{8}\-[0-9A-Fa-f]{4}\-[0-9A-Fa-f]{4}\-[0-9A-Fa-f]{4}\-[0-9A-Fa-f]{12}
[0-9A-Fa-f]{8}\-[0-9A-Fa-f]{4}\-4[0-9A-Fa-f]{3}\-[89ABab][0-9A-Fa-f]{3}\-[0-9A-Fa-f]{12}
cat squid.conf | sed '/^#/d' | sed '/^$/d'
(.+?)([a-zA-Z0-9\-]+)\.([a-z]{2,4})$
(([0-9a-fA-F]{2}[:]){5}([0-9a-fA-F]{2}))
where the first captured group is the whole MAC address.
The problem is that back references such as \1
may clash with an immediate following number. For example:
\10
will make the compiler understand the replacement as the 10th group instead of substituting \1
for the first group and then appending a 0
to the replacement.
There are various ways to avoid the confusion and the following table lists them by language.
Language | Solution |
---|---|
PHP | \${1}0 |
AWK | \\10 |
Python | \g<1>0 |
Given text files containing scripts placed in a folder, the following command:
find . -name \*.txt -exec perl -i'' -pe 's/([^0-9\.])0\.([0-9]+)([^\.])/\1.\2\3/g' '{}' \;
will pop off the first zero from the floating point number.
For example, it will replace 0.0234
by .0234
. In cases such as LSL where the stack, heap and code are stored in the same container, it makes sense to reduce the code-size.
The next refinement is to eliminate trailing zeroes off floating point numbers:
find . -name \*.txt -exec perl -i'' -pe 's/([\s,<])([1-9])\.[0]+([>,\s;)])/\1\2\3/g' '{}' \;
since it is completely redundant to write 1.000000
instead of just 1
.
For both Perl Compatible Regular Expression (PCRE) and POSIX Extended Regular Expressions (POSIX ERE), the following characters carry special meaning:
. ^ $ * + ? ( ) [ ] { \ | -
and should be escaped.
In POSIX Basic Regular Expressions (BRE) only the following characters carry special meaning:
. ^ $ *
and escaping parentheses and curly brackets gives them special meaning that they have in POSIX ERE.
Using word boundaries, the following substitution:
s/\b(\w)/\u$1/g
will capitalise the first letter of every word.
The difference between:
string a = "good"; string b = "day"; a.Equals(b, StringComparison.Ordinal);
and:
string a = "good"; string b = "day"; string.Equals(a, b, StringComparison.Ordinal);
is that the latter variant performs a reference equality test which may in some case be faster.
We can build a regex replacement rule that will thus refactor all instances of a.Equals(b)
into string.Equals(a, b)
. We search for:
\b([a-zA-Z_@0-9\.]+?)\.Equals\((.+?), StringComparison\.([a-zA-Z_@0-9\.]+?)\)
and replace with:
string.Equals(\2, \1, StringComparison.\3)
where \2
, \1
and \3
represent the capture groups.
Example | Name | Description |
---|---|---|
def(?!abc) | Negative lookahead. | Match a group (def ) not followed by a group (abc ) |
def(?=abc) | Positive lookahead. | Match a group (def ) followed by a group (abc ) |
(?<!abc)def | Negative lookbehind. | Match a group (def ) not preceded by a group (abc ) |
(?⇐abc)def | Positive lookbehind. | Match a group (def ) preceded by a group (abc ) |
For instance, matching all instances like:
double.TryParse( float.TryParse(
but no instances of:
UUID.TryParse( Vector2.TryParse( Vector2d.TryParse( Vector3.TryParse( Vector3d.TryParse( Quaternion.TryParse( bool.TryParse( DateTime.TryParse(
one could write use a negative lookbehind regular expression:
(?<!UUID|Vector[23]d?|Quaternion|bool|DateTime)\.TryParse\(
Conforming to RFC 1738, the following pattern will match one or more characters that are able to appear in an URL address:
[$\-_\.\+!\*'\(\),a-zA-Z0-9]+
In other words, given an URL such as http://www.google.com
, the pattern can be run against the segment www.google.com
and it will match all characters.
A Windows NetSH URL reservation needs to conform to the following rules:
/
.The following pattern will match in case an URL conforms to the aforementioned rules:
^https?:\/\/[$\-_\.\+!\*'\(\),a-zA-Z0-9]+:[0-9]{1,5}.*/[$\-_\.\+!\*'\(\),a-zA-Z0-9]*$
Conforming to RFC1035, the following characters are allowed:
a-z
, A-Z
0-9
-
but not as a starting or ending character.
as a separator for the textual portions of a domain name(?:[A-Za-z0-9][A-Za-z0-9\-]{0,61}[A-Za-z0-9]|[A-Za-z0-9])
A comprehensive rule that keeps into account IP address classes is the following:
(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)