Mediawiki Headers to Dokuwiki Headers

This one was used to convert the headers between MediaWiki and DokuWiki:

Regex: =+(.+?)=+
Substitute: ======    $1 ======

Convert Dokuwiki ll-Functions to Monospace Font

Regex: \[\[ll(.+?)\]\]
Substitute: ''ll$1''

MediaWiki Links to DokuWiki Links

Regex: \[(http){1}(.+?)\s(.+?)\]
Substitute: [[$1$2|$3]]

Convert Uppercase Titles to DokuWiki Titles

^([A-Z ]+?):$

Converting Mediawiki Links to DokuWiki Links

Search:

^(\|)\[\[http\:\/\/was\.fm\/wiki/(.+?)]]

Replace:

$1{{wiki:$2}}

Matching UUIDs

[0-9A-Fa-f]{8}\-[0-9A-Fa-f]{4}\-[0-9A-Fa-f]{4}\-[0-9A-Fa-f]{4}\-[0-9A-Fa-f]{12}

UUID v4

[0-9A-Fa-f]{8}\-[0-9A-Fa-f]{4}\-4[0-9A-Fa-f]{3}\-[89ABab][0-9A-Fa-f]{3}\-[0-9A-Fa-f]{12}

Strip #-prefix Comments and Newlines

cat squid.conf | sed '/^#/d' | sed '/^$/d' 

Grabbing Top-Level Domains

(.+?)([a-zA-Z0-9\-]+)\.([a-z]{2,4})$

Match MAC Address

(([0-9a-fA-F]{2}[:]){5}([0-9a-fA-F]{2}))

where the first captured group is the whole MAC address.

Back Reference Followed by Number

The problem is that back references such as \1 may clash with an immediate following number. For example:

\10

will make the compiler understand the replacement as the 10th group instead of substituting \1 for the first group and then appending a 0 to the replacement.

There are various ways to avoid the confusion and the following table lists them by language.

Language Solution
PHP \${1}0
AWK \\10
Python \g<1>0

Floating Point Refinement for LSL Scripts

Given text files containing scripts placed in a folder, the following command:

find . -name \*.txt -exec perl -i'' -pe 's/([^0-9\.])0\.([0-9]+)([^\.])/\1.\2\3/g' '{}' \;

will pop off the first zero from the floating point number.

For example, it will replace 0.0234 by .0234. In cases such as LSL where the stack, heap and code are stored in the same container, it makes sense to reduce the code-size.

The next refinement is to eliminate trailing zeroes off floating point numbers:

find . -name \*.txt -exec perl -i'' -pe 's/([\s,<])([1-9])\.[0]+([>,\s;)])/\1\2\3/g' '{}' \;

since it is completely redundant to write 1.000000 instead of just 1.

Escaping Special Characters

For both Perl Compatible Regular Expression (PCRE) and POSIX Extended Regular Expressions (POSIX ERE), the following characters carry special meaning:

. ^ $ * + ? ( ) [ ] { \ | -

and should be escaped.

In POSIX Basic Regular Expressions (BRE) only the following characters carry special meaning:

. ^ $ *

and escaping parentheses and curly brackets gives them special meaning that they have in POSIX ERE.

Capitalise First Letter of Every Word

Using word boundaries, the following substitution:

s/\b(\w)/\u$1/g

will capitalise the first letter of every word.

Refactor String Comparisons for Better Performance

The difference between:

string a = "good";
string b = "day";
a.Equals(b, StringComparison.Ordinal);

and:

string a = "good";
string b = "day";
string.Equals(a, b, StringComparison.Ordinal);

is that the latter variant performs a reference equality test which may in some case be faster.

We can build a regex replacement rule that will thus refactor all instances of a.Equals(b) into string.Equals(a, b). We search for:

\b([a-zA-Z_@0-9\.]+?)\.Equals\((.+?), StringComparison\.([a-zA-Z_@0-9\.]+?)\)

and replace with:

string.Equals(\2, \1, StringComparison.\3)

where \2, \1 and \3 represent the capture groups.

Lookahead

Example Name Description
def(?!abc) Negative lookahead. Match a group (def) not followed by a group (abc)
def(?=abc) Positive lookahead. Match a group (def) followed by a group (abc)
(?<!abc)def Negative lookbehind. Match a group (def) not preceded by a group (abc)
(?⇐abc)def Positive lookbehind. Match a group (def) preceded by a group (abc)

For instance, matching all instances like:

double.TryParse(
float.TryParse(

but no instances of:

UUID.TryParse(
Vector2.TryParse(
Vector2d.TryParse(
Vector3.TryParse(
Vector3d.TryParse(
Quaternion.TryParse(
bool.TryParse(
DateTime.TryParse(

one could write use a negative lookbehind regular expression:

(?<!UUID|Vector[23]d?|Quaternion|bool|DateTime)\.TryParse\(

Validating an URL Address

Conforming to RFC 1738, the following pattern will match one or more characters that are able to appear in an URL address:

[$\-_\.\+!\*'\(\),a-zA-Z0-9]+

In other words, given an URL such as http://www.google.com, the pattern can be run against the segment www.google.com and it will match all characters.

Matching a Windows NetSH URL Reservation

A Windows NetSH URL reservation needs to conform to the following rules:

  • If not URL path is given, then the last character in the URL reservation must be a forward-slash /.
  • URL reservations must specify a port name - even if the port is a typical HTTP port (port 80).
  • URLs may be reserved for both HTTP and HTTPs.

The following pattern will match in case an URL conforms to the aforementioned rules:

^https?:\/\/[$\-_\.\+!\*'\(\),a-zA-Z0-9]+:[0-9]{1,5}.*/[$\-_\.\+!\*'\(\),a-zA-Z0-9]*$

Matching Domain Names

Conforming to RFC1035, the following characters are allowed:

  • a-z, A-Z
  • 0-9
  • - but not as a starting or ending character
  • . as a separator for the textual portions of a domain name
  • labels must be 63 octets or less
(?:[A-Za-z0-9][A-Za-z0-9\-]{0,61}[A-Za-z0-9]|[A-Za-z0-9])

Matching IP Addresses

A comprehensive rule that keeps into account IP address classes is the following:

(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)

fuss/regex.txt · Last modified: 2022/04/19 08:28 by 127.0.0.1

Access website using Tor Access website using i2p Wizardry and Steamworks PGP Key


For the contact, copyright, license, warranty and privacy terms for the usage of this website please see the contact, license, privacy, copyright.