Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
fuss:regex [2017/02/14 13:20] – [Matching a Windows NetSH URL Reservation] officefuss:regex [2022/04/19 08:28] (current) – external edit 127.0.0.1
Line 1: Line 1:
 +====== Mediawiki Headers to Dokuwiki Headers ======
 +
 +This one was used to convert the headers between MediaWiki and DokuWiki:
 +
 +<code>
 +Regex: =+(.+?)=+
 +Substitute: ======    $1 ======
 +</code>
 +
 +====== Convert Dokuwiki ll-Functions to Monospace Font ======
 +
 +<code>
 +Regex: \[\[ll(.+?)\]\]
 +Substitute: ''ll$1''
 +</code>
 +
 +====== MediaWiki Links to DokuWiki Links ======
 +
 +<code>
 +Regex: \[(http){1}(.+?)\s(.+?)\]
 +Substitute: [[$1$2|$3]]
 +</code>
 +
 +====== Convert Uppercase Titles to DokuWiki Titles ======
 +
 +<code>
 +^([A-Z ]+?):$
 +</code>
 +
 +====== Converting Mediawiki Links to DokuWiki Links ======
 +
 +Search:
 +<code>
 +^(\|)\[\[http\:\/\/was\.fm\/wiki/(.+?)]]
 +</code>
 +
 +Replace:
 +<code>
 +$1{{wiki:$2}}
 +</code>
 +
 +====== Matching UUIDs ======
 +
 +<code>
 +[0-9A-Fa-f]{8}\-[0-9A-Fa-f]{4}\-[0-9A-Fa-f]{4}\-[0-9A-Fa-f]{4}\-[0-9A-Fa-f]{12}
 +</code>
 +
 +===== UUID v4 =====
 +
 +<code>
 +[0-9A-Fa-f]{8}\-[0-9A-Fa-f]{4}\-4[0-9A-Fa-f]{3}\-[89ABab][0-9A-Fa-f]{3}\-[0-9A-Fa-f]{12}
 +</code>
 +
 +====== Strip #-prefix Comments and Newlines ======
 +
 +<code bash>
 +cat squid.conf | sed '/^#/d' | sed '/^$/d' 
 +</code>
 +
 +====== Grabbing Top-Level Domains ======
 +
 +<code>
 +(.+?)([a-zA-Z0-9\-]+)\.([a-z]{2,4})$
 +</code>
 +
 +====== Match MAC Address ======
 +
 +<code>
 +(([0-9a-fA-F]{2}[:]){5}([0-9a-fA-F]{2}))
 +</code>
 +
 +where the first captured group is the whole MAC address.
 +
 +====== Back Reference Followed by Number ======
 +
 +The problem is that back references such as ''\1'' may clash with an immediate following number. For example:
 +<code>
 +\10
 +</code>
 +will make the compiler understand the replacement as the 10th group instead of substituting ''\1'' for the first group and then appending a ''0'' to the replacement.
 +
 +There are various ways to avoid the confusion and the following table lists them by language.
 +
 +^ Language ^ Solution   ^
 +| PHP      | ''\${1}0'' |
 +| AWK      | ''\\10''   |
 +| Python   | ''\g<1>0'' |
 +
 +====== Floating Point Refinement for LSL Scripts ======
 +
 +Given text files containing scripts placed in a folder, the following command:
 +<code bash>
 +find . -name \*.txt -exec perl -i'' -pe 's/([^0-9\.])0\.([0-9]+)([^\.])/\1.\2\3/g' '{}' \;
 +</code>
 +will pop off the first zero from the floating point number.
 +
 +For example, it will replace ''0.0234'' by ''.0234''. In cases such as LSL where the stack, heap and code are stored in the same container, it makes sense to reduce the code-size.
 +
 +The next refinement is to eliminate trailing zeroes off floating point numbers:
 +<code bash>
 +find . -name \*.txt -exec perl -i'' -pe 's/([\s,<])([1-9])\.[0]+([>,\s;)])/\1\2\3/g' '{}' \;
 +</code>
 +since it is completely redundant to write ''1.000000'' instead of just ''1''.
 +
 +====== Escaping Special Characters ======
 +
 +For both Perl Compatible Regular Expression (PCRE) and POSIX Extended Regular Expressions (POSIX ERE), the following characters carry special meaning:
 +
 +<code>
 +. ^ $ * + ? ( ) [ ] { \ | -
 +</code>
 +
 +and should be escaped.
 +
 +In POSIX Basic Regular Expressions (BRE) only the following characters carry special meaning:
 +<code>
 +. ^ $ *
 +</code>
 +
 +and escaping parentheses and curly brackets gives them special meaning that they have in POSIX ERE.
 +
 +====== Capitalise First Letter of Every Word ======
 +
 +Using word boundaries, the following substitution:
 +<code>
 +s/\b(\w)/\u$1/g
 +</code>
 +
 +will capitalise the first letter of every word.
 +
 +====== Refactor String Comparisons for Better Performance ======
 +
 +The difference between:
 +<code csharp>
 +string a = "good";
 +string b = "day";
 +a.Equals(b, StringComparison.Ordinal);
 +</code>
 +and:
 +<code csharp>
 +string a = "good";
 +string b = "day";
 +string.Equals(a, b, StringComparison.Ordinal);
 +</code>
 +
 +is that the latter variant performs a reference equality test which may in some case be faster.
 +
 +We can build a regex replacement rule that will thus refactor all instances of ''a.Equals(b)'' into ''string.Equals(a, b)''. We search for:
 +<code>
 +\b([a-zA-Z_@0-9\.]+?)\.Equals\((.+?), StringComparison\.([a-zA-Z_@0-9\.]+?)\)
 +</code>
 +
 +and replace with:
 +<code>
 +string.Equals(\2, \1, StringComparison.\3)
 +</code>
 +
 +where ''\2'', ''\1'' and ''\3'' represent the capture groups.
 +
 +====== Lookahead ======
 +
 +^ Example ^ Name ^ Description ^
 +| ''def(?!abc)'' | Negative lookahead. | Match a group (''def'') //not// followed by a group (''abc'') |
 +| ''def(?=abc)'' | Positive lookahead. | Match a group (''def'') followed by a group (''abc'') |
 +| ''(?<!abc)def'' | Negative lookbehind. | Match a group (''def'') //not// preceded by a group (''abc'') |
 +| ''(?<=abc)def'' | Positive lookbehind. | Match a group (''def'') preceded by a group (''abc'') |
 +
 +For instance, matching all instances like:
 +<code>
 +double.TryParse(
 +float.TryParse(
 +</code>
 +
 +but no instances of:
 +<code>
 +UUID.TryParse(
 +Vector2.TryParse(
 +Vector2d.TryParse(
 +Vector3.TryParse(
 +Vector3d.TryParse(
 +Quaternion.TryParse(
 +bool.TryParse(
 +DateTime.TryParse(
 +</code>
 +
 +one could write use a negative lookbehind regular expression:
 +<code>
 +(?<!UUID|Vector[23]d?|Quaternion|bool|DateTime)\.TryParse\(
 +</code>
 +
 +====== Validating an URL Address ======
 +
 +Conforming to [[http://www.faqs.org/rfcs/rfc1738.html|RFC 1738]], the following pattern will match one or more characters that are able to appear in an URL address:
 +<code>
 +[$\-_\.\+!\*'\(\),a-zA-Z0-9]+
 +</code>
 +
 +In other words, given an URL such as ''http://www.google.com'', the pattern can be run against the segment ''www.google.com'' and it will match all characters.
 +
 +====== Matching a Windows NetSH URL Reservation ======
 +
 +A Windows NetSH URL reservation needs to conform to the following rules:
 +  * If not URL path is given, then the last character in the URL reservation must be a forward-slash ''/''.
 +  * URL reservations must specify a port name - even if the port is a typical HTTP port (port 80).
 +  * URLs may be reserved for both HTTP and HTTPs.
 +
 +The following pattern will match in case an URL conforms to the aforementioned rules:
 +<code regex>
 +^https?:\/\/[$\-_\.\+!\*'\(\),a-zA-Z0-9]+:[0-9]{1,5}.*/[$\-_\.\+!\*'\(\),a-zA-Z0-9]*$
 +</code>
 +
 +====== Matching Domain Names ======
 +
 +Conforming to [[https://tools.ietf.org/html/rfc1035|RFC1035]], the following characters are allowed:
 +  * ''a-z'', ''A-Z''
 +  * ''0-9''
 +  * ''-'' but not as a starting or ending character
 +  * ''.'' as a separator for the textual portions of a domain name
 +  * labels must be 63 octets or less
 +
 +<code regex>
 +(?:[A-Za-z0-9][A-Za-z0-9\-]{0,61}[A-Za-z0-9]|[A-Za-z0-9])
 +</code>
 +
 +====== Matching IP Addresses ======
 +
 +A comprehensive rule that keeps into account IP address classes is the following:
 +
 +<code regex>
 +(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)
 +</code>
  

fuss/regex.1487078449.txt.bz2 · Last modified: 2017/02/14 13:20 by office

Access website using Tor Access website using i2p Wizardry and Steamworks PGP Key


For the contact, copyright, license, warranty and privacy terms for the usage of this website please see the contact, license, privacy, copyright.