This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revision | |||
fuss:regex [2017/02/14 13:20] – [Matching a Windows NetSH URL Reservation] office | fuss:regex [2022/04/19 08:28] (current) – external edit 127.0.0.1 | ||
---|---|---|---|
Line 1: | Line 1: | ||
+ | ====== Mediawiki Headers to Dokuwiki Headers ====== | ||
+ | |||
+ | This one was used to convert the headers between MediaWiki and DokuWiki: | ||
+ | |||
+ | < | ||
+ | Regex: =+(.+?)=+ | ||
+ | Substitute: ====== | ||
+ | </ | ||
+ | |||
+ | ====== Convert Dokuwiki ll-Functions to Monospace Font ====== | ||
+ | |||
+ | < | ||
+ | Regex: \[\[ll(.+? | ||
+ | Substitute: '' | ||
+ | </ | ||
+ | |||
+ | ====== MediaWiki Links to DokuWiki Links ====== | ||
+ | |||
+ | < | ||
+ | Regex: \[(http){1}(.+? | ||
+ | Substitute: [[$1$2|$3]] | ||
+ | </ | ||
+ | |||
+ | ====== Convert Uppercase Titles to DokuWiki Titles ====== | ||
+ | |||
+ | < | ||
+ | ^([A-Z ]+?):$ | ||
+ | </ | ||
+ | |||
+ | ====== Converting Mediawiki Links to DokuWiki Links ====== | ||
+ | |||
+ | Search: | ||
+ | < | ||
+ | ^(\|)\[\[http\: | ||
+ | </ | ||
+ | |||
+ | Replace: | ||
+ | < | ||
+ | $1{{wiki: | ||
+ | </ | ||
+ | |||
+ | ====== Matching UUIDs ====== | ||
+ | |||
+ | < | ||
+ | [0-9A-Fa-f]{8}\-[0-9A-Fa-f]{4}\-[0-9A-Fa-f]{4}\-[0-9A-Fa-f]{4}\-[0-9A-Fa-f]{12} | ||
+ | </ | ||
+ | |||
+ | ===== UUID v4 ===== | ||
+ | |||
+ | < | ||
+ | [0-9A-Fa-f]{8}\-[0-9A-Fa-f]{4}\-4[0-9A-Fa-f]{3}\-[89ABab][0-9A-Fa-f]{3}\-[0-9A-Fa-f]{12} | ||
+ | </ | ||
+ | |||
+ | ====== Strip #-prefix Comments and Newlines ====== | ||
+ | |||
+ | <code bash> | ||
+ | cat squid.conf | sed '/ | ||
+ | </ | ||
+ | |||
+ | ====== Grabbing Top-Level Domains ====== | ||
+ | |||
+ | < | ||
+ | (.+? | ||
+ | </ | ||
+ | |||
+ | ====== Match MAC Address ====== | ||
+ | |||
+ | < | ||
+ | (([0-9a-fA-F]{2}[: | ||
+ | </ | ||
+ | |||
+ | where the first captured group is the whole MAC address. | ||
+ | |||
+ | ====== Back Reference Followed by Number ====== | ||
+ | |||
+ | The problem is that back references such as '' | ||
+ | < | ||
+ | \10 | ||
+ | </ | ||
+ | will make the compiler understand the replacement as the 10th group instead of substituting '' | ||
+ | |||
+ | There are various ways to avoid the confusion and the following table lists them by language. | ||
+ | |||
+ | ^ Language ^ Solution | ||
+ | | PHP | '' | ||
+ | | AWK | '' | ||
+ | | Python | ||
+ | |||
+ | ====== Floating Point Refinement for LSL Scripts ====== | ||
+ | |||
+ | Given text files containing scripts placed in a folder, the following command: | ||
+ | <code bash> | ||
+ | find . -name \*.txt -exec perl -i'' | ||
+ | </ | ||
+ | will pop off the first zero from the floating point number. | ||
+ | |||
+ | For example, it will replace '' | ||
+ | |||
+ | The next refinement is to eliminate trailing zeroes off floating point numbers: | ||
+ | <code bash> | ||
+ | find . -name \*.txt -exec perl -i'' | ||
+ | </ | ||
+ | since it is completely redundant to write '' | ||
+ | |||
+ | ====== Escaping Special Characters ====== | ||
+ | |||
+ | For both Perl Compatible Regular Expression (PCRE) and POSIX Extended Regular Expressions (POSIX ERE), the following characters carry special meaning: | ||
+ | |||
+ | < | ||
+ | . ^ $ * + ? ( ) [ ] { \ | - | ||
+ | </ | ||
+ | |||
+ | and should be escaped. | ||
+ | |||
+ | In POSIX Basic Regular Expressions (BRE) only the following characters carry special meaning: | ||
+ | < | ||
+ | . ^ $ * | ||
+ | </ | ||
+ | |||
+ | and escaping parentheses and curly brackets gives them special meaning that they have in POSIX ERE. | ||
+ | |||
+ | ====== Capitalise First Letter of Every Word ====== | ||
+ | |||
+ | Using word boundaries, the following substitution: | ||
+ | < | ||
+ | s/ | ||
+ | </ | ||
+ | |||
+ | will capitalise the first letter of every word. | ||
+ | |||
+ | ====== Refactor String Comparisons for Better Performance ====== | ||
+ | |||
+ | The difference between: | ||
+ | <code csharp> | ||
+ | string a = " | ||
+ | string b = " | ||
+ | a.Equals(b, StringComparison.Ordinal); | ||
+ | </ | ||
+ | and: | ||
+ | <code csharp> | ||
+ | string a = " | ||
+ | string b = " | ||
+ | string.Equals(a, | ||
+ | </ | ||
+ | |||
+ | is that the latter variant performs a reference equality test which may in some case be faster. | ||
+ | |||
+ | We can build a regex replacement rule that will thus refactor all instances of '' | ||
+ | < | ||
+ | \b([a-zA-Z_@0-9\.]+? | ||
+ | </ | ||
+ | |||
+ | and replace with: | ||
+ | < | ||
+ | string.Equals(\2, | ||
+ | </ | ||
+ | |||
+ | where '' | ||
+ | |||
+ | ====== Lookahead ====== | ||
+ | |||
+ | ^ Example ^ Name ^ Description ^ | ||
+ | | '' | ||
+ | | '' | ||
+ | | '' | ||
+ | | '' | ||
+ | |||
+ | For instance, matching all instances like: | ||
+ | < | ||
+ | double.TryParse( | ||
+ | float.TryParse( | ||
+ | </ | ||
+ | |||
+ | but no instances of: | ||
+ | < | ||
+ | UUID.TryParse( | ||
+ | Vector2.TryParse( | ||
+ | Vector2d.TryParse( | ||
+ | Vector3.TryParse( | ||
+ | Vector3d.TryParse( | ||
+ | Quaternion.TryParse( | ||
+ | bool.TryParse( | ||
+ | DateTime.TryParse( | ||
+ | </ | ||
+ | |||
+ | one could write use a negative lookbehind regular expression: | ||
+ | < | ||
+ | (?< | ||
+ | </ | ||
+ | |||
+ | ====== Validating an URL Address ====== | ||
+ | |||
+ | Conforming to [[http:// | ||
+ | < | ||
+ | [$\-_\.\+!\*' | ||
+ | </ | ||
+ | |||
+ | In other words, given an URL such as '' | ||
+ | |||
+ | ====== Matching a Windows NetSH URL Reservation ====== | ||
+ | |||
+ | A Windows NetSH URL reservation needs to conform to the following rules: | ||
+ | * If not URL path is given, then the last character in the URL reservation must be a forward-slash ''/'' | ||
+ | * URL reservations must specify a port name - even if the port is a typical HTTP port (port 80). | ||
+ | * URLs may be reserved for both HTTP and HTTPs. | ||
+ | |||
+ | The following pattern will match in case an URL conforms to the aforementioned rules: | ||
+ | <code regex> | ||
+ | ^https?: | ||
+ | </ | ||
+ | |||
+ | ====== Matching Domain Names ====== | ||
+ | |||
+ | Conforming to [[https:// | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * labels must be 63 octets or less | ||
+ | |||
+ | <code regex> | ||
+ | (?: | ||
+ | </ | ||
+ | |||
+ | ====== Matching IP Addresses ====== | ||
+ | |||
+ | A comprehensive rule that keeps into account IP address classes is the following: | ||
+ | |||
+ | <code regex> | ||
+ | (?: | ||
+ | </ | ||