Regex vocabulary
The following are the minimal building blocks you will need, in order to write regular expressions and RewriteRules. They certainly do not represent a complete regular expression vocabulary, but they are a good place to start, and should help you read basic regular expressions, as well as write your own.
The following are the minimal building blocks you will need, in order to write regular expressions and RewriteRules. They certainly do not represent a complete regular expression vocabulary, but they are a good place to start, and should help you read basic regular expressions, as well as write your own.
Character | Meaning | Example |
. | Matches any single character | c.t will match cat, cot, cut, etc. |
+ | Repeats the previous match one or more times | a+ matches a, aa, aaa, etc |
* | Repeats the previous match zero or more times. | a* matches all the same things a+ matches, but will also match an empty string. |
? | Makes the match optional. | colou?r will match color and colour. |
^ | Called an anchor, matches the beginning of the string | ^a matches a string that begins with a |
$ | The other anchor, this matches the end of the string. | a$ matches a string that ends with a. |
( ) | Groups several characters into a single unit, and captures a match for use in a backreference. | (ab)+ matches ababab - that is, the +applies to the group. For more on backreferences see below. |
[ ] | A character class - matches one of the characters | c[uoa]t matches cut, cot or cat. |
[^ ] | Negative character class - matches any character not specified | c[^/]t matches cat or c=t but not c/t |
In mod_rewrite the ! character can be used before a regular expression to negate it. This is, a string will be considered to have matched only if it does not match the rest of the expression.
Here's the very basics of regexp (expanded from the Apache mod_rewrite documentation)..
Escaping:
\char escape that particular char
For instance to specify special characters.. [].()\ etc.
Text:
. Any single character (on its own = the entire URI)
[chars] Character class: One of following chars
[^chars] Character class: None of following chars
text1|text2 Alternative: text1 or text2 (i.e. "or")
e.g. [^/] matches any character except /
(foo|bar)\.html matches foo.html and bar.html
Quantifiers:
? 0 or 1 of the preceding text
* 0 or N of the preceding text (hungry)
+ 1 or N of the preceding text
e.g. (.+)\.html? matches foo.htm and foo.html
(foo)?bar\.html matches bar.html and foobar.html
Grouping:
(text) Grouping of text
Either to set the borders of an alternative or for making backreferences where the nthe group can
be used on the target of a RewriteRule with $n
e.g. ^(.*)\.html foo.php?bar=$1
Anchors:
^ Start of line anchor
$ End of line anchor
An anchor explicitly states that the character right next to it MUST
be either the very first character ("^"), or the very last character ("$") of the URI string to match against the pattern, e.g.. ^foo(.*) matches foo and foobar but not eggfoo
(.*)l$ matches fool and cool, but not foo
No comments:
Post a Comment