1 / 3
Caption Text
2 / 3
Caption Two
3 / 3
Caption Three margin testing

Monday, August 9, 2010

Very basics of regexp for mod_rewrite

Regex vocabulary
The following are the minimal building blocks you will need, in order to write regular expressions and RewriteRules. They certainly do not represent a complete regular expression vocabulary, but they are a good place to start, and should help you read basic regular expressions, as well as write your own.

CharacterMeaningExample
.Matches any single characterc.t will match cat, cot, cut, etc.
+Repeats the previous match one or more timesa+ matches a, aa, aaa, etc
*Repeats the previous match zero or more times.a* matches all the same things a+ matches, but will also match an empty string.
?Makes the match optional.colou?r will match color and colour.
^Called an anchor, matches the beginning of the string^a matches a string that begins with a
$The other anchor, this matches the end of the string.a$ matches a string that ends with a.
( )Groups several characters into a single unit, and captures a match for use in a backreference.(ab)+ matches ababab - that is, the +applies to the group. For more on backreferences see below.
[ ]A character class - matches one of the charactersc[uoa]t matches cut, cot or cat.
[^ ]Negative character class - matches any character not specifiedc[^/]t matches cat or c=t but not c/t


In mod_rewrite the ! character can be used before a regular expression to negate it. This is, a string will be considered to have matched only if it does not match the rest of the expression.




Here's the very basics of regexp (expanded from the Apache mod_rewrite documentation)..

Escaping:
\char escape that particular char
For instance to specify special characters..
[].()\ etc.

Text:
. Any single character (on its own = the entire URI)
[chars] Character class: One of following chars
[^chars] Character class: None of following chars
text1
|text2 Alternative: text1 or text2 (i.e. "or")
e.g. [^/] matches any character except /
(foo|bar)\.html matches foo.html and bar.html

Quantifiers:
? 0 or 1 of the preceding text
* 0 or N of the preceding text (hungry)
+ 1 or N of the preceding text
e.g. (.+)\.html? matches foo.htm and foo.html
(foo)?bar\.html matches bar.html and foobar.html

Grouping:
(text) Grouping of text
Either to set the borders of an alternative or for making backreferences where the
nthe group can
be used on the target of a RewriteRule with
$n
e.g. ^(.*)\.html foo.php?bar=$1

Anchors:
^ Start of line anchor
$ End of line anchor
An anchor explicitly states that the character
right next to it MUST
be either the very first character ("^"), or the very last character ("$") of the URI string to match against the pattern, e.g..
^foo(.*) matches foo and foobar but not eggfoo
(.*)l$ matches fool and cool, but not foo


A mod_rewrite beginner's Example

http://www.workingwith.me.uk/articles/scripting/mod_rewrite

What we'll do with mod_rewrite is to silently redirect users from page/software/ toindex.php?page=software etc.

The following is what needs to go into your .htaccess file to accomplish that:

RewriteEngine on 
RewriteRule ^page/([^/\.]+)/?$ index.php?page=$1 [L]

Let's walk through that RewriteRule, and work out exactly what's going on:

^page/

Sees whether the requested page starts with page/. If it doesn't, this rule will be ignored.

([^/.]+)

Here, the enclosing brackets signify that anything that is matched will be remembered by the RewriteRule. Inside the brackets, it says "I'd like one or more characters that aren't a forward slash or a period, please". Whatever is found here will be captured and remembered.

/?$

Makes sure that the only thing that is found after what was just matched is a possible forward slash, and nothing else. If anything else is found, then this RewriteRule will be ignored.

index.php?page=$1

The actual page which will be loaded by Apache. $1 is magically replaced with the text which was captured previously.

[L]

Tells Apache to not process any more RewriteRules if this one was successful.

Let's write a quick page to test that this is working. The following test script will simply echo the name of the page you asked for to the screen, so that you can check that the RewriteRule is working.

 <html>    
<head>       
<title>
Second mod_rewrite example</title>    </head>    <body>       <p>          The requested page was:          <?php echo $_GET['page']; ?>       </p>    </body> </html>

Again, upload both the index.php page, and the .htaccess file to the same directory. Then, test it! If you put the page inhttp://www.somesite.com/mime_test/, then try requestinghttp://www.somesite.com/mime_test/page/software. The URL in your browser window will show the name of the page which you requested, but the content of the page will be created by the index.php script! This technique can obviously be extended to pass multiple query strings to a page - all you're limited by is your imagination.

网络访问攻防战之代理服务器篇

http://www.williamlong.info/archives/2283.html

  在上一篇《网络访问攻防战》中说到,在很多大型企业中和有些国家中,为了限制员工或人民访问某些网站或使用某些网络应用程序,通常做了一些访问限制。限制的方法通常有路由器IP过滤和强制使用代理服务器等几种方式。那么,本篇主要讲述使用代理服务器进行网络访问时的攻防战。

  通过代理服务器进行网络访问的很多情况,和直接访问网络的情况十分相似。代理服务器能做到不使用代理服务器的所有过滤方式,这些过滤方式在上一篇中已经有了详细地说明,唯一的区别是网络访问的攻防全部在代理服务器上进行。也就是说,如果你希望应用直接访问网络的访问攻防技术到代理服务器中,那么首先必须使浏览器或网络应用程序设置了代理服务器。

  然而,寻找访问外网的代理服务器地址又是一门学问。有些网络环境是在浏览器中直接设置了代理服务器的地址和端口,这样获得其地址十分容易;而有些网络环境使用了"自动代理配置脚本"的功能,以达到访问不同的网络使用不同的代理服务器的功能,其中著名的AutoProxy插件也使用了这种技术;甚至有些网络环境使用了"自动检测网络代理设置"的功能,以达到计算机在不同的网络环境中都能自动配置代理的功能。不过,在后两种设置的网络环境中寻找访问外网的代理服务器地址就需要了解这些技术了,具体可以参考Proxy auto-configWeb Proxy Autodiscovery Protocol。如果不熟悉这些技术,也可以通过netstat工具或者sniffer工具找出访问外网的代理服务器地址。这些技术和工具不在本篇中展开讨论。

  如果你顺利找到了代理服务器的地址,那么接下来我们就可以分析一下代理服务器究竟可以做哪些限制,以及如何突破这些限制的方法了。

  我们略过上一篇中已经详细说明的直接访问网络情况下的攻防,来看看代理服务器还能进一步做哪些过滤。在此列举一下通常情况下会碰到的过滤方式:

  1、域名过滤。在使用代理服务器访问网络时,会遇到某个域名下的所有网页都是访问被拒绝的情况,这就是域名过滤。然而,通常情况下代理服务器对于域名的过滤只是通过分析访问地址中的域名进行的过滤,而不是通过HTTP代理协议中的Host字段进行的过滤。那么我们可以通过把访问地址中的域名换成域名对应的IP地址,来解决这个问题。

  2、IP地址过滤。在访问网络时,有时候访问某个IP地址下面的网页会出现全部拒绝访问的情况。和域名过滤一样,这种过滤方式很多情况下也只是对访问地址中的IP进行过滤,那么我们可以把IP地址换成对应的域名解决问题。如果这个IP地址没有域名或者暂时找不到其对应的域名,也可以为这个IP注册一个免费的二级域名,之后就可以把IP换域名了。

  3、 端口过滤。由于浏览网页使用的是HTTP和HTTPS协议,这两个协议使用的默认端口分别是80和443,那么为了防止人们使用其他协议,很多时候代理服务器也会限制访问的外部端口只能是80和443。碰到这种过滤方式,只能使用支持代理服务器级联的软件访问其他端口了。但由于HTTP协议是明文传输的,所以也有很多代理服务器不对HTTP协议的端口进行过滤,只对HTTPS协议的端口限制为443。如果需要使用其他协议,而其它协议使用的端口正好是 443,那么我们正好可以利用HTTPS的密文传输特性,连接到目标服务器的443端口,代理服务器无法知道我们使用的是HTTPS协议还是其它协议。如果不碰巧,其它协议使用的端口不是443,那么我们仍旧需要使用支持代理服务器级联的软件访问其他端口了。

  4、探测HTTPS协议头。由于HTTPS协议的初始握手过程仍旧是明文的,那么代理服务器可以检测连接到外部443端口的协议头。如果不是 HTTPS协议,那么就断开连接。碰到这种过滤方式,我们可以先把正常的HTTPS协议头sniff下来,加入到通讯双方,之后再进行其它协议的通讯,就可以解决问题。

  5、NTLM密码认证。有些代理服务器使用了NTLM密码认证,那么IE用户不会感觉到有什么问题,使用了其它内核的浏览器或者其它应用程序时,就会提时输入访问代理服务器的口令。由于很多代理级联软件并不支持需要经过密码认证的代理服务器,会造成一定麻烦。可以使用一款名为NTLM Authorization Proxy Server的软件解决问题。

  6、URL过滤。有时候代理服务器为了防止用户访问某一类特定的应用——比如bbs——会过滤URL中带bbs的所有访问请求。碰到这种过滤方式,我们也只能使用代理级联的软件了。

  上面大致讲述了会经常碰到的代理服务器过滤方式。不过由于代理服务器过滤的方式千奇百怪,本文无法罗列所有的过滤方式。而且除了代理级联软件有现成的之外,其它解决问题的方式都需要代理服务器使用者自己编写网络程序,所以其它方式也主要是供大家进行研究的。

  最后,我给出使用了HTTP协议进行级联的软件源代码(下载),使用方式为java -Dhttp.proxyHost=代理服务器地址 -Dhttp.proxyPort=代理服务器端口 net.tools.web.TunnelClient 本地代理服务器端口 级联的代理服务器URL,然后就可以使用本地代理服务器进行网络访问了。比如我们必须通过代理服务器192.168.0.200:8080进行外部网络访问,我们可以运行命令java -Dhttp.proxyHost=192.168.0.200 -Dhttp.proxyPort=8080 net.tools.web.TunnelClient 7890 级联的代理服务器URL,之后我们把浏览器的代理服务器设置为127.0.0.1:7890即可使用。我再给出一个级联的代理服务器URL为 http://jinshan.isysjs.com.cn/tunnel/。此URL只供测试使用,请勿滥用。如果碰到NTLM密码认证的情况,请参考上述第5种代理服务器的过滤方式。

  如有兴趣继续和我讨论有关代理服务器的网络访问攻防战,可以去我经常访问的论坛(需要使用国外IP进行访问)找我,或者直接给我发Email

  作者的Twitter: @davidsky2012 ,作者的Google Reader: https://www.google.com/reader/shared/lehui99 。

Featured Post

Windows和Ubuntu双系统完全独立的安装方法

http://www.ubuntuhome.com/windows-and-ubuntu-install.html  | Ubuntu Home Posted by Snow on 2012/06/25 安装Windows和Ubuntu双系统时,很多人喜欢先安装windows,然...