以下是PHP Manual 里关于正则表达式式的条件分枝的叙述:
Conditional subpatterns
It is possible to cause the matching process to obey a subpattern conditionally or to choose between two alternative subpatterns, depending on the result of an assertion, or whether a previous capturing subpattern matched or not. The two possible forms of conditional subpattern are
(?(condition)yes-pattern)
(?(condition)yes-pattern|no-pattern)
If the condition is satisfied, the yes-pattern is used; otherwise the no-pattern (if present) is used. If there are more than two alternatives in the subpattern, a compile-time error occurs.
示例:
(?(?=[^a-z]*[a-z])\d{2}-[a-z]{3}-\d{2} | \d{2}-\d{2}-\d{2} )
昨天在抓取该网页数据时,
发现只抓取了其中的5个, 以下是先前写的正则表达式:
<div class="title">[\s\S]*?<a href="/(?<url>[^"]*)">(?<name>[\s\S]*?)</a>[\S\s]*?<span class="myerror">\$(?<price>[^<]*)</span>
上面写的正则在取 price 的时候,只取了被标识为 myerror 的 price .但是被 myerror 标志的价格只是其中的一种, 该网页的 price 有两种情况:
1,
<div class="sprice"> Manufacturers RRP*: <strike>$29.95</strike><br/>Our Price: <span class="myerror">$24.00</span> </div>
2,
<div class="sprice"> $11.95 </div>
所以用上面写的正则只能匹配第一种情况.
正则的条件分枝以前没用过,还有前断言后断言搞的我云里雾里,摸索了快两小时才把它们给搞定.
最终的表达式(只有价格部分):
<div class="sprice">[\s]*(?(?=M)[\s\S]*?<span class="myerror">\$(?<price>[^<]*)|\$(?<price>[^<]*))
<div class="sprice">[\s]*后面的部分就是一个条件分枝
(?=M) 是条件,即指 \s 的后面紧跟的字母是 M, 如果条件成立,就取被 myerror 标志的 price ,否则就取上面所说的第2种情况的 price.
还有另外一个版本:
<div class="sprice">(?([\s\S]*?(?=myerror))[\s\S]*?<span class="myerror">\$(?<price>[^<]*)</span>|[\s\S]*?\$(?<price>[^<]*)</div>)
但是只对该断文本起作用,对源网页进行匹配出错,我想应该是 [\s\S]*? 引起的吧:
1,
<div class="sprice"> Manufacturers RRP*: <strike>$29.95</strike><br/>Our Price: <span class="myerror">$24.00</span> </div>
2,
<div class="sprice"> $11.95 </div>
| < Prev | Next > |
|---|
Last Updated ( Wednesday, 28 October 2009 11:18 )



