一千萬個為什麽

搜索

解析包含特定短語的鏈接

i have this script. i need is so that it only writes the links that contain "/product-product/" to the file items.txt. Well, not the wohle link but the 10 didget item-nr product-product/1007687980

在這個例子中,你會看到item-nr是/ 100。我正在尋找某個類別中的項目,其中的nrs是/ 100。但這不再需要。

$keyword= $_SERVER['QUERY_STRING'];
$site=1;
while ($site<30) {
$content = file_get_contents('http://www.domain.com/?keywords='. $keyword .'&x=0&y=0&pagecount='.$site.'&sort=sort');
$html = $content;
$dom = new DomDocument();
@$dom->loadHTML($html);
$urls = $dom->getElementsByTagName('a');

$lookfor='http://www.domain.com';

foreach ($urls as $url){
    if(substr($url->getAttribute('href'),0,strlen($lookfor))==$lookfor){
       $tubeurl = str_replace ("http://www.domain.com","",$url->getAttribute('href'));
    $tubeurl = substr($tubeurl, strpos($tubeurl,"/product-product/100")+17, 10);
    file_put_contents("items.txt", "" .$tubeurl. "
", FILE_APPEND | LOCK_EX);// this line must remain, it makes it so that there is a new line  \n wouldn't work

    }
} $site++; echo $site;}

正則表達式將是一個解決方案。但我在這裏閱讀的是stackoverflow,這是很多工作的服務器。

最佳答案

一個簡單的正則表達式可以讓你的產品ID變成1美元。您可能需要更多的邏輯來確保$ 1。修改它,使$ 1應始終為10位數字。

$keyword= $_SERVER['QUERY_STRING'];
$site=1;
while ($site<30) {
  $content = file_get_contents('http://www.domain.com/?keywords=' . $keyword . '&x=0&y=0&pagecount='.$site.'&sort=sort');
  $html = $content;
  $dom = new DomDocument();
  @$dom->loadHTML($html);
  $urls = $dom->getElementsByTagName('a');

  $lookfor='http://www.domain.com';

  foreach ($urls as $url){
      if(substr($url->getAttribute('href'),0,strlen($lookfor))==$lookfor){
         $tubeurl = str_replace ("http://www.domain.com","",$url->getAttribute('href'));

         preg_match("/^http.*/product-product\/(\d{10})$/", $tubeurl, $matches);



      file_put_contents("items.txt", $1,
        FILE_APPEND | LOCK_EX);//this line must remain, it makes it so that there is a new line  \n wouldn't work

      }
  } 

  $site++;

  echo $site;
}

轉載註明原文: 解析包含特定短語的鏈接