How to get description with php regular expression?

Multi tool use
Multi tool use


How to get description with php regular expression?



Iam making a webcrawler and I need to extract the metadata that contains the description, this is what I did:


$html = file_get_contents('http://www.google.com');
preg_match('/<meta name="description" content="(.*)"/>i', $html, $description);
$description_out = $description;
var_dump($description_out);



and I get this error



Warning: preg_match(): Unknown modifier '>' in
C:xampphtdocswebcrawlerphp-web-crawlerindex.php on line 21



What is the correct regular expression?





Possible duplicate of RegEx match open tags except XHTML self-contained tags
– Yunnosch
Jun 30 at 5:56




2 Answers
2



Your pattern is incorrect. You start with a / delimiter and then you have an unescaped / in the pattern this ends the pattern and everything after it is read as modifiers.
Then your end delimiter was on the wrong way, was should be /.


/


/



/


'/<meta name="description" content="(.*)"/>/i',





That works, but there is a problem. When I do a crawler to twitter for example, that page does not contain a meta description, how can I validate that there is one or not? On the other hand, the pages to which I can crawler return the content but with this at the end "/>
– Diesan Romero
Jun 30 at 6:07





I assume if there is no meta the return array is empty? I.e. count ==0?
– Andreas
Jun 30 at 6:09





When I try with google I get this error: Notice: Undefined offset: 1 in C:xampphtdocswebcrawlerphp-web-crawlerindex.php on line 24
– Diesan Romero
Jun 30 at 6:12





this is my line 24: $description_out = $description[1];
– Diesan Romero
Jun 30 at 6:12





Yes and that is because the array is empty. Nothing wrong. It's your code that is not checking if it is empty or not that is at fault. Never assign variables from arrays if you don't know if the data is there or not. Check first if it's empty, count is zero or isset.
– Andreas
Jun 30 at 6:14



As an alternative, instead of using a regex you might use DOMDocument and DOMXPath with an xpath expression /html/head/meta[@name="description"]/@content to get the content attribute.


/html/head/meta[@name="description"]/@content


$document = new DOMDocument();
$document->loadHTML($html);
$xpath = new DOMXPath($document);
$items = $xpath->query('/html/head/meta[@name="description"]/@content');
foreach ($items as $item) {
echo $item->value . "<br>";
}



The $items are of type DOMNodeList which you could loop using for example a foreach. The $item is of type DOMAttr from which you can get the value.


$items


foreach


$item






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

2TCTTT,c3j
yBxS 9ispGJQ,R9Qj,ZjBRW,f u7SZ Esc9D2lN Pw0X5YQ

Popular posts from this blog

Delphi Android file open failure with API 26

.

Amasya