How to get description with php regular expression?

Multi tool use
How to get description with php regular expression?
Iam making a webcrawler and I need to extract the metadata that contains the description, this is what I did:
$html = file_get_contents('http://www.google.com');
preg_match('/<meta name="description" content="(.*)"/>i', $html, $description);
$description_out = $description;
var_dump($description_out);
and I get this error
Warning: preg_match(): Unknown modifier '>' in
C:xampphtdocswebcrawlerphp-web-crawlerindex.php on line 21
What is the correct regular expression?
2 Answers
2
Your pattern is incorrect. You start with a /
delimiter and then you have an unescaped /
in the pattern this ends the pattern and everything after it is read as modifiers.
Then your end delimiter was on the wrong way, was should be
/
.
/
/
/
'/<meta name="description" content="(.*)"/>/i',
That works, but there is a problem. When I do a crawler to twitter for example, that page does not contain a meta description, how can I validate that there is one or not? On the other hand, the pages to which I can crawler return the content but with this at the end "/>
– Diesan Romero
Jun 30 at 6:07
I assume if there is no meta the return array is empty? I.e. count ==0?
– Andreas
Jun 30 at 6:09
When I try with google I get this error: Notice: Undefined offset: 1 in C:xampphtdocswebcrawlerphp-web-crawlerindex.php on line 24
– Diesan Romero
Jun 30 at 6:12
this is my line 24: $description_out = $description[1];
– Diesan Romero
Jun 30 at 6:12
Yes and that is because the array is empty. Nothing wrong. It's your code that is not checking if it is empty or not that is at fault. Never assign variables from arrays if you don't know if the data is there or not. Check first if it's empty, count is zero or isset.
– Andreas
Jun 30 at 6:14
As an alternative, instead of using a regex you might use DOMDocument and DOMXPath with an xpath expression /html/head/meta[@name="description"]/@content
to get the content attribute.
/html/head/meta[@name="description"]/@content
$document = new DOMDocument();
$document->loadHTML($html);
$xpath = new DOMXPath($document);
$items = $xpath->query('/html/head/meta[@name="description"]/@content');
foreach ($items as $item) {
echo $item->value . "<br>";
}
The $items
are of type DOMNodeList which you could loop using for example a foreach
. The $item
is of type DOMAttr from which you can get the value.
$items
foreach
$item
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
Possible duplicate of RegEx match open tags except XHTML self-contained tags
– Yunnosch
Jun 30 at 5:56