John Davidson

php - Privacy Crawler

0 comments
Message:


i need your help, can anyone explain me why my code doesnt find the a-tag privacy on the site zoho.com?


my code finds the link "privacy" on other sites well but not on the site zoho.com


I use symfony Crawler: https://symfony.com/doc/current/components/dom_crawler.html


// Imprint Check //

function findPrivacy($domain) {

$ua = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/525.13 (KHTML, like Gecko) Chrome/0.A.B.C Safari/525.13';
$curl = curl_init($domain);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($curl, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($curl, CURLOPT_CONNECTTIMEOUT, 30);
curl_setopt($curl, CURLOPT_USERAGENT, $ua);
$data = curl_exec($curl);

$crawler = new Crawler($data);

$nodeValues = $crawler->filter('a')->each(function ($node) {

if(str_contains($node->attr('href'), 'privacy-police') || str_contains($node->attr('href'), 'privacy')) {

return true;

} else {

return false;

}

});

return $nodeValues;

}


if you watch the source code from zoho.com, then you will see the footer is empty. But on the site, the footer isnt empty if you scroll down.


How can I find now this link Privacy?



My Answer:
to the privacy policy page on zoho.com, but it doesn't find the specific a-tag with the text "privacy". Here is my code:

php
$url = 'https://www.zoho.com/';
$html = file_get_contents($url);

$dom = new DOMDocument();
$dom->loadHTML($html);

$links = $dom->getElementsByTagName('a');
foreach ($links as $link) {
if ($link->nodeValue == 'Privacy') {
echo $link->getAttribute('href');
break;
}
}
?>


The issue might be that the text "Privacy" is not an exact match in the a-tag on the zoho.com website. You can try using a different approach to find the privacy policy link, such as searching for the href attribute that contains the word "privacy" or using a regular expression to match the text more flexibly.

You can also try using a tool like XPath to search for the specific a-tag with the text "privacy". Here is an example using XPath:

php
$url = 'https://www.zoho.com/';
$html = file_get_contents($url);

$dom = new DOMDocument();
$dom->loadHTML($html);

$xpath = new DOMXPath($dom);
$links = $xpath->query('//a[contains(text(), "Privacy")]');
foreach ($links as $link) {
echo $link->getAttribute('href');
}
?>


This code snippet uses XPath to search for the a-tag that contains the text "Privacy" on the zoho.com website. This should help you find the specific link you are looking for.

Rate this post

4 of 5 based on 2852 votes

Comments




© 2024 Hayatsk.info - Personal Blogs Platform. All Rights Reserved.
Create blog  |  Privacy Policy  |  Terms & Conditions  |  Contact Us