我正在寻找一种方法来获得所有外部域名用于网站。例如:堆栈溢出.comgoogletagservices.com, google-analytics.com, fbcdn.net, i.stack.imgur.com, cdn.sstatic.net.有没有办法在bash或php中得到这个域名列表?我的google fu失败了。基本上是这个列表:
googletagservices.com, google-analytics.com, fbcdn.net, i.stack.imgur.com, cdn.sstatic.net.
使用www.example.com的另一示例webpagetest.org
cig3rfwq1#
<?php // Download The Remote WebPage $websiteURL= "https://www.google.com"; $curl = curl_init($websiteURL); curl_setopt($curl, CURLOPT_RETURNTRANSFER, TRUE); $webPageContent = curl_exec($curl); print("Download size: Of Main Page " . curl_getinfo($curl, CURLINFO_SIZE_DOWNLOAD) .''); //get the download size of page // Match And Extract src and href Tags URLs preg_match_all('/(?:src=)"([^"]*)"/m', $webPageContent, $matchessrc); // Get All src URLs preg_match_all('/link.*\s*(?:href=)"([^"]*)"/m', $webPageContent, $matcheslink); // Get All link->href URLs $matches = array_merge($matchessrc[1], $matcheslink[1]); $domain = parse_url($websiteURL, PHP_URL_SCHEME). '://'.parse_url($websiteURL, PHP_URL_HOST); $path = parse_url($websiteURL, PHP_URL_PATH); $checked = array(); print_r($matches); // Print All Resources URLs foreach($matches as $m) { if($m[0] == '/') // Convert / Pathe URL To Main Domain $m = $domain.$m; elseif(substr($m, 0, 5) != 'http:' and substr($m, 0, 6) != 'https:') $m = $domain.'/'.$path.'/'.$m; if(in_array($m, $checked)) // Remove Duplicate Resources URLS continue; $checked[] = $m; } ?>
1条答案
按热度按时间cig3rfwq1#