PHP cURL在Web服务器上运行时被检测为bot

ia2d9nvy  于 2023-03-18  发布在  PHP
关注(0)|答案(2)|浏览(162)

我使用来自此答案https://stackoverflow.com/a/46834320/12616388的PHP cURL代码。当我在localhost上运行该脚本时,我得到了所需的输出。如果我从Web服务器运行它,我将检索验证码以验证我不是机器人。我是此主题的新手,希望了解原因。我的代码:

$request = array();
//$request[] = 'host:www.amazon.com';
$request[] = 'Connection: keep-alive';
$request[] = 'Pragma: no-cache';
$request[] = 'Cache-Control: no-cache';
$request[] = 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8';
$request[] = 'User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:98.0) Gecko/20100101 Firefox/98.0';//Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.101 Safari/537.36';
$request[] = 'DNT: 1';
$request[] = 'Accept-Encoding: gzip, deflate';
$request[] = 'Accept-Language: en-US,en;q=0.8';

$url = 'https://www.amazon.de/Wenn-Dunkeln-Sterne-funkeln-Puste-Licht-Buch/dp/3480236529/ref=sr_1_3?keywords=buch&qid=1670662644&sr=8-3';
$ch = curl_init($url);

curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, false);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_POST, false);
curl_setopt($ch, CURLOPT_HTTPHEADER, $request);
curl_setopt($ch, CURLOPT_ENCODING,"");
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 10);
curl_setopt($ch, CURLOPT_TIMEOUT,10);
curl_setopt($ch, CURLOPT_FAILONERROR,true);
$output = curl_exec($ch);

编辑:我稍微修改了代码(随机用户代理字符串和循环中的多个cURL请求),但问题是相同的:在本地主机上没有问题,在Web服务器上,我得到了验证码)。

$user_agents = array('Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.1.1 Safari/605.1.15', 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.67 Safari/537.36', 'Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.101 Safari/537.36', 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36', 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:89.0) Gecko/20100101 Firefox/89.0', 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_1) AppleWebKit/537.36 (K HTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36', 'Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:59.0) Gecko/20100101 Firefox/59.0', 'Mozilla/5.0 (X11; Linux x86_64; rv:58.0) Gecko/20100101 Firefox/58.0', 'Mozilla/5.0 (Windows NT 6.2; WOW64; rv:17.0) Gecko/20100101 Firefox/17.0');
foreach ($products as $key => $value) {
    $request = array();
    $request[] = 'Connection: keep-alive';
    $request[] = 'Pragma: no-cache';
    $request[] = 'Cache-Control: no-cache';
    $request[] = 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8';
    $request[] = 'User-Agent: ' . $user_agents[array_rand($user_agents)];
    $request[] = 'DNT: 1';
    $request[] = 'Accept-Encoding: gzip, deflate';
    $request[] = 'Accept-Language: en-US,en;q=0.8';
    $url = $value['url'];
    $ch = curl_init($url);

    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, false);
    curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
    curl_setopt($ch, CURLOPT_POST, false);
    curl_setopt($ch, CURLOPT_HTTPHEADER, $request);
    curl_setopt($ch, CURLOPT_ENCODING,"");
    curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 10);
    curl_setopt($ch, CURLOPT_TIMEOUT,10);
    curl_setopt($ch, CURLOPT_FAILONERROR,true);
    $output = curl_exec($ch);
    ...
}
jyztefdp

jyztefdp1#

因为只有当你在服务器上时才会被触发,验证码可能会跟踪IP地址。有没有可能是一个Recaptcha?
无论验证码是什么,有一件事可以帮助解决验证码从网络服务器的IP地址。
如果网络服务器有桌面环境,通过VNC(或任何你通常用来连接的东西)连接,打开浏览器并解出验证码。
如果没有,请尝试在Web服务器上设置VPN服务器(this one似乎很容易),从您的计算机连接到VPN(从而获得与Web服务器相同的IP地址),打开浏览器并解析验证码。
另一个选择是创建一个代理服务器,这将实现类似的结果VPN.
可悲的是,你将不得不不时地这样做,因为这正是验证码的作用-防止自动报废的网站通过机器人。

ndh0cuux

ndh0cuux2#

要解决此问题,您可以尝试在cURL请求中包含其他标头或Cookie,以使其看起来更像真实的用户。例如,您可以包含User-Agent标头以指定cURL请求来自的浏览器和操作系统,还可以包含Cookie标头以包含通常由真实用户发送的Cookie。
例如:

$ch = curl_init();

curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);

// Include additional headers
curl_setopt($ch, CURLOPT_HTTPHEADER, array(
  'User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.102 Safari/537.36',
  'Cookie: __cfduid=<cookie-data-goes-here>'
));

$response = curl_exec($ch);

相关问题