我有一个表单,其中有几个URL的字段。我写了一个Zend Framework验证器,它执行一个简单的preg_match来筛选出荒谬的字符串,然后执行一个curl HEAD请求(CURLOPT_NOBODY)来筛选出404和其他连接问题。在测试中,我遇到了神秘的返回代码0与“未知的SSL协议错误”,所以我添加了一个检查,以接受任何有效的消息与“SSL”在它,因为这将表明该URL到达了一个Web服务器。
但是,我们的客户在实践中可能会使用的一个特定URL重定向到PDF文件的s3.amazonaws.com URL。在浏览器中,原始URL和它重定向到的s3 URL都可以很好地显示PDF。因为我使用了CURLOPT_FOLLOWLOCATION,我希望我的验证器会接受它。但结果却是404.然后我尝试直接指定s3 URL,结果出现了403(!)。我认为403可能是因为我指定了一个HTTP_X_REQUESTED_WITH的头而触发的:XMLHttpRequest ',我注解掉了代码中的那一行。但它仍然给了一个403。
怎么会这样?在我看来,亚马逊S3将不得不显式地寻找HEAD请求,并根据它是否通过重定向来故意发出404或403???
我想我可以删除CURLOPT_NOBODY,让它发送GET请求,但这似乎很愚蠢,因为我不关心主体。
以下是我的完整代码:
<?php
class Oshk_ZendX_Validate_Url {
static $debug = true;
// Based on https://stackoverflow.com/a/42619410/467590
const PATTERN = '/^(https?:\/\/)?[^" ]+(\.[^" ]+)*$/';
public static function isValid($value) {
$STDERR = fopen("php://stderr", "w");
$value = (string) $value;
$matches = array();
if (! preg_match(self::PATTERN, $value, $matches)) {
fwrite($STDERR, sprintf("File '%s', line %d, value '%s' does not match pattern '%s'\n", __FILE__, __LINE__, $value, self::PATTERN));
fclose($STDERR);
return false;
}
if (! array_key_exists(1, $matches)) {
$value = "https://$value";
}
if (self::$debug) {
fwrite($STDERR, sprintf("File '%s', line %d, \$value = '%s', \$matches = %s", __FILE__, __LINE__, $value, print_r($matches, true)));
}
// URL looks well-formed. Ask curl to send a HEAD request to it
$ch = curl_init($value);
if ($ch === false) {
throw new Exception("curl_init($value) failed!");
}
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_HEADER, 0); // From https://www.php.net/manual/en/curl.examples-basic.php
curl_setopt($ch, CURLOPT_HTTPHEADER, array('HTTP_X_REQUESTED_WITH: XMLHttpRequest'));
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36');
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_NOBODY, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_FAILONERROR, true);
if (self::$debug) {
curl_setopt($ch, CURLOPT_VERBOSE, true);
curl_setopt($ch, CURLOPT_STDERR, $STDERR);
}
$data = curl_exec($ch);
$msg = curl_error($ch);
$status = curl_getinfo($ch, CURLINFO_HTTP_CODE);
if (self::$debug) {
// https://stackoverflow.com/a/14436877/467590
$allinfo = curl_getinfo($ch);
fwrite($STDERR, sprintf("File '%s', line %d, \$allinfo = %s\n", __FILE__, __LINE__, print_r($allinfo, true)));
}
curl_close($ch);
if (self::$debug) {
fwrite($STDERR, sprintf("File '%s', line %d, data = '%s'\n", __FILE__, __LINE__, substr($data, 0, 255)));
}
if(! strlen($data) && $status != 0 && false === strpos($msg, 'SSL')) {
fwrite($STDERR, sprintf("File '%s', line %d, '%s' gives bad status code %d when accessed, with message '%s'\n", __FILE__, __LINE__, $value, $status, $msg));
fclose($STDERR);
return false;
}
if (self::$debug) {
fwrite($STDERR, sprintf("File '%s', line %d, url = '%s'\n", __FILE__, __LINE__, $value));
fwrite($STDERR, sprintf("File '%s', line %d, data = '%s'\n", __FILE__, __LINE__, substr($data, 0, 255)));
}
unset($data);
if (self::$debug) {
fwrite($STDERR, sprintf("File '%s', line %d, \$msg = '%s'\n", __FILE__, __LINE__, $msg));
fwrite($STDERR, sprintf("File '%s', line %d, \$status = '%s'\n", __FILE__, __LINE__, $status));
fwrite($STDERR, sprintf("File '%s', line %d, \$value = '%s'\n", __FILE__, __LINE__, $value));
}
if (($status >= 100 & $status < 400) || false !== strpos($msg, 'SSL')) {
fclose($STDERR);
return true;
}
fwrite($STDERR, sprintf("File '%s', line %d, '%s' gives bad status code %d when accessed, with message '%s'\n", __FILE__, __LINE__, $value, $status, $msg));
fclose($STDERR);
return false;
}
}
echo var_dump(Oshk_ZendX_Validate_Url::isValid($argv[1]));
下面是使用原始URL运行的bash shell会话:
$ php curltest.php 'https://americandrivingsociety.org/docs.ashx?id=1037680'
File 'C:\xampp1826\htdocs\OSH0\curltest.php', line 21, $value = 'https://americandrivingsociety.org/docs.ashx?id=1037680', $matches = Array
(
[0] => https://americandrivingsociety.org/docs.ashx?id=1037680
[1] => https://
)
* Trying 208.66.171.71:443...
* Connected to americandrivingsociety.org (208.66.171.71) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
* CAfile: \xampp7412\apache\bin\curl-ca-bundle.crt
CApath: none
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN, server did not agree to a protocol
* Server certificate:
* subject: CN=americandrivingsociety.org
* start date: Sep 2 00:00:00 2022 GMT
* expire date: Oct 3 23:59:59 2023 GMT
* subjectAltName: host "americandrivingsociety.org" matched cert's "americandrivingsociety.org"
* issuer: C=GB; ST=Greater Manchester; L=Salford; O=Sectigo Limited; CN=Sectigo RSA Domain Validation Secure Server CA
* SSL certificate verify ok.
> HEAD /docs.ashx?id=1037680 HTTP/1.1
Host: americandrivingsociety.org
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36
Accept: */*
HTTP_X_REQUESTED_WITH: XMLHttpRequest
* old SSL session ID is stale, removing
* Mark bundle as not supporting multiuse
* The requested URL returned error: 404 Not Found
* Closing connection 0
File 'C:\xampp1826\htdocs\OSH0\curltest.php', line 46, $allinfo = Array
(
[url] => https://americandrivingsociety.org/docs.ashx?id=1037680
[content_type] =>
[http_code] => 404
[header_size] => 0
[request_size] => 250
[filetime] => -1
[ssl_verify_result] => 0
[redirect_count] => 0
[total_time] => 0.132769
[namelookup_time] => 0.009406
[connect_time] => 0.035694
[pretransfer_time] => 0.090879
[size_upload] => 0
[size_download] => 0
[speed_download] => 0
[speed_upload] => 0
[download_content_length] => -1
[upload_content_length] => -1
[starttransfer_time] => 0.132714
[redirect_time] => 0
[redirect_url] =>
[primary_ip] => 208.66.171.71
[certinfo] => Array
(
)
[primary_port] => 443
[local_ip] => 16.1.1.151
[local_port] => 55977
[http_version] => 2
[protocol] => 2
[ssl_verifyresult] => 0
[scheme] => HTTPS
[appconnect_time_us] => 90757
[connect_time_us] => 35694
[namelookup_time_us] => 9406
[pretransfer_time_us] => 90879
[redirect_time_us] => 0
[starttransfer_time_us] => 132714
[total_time_us] => 132769
)
File 'C:\xampp1826\htdocs\OSH0\curltest.php', line 50, data = ''
File 'C:\xampp1826\htdocs\OSH0\curltest.php', line 53, 'https://americandrivingsociety.org/docs.ashx?id=1037680' gives bad status code 404 when accessed, with message 'The requested URL returned error: 404 Not Found'
C:\xampp1826\htdocs\OSH0\curltest.php:77:
bool(false)
repete@DESKTOP-CLQS7C1 /cygdrive/c/xampp1826/htdocs/OSH0
$
下面是使用s3 URL重定向到相同的事情:
$ php curltest.php 'https://s3.amazonaws.com/ClubExpressClubFiles/548049/documents/Omnibus_02-01-2023_Black_Prong_Driving_Derby_4_581817244.pdf?AWSAccessKeyId=AKIA6MYUE6DNNNCCDT4J&Expires=1683645984&response-content-disposition=inline%3B%20filename%3DOmnibus_02-01-2023_Black_Prong_Driving_Derby_4.pdf&Signature=YQGemVm9Gphf2EZ%2F4K%2FIyK%2Bmm7I%3D'
File 'C:\xampp1826\htdocs\OSH0\curltest.php', line 21, $value = 'https://s3.amazonaws.com/ClubExpressClubFiles/548049/documents/Omnibus_02-01-2023_Black_Prong_Driving_Derby_4_581817244.pdf?AWSAccessKeyId=AKIA6MYUE6DNNNCCDT4J&Expires=1683645984&response-content-disposition=inline%3B%20filename%3DOmnibus_02-01-2023_Black_Prong_Driving_Derby_4.pdf&Signature=YQGemVm9Gphf2EZ%2F4K%2FIyK%2Bmm7I%3D', $matches = Array
(
[0] => https://s3.amazonaws.com/ClubExpressClubFiles/548049/documents/Omnibus_02-01-2023_Black_Prong_Driving_Derby_4_581817244.pdf?AWSAccessKeyId=AKIA6MYUE6DNNNCCDT4J&Expires=1683645984&response-content-disposition=inline%3B%20filename%3DOmnibus_02-01-2023_Black_Prong_Driving_Derby_4.pdf&Signature=YQGemVm9Gphf2EZ%2F4K%2FIyK%2Bmm7I%3D
[1] => https://
)
* Trying 52.216.56.0:443...
* Connected to s3.amazonaws.com (52.216.56.0) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
* CAfile: \xampp7412\apache\bin\curl-ca-bundle.crt
CApath: none
* SSL connection using TLSv1.2 / ECDHE-RSA-AES128-GCM-SHA256
* ALPN, server accepted to use http/1.1
* Server certificate:
* subject: CN=s3.amazonaws.com
* start date: Apr 11 00:00:00 2023 GMT
* expire date: Dec 20 23:59:59 2023 GMT
* subjectAltName: host "s3.amazonaws.com" matched cert's "s3.amazonaws.com"
* issuer: C=US; O=Amazon; CN=Amazon RSA 2048 M01
* SSL certificate verify ok.
> HEAD /ClubExpressClubFiles/548049/documents/Omnibus_02-01-2023_Black_Prong_Driving_Derby_4_581817244.pdf?AWSAccessKeyId=AKIA6MYUE6DNNNCCDT4J&Expires=1683645984&response-content-disposition=inline%3B%20filename%3DOmnibus_02-01-2023_Black_Prong_Driving_Derby_4.pdf&Signature=YQGemVm9Gphf2EZ%2F4K%2FIyK%2Bmm7I%3D HTTP/1.1
Host: s3.amazonaws.com
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36
Accept: */*
HTTP_X_REQUESTED_WITH: XMLHttpRequest
* Mark bundle as not supporting multiuse
* The requested URL returned error: 403 Forbidden
* Closing connection 0
File 'C:\xampp1826\htdocs\OSH0\curltest.php', line 46, $allinfo = Array
(
[url] => https://s3.amazonaws.com/ClubExpressClubFiles/548049/documents/Omnibus_02-01-2023_Black_Prong_Driving_Derby_4_581817244.pdf?AWSAccessKeyId=AKIA6MYUE6DNNNCCDT4J&Expires=1683645984&response-content-disposition=inline%3B%20filename%3DOmnibus_02-01-2023_Black_Prong_Driving_Derby_4.pdf&Signature=YQGemVm9Gphf2EZ%2F4K%2FIyK%2Bmm7I%3D
[content_type] =>
[http_code] => 403
[header_size] => 0
[request_size] => 523
[filetime] => -1
[ssl_verify_result] => 0
[redirect_count] => 0
[total_time] => 0.128771
[namelookup_time] => 0.027331
[connect_time] => 0.043198
[pretransfer_time] => 0.107906
[size_upload] => 0
[size_download] => 0
[speed_download] => 0
[speed_upload] => 0
[download_content_length] => -1
[upload_content_length] => -1
[starttransfer_time] => 0.128721
[redirect_time] => 0
[redirect_url] =>
[primary_ip] => 52.216.56.0
[certinfo] => Array
(
)
[primary_port] => 443
[local_ip] => 16.1.1.151
[local_port] => 56277
[http_version] => 2
[protocol] => 2
[ssl_verifyresult] => 0
[scheme] => HTTPS
[appconnect_time_us] => 107740
[connect_time_us] => 43198
[namelookup_time_us] => 27331
[pretransfer_time_us] => 107906
[redirect_time_us] => 0
[starttransfer_time_us] => 128721
[total_time_us] => 128771
)
File 'C:\xampp1826\htdocs\OSH0\curltest.php', line 50, data = ''
File 'C:\xampp1826\htdocs\OSH0\curltest.php', line 53, 'https://s3.amazonaws.com/ClubExpressClubFiles/548049/documents/Omnibus_02-01-2023_Black_Prong_Driving_Derby_4_581817244.pdf?AWSAccessKeyId=AKIA6MYUE6DNNNCCDT4J&Expires=1683645984&response-content-disposition=inline%3B%20filename%3DOmnibus_02-01-2023_Black_Prong_Driving_Derby_4.pdf&Signature=YQGemVm9Gphf2EZ%2F4K%2FIyK%2Bmm7I%3D' gives bad status code 403 when accessed, with message 'The requested URL returned error: 403 Forbidden'
C:\xampp1826\htdocs\OSH0\curltest.php:77:
bool(false)
repete@DESKTOP-CLQS7C1 /cygdrive/c/xampp1826/htdocs/OSH0
$
1条答案
按热度按时间oaxa6hgo1#
我添加了一个检查,以接受任何有效的消息,其中包含“SSL”
这似乎很危险。如果错误消息是“无效的SSL证书”怎么办?
因为这意味着URL到达了Web服务器
这对任何回答都是真的-- 300,400,500,随便什么。如果您的连接没有超时,则表明您已成功连接到某个对象,无论状态代码如何。也就是说,按照这种逻辑,如果你验证的是“到达Web服务器”,那么只有超时应该失败。
我想我可以删除CURLOPT_NOBODY,让它发送GET请求,但这似乎很愚蠢,因为我不关心主体。
你不能期望每个URL都能通过HEAD请求成功到达,或者HEAD请求的结果总是与GET请求的结果相同。
curl_setopt($ch,CURLOPT_SSL_VERIFYPEER,false);
别这样如果验证失败,你希望请求失败,这就是SSL的全部意义。
总的来说,如果您不打算验证页面的实际内容,那么我认为即使提出请求也没有任何意义。只需验证URL的语法即可。否则,你会失败的东西,如短暂的网络错误,维护停机时间,广告拦截器,基于IP的过滤,等等。你已经有了大量的代码,而这些代码应该只有一行:
如果你还想测试连接,并确保在表单提交时有一个实时服务器响应请求,那么状态并不重要,你可以通过file_get_contents()检查HTTP Package 器的非假返回值: