erlang spawn中的httpc +多个请求不起作用

dba5bblo  于 2022-12-08  发布在  Erlang
关注(0)|答案(1)|浏览(146)

我尝试将Erlang的httpc模块用于高并发请求
我在spawn中处理许多请求的代码都不起作用:

-module(t).
-compile(export_all).

start() ->
  ssl:start(),
  inets:start( httpc, [{profile, default}] ),
  httpc:set_options([{max_sessions, 200}, {pipeline_timeout, 20000}], default),

  {ok, Device} = file:open("c:\urls.txt", read),
  read_each_line(Device).

read_each_line(Device) ->
  case io:get_line(Device, "") of
    eof  -> file:close(Device);
    Line -> go( string:substr(Line, 1,length(Line)-1)),
      read_each_line(Device)
  end.

go(Url)->
  spawn(t,geturl, [Url] ).

geturl(Url)->
  UrlHTTP=lists:concat(["http://www.",  Url]),
  io:format(UrlHTTP),io:format("~n"),

  {ok, RequestId}=httpc:request(get,{UrlHTTP,[{"User-Agent", "Opera/9.80 (Windows NT 6.1; U; ru) Presto/2.8.131 Version/11.10"}]}, [],[{sync, false}]),

  receive
    {http, {RequestId, {_HttpOk, _ResponseHeaders, Body}}} -> io:format("ok"),ok
  end.

httpc:html正文中未收到请求-如果可以在中使用spawn

go(Url)->
      spawn(t,geturl, [Url] ).

http://erlang.org/doc/man/httpc.html

备注

如果可能的话,客户端将保持其连接活动,并根据配置和当前环境使用带或不带管道的持久连接。HTTP/1.1规范没有提供在持久连接上发送多少请求是理想的准则,这在很大程度上取决于应用。请注意,一个很长的请求队列可能会导致用户感觉到的延迟,因为较早的请求可能需要很长时间才能完成。1规范确实建议每个服务器限制为2个持久连接,这是max_sessions选项的缺省值
urls.txt包含不同的URL-例如

google.com
amazon.com
alibaba.com
...

你怎么了?

kmb7vmvb

kmb7vmvb1#

Your code never actually starts the httpc service (and inets , the application that it depends on), and the confusion probably comes from the unfortunate overloading of the inets:start/[0,1,2,3] function:

  • inets:start/[0,1] starts the inets application itself and the httpc service with the default profile (called default ).
  • inets:start/[2,3] (which should be called start_service) starts one of the services that can run atop inets (viz. ftpc , tftp , httpc , httpd ) once the inets application has already started.

start() ->
start(Type) -> ok | {error, Reason}
Starts the Inets application.
start(Service, ServiceConfig) -> {ok, Pid} | {error, Reason}
start(Service, ServiceConfig, How) -> {ok, Pid} | {error, Reason}
Dynamically starts an Inets service after the Inets application has been started
    (with inets:start/[0,1]).
So your spawned process simply crashed when trying to call httpc:request/4 as the service itself was not running. To illustrate, inets:start( httpc, [{profile, default}] ) from your start/0 function would fail to start inets and the httpc service:

Eshell V10.7  (abort with ^G)                                                                                                            
1> inets:start(httpc, [{profile, default}]).
{error,inets_not_started}

You should check the returned value of application start to track potential problems:

...
ok = ssl:start(),
ok = inets:start(),
...

Or, if the application could be already started, use a function like this:

...
ok = ensure_start(ssl),
ok = ensure_start(inets),
...
ensure_start(M) ->
    case M:start() of
        ok -> ok;
        {error,{already_started,M}} -> ok;
        Other -> Other
    end.

[edit 2 - small code enhancement]

I have tested this code and it works on my PC. Note that you are using a '' in the string for file access, this is an escape sequence that make the line to fail.

-module(t).
-compile(export_all).

start() -> start(2000).

% `To` is a parameter which is passed to `getUrl`
% to change the timeout value. You can  play with 
% it  to  see  the  request queue effect, and how
% much the response times of each site varies.
%
% The default timeout value is set to 2 seconds.
start(To) ->
  ok = ensure_start(ssl),
  ok = ensure_start(inets),
  ok = httpc:set_options([{max_sessions, 200}, {pipeline_timeout, 20000}], default),

  {ok, Device} = file:open("D:/urls.txt", read),
  read_each_line(Device,To).

read_each_line(Device,To) ->
  case io:get_line(Device, "") of
    eof  -> file:close(Device);
    Line -> go( string:substr(Line, 1,length(Line)-1),To),
      read_each_line(Device,To)
  end.

go(Url,To)->
  spawn(t,geturl, [Url,To] ).

geturl(Url,To)->
  UrlHTTP=lists:concat(["http://www.",  Url]),
  io:format(UrlHTTP), io:format("~n"),

  {ok, RequestId}=httpc:request(get,{UrlHTTP,[{"User-Agent", "Opera/9.80 (Windows NT 6.1; U; ru) Presto/2.8.131 Version/11.10"}]}, [],[{sync, false}]),

  M = receive
        {http, {RequestId, {_HttpOk, _ResponseHeaders, _Body}}} -> ok
      after To ->
        not_ok
      end,
  io:format("httprequest to ~p: ~p~n",[UrlHTTP,M]).

  ensure_start(M) ->
    case M:start() of
        ok -> ok;
        {error,{already_started,M}} -> ok;
        Other -> Other
    end.

and in the console:

1> t:start().
http://www.povray.org
http://www.google.com
http://www.yahoo.com
ok
httprequest to "http://www.google.com": ok
httprequest to "http://www.povray.org": ok
httprequest to "http://www.yahoo.com": ok
2> t:start().
http://www.povray.org
http://www.google.com
http://www.yahoo.com
ok
httprequest to "http://www.google.com": ok
httprequest to "http://www.povray.org": ok
httprequest to "http://www.yahoo.com": ok
3>

Note that thanks to the ensure_start/1 you can launch the application twice.
I have tested also with a bad url and it is detected.
My test include only 3 urls, and I guess that if there are many urls, the time to get the response will increase, because the loop to spawn processes is faster to execute than the request themselves. So you must expect at some point some timeout issue. There may be also some limitation in the http client, I didn't check the doc for this particular point.

相关问题