I have a SQLAgent scheduled PowerShell that downloads files via HTTP and loads the data to a table. It does not know the names of the files in advance. Instead it processes the folder list to determine which files need to be loaded based on their dates and the date of its last run. The available files and dates are in an HTML table on an index page. Example folder: https://lehd.ces.census.gov/data/lodes/LODES7/al/od/
I've tried this a couple of different ways. Based on Can Powershell be used to list the contents of a URL directory? I tried this:
try
{
$r=Invoke-WebRequest -Uri $url;
}
catch {
$_;
"Page not found - $url";
return;
}
$r.ParsedHtml.body.getElementsByTagName('TR')|%{
$c=$_.getElementsByTagName('TD') |select -expand innerhtml;
And also tried this using the Read-HTMLTable gallery package:
try
{
$t=Read-HTMLTable $url
}
catch {
$_;
"Page not found - $url";
return;
}
if ($null -ne $t)
{
foreach($r in $t)
{
Both work fine in test but when I run the task under SQL Agent, I get the following error:
Executed as user: NT Service\SQLSERVERAGENT... The response content cannot be parsed because the Internet Explorer engine is not available, or Internet Explorer's first-launch configuration is not complete. Specify the UseBasicParsing parameter and try again.
When I implement the UseBasicParsing parameter the parsedHTML property is null. I cannot complete the IE first launch configuration because I cannot sign on as the SQLAgent task. I would prefer not to use the Proxy/Delegate feature in SQL Agent.
Is there an easy way to extract the file names and date stamps from this page?
1条答案
按热度按时间idfiyjo81#
With the addition of a helper/parser function
Example
Results
The Table-Valued Function if Interested