SQL Server Extracting files names and data stamps from HTML in an environment where Internet Explorer's first launch configuration is not complete

6ojccjat  于 2022-12-22  发布在  其他
关注(0)|答案(1)|浏览(89)

I have a SQLAgent scheduled PowerShell that downloads files via HTTP and loads the data to a table. It does not know the names of the files in advance. Instead it processes the folder list to determine which files need to be loaded based on their dates and the date of its last run. The available files and dates are in an HTML table on an index page. Example folder: https://lehd.ces.census.gov/data/lodes/LODES7/al/od/
I've tried this a couple of different ways. Based on Can Powershell be used to list the contents of a URL directory? I tried this:

try 
{
    $r=Invoke-WebRequest -Uri $url;
}
catch {
    $_;
    "Page not found - $url";
    return;
} 
$r.ParsedHtml.body.getElementsByTagName('TR')|%{ 
    $c=$_.getElementsByTagName('TD') |select -expand innerhtml;

And also tried this using the Read-HTMLTable gallery package:

try 
    {
         $t=Read-HTMLTable $url  
    }
    catch {
        $_;
        "Page not found - $url";
        return;
    }
    if ($null -ne $t)
    {
        foreach($r in $t)
        {

Both work fine in test but when I run the task under SQL Agent, I get the following error:
Executed as user: NT Service\SQLSERVERAGENT... The response content cannot be parsed because the Internet Explorer engine is not available, or Internet Explorer's first-launch configuration is not complete. Specify the UseBasicParsing parameter and try again.
When I implement the UseBasicParsing parameter the parsedHTML property is null. I cannot complete the IE first launch configuration because I cannot sign on as the SQLAgent task. I would prefer not to use the Proxy/Delegate feature in SQL Agent.
Is there an easy way to extract the file names and date stamps from this page?

idfiyjo8

idfiyjo81#

With the addition of a helper/parser function

Example

exec master..xp_cmdshell 'powershell.exe Invoke-WebRequest "https://lehd.ces.census.gov/data/lodes/LODES7/al/od/" -OutFile "c:\working\S.txt"',no_output

Declare @S varchar(max); 
Select @S = BulkColumn FROM  OPENROWSET(BULK 'c:\working\S.txt', SINGLE_BLOB) x; 

Select RowNr
      ,Name  = max(case when RetSeq = 1 then RetVal end)
      ,Date  = max(case when RetSeq = 2 then RetVal end)
      ,Size  = max(case when RetSeq = 3 then RetVal end)
 From (
        Select RowNr = A.RetSeq
              ,B.*
          From [dbo].[tvf-Str-Extract](@S,'<tr>','</tr>') A
          Cross Apply [dbo].[tvf-Str-Extract](A.RetVal,'>','</') B
      ) A
 Group By RowNr
 Having max(case when RetSeq = 2 then RetVal end) like '20__-%'

Results

The Table-Valued Function if Interested

CREATE FUNCTION [dbo].[tvf-Str-Extract] (@String varchar(max),@Delim1 varchar(100),@Delim2 varchar(100))
Returns Table 
As
Return (  

    Select RetSeq = row_number() over (order by RetSeq)
          ,RetVal = left(RetVal,charindex(@Delim2,RetVal)-1)
    From  (
            Select RetSeq = row_number() over (order by 1/0)
                  ,RetVal = ltrim(rtrim(B.i.value('(./text())[1]', 'varchar(max)')))
            From  ( values (convert(xml,'<x>' + replace((Select replace(@String,@Delim1,'§§Split§§') as [*] For XML Path('')),'§§Split§§','</x><x>')+'</x>').query('.'))) as A(XMLData)
            Cross Apply XMLData.nodes('x') AS B(i)
          ) C1
    Where charindex(@Delim2,RetVal)>1

)
/*
Max Length of String 1MM characters

Declare @String varchar(max) = 'Dear [[FirstName]] [[LastName]], ...'
Select * From [dbo].[tvf-Str-Extract] (@String,'[[',']]')
*/

相关问题