我正试图刮的产品规格从亚马逊使用VBA. HTML页面刮:https://www.amazon.in/dp/B01FXJI1OY
我的两大要求是:1)拆分产品标题以获取特定规格2)从第页上提到的要点(BP)获取其余规格
我想到的解决方案(如果您认为有更好的方法,请提出建议):使用文字标识符(等级库值或等级库值后面的文字):
我当前的代码能够获取产品标题。它还获取与单元格(2,2)中存储的值匹配的项目符号。请帮助我如何使用标识符获取规格的值(对于某些规格,如保修的月/年,这些标识符是多个):
Sub GetchDetails()
Application.ScreenUpdating = False
Application.DisplayAlerts = False
Application.EnableEvents = False
Dim IE As Object ' InternetExplorer.Application
Dim url As String
Dim sh As Worksheet
Dim rw As Range
ThisWorkbook.Sheets("Crawler").Activate
Set sh = ActiveSheet
Set IE = CreateObject("InternetExplorer.Application")
' IE.Visible = True
url = "https://amazon.in/dp/B01FXJI1OY"
On Error Resume Next
IE.Navigate2 url
Do While IE.Busy = True Or IE.readystate <> 4
DoEvents
Loop
Set HTMLDoc = IE.document
Application.Wait (Now + TimeValue("0:00:01"))
Option Compare Text
Set itm = HTMLDoc.getElementById("productTitle")
Cells(rw.Row, 3).Value = itm.innertext
Set itm = HTMLDoc.getElementsByClassName("a-unordered-list a-vertical a-spacing-none")(0)
i = 0
For Each Item In itm.getElementsByTagName("li")
If LCase(Item.innertext) Like "*" & LCase(Cells(2, 2)) & "*" Then
Cells(rw.Row, 5 + i).Value = Item.innertext
i = i + 1
End If
Next Item
1条答案
按热度按时间e3bfsja21#
我在想......像这样的东西......作为开始。当然,你可以修改它以适合你的需要。
结果: