python 如何修复使用Selenium无限滚动的网页抓取Facebook Marketplace中的错误

dy1byipe 于 2023-02-07 发布在 Python

关注(0)|答案(1)|浏览(146)

我的代码正在从Facebook Marketplace抓取房屋数据，但遇到了一个问题。最初，当页面打开时，它只能读取24个房源。但是，当我尝试通过向下滚动页面加载更多房源时，我的代码开始从开头读取所有房源，而不是第25个房源。如何解决此问题？

open = driver.find_elements(By.XPATH, '//div[@ class="x3ct3a4"]')
#open it's a list of all clickable housing listings when I open the page

while True:
    for o in open:
        sleep(random.randint(1, 2))

        #Here I read the data that I need 

        close_button = driver.find_element(By.XPATH, close_xpath)
        close_button.click()
        sleep(random.randint(1, 2))
        #Here I close the listing and go to the next one
        
    #When I read all 24 listings that were in the 'open' list, I then scroll the page down and try to get new listings and then read them

    driver.execute_script('window.scrollTo(0, document.body.scrollHeight);')
    sleep(random.randint(2, 4)
    open = driver.find_elements(By.XPATH, open_xpath)

    #But after the scroll, my code starts reading the same listings that it already read.

以下是我的输出：

1
['', '2 Beds 1 Bath Apartment']
['$1,600 / Month']
2
['', '1 Bed 1 Bath Apartment']
['$1,500 / Month']

.
.
.
24
['', '2 Beds 2 Baths Apartment']
['$1,350 / Month']
25
['', '2 Beds 1 Bath Apartment']
['$1,600 / Month']
26
['', '1 Bed 1 Bath Apartment']
['$1,500 / Month']

因此，在第24次打开链接后，代码再次开始读取所有清单。

python

来源：https://stackoverflow.com/questions/75350283/how-to-fix-an-error-in-web-scraping-facebook-marketplace-with-infinite-scrolling

1条答案

按热度按时间

von4xj4u1#

你可以尝试几种方法。

从html中删除元素

在for o in open:循环结束时，使用javascript从html中删除当前元素o。

for o in open:
    ...
    driver.execute_script('var element = arguments[0]; element.remove();', o)

但是，此方法可能不起作用：有时当你向下滚动来加载新元素时，页面会重新加载所有先前的元素，然后重新添加到html中。如果是这种情况，请尝试下一种方法。

添加计数器并循环遍历新元素

定义一个计数器并循环遍历索引大于计数器的元素第一次执行while循环时，计数器为0，因此for循环open中包含的所有元素在for的末尾，counter将等于24，因此在第二次执行while时，我们将得到for o in open[24:]:，这意味着现在排除了前24个元素。

open = driver.find_elements(By.XPATH, '//div[@ class="x3ct3a4"]')
counter = 0
while True:
    for o in open[counter:]:
        ...
        counter += 1
           
    ...

赞(0）回复(0）举报 2023-02-07

我来回答

python 如何修复使用Selenium无限滚动的网页抓取Facebook Marketplace中的错误

1条答案

从html中删除元素

添加计数器并循环遍历新元素

相关问题

热门标签

最新问答