我的代码正在从Facebook Marketplace抓取房屋数据,但遇到了一个问题。最初,当页面打开时,它只能读取24个房源。但是,当我尝试通过向下滚动页面加载更多房源时,我的代码开始从开头读取所有房源,而不是第25个房源。如何解决此问题?
open = driver.find_elements(By.XPATH, '//div[@ class="x3ct3a4"]')
#open it's a list of all clickable housing listings when I open the page
while True:
for o in open:
sleep(random.randint(1, 2))
#Here I read the data that I need
close_button = driver.find_element(By.XPATH, close_xpath)
close_button.click()
sleep(random.randint(1, 2))
#Here I close the listing and go to the next one
#When I read all 24 listings that were in the 'open' list, I then scroll the page down and try to get new listings and then read them
driver.execute_script('window.scrollTo(0, document.body.scrollHeight);')
sleep(random.randint(2, 4)
open = driver.find_elements(By.XPATH, open_xpath)
#But after the scroll, my code starts reading the same listings that it already read.
以下是我的输出:
1
['', '2 Beds 1 Bath Apartment']
['$1,600 / Month']
2
['', '1 Bed 1 Bath Apartment']
['$1,500 / Month']
.
.
.
24
['', '2 Beds 2 Baths Apartment']
['$1,350 / Month']
25
['', '2 Beds 1 Bath Apartment']
['$1,600 / Month']
26
['', '1 Bed 1 Bath Apartment']
['$1,500 / Month']
因此,在第24次打开链接后,代码再次开始读取所有清单。
1条答案
按热度按时间von4xj4u1#
你可以尝试几种方法。
从html中删除元素
在
for o in open:
循环结束时,使用javascript从html中删除当前元素o
。但是,此方法可能不起作用:有时当你向下滚动来加载新元素时,页面会重新加载所有先前的元素,然后重新添加到html中。如果是这种情况,请尝试下一种方法。
添加计数器并循环遍历新元素
定义一个计数器并循环遍历索引大于计数器的元素第一次执行
while
循环时,计数器为0,因此for
循环open
中包含的所有元素在for
的末尾,counter
将等于24,因此在第二次执行while
时,我们将得到for o in open[24:]:
,这意味着现在排除了前24个元素。