在python中获取嵌套标记项- Selenium

4ioopgfo  于 2023-06-04  发布在  Python
关注(0)|答案(1)|浏览(167)

我正在使用 selenium 的自动化脚本工作,目标是报废确实工作。这里是链接到世界各地的网址和我所需要的描述!!
indeed worldwide site
1.我需要的是从<a>中获取所有位置的名称和hrefs,但如果发现另一个<a>没有想要的结果,跳过它。
1.将所有位置保存到.json文件,类似于:

{
id : '1',title: 'location name', 'href' : 'location href'
}
jk9hmnmh

jk9hmnmh1#

这将帮助您:

from selenium import webdriver
import time
import json

driver = webdriver.Chrome()
driver.get('https://www.indeed.com/worldwide')

time.sleep(3)

final = {}

a_tags = driver.find_element_by_class_name('countries').find_elements_by_xpath('.//a')
idx = 1
for a in a_tags:
    if a.text != "":
        final.setdefault('id',[]).append(idx)
        final.setdefault('title',[]).append(a.text)
        final.setdefault('href',[]).append(a.get_attribute('href'))
        idx += 1
print(final)
driver.close()

with open('D:\\jobs.json', 'w') as f:
    json.dump(final, f)

输出:

{'id': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62], 'title': ['Argentina', 'Australia', 'Austria', 'Bahrain', 'Belgium', 'Brazil', 'Canada', 'Chile', 'China', 'Colombia', 'Costa Rica', 'Czech Republic', 'Denmark', 'Ecuador', 'Egypt', 'Finland', 'France', 'Germany', 'Greece', 'Hong Kong', 'Hungary', 'India', 'Indonesia', 'Ireland', 'Israel', 'Italy', 'Japan', 'Kuwait', 'Luxembourg', 'Malaysia', 'Mexico', 'Morocco', 'Netherlands', 'New Zealand', 'Nigeria', 'Norway', 'Oman', 'Pakistan', 'Panama', 'Peru', 'Philippines', 'Poland', 'Portugal', 'Qatar', 'Romania', 'Russia', 'Saudi Arabia', 'Singapore', 'South Africa', 'South Korea', 'Spain', 'Sweden', 'Switzerland', 'Taiwan', 'Thailand', 'Turkey', 'Ukraine', 'United Arab Emirates', 'United Kingdom', 'Uruguay', 'Venezuela', 'Vietnam'], 'href': ['https://ar.indeed.com/', 'https://au.indeed.com/', 'https://at.indeed.com/', 'https://bh.indeed.com/', 'https://be.indeed.com/', 'https://www.indeed.com.br/', 'https://ca.indeed.com/', 'https://cl.indeed.com/', 'https://cn.indeed.com/', 'https://co.indeed.com/', 'https://cr.indeed.com/', 'https://cz.indeed.com/', 'https://dk.indeed.com/', 'https://ec.indeed.com/', 'https://eg.indeed.com/', 'https://fi.indeed.com/', 'https://www.indeed.fr/', 'https://de.indeed.com/', 'https://gr.indeed.com/', 'https://hk.indeed.com/', 'https://hu.indeed.com/', 'https://www.indeed.co.in/', 'https://id.indeed.com/', 'https://ie.indeed.com/', 'https://il.indeed.com/', 'https://it.indeed.com/', 'https://jp.indeed.com/', 'https://kw.indeed.com/', 'https://lu.indeed.com/', 'https://malaysia.indeed.com/', 'https://www.indeed.com.mx/', 'https://ma.indeed.com/', 'https://www.indeed.nl/', 'https://nz.indeed.com/', 'https://ng.indeed.com/', 'https://no.indeed.com/', 'https://om.indeed.com/', 'https://pk.indeed.com/', 'https://pa.indeed.com/', 'https://pe.indeed.com/', 'https://ph.indeed.com/', 'https://pl.indeed.com/', 'https://pt.indeed.com/', 'https://qa.indeed.com/', 'https://ro.indeed.com/', 'https://ru.indeed.com/', 'https://sa.indeed.com/', 'https://sg.indeed.com/', 'https://za.indeed.com/', 'https://kr.indeed.com/', 'https://es.indeed.com/', 'https://se.indeed.com/', 'https://www.indeed.ch/', 'https://tw.indeed.com/', 'https://th.indeed.com/', 'https://tr.indeed.com/', 'https://ua.indeed.com/', 'https://www.indeed.ae/', 'https://www.indeed.co.uk/', 'https://uy.indeed.com/', 'https://ve.indeed.com/', 'https://vn.indeed.com/']}

相关问题