pandas 如何从URL列表中找到特定的URL- python

zaq34kh6  于 2023-08-01  发布在  Python
关注(0)|答案(2)|浏览(129)

我在使用URL。我有一个URL列表,我正在从一个抓取活动。但是,我需要选择一个包含“决策者”的网址。

编码:

src = driver.page_source
 
# Now using beautiful soup
soup = BeautifulSoup(src, 'lxml')
for a in soup.find_all('a', href=True):
    print(a['href'])

字符串
代码的输出:

https://www.linkedin.com/mynetwork/?
https://www.linkedin.com/jobs/?decidion-makers
https://www.linkedin.com/messaging/?
https://www.linkedin.com/notifications/?
/company/infosys/


预期的输出:

https://www.linkedin.com/jobs/?decidion-makers


请帮帮忙
谢啦,谢啦

yi0zb3m4

yi0zb3m41#

a = soup.select_one("a[href*=decision-makers]")
dm = a['href']
print(dm)

字符串

ojsjcaue

ojsjcaue2#

如果需要,可以使用pandas和regex模块:

import pandas as pd
import re

urls = ["https://www.linkedin.com/mynetwork/?",
        "https://www.linkedin.com/jobs/?decidion-makers",
        "https://www.linkedin.com/messaging/?",
        "https://www.linkedin.com/notifications/?",
        "/company/infosys/"]

df = pd.DataFrame(data=urls)

for index, row in df.iterrows():
    url = row[0]

    match = re.search(r"(.*decidion-makers)", url)
    if match:
        print(url)

字符串
输出量:

https://www.linkedin.com/jobs/?decidion-makers

相关问题