如何在selenium中将regex表达式与url匹配?

xytpbqjk  于 2021-07-09  发布在  Java
关注(0)|答案(2)|浏览(294)
public class Unsplash {

    public static void main(String[] args) {
        // TODO Auto-generated method stub
        System.setProperty("webdriver.firefox.marionette","d:\\selenium\\gecko\\geckodriver.exe");
        WebDriver driver = new FirefoxDriver();

        driver.manage().timeouts().implicitlyWait(30,TimeUnit.SECONDS);     
        driver.manage().window().maximize();
        //driver.manage().window().setPosition(new Point(1920,0));
        //driver.manage().window().setSize(new Dimension(1920/2,1080));
        driver.get("http://unsplash.com/");
        driver.findElement(By.className("_32SMR")).click();
        for(int i=0;i<30;i++)
        {
            driver.findElement(By.tagName("body")).sendKeys(Keys.PAGE_DOWN);

        }
        //driver.getPageSource();
        Pattern p = Pattern.compile("/?photo=(.*?)");
        Matcher m = p.matcher(driver.getPageSource());
        while(m.find())
        {

            driver.get("https://unsplash.com"+m.group());
            System.out.println(m.group());
        }

        driver.quit();
    }

}

我正在尝试从unsplash.com中提取href链接以自动下载网站href linksformat是href=“/photos/9l\u 326fiszk”
对于代码系统.out.println(m.group());我刚收到“/照片/”作为输出。如何获得完整的href url例如“/photos/9l\u 326fiszk”作为输出

zd287kbt

zd287kbt1#

而不是将正则表达式与整个 driver.getPageSource() ,更“ selenium ”的方法是定位包含元素的元素 href 属性,然后计算正则表达式。
假设你只想 href 来自所有 <a> 页面上的标记:

Pattern p = Pattern.compile("/?photo=(.*?)");

List<WebElement> aTags = driver.findElements(By.tagName("a"));
for (WebElement aTag : aTags) {
    String href = aTag.getAttribute("href");
    Matcher m = p.matcher(href);
    if (m.matches()) {
        // do something with href
    }
}
plicqrtu

plicqrtu2#

以下是您问题的答案:
我们可以采用一种更简单的方法来获取使用java collection的不同艺术家的图像的URL。以下代码块按艺术家获取图像的所有链接:

import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.TimeUnit;

import org.openqa.selenium.By;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.firefox.FirefoxDriver;

public class Q45106505_REGEX 
{

    public static void main(String[] args) 
    {

        System.setProperty("webdriver.gecko.driver", "C:\\Utility\\BrowserDrivers\\geckodriver.exe");
        WebDriver driver = new FirefoxDriver();

        driver.manage().timeouts().implicitlyWait(5,TimeUnit.SECONDS);     
        driver.manage().window().maximize();
        driver.get("http://unsplash.com/");
        driver.findElement(By.xpath("//button[@class='_2OLVr _21rCr']/*[name()='svg' and @class='_32SMR']")).click();;
        List<WebElement> elem_list = driver.findElements(By.xpath("//div[@id='app']//div[@id='gridSingle']/div[@class='y5w1y' and @data-test='photo-component']//a[contains(@href,'/?photo=')]"));
        List<String> title_list = new ArrayList<String>();
        List<String> href_list = new ArrayList<String>();
        for (WebElement we:elem_list)
        {
            String my_title = we.getAttribute("title");
            title_list.add(my_title);
            String my_href = we.getAttribute("href");
            href_list.add(my_href);
        }

        for(int i=0; i<title_list.size(); i++)
        {
            System.out.println(title_list.get(i)+" at : "+href_list.get(i));
        }

    }

}

控制台上的输出如下:

View the photo By timothy muza at : https://unsplash.com/?photo=6VjPmyMj5KM
View the photo By Stephanie McCabe at : https://unsplash.com/?photo=_Ajm-ewEC24
View the photo By John Moore at : https://unsplash.com/?photo=Fdhyrhb9x7o
View the photo By Jason Blackeye at : https://unsplash.com/?photo=KUgDg__TMGk
View the photo By Mahkeo at : https://unsplash.com/?photo=m76_jjV-rRI
View the photo By Samara Doole at : https://unsplash.com/?photo=5VuLCwvZCQU
View the photo By Craig  Whitehead at : https://unsplash.com/?photo=2pdDHpqbKr8
View the photo By Chris Marquardt at : https://unsplash.com/?photo=5KmkrOjOBrE
View the photo By Annie Spratt at : https://unsplash.com/?photo=MN31CWOoEmc
View the photo By Alexandra Kusper at : https://unsplash.com/?photo=T8kr3JLALFU

如果这能回答你的问题,请告诉我。

相关问题