使用Groovy提取URL部分(博客名称)

g6ll5ycj 于 2022-11-01 发布在其他

关注(0)|答案(3)|浏览(293)

我正在使用以下URL：http://www.espn.com/blog/stephania-bell/post/_/id/3563/key-fantasy-football-injury-updates-for-week-4-2
我正在尝试提取博客的名称为（stephania-bell）。
我已经实现了以下函数来从URL中提取预期值：

def getBlogName( def decodeUrl )
{
    def urlParams = this.paramsParser.parseURIToMap( URI.create( decodeUrl ) )
    def temp = decodeUrl.replace( "http://www.espn.com", "" )
            .replaceAll( "(/_/|\\?).*", "" )
            .replace( "/index", "" )
            .replace( "/insider", "" )
            .replace( "/post", "" )
            .replace( "/tag", "" )
            .replace( "/category", "" )
            .replace( "/", "" )
            .replace( "/blog/", "" )
    def blogName = temp.replace( "/", "" )
    return blogName
}

但是我遗漏了一些东西，它返回的值是blogstephania-bell。你能帮助我理解我在函数实现中遗漏了什么吗？或者也许有一个更好的方法来做同样的事情？

groovy

来源：https://stackoverflow.com/questions/53018136/extracting-url-part-blog-name-with-groovy

3条答案

按热度按时间

f8rj6qna1#

不是你要求的，只是为了好玩（我以为这是你一开始想要的）

@Grab('org.jsoup:jsoup:1.11.3')
import static org.jsoup.Jsoup.connect

def name = connect('http://www.espn.com/blog/stephania-bell/post/_/id/3563/key-fantasy-football-injury-updates-for-week-4-2')
  .get()
  .select('.sticky-header h1 a')
  .text()

assert name == 'Stephania Bell Blog'

赞(0）回复(0）举报 2022-11-01

p1tboqfb2#

按照Java的URL类来处理URL可能会更有用。
1.使用getPath()将 path 提取为字符串
1.按路径分隔符split("/")拆分为 * 段 *
1.利用数组索引pathSegments[2]提取相关路径段

String plainText="http://www.espn.com/blog/stephania-bell/post/_/id/3563/key-fantasy-football-injury-updates-for-week-4-2";

def url = plainText.toURL();
def fullPath = url.getPath();
def pathSegments = fullPath.split("/")
assert "stephania-bell" == pathSegments[2]

赞(0）回复(0）举报 2022-11-01

dtcbnfnu3#

这类工作可以很容易地通过正则表达式来处理。如果我们想提取http://www.espn.com/blog/和下一个/之间的URL部分，那么下面的代码就可以做到这一点：

import java.util.regex.Pattern

def url = 'http://www.espn.com/blog/stephania-bell/post/_/id/3563/key-fantasy-football-injury-updates-for-week-4-2'

def pattern = Pattern.compile('^https?://www\\.espn\\.com/blog/([^/]+)/.*$')

def (_, blog) = (url =~ pattern)[0]

assert blog == 'stephania-bell'

赞(0）回复(0）举报 2022-11-01

我来回答

使用Groovy提取URL部分(博客名称)

3条答案

相关问题

热门标签

最新问答