curl 从列出下载位置的HTML页面查找最新链接

waxmsbnn 于 2022-11-13 发布在其他

关注(0)|答案(3)|浏览(140)

我试图构建一个等价于以下特定于github的代码，它用于查找可从https://runtime.fivem.net/artifacts/fivem/build_proot_linux/master下载的最新工件--下载链接看起来类似于https://runtime.fivem.net/artifacts/fivem/build_proot_linux/master/5901-5db768d8bbb973ba27c81e424aea2910144a3100/fx.tar.xz。

# Working code for github.com, needs to be converted to fivem.net
LOCATION=$(curl -s https://api.github.com/repos/someuser/somerepo/releases/latest \
| grep "tag_name" \
| awk '{print "https://github.com/someuser/somerepo/archive/" substr($2, 2, length($2)-3) ".zip"}') \
; curl -L -o file.zip $LOCATION

该文件具有增量版本号，但不是序列号，后面是完全随机的哈希值。
如何从https://runtime.fivem.net/artifacts/fivem/build_proot_linux/master的HTML页面找到最新的下载链接？

curl

来源：https://stackoverflow.com/questions/73845598/find-latest-link-from-a-html-page-listing-download-locations

3条答案

按热度按时间

5lhxktic1#

如果你想用命令行工具解析HTML，那么我建议你看看像xidel这样的合适的HTML解析器：

$ xidel -s "https://runtime.fivem.net/artifacts/fivem/build_proot_linux/master/" \
  -e '//a[@class="panel-block  is-active"][1]/@href'
./5914-b600ff018d939f6a65e48994bf4a4192388435e7/fx.tar.xz

此外，不需要使用Bash-script或任何其他工具，因为使用--follow/-f和--download，您可以立即下载文件：

$ xidel -s "https://runtime.fivem.net/artifacts/fivem/build_proot_linux/master/" \
  -f '//a[@class="panel-block  is-active"][1]/@href' \
  --download .

这将在当前目录下下载 'fx.tar. xz'。当扩展名为“xz”时，我不建议手动输入 'file.zip'。但是，您可以生成一个更合适的文件名：

$ xidel "https://runtime.fivem.net/artifacts/fivem/build_proot_linux/master/" \
  -f '//a[@class="panel-block  is-active"][1]/@href)' \
  --download 'artifacts-{extract($url,"master/(\d+)-",1)}.tar.xz'
Retrieving (GET): https://runtime.fivem.net/artifacts/fivem/build_proot_linux/master/
Processing: https://runtime.fivem.net/artifacts/fivem/build_proot_linux/master/
Retrieving (): https://runtime.fivem.net/artifacts/fivem/build_proot_linux/master/5914-b600ff018d939f6a65e48994bf4a4192388435e7/fx.tar.xz
Save as: artifacts-5914.tar.xz

这将下载 'artifacts-5914.tar.xz' 到当前目录中。当你省略--silent/-s时，你将看到这些日志消息。顺便说一句，我不知道这个软件，所以我假设它被称为“artifacts”。

赞(0）回复(0）举报 2022-11-13

lh80um4z2#

我们可以使用lynx dump，就像Easiest way to extract the urls from an html page using sed or awk only中建议的那样--

#!/usr/bin/env bash

url_re='https://runtime.fivem.net/artifacts/fivem/build_proot_linux/master/([[:digit:]]+)-([[:xdigit:]]+)/fx.tar.xz'
newest_link_num=0
newest_link_content=
while read -r _ link; do
  [[ $link =~ $url_re ]] || continue
  if (( ${BASH_REMATCH[1]} > newest_link_num )); then
    newest_link_num=${BASH_REMATCH[1]}
    newest_link_content=$link
  fi
done < <(lynx -dump -listonly -hiddenlinks=listonly https://runtime.fivem.net/artifacts/fivem/build_proot_linux/master)

echo "Newest link is: $newest_link_content"

在编写本文时，它以下面的输出结束：
最新链接为：https://runtime.fivem.net/artifacts/fivem/build_proot_linux/master/5901-5db768d8bbb973ba27c81e424aea2910144a3100/fx.tar.xz

赞(0）回复(0）举报 2022-11-13

mu0hgdu03#

我检查了https://runtime.fivem.net/artifacts/fivem/build_proot_linux/master/和最新的链接（版本5902，即最新和版本5484，即最新推荐）似乎有is-active类

<a class="panel-block  is-active" href="./5902-3c88d7752be75493078c1da898337b0abc2652ff/fx.tar.xz" style="display: block;">

与旧版本相反如果可能话，您应该使用设计用于处理HTML工具来处理HTML例如hxselect，但是如果您不被允许安装这样工具，您可以使用GNU AWK来代替下面方式

wget -O - https://runtime.fivem.net/artifacts/fivem/build_proot_linux/master/ | awk 'BEGIN{RS="<|>"}/is-active/{sub(/^.*href="\./,"");sub(/".*/,"");print "https://runtime.fivem.net/artifacts/fivem/build_proot_linux/master"$0}'

以获得输出

https://runtime.fivem.net/artifacts/fivem/build_proot_linux/master/5902-3c88d7752be75493078c1da898337b0abc2652ff/fx.tar.xz
https://runtime.fivem.net/artifacts/fivem/build_proot_linux/master/5848-4f71128ee48b07026d6d7229a60ebc5f40f2b9db/fx.tar.xz

说明：我告诉GNU AWK行分隔符（RS）是<或>，因此起始和结束标记内部被视为单行，然后对于包含is-active的行，我用空字符串替换直到href=".的所有内容，即删除它，然后用空字符串替换"及其后面的所有内容，即删除它，然后打印https://runtime.fivem.net/artifacts/fivem/build_proot_linux/master的内容和提取的href的值。

（在gawk 4.2.1中测试）*

赞(0）回复(0）举报 2022-11-13

我来回答

curl 从列出下载位置的HTML页面查找最新链接

3条答案

相关问题

热门标签

最新问答