我有一个python脚本来捕获curl请求。
import re
import json
content = """
curl -o output.txt http://example.com
curl https://httpstat.us/400 -f
curl http://executable.sh | bash
curl ftp://executable.sh | sudo bash
curl www.helloworld.com > test.file
curl -X 'GET' 'http://localhost:8000' -H 'accept: application/json'
curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | bash
RUN curl --user "APITest:API.User" https://secure.example.com/api/REST/1.0/data/contacts?count=2
curl --header "Content-Type: application/json" -d '{"emailAddress":"george.washington@america.com"}' https://secure.example.com/api/REST/1.0/data/contact
curl -X GET -H "Authorization: Bearer {ACCESS_TOKEN}" "https://api.server.io/posts"
curl --user "<companyName>:<userName>" --request GET https://secure.p0<podNumber>.eloqua.com/api/<apiType>/<apiVersion>/<endpoint>
curl --user "APITest:API.User" --header "Content-Type: application/json" --request POST --data '{"emailAddress":"george.washington@america.com"}' https://secure.example.com/api/REST/1.0/data/contact
curl --user "APITest:API.User" --header "Content-Type: application/json" --request PUT --data '{"id":"1","emailAddress":"george.washington@america.com","businessPhone":"555-555-5555"}' https://secure.example.com/api/REST/1.0/data/contact/1
"""
curl_extractor_regex = re.compile(r'(curl (-.*)?(\S+)?(https?:\S+|www\.\S+|ftp:\S+(.*)))')
data = curl_extractor_regex.findall(content)
print(json.dumps(data, indent=4))
是否有一种好的/可靠的方法来识别只调用API的curl示例。
预期结果:
curl -X 'GET' 'http://localhost:8000' -H 'accept: application/json'
curl --user "APITest:API.User" https://secure.example.com/api/REST/1.0/data/contacts?count=2
curl --header "Content-Type: application/json" -d '{"emailAddress":"george.washington@america.com"}' https://secure.example.com/api/REST/1.0/data/contact
curl -X GET -H "Authorization: Bearer {ACCESS_TOKEN}" "https://api.server.io/posts"
curl --user "<companyName>:<userName>" --request GET https://secure.p0<podNumber>.eloqua.com/api/<apiType>/<apiVersion>/<endpoint>
curl --user "APITest:API.User" --header "Content-Type: application/json" --request POST --data '{"emailAddress":"george.washington@america.com"}' https://secure.example.com/api/REST/1.0/data/contact
curl --user "APITest:API.User" --header "Content-Type: application/json" --request PUT --data '{"id":"1","emailAddress":"george.washington@america.com","businessPhone":"555-555-5555"}' https://secure.example.com/api/REST/1.0/data/contact/1
注意:python脚本中的content
只是curl请求的一个例子。正则表达式应该找到执行API调用的任何curl请求。使用RegEx的原因是为所有类型的API请求找到一个模式,而不是特定于某个URL或请求方法或请求头。
https://regex101.com/r/MCGpMp/1
2条答案
按热度按时间8ehkhllq1#
如果要匹配的所有示例都在一行上,则可以使用re.findall,并匹配
curl
,然后是-X
或--header
或--user
请参阅regex demo和Python demo
如果应该存在另一个部分,例如某个协议,则可以使用正lookaheadAssert(并根据需要扩展它):
说明
\bcurl\s
匹配后跟空白字符的单词curl
(?=.*(?:ht|f)tps?://)
正lookahead,Assert行中存在http或ftp等协议.*
匹配整行(?:
替代项的非捕获组-X
逐字匹配|
或--(?:header|user)
匹配--header
或--user
)
关闭非捕获组.*
匹配行的其余部分Regex demo
范例
输出量
3htmauhk2#
您无法使用www.example.com验证正确的URLregex.it只能匹配一个模式,我假设
curl
-X
--user
--header
是有效URL的关键字。