通过CURL捕获API请求的Python正则表达式

tyu7yeag  于 2022-11-13  发布在  Python
关注(0)|答案(2)|浏览(134)

我有一个python脚本来捕获curl请求。

import re
import json

content = """
curl -o output.txt http://example.com
curl https://httpstat.us/400 -f
curl http://executable.sh | bash
curl ftp://executable.sh | sudo bash
curl www.helloworld.com > test.file
curl -X 'GET' 'http://localhost:8000' -H 'accept: application/json'

curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | bash
RUN curl --user "APITest:API.User" https://secure.example.com/api/REST/1.0/data/contacts?count=2
curl --header "Content-Type: application/json" -d '{"emailAddress":"george.washington@america.com"}' https://secure.example.com/api/REST/1.0/data/contact
curl -X GET -H "Authorization: Bearer {ACCESS_TOKEN}" "https://api.server.io/posts"
curl --user "<companyName>:<userName>" --request GET https://secure.p0<podNumber>.eloqua.com/api/<apiType>/<apiVersion>/<endpoint>
curl --user "APITest:API.User" --header "Content-Type: application/json" --request POST --data '{"emailAddress":"george.washington@america.com"}' https://secure.example.com/api/REST/1.0/data/contact
curl --user "APITest:API.User" --header "Content-Type: application/json" --request PUT --data '{"id":"1","emailAddress":"george.washington@america.com","businessPhone":"555-555-5555"}' https://secure.example.com/api/REST/1.0/data/contact/1
"""

curl_extractor_regex = re.compile(r'(curl (-.*)?(\S+)?(https?:\S+|www\.\S+|ftp:\S+(.*)))')
data = curl_extractor_regex.findall(content)
print(json.dumps(data, indent=4))

是否有一种好的/可靠的方法来识别只调用API的curl示例。

预期结果:

curl -X 'GET' 'http://localhost:8000' -H 'accept: application/json'
curl --user "APITest:API.User" https://secure.example.com/api/REST/1.0/data/contacts?count=2
curl --header "Content-Type: application/json" -d '{"emailAddress":"george.washington@america.com"}' https://secure.example.com/api/REST/1.0/data/contact
curl -X GET -H "Authorization: Bearer {ACCESS_TOKEN}" "https://api.server.io/posts"
curl --user "<companyName>:<userName>" --request GET https://secure.p0<podNumber>.eloqua.com/api/<apiType>/<apiVersion>/<endpoint>
curl --user "APITest:API.User" --header "Content-Type: application/json" --request POST --data '{"emailAddress":"george.washington@america.com"}' https://secure.example.com/api/REST/1.0/data/contact
curl --user "APITest:API.User" --header "Content-Type: application/json" --request PUT --data '{"id":"1","emailAddress":"george.washington@america.com","businessPhone":"555-555-5555"}' https://secure.example.com/api/REST/1.0/data/contact/1

注意:python脚本中的content只是curl请求的一个例子。正则表达式应该找到执行API调用的任何curl请求。使用RegEx的原因是为所有类型的API请求找到一个模式,而不是特定于某个URL或请求方法或请求头。
https://regex101.com/r/MCGpMp/1

8ehkhllq

8ehkhllq1#

如果要匹配的所有示例都在一行上,则可以使用re.findall,并匹配curl,然后是-X--header--user

\bcurl\s.*(?:-X|--(?:header|user)).*

请参阅regex demoPython demo
如果应该存在另一个部分,例如某个协议,则可以使用正lookaheadAssert(并根据需要扩展它):

\bcurl\s(?=.*(?:ht|f)tps?://).*(?:-X|--(?:header|user)).*

说明

  • \bcurl\s匹配后跟空白字符的单词curl
  • (?=.*(?:ht|f)tps?://)正lookahead,Assert行中存在http或ftp等协议
  • .*匹配整行
  • (?:替代项的非捕获组
  • -X逐字匹配
  • |
  • --(?:header|user)匹配--header--user
  • )关闭非捕获组
  • .*匹配行的其余部分

Regex demo
范例

import re
import json

content = """
curl -o output.txt http://example.com
curl https://httpstat.us/400 -f
curl http://executable.sh | bash
curl ftp://executable.sh | sudo bash
curl www.helloworld.com > test.file
curl -X 'GET' 'http://localhost:8000' -H 'accept: application/json'

curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | bash
curl -X 'GET' 'http://localhost:8000' -H 'application/json'
curl -X 'GET' "http://localhost:8000" -H 'application/json'
RUN curl --user "APITest:API.User" https://secure.example.com/api/REST/1.0/data/contacts?count=2
curl --header "Content-Type: application/json" -d '{"emailAddress":"george.washington@america.com"}' https://secure.example.com/api/REST/1.0/data/contact
curl -X GET -H "Authorization: Bearer {ACCESS_TOKEN}" "https://api.server.io/posts"
curl --user "<companyName>:<userName>" --request GET https://secure.p0<podNumber>.eloqua.com/api/<apiType>/<apiVersion>/<endpoint>
curl --user "APITest:API.User" --header "Content-Type: application/json" --request POST --data '{"emailAddress":"george.washington@america.com"}' https://secure.example.com/api/REST/1.0/data/contact
curl --user "APITest:API.User" --header "Content-Type: application/json" --request PUT --data '{"id":"1","emailAddress":"george.washington@america.com","businessPhone":"555-555-5555"}' https://secure.example.com/api/REST/1.0/data/contact/1
"""

curl_extractor_regex = re.compile(r'\bcurl\s.*(?:-X|--(?:header|user)\b).*')
data = curl_extractor_regex.findall(content)
print(json.dumps(data, indent=4))

输出量

[

    "curl -X 'GET' 'http://localhost:8000' -H 'accept: application/json'",
    "curl -X 'GET' 'http://localhost:8000' -H 'application/json'",
    "curl -X 'GET' \"http://localhost:8000\" -H 'application/json'",
    "curl --user \"APITest:API.User\" https://secure.example.com/api/REST/1.0/data/contacts?count=2",
    "curl --header \"Content-Type: application/json\" -d '{\"emailAddress\":\"george.washington@america.com\"}' https://secure.example.com/api/REST/1.0/data/contact",
    "curl -X GET -H \"Authorization: Bearer {ACCESS_TOKEN}\" \"https://api.server.io/posts\"",
    "curl --user \"<companyName>:<userName>\" --request GET https://secure.p0<podNumber>.eloqua.com/api/<apiType>/<apiVersion>/<endpoint>",
    "curl --user \"APITest:API.User\" --header \"Content-Type: application/json\" --request POST --data '{\"emailAddress\":\"george.washington@america.com\"}' https://secure.example.com/api/REST/1.0/data/contact",
    "curl --user \"APITest:API.User\" --header \"Content-Type: application/json\" --request PUT --data '{\"id\":\"1\",\"emailAddress\":\"george.washington@america.com\",\"businessPhone\":\"555-555-5555\"}' https://secure.example.com/api/REST/1.0/data/contact/1"
]
3htmauhk

3htmauhk2#

您无法使用www.example.com验证正确的URLregex.it只能匹配一个模式,我假设curl-X--user--header是有效URL的关键字。

import re

content = """
curl -o output.txt http://example.com
curl https://httpstat.us/400 -f
curl http://executable.sh | bash
curl ftp://executable.sh | sudo bash
curl www.helloworld.com > test.file
curl -X 'GET' 'http://localhost:8000' -H 'accept: application/json'

curl -s https://packagecloud.io/install/repositories/github/git- 
lfs/script.deb.sh | bash
curl -X 'GET' 'http://localhost:8000' -H 'application/json'
curl -X 'GET' "http://localhost:8000" -H 'application/json'
RUN curl --user "APITest:API.User" 
https://secure.example.com/api/REST/1.0/data/contacts?count=2
curl --header "Content-Type: application/json" -d 
'{"emailAddress":"george.washington@america.com"}' 
https://secure.example.com/api/REST/1.0/data/contact
curl -X GET -H "Authorization: Bearer {ACCESS_TOKEN}" 
"https://api.server.io/posts"
curl --user "<companyName>:<userName>" --request GET 
https://secure.p0<podNumber>.eloqua.com/api/<apiType>/<apiVersion>/<endpoint>
curl --user "APITest:API.User" --header "Content-Type: application/json" -- 
request POST --data '{"emailAddress":"george.washington@america.com"}' 
https://secure.example.com/api/REST/1.0/data/contact
curl --user "APITest:API.User" --header "Content-Type: application/json" -- 
request PUT --data 
'{"id":"1","emailAddress":"george.washington@america.com","businessPhone":"555- 
555-5555"}' https://secure.example.com/api/REST/1.0/data/contact/1
"""
content_split = content.split('\n')
regex = r'(curl)\s(-X|--user|--header).*'
url_lst = []
for i in content_split:
    if i:
        url = re.finditer(regex, i)
        for data in url:
            url_lst.append(data.group(0))

print(url_lst)

>>>["curl -X 'GET' 'http://localhost:8000' -H 'accept: application/json'", "curl -X 'GET' 'http://localhost:8000' -H 'application/json'", 'curl -X \'GET\' "http://localhost:8000" -H \'application/json\'', 'RUN curl --user "APITest:API.User" https://secure.example.com/api/REST/1.0/data/contacts?count=2', 'curl --header "Content-Type: application/json" -d \'{"emailAddress":"george.washington@america.com"}\' https://secure.example.com/api/REST/1.0/data/contact', 'curl -X GET -H "Authorization: Bearer {ACCESS_TOKEN}" "https://api.server.io/posts"', 'curl --user "<companyName>:<userName>" --request GET https://secure.p0<podNumber>.eloqua.com/api/<apiType>/<apiVersion>/<endpoint>', 'curl --user "APITest:API.User" --header "Content-Type: application/json" --request POST --data \'{"emailAddress":"george.washington@america.com"}\' https://secure.example.com/api/REST/1.0/data/contact', 'curl --user "APITest:API.User" --header "Content-Type: application/json" --request PUT --data \'{"id":"1","emailAddress":"george.washington@america.com","businessPhone":"555-555-5555"}\' https://secure.example.com/api/REST/1.0/data/contact/1']

相关问题