selenium 创建被电子邮件混淆阻止的视频链接列表/

20jt8wwn  于 2022-11-10  发布在  其他
关注(0)|答案(2)|浏览(127)

我对Python和一般的编程都是相当陌生的,所以请跟我说。总而言之,我尝试的是创建一个脚本来打开我每月的消防部门培训,转到每月的培训视频,列出数据舱中每月变化的潜在视频列表,并在该月拥有不同的视频数量,然后播放视频。我已经使用Selify访问了网页和登录。我目前正在努力使可能的每月视频的清单,将能够拉出和播放。图中显示的是检查视频元素的“任务”和代码布局。下面是我的代码,我想出了视频链接,但每次我运行它时,它都会出现电子邮件混淆。不知道是什么原因造成的,也不知道如何绕过它。任何帮助都将不胜感激。

编辑添加了我的所有代码

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.chrome.options import Options #for maximize, disabling pop ups, enabling/disabling ext, etc..
from selenium.webdriver.common.keys import Keys 
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time
import requests
from bs4 import BeautifulSoup
import httplib2
import re

# Target Solutions Credentials

username = "#"
password = "#"

# opening web page

chrome_options = Options()
chrome_options.add_experimental_option("detach", True)
s = Service(ChromeDriverManager().install())
driver = webdriver.Chrome(service=s, options=chrome_options)

# open window in maximize

driver.maximize_window()

# website

driver.get('https://www.targetsolutions.com/')
driver.implicitly_wait(10)

# login screen button

lms_login_button = driver.find_element(by=By.XPATH, value='//*[@id="riverbend-ButtonElement--oouR2NYdw9Ns5lb2VrED"]')
lms_login_button.click()

# username & password

username = driver.find_element(by=By.XPATH, value='//*[@id="username"]').send_keys("#")
password = driver.find_element(by=By.XPATH, value ='//*[@id="password"]').send_keys("#")

# Login button

login_screen_button = driver.find_element(by=By.XPATH, value='//*[@id="form-login"]/ul/li[3]/input')
login_screen_button.click()

# Assignments page

my_assignments = driver.find_element(by=By.XPATH, value ='//*[@id="navLeft"]/ul/li[2]/a')
my_assignments.click()

# EMAIL OBFUSCATION===============

def email(string):
    r = int(string[:2], 16)
    email = ''.join([chr(int(string[i:i+2], 16) ^ r)
                     for i in range(2, len(string), 2)])
    return email

print(email('d0a3a5a0a0bfa2a490a4b1a2b7b5a4a3bfbca5a4b9bfbea3feb3bfbd'))

# WEBSCRAPER====================

url = 'https://app.targetsolutions.com/tsapp/dashboard/pl_fb/index.cfm?fuseaction=c_pro_assignments.showHome'

links = []
website = requests.get(url)
website_text = website.text
soup = BeautifulSoup(website_text, features='html.parser')

for link in soup.find_all('a'):
    links.append(link.get('href'))

for link in links:
    print(link)

结果:=网络驱动程序管理器=当前谷歌-Chrome版本为107.0.5304获取107.0.5304的最新ChromeDriver版本在缓存中找到谷歌-Chrome驱动程序[C:\Users\Wrd_3.wdm\drivers\chromedriver\win32\107.0.5304.62\chromedriver.exe]
DevTool在ws://127.0.0.1:55154/devtools/browser/d4e0b939-a7c4-4cfb-b828-1187823a031e Support@Target Solutions tions.com/cdn-cgi/l/email-protection#3a494f4a4a55484e7a4e5b485d5f4e4955564f4e5355544914595557上侦听
据我所知,这是某种形式的CloudFare电子邮件混淆。

gfttwv5a

gfttwv5a1#

我真的不知道你是怎么回事,因为你没有向我们提供任何更多的信息来检查和核实
基本上,您要处理的是支持CloudFlare保护的网站,您得到的结果是一封电子邮件support@targetsolutions.com,您可以尝试使用此脚本来解码结果

def email(string):
    r = int(string[:2], 16)
    email = ''.join([chr(int(string[i:i+2], 16) ^ r)
                     for i in range(2, len(string), 2)])
    return email

print(email('d0a3a5a0a0bfa2a490a4b1a2b7b5a4a3bfbca5a4b9bfbea3feb3bfbd')) // support@targetsolutions.com

这是电子邮件地址混淆,您可以在此处阅读click here

更新

在你先发布你的完整代码后,你会得到这个结果

/cdn-cgi/l/email-protection#3a494f4a4a55484e7a4e5b485d5f4e4955564f4e5355544914595557

来自您无法向此站点请求的http请求,因为它需要登录,因此您不需要请求部分,因此让我们将其删除


# Target Solutions Credentials

username = "#"
password = "#"

# opening web page

chrome_options = Options()
chrome_options.add_experimental_option("detach", True)
s = Service(ChromeDriverManager().install())
driver = webdriver.Chrome(service=s, options=chrome_options)

# open window in maximize

driver.maximize_window()

# website

driver.get('https://www.targetsolutions.com/')
driver.implicitly_wait(10)

# login screen button

lms_login_button = driver.find_element(by=By.XPATH, value='//*[@id="riverbend-ButtonElement--oouR2NYdw9Ns5lb2VrED"]')
lms_login_button.click()

# username & password

username = driver.find_element(by=By.XPATH, value='//*[@id="username"]').send_keys("#")
password = driver.find_element(by=By.XPATH, value ='//*[@id="password"]').send_keys("#")

# Login button

login_screen_button = driver.find_element(by=By.XPATH, value='//*[@id="form-login"]/ul/li[3]/input')
login_screen_button.click()

# Assignments page

my_assignments = driver.find_element(by=By.XPATH, value ='//*[@id="navLeft"]/ul/li[2]/a')
my_assignments.click()

现在你停在你的作业页面

现在您需要打开您的DevTools F12来打开它,然后我需要您检查您的分配并找到url标记,并尝试获取唯一的id或类,就像您如何获得Assignments页面按钮一样,然后您所需要的就是获取具有该唯一值的所有元素,因此您的代码应该如下所示

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.chrome.options import Options #for maximize, disabling pop ups, enabling/disabling ext, etc..
from selenium.webdriver.common.by import By

# Target Solutions Credentials

username = "#"
password = "#"

# opening web page

chrome_options = Options()
chrome_options.add_experimental_option("detach", True)
s = Service(ChromeDriverManager().install())
driver = webdriver.Chrome(service=s, options=chrome_options)

# open window in maximize

driver.maximize_window()

# website

driver.get('https://www.targetsolutions.com/')
driver.implicitly_wait(10)

# login screen button

lms_login_button = driver.find_element(by=By.XPATH, value='//*[@id="riverbend-ButtonElement--oouR2NYdw9Ns5lb2VrED"]')
lms_login_button.click()
driver.implicitly_wait(10)

# username & password

username = driver.find_element(by=By.XPATH, value='//*[@id="username"]').send_keys("#")
password = driver.find_element(by=By.XPATH, value ='//*[@id="password"]').send_keys("#")
driver.implicitly_wait(10)

# Login button

login_screen_button = driver.find_element(by=By.XPATH, value='//*[@id="form-login"]/ul/li[3]/input')
login_screen_button.click()
driver.implicitly_wait(10)

# Assignments page

my_assignmentspage = driver.find_element(by=By.XPATH, value ='//*[@id="navLeft"]/ul/li[2]/a')
my_assignmentspage.click()
driver.implicitly_wait(10)

my_assignmentslist = driver.find_elements(by=By.XPATH, value = "find element with a unique value")
for my_assignment in my_assignmentslist:
    print(my_assignment.get_attribute("href"))
xlpyo6sf

xlpyo6sf2#

使用模块js2py,您可以重新利用其Java脚本解码例程:

import js2py

js_script = """\
  function decode(email) {
      function r(e, t) {
        var r = e.substr(t, 2);
        return parseInt(r, 16);
      }

      function n(n, c) {
        for (var o = "", a = r(n, c), i = c + 2; i < n.length; i += 2) {
          var l = r(n, i) ^ a;
          o += String.fromCharCode(l);
        }
        return o;
      }

    var l = "/cdn-cgi/l/email-protection#";
    return n(email, email.indexOf(l) + l.length);
  }
"""

decoder = js2py.eval_js(js_script)
email = decoder(
    "/cdn-cgi/l/email-protection#d0a3a5a0a0bfa2a490a4b1a2b7b5a4a3bfbca5a4b9bfbea3feb3bfbd"
)
print(email)

运行该脚本将打印您的电子邮件。

相关问题