selenium 如何在for循环之前跳回并更新变量直到满足我要求

mklgxw1f  于 2023-02-12  发布在  其他
关注(0)|答案(1)|浏览(110)

我正在从父类别和子类别中抓取嵌套文本。这里我的循环看起来像:

first for loop will scrape all parent category:
      ...seond for loop will scrape child1 category of parent category
          ...third for loop will scrape child2 category of child1 category

我正在尝试从此page中抓取所有父类别和子类别
如果我的"sub_cat_1 = y.text"无或空字符串,那么我想增加1 Level_1_Category_No{increment_by_1},这里是我的变量sub_category_one = driver.find_elements(By.CSS_SELECTOR , ".Level_1_Category_No1 .lzd-site-menu-sub-item > a span"),这里是我的完整代码:

driver.get("https://www.daraz.com.bd/")
    time.sleep(10)
    main_category = driver.find_elements(By.CSS_SELECTOR , '.lzd-site-menu-root-item span')
    with open("all_category_subcat.csv", "w",encoding="utf-8",newline="") as f:
        writer = csv.writer(f)
        writer.writerow(["Main Category", "Sub Category 1", "Sub Category 2"])
    
        for i in main_category:
            hover = ActionChains(driver).move_to_element(i)
            hover.perform()
            main_cat = i.text
            print(main_cat)
            
            sub_category_one = driver.find_elements(By.CSS_SELECTOR , ".Level_1_Category_No1 .lzd-site-menu-sub-item > a span")
            for y in sub_category_one:
                hover = ActionChains(driver).move_to_element(y)
                hover.perform()
                sub_cat_1 = y.text
                print("--------------",sub_cat_1,"--------------")
                if sub_cat_1 == None sub_cat_1 == "":
                       #update the value of sub_category_one and run for loop again 
                sub_category_two = driver.find_elements(By.CSS_SELECTOR , ".lzd-site-menu-grand-active span")           
                for z in sub_category_two:
                            sub_cat_2 = z.text
                            print(sub_cat_2)
                            writer.writerow([main_cat, sub_cat_1, sub_cat_2])
fivyi3re

fivyi3re1#

out = driver.execute_script('''
out = []; 
els = document.querySelectorAll('.lzd-site-menu-root-item');
els.forEach(function(a) {
            txt1 = a.querySelector('a').textContent.replaceAll('\n', '').replaceAll('  ', '').trim();
            els2 = document.querySelector('.lzd-site-menu-sub.' + a.getAttribute('id'));
            url_2 = els2.querySelector('a').href;
            txt2 = url_2.split('/')[3]
            els2b = els2.children;
            for(i=0;i<=els2b.length-1;i++){ b = els2b[i];
                els3 = b.querySelectorAll('.lzd-site-menu-grand-item');
                els3.forEach(function(c) {
                    url_3 = c.querySelector('a').href;
                    txt3 = url_3.split('/')[3]
                    out.push([txt1, txt2, url_2, txt3, url_3]);
                });

            }
            }); 
return out''');
print(out)

它给你这样的输出--我相信这就是你在一天结束时真正想要的?
[[“女性时尚”、“女性服装”、“https://www.daraz.com.bd/womens-clothing/?price = 999-& service = OS & from = filter/'、”女性连衣裙“、”https://www.daraz.com.bd/womens-dresses/?from = filter & price = 1200-& service = OS & from = filter/'],
【“女性时尚”、“女性服装”、“女性针织品”、“女性针织品”、“女性针织品”、“女性针织品”】、
【《女性时尚》、《女性-服装》、《女性-服装》、《女性-库尔提斯》、《女性-库尔提斯》、《女性-服装》......》
【“健康与美容”、“沐浴-身体”、“沐浴-身体”、“身体-精油”、“身体-精油”、“身体-精油”】、......
[“手表、包、珠宝”、“书包”、“儿童背包”、“儿童背包”]、
【“手表、包、珠宝”、“书包”、“https://www.daraz.com.bd/school-bags/”、“儿童双肩包”、“https://www.daraz.com.bd/kids-shoulder-bags/”】,
[“手表、包、珠宝”、“书包”、“https://www.daraz.com.bd/school-bags/ '、”书包-背包“、”https://www.daraz.com.bd/school-bags-backpack/']、......[“汽车与摩托车”、“汽车”、“https://www.daraz.com.bd/automotive/?spm = a2a0e.home.cate_12.1.735212f73vbvt5”、“摩托车 Helm ”、“https://www.daraz.com.bd/motorcycle-helmets/ ']、
[“汽车和摩托车”、“汽车”、“https://www.daraz.com.bd/automotive/?spm = a2a0e.home.cate_12.1.735212f73vbvt5”、“摩托车工具维护”、“https://www.daraz.com.bd/motorcycle-tools-maintenance/ ']]
💪🏽
注:(节略版,附赠=个别网址可按需要舍弃;代码需要修改-查询中没有“hover”/for循环)。

相关问题