Loading... ## 前言 好久没有使用selenium了,在爬虫群无意间看到一个需求,大致问题就是爬取的网站有速率访问限制,我说那你不如就直接模拟人上无脑selenium算了,就花了5分钟写了一下,顺便复习一下新版本的selenium ```python import json import time import pandas as pd import requests from bs4 import BeautifulSoup from selenium import webdriver from selenium.webdriver.chrome.options import Options from selenium.webdriver.common.by import By from selenium.webdriver.support.ui import Select url = "http://search.ccgp.gov.cn/" options = Options() options.add_argument("--incognito") # 配置隐私模式 options.add_experimental_option("excludeSwitches", ["enable-automation"]) driver = webdriver.Chrome( executable_path=r"D:\chrome driver\chromedriver", options=options ) driver.execute_cdp_cmd( "Page.addScriptToEvaluateOnNewDocument", { "source": """ Object.defineProperty(navigator,'webdriver',{ get: () => undefined }) """ }, ) driver.get(url) time.sleep(1) driver.find_element(By.XPATH, '//*[@id="kw"]').click() time.sleep(1) driver.find_element(By.XPATH, '//*[@id="kw"]').send_keys("供应商") time.sleep(1) driver.find_element(By.XPATH, '//*[@id="doSearch1"]').click() time.sleep(1) soup = BeautifulSoup(driver.page_source) li = soup.find("ul", "vT-srch-result-list-bid").findAll("li") pd.DataFrame( [ (i.find("a").text.replace(" ", "A").replace("\n", ""), i.find("span").text.replace(' ','').replace('\n','')) for i in li ] ) ``` 最后主要就差一个selenium翻页继续访问列表添加的问题了,记得time.sleep都加上,反正随便写的,肯定会有其他问题,后面如果他还有需求再说吧。 Last modification:July 25, 2023 © Allow specification reprint Support Appreciate the author AliPayWeChat Like 如果觉得我的内容对你有用,请随意赞赏