记录一次selenium使用，获取采购信息

Alpha

July 25, 2023

2378 views

No comments

1538 words

python 编程技术

## 前言

好久没有使用selenium了，在爬虫群无意间看到一个需求，大致问题就是爬取的网站有速率访问限制，我说那你不如就直接模拟人上无脑selenium算了，就花了5分钟写了一下，顺便复习一下新版本的selenium

```python
import json
import time

import pandas as pd
import requests
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import Select

url = "http://search.ccgp.gov.cn/"
options = Options()
options.add_argument("--incognito")  # 配置隐私模式
options.add_experimental_option("excludeSwitches", ["enable-automation"])

driver = webdriver.Chrome(
    executable_path=r"D:\chrome driver\chromedriver", options=options
)

driver.execute_cdp_cmd(
    "Page.addScriptToEvaluateOnNewDocument",
    {
        "source": """
        Object.defineProperty(navigator,'webdriver',{
            get: () => undefined
        })
    """
    },
)
driver.get(url)
time.sleep(1)
driver.find_element(By.XPATH, '//*[@id="kw"]').click()
time.sleep(1)
driver.find_element(By.XPATH, '//*[@id="kw"]').send_keys("供应商")
time.sleep(1)
driver.find_element(By.XPATH, '//*[@id="doSearch1"]').click()
time.sleep(1)
soup = BeautifulSoup(driver.page_source)
li = soup.find("ul", "vT-srch-result-list-bid").findAll("li")

pd.DataFrame(
    [
        (i.find("a").text.replace(" ", "A").replace("\n", ""), i.find("span").text.replace(' ','').replace('\n',''))
        for i in li
    ]
)
```

最后主要就差一个selenium翻页继续访问列表添加的问题了，记得time.sleep都加上，反正随便写的，肯定会有其他问题，后面如果他还有需求再说吧。

Last modification：July 25, 2023

如果觉得我的内容对你有用，请随意赞赏