Loading... ## 话不多说,直接开始 ### 导入包 ```python import pandas as pd import requests import json from bs4 import BeautifulSoup import re ``` ### 设置模板 ```python url = 'https://888mz.cn/' api = url+'joe/api' dt = {'routeType': 'publish_list', 'page': '1', 'pageSize': '999', 'type': 'created'} ``` ### 解析数据包 ```python r = requests.post(api,params=dt) j = json.loads(r.text) ``` ### 添加数据至数组 ```python links = [];title = [];time = [];nr = [] for i in j['data']: links.append(i['permalink']);title.append(i['title']);time.append(i['time']) for u in links: nr.append(re.sub('\n','',BeautifulSoup(requests.get(u).text).find('div','joe_detail').text)) ``` ### 处理成表格数据 ```python df = pd.DataFrame([title,nr,links,time],index=['标题','内容','链接','时间']) df.T.to_csv('data.csv') ``` ## 总结 这是一个非常简单的爬虫,并没有做任何的c盾拦截处理,这次演示的目标文章是只有不到100个的 更多的提升空间,我教给你们! Last modification:October 6, 2021 © Allow specification reprint Support Appreciate the author AliPayWeChat Like 如果觉得我的内容对你有用,请随意赞赏