Loading... # 原来,你也玩——you-get啊! 前几日,遇见新代代佬要爬b站视频的需求,要爬猫猫和狗狗的视频去提取video里面的xxxx(大概,暂不透露) 当时我就想到了,N年前玩的you-get。 诶嘿,结果you-get给我报错了 <div class="panel panel-default collapse-panel box-shadow-wrap-lg"><div class="panel-heading panel-collapse" data-toggle="collapse" data-target="#collapse-2aa06da352f9c8ac4ba6f78b261e95fd71" aria-expanded="true"><div class="accordion-toggle"><span style="">具体报错内容</span> <i class="pull-right fontello icon-fw fontello-angle-right"></i> </div> </div> <div class="panel-body collapse-panel-body"> <div id="collapse-2aa06da352f9c8ac4ba6f78b261e95fd71" class="collapse collapse-content"><p></p>C:\\Users\\Allen>you-get --debug [https://www.bilibili.com/video/BV1oC4y1w7do](https://www.bilibili.com/video/BV1oC4y1w7do) [DEBUG] get\_content: [https://www.bilibili.com/video/BV1oC4y1w7do](https://www.bilibili.com/video/BV1oC4y1w7do) [DEBUG] get\_content: [https://www.bilibili.com/video/BV1oC4y1w7do](https://www.bilibili.com/video/BV1oC4y1w7do) you-get: You will need login cookies for 720p formats or above. (use --cookies to load cookies.txt.) [DEBUG] get\_content: [https://interface.bilibili.com/v2/playurl?appkey=iVGUTjsxvpLeuDCf&cid=1349898517&otype=json&qn=112&quality=112&type=&sign=2f1174f7babe041913db31eed27eecf0](https://interface.bilibili.com/v2/playurl?appkey=iVGUTjsxvpLeuDCf&cid=1349898517&otype=json&qn=112&quality=112&type=&sign=2f1174f7babe041913db31eed27eecf0) [DEBUG] HTTP Error with code504 [DEBUG] HTTP Error with code429 [DEBUG] HTTP Error with code429 you-get: version 0.4.1650, a tiny downloader that scrapes the web. you-get: Namespace(URL=['https://www.bilibili.com/video/BV1oC4y1w7do'], auto\_rename=False, cookies=None, debug=True, extractor\_proxy=None, first=None, force=False, format=None, help=False, http\_proxy=None, info=False, input\_file=None, insecure=False, itag=None, json=False, last=None, m3u8=False, no\_caption=False, no\_merge=False, no\_proxy=False, output\_dir='.', output\_filename=None, password=None, player=None, playlist=False, postfix=False, prefix=None, size=None, skip\_existing\_file\_size\_check=False, socks\_proxy=None, stream=None, timeout=600, url=False, version=False) Traceback (most recent call last): File "C:\\Users\\Allen\\AppData\\Local\\Programs\\Python\\Python38\\Scripts\\you-get-script.py", line 33, in sys.exit(load\_entry\_point('you-get==0.4.1650', 'console\_scripts', 'you-get')()) File "C:\\Users\\Allen\\AppData\\Local\\Programs\\Python\\Python38\\lib\\site-packages\\you\_get-0.4.1650-py3.8.egg\\you\_get\_*main*\_.py", line 92, in main File "C:\\Users\\Allen\\AppData\\Local\\Programs\\Python\\Python38\\lib\\site-packages\\you\_get-0.4.1650-py3.8.egg\\you\_get\\common.py", line 1879, in main File "C:\\Users\\Allen\\AppData\\Local\\Programs\\Python\\Python38\\lib\\site-packages\\you\_get-0.4.1650-py3.8.egg\\you\_get\\common.py", line 1771, in script\_main File "C:\\Users\\Allen\\AppData\\Local\\Programs\\Python\\Python38\\lib\\site-packages\\you\_get-0.4.1650-py3.8.egg\\you\_get\\common.py", line 1385, in download\_main File "C:\\Users\\Allen\\AppData\\Local\\Programs\\Python\\Python38\\lib\\site-packages\\you\_get-0.4.1650-py3.8.egg\\you\_get\\common.py", line 1870, in any\_download File "C:\\Users\\Allen\\AppData\\Local\\Programs\\Python\\Python38\\lib\\site-packages\\you\_get-0.4.1650-py3.8.egg\\you\_get\\extractor.py", line 48, in download\_by\_url File "C:\\Users\\Allen\\AppData\\Local\\Programs\\Python\\Python38\\lib\\site-packages\\you\_get-0.4.1650-py3.8.egg\\you\_get\\extractors\\bilibili.py", line 282, in prepare File "C:\\Users\\Allen\\AppData\\Local\\Programs\\Python\\Python38\\lib\\site-packages\\you\_get-0.4.1650-py3.8.egg\\you\_get\\common.py", line 478, in get\_content File "C:\\Users\\Allen\\AppData\\Local\\Programs\\Python\\Python38\\lib\\site-packages\\you\_get-0.4.1650-py3.8.egg\\you\_get\\common.py", line 447, in urlopen\_with\_retry File "C:\\Users\\Allen\\AppData\\Local\\Programs\\Python\\Python38\\lib\\site-packages\\you\_get-0.4.1650-py3.8.egg\\you\_get\\common.py", line 438, in urlopen\_with\_retry File "C:\\Users\\Allen\\AppData\\Local\\Programs\\Python\\Python38\\lib\\urllib\\request.py", line 222, in urlopen return opener.open(url, data, timeout) File "C:\\Users\\Allen\\AppData\\Local\\Programs\\Python\\Python38\\lib\\urllib\\request.py", line 531, in open response = meth(req, response) File "C:\\Users\\Allen\\AppData\\Local\\Programs\\Python\\Python38\\lib\\urllib\\request.py", line 640, in http\_response response = self.parent.error( File "C:\\Users\\Allen\\AppData\\Local\\Programs\\Python\\Python38\\lib\\urllib\\request.py", line 569, in error return self.\_call\_chain(\*args) File "C:\\Users\\Allen\\AppData\\Local\\Programs\\Python\\Python38\\lib\\urllib\\request.py", line 502, in \_call\_chain result = func(\*args) File "C:\\Users\\Allen\\AppData\\Local\\Programs\\Python\\Python38\\lib\\urllib\\request.py", line 649, in http\_error\_default raise HTTPError(req.full\_url, code, msg, hdrs, fp) urllib.error.HTTPError: HTTP Error 429: Too Many Requests<p></p></div></div></div> **urllib.error.HTTPError: HTTP Error 429: Too Many Requests** 我..... 直接上github看pull request。 一看,好家伙,就在几小时前,就有人提出这样的问题了。 然后,我就不管了。。。。。。 后来,也就是12.9号。 [ljhcage](https://github.com/ljhcage)提供了解决方案: ```python diff --git a/src/you_get/extractors/bilibili.py b/src/you_get/extractors/bilibili.py index 6335e6d..160b713 100644 --- a/src/you_get/extractors/bilibili.py +++ b/src/you_get/extractors/bilibili.py @@ -279,10 +279,10 @@ class Bilibili(VideoExtractor): message = api_playinfo['data']['message'] if best_quality is None or qn <= best_quality: api_url = self.bilibili_interface_api(cid, qn=qn) - api_content = get_content(api_url, headers=self.bilibili_headers(referer=self.url)) - api_playinfo_data = json.loads(api_content) - if api_playinfo_data.get('quality'): - playinfos.append({'code': 0, 'message': '0', 'ttl': 1, 'data': api_playinfo_data}) + # api_content = get_content(api_url, headers=self.bilibili_headers(referer=self.url)) + # api_playinfo_data = json.loads(api_content) + # if api_playinfo_data.get('quality'): + # playinfos.append({'code': 0, 'message': '0', 'ttl': 1, 'data': api_playinfo_data}) if not playinfos: log.w(message) # use bilibili error video instead ``` --- 我坐不住了,直接vsc启动看了一遍这玩意。 (当时小的时候,爬虫不太熟练) 既然有you-get这样的玩意,那么我也不需要自己去研究b站到底是如何下载的。 我一看 我去 怎么这么好爬的 设置一下ua,请求一下,对着playinfo的一个玩意正则提取一下,然后直接转json 之后就直接下载就完事了。 当然,这个是针对普通的视频, 对于番剧这样的,you-get还是无法下载,(b站改了) 想当年用you-get下载炮姐和eva的时候..... 一个命令下去是多么的舒服啊。。。。 下面是实现简单的requests去爬取b站普通视频的代码 供学习参考。。。 ```python import json import re import subprocess import requests url = "https://www.bilibili.com/video/BV1Zi4y1e7dC/?spm_id_from=333.1007.tianma.2-2-5.click&vd_source=cb8e6d5011291db8df766a2f304463e3" ua = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.84 Safari/537.36" headers = {"Accept": "*/*", "Accept-Language": "en-US,en;q=0.5", "User-Agent": ua} response = requests.get(url=url, headers=headers) title = re.findall('<h1 title="(.*?)"', response.text)[0] title = re.sub(r"[\/:*?<>|]", "", title) print("视频标题为:", title) html_data = re.findall("<script>window.__playinfo__=(.*?)</script>", response.text)[0] json_data = json.loads(html_data) video_url = json_data["data"]["dash"]["video"][0]["baseUrl"] print("视频画面地址为:", video_url) audio_url = json_data["data"]["dash"]["audio"][0]["baseUrl"] print("音频地址为:", audio_url) video_content = requests.get(url=video_url, headers=headers).content audio_content = requests.get(url=audio_url, headers=headers).content video_path = title + ".mp4" audio_path = title + ".mp3" with open(video_path, mode="wb") as f: f.write(video_content) # 创建mp3文件,写入二进制数据 with open(audio_path, mode="wb") as f: f.write(audio_content) def merge_video_and_audio(video_path, audio_path, output_path): # 构建 FFmpeg 命令 ffmpeg_cmd = [ "ffmpeg", "-i", video_path, # 输入视频文件路径 "-i", audio_path, # 输入音频文件路径 "-c:v", "copy", # 复制视频编解码器 "-c:a", "aac", # 音频编解码器选择 AAC "-strict", "experimental", "-map", "0:v:0", # 映射视频流 "-map", "1:a:0", # 映射音频流 "-shortest", # 保持最短的时长,以防止截断 output_path, # 输出文件路径 ] subprocess.run(ffmpeg_cmd) output_file="合成后-"+title+".mp4" merge_video_and_audio(video_path, audio_path, output_file) print("done") ``` 注意,你需要提前安装好ffmpeg,这是用于声音画面合成。 安装完了之后,记得加环境变量 诶。。 b站普通视频的信息基本上都在playinfo里面。当时太过年轻,并没有去深入去了解。 现在有机会了,自然是想到什么,就去尽力了解一下吧... 用you-get爬取的eva和炮姐,现在还在u盘里。 you-get,感觉现在已经没有人去维护了,遇到问题时候,总是需要一个无名的英雄站出来, 在这里,是[ljhcage](https://github.com/ljhcage)。 这就是开源啊。 能白嫖就可以了。 Last modification:December 30, 2023 © Allow specification reprint Support Appreciate the author AliPayWeChat Like 如果觉得我的内容对你有用,请随意赞赏
One comment
我想做一个boy~
调用各种工具的boy~