Python 爬虫教程 09

loskyertt Unknown

1.代码示例

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
import requests
from lxml import etree

url = "https://www.spiderbuf.cn/playground/s08"

headers = {
"User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/130.0.0.0 Safari/537.36",
}
html = requests.get(url=url, headers=headers).text

f = open('./课程/08course/08.html', 'w', encoding='utf-8')
f.write(html)
f.close()

root = etree.HTML(html)
trs = root.xpath('//tr')

f = open('./课程/08course/data08.txt', 'w', encoding='utf-8')
for tr in trs:
tds = tr.xpath('./td')
s = ''
for td in tds:
s = s + str(td.xpath('string(.)')) + '|'
print(s)
if s!= '':
f.write(s + '\n')

直接运行这段代码是不会解析出任何数据的,同时可以看到抓取到的网页与我们想要的不一样。

2,网页分析

打开浏览器控制台,选择network

控制台.png
控制台.png

可以发现,请求方式是post。所以我们就得在代码中采用post请求方式:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
import requests
from lxml import etree

url = "https://www.spiderbuf.cn/playground/s08"

headers = {
"User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/130.0.0.0 Safari/537.36",
}
# 传入 post 请求中的数据
payload = {'level': '8'}
# post 请求
html = requests.post(url=url, headers=headers, data=payload).text

f = open('./课程/08course/08.html', 'w', encoding='utf-8')
f.write(html)
f.close()

root = etree.HTML(html)
trs = root.xpath('//tr')

f = open('./课程/08course/data08.txt', 'w', encoding='utf-8')
for tr in trs:
tds = tr.xpath('./td')
s = ''
for td in tds:
s = s + str(td.xpath('string(.)')) + '|'
print(s)
if s!= '':
f.write(s + '\n')

payload是在这里:

payload.png
payload.png

可以参考这里:更加复杂的 POST 请求

  • Title: Python 爬虫教程 09
  • Author: loskyertt
  • Created at : 2024-10-21 14:35:56
  • Updated at : 2024-11-13 03:07:38
  • Link: https://redefine.ohevan.com/2024/10/21/09Python爬虫/
  • License: This work is licensed under CC BY-NC-SA 4.0.
Comments
On this page
Python 爬虫教程 09