搭建代理池

loskyertt Unknown

1.开源项目地址

jhao104/proxy_pool

2.搭建教程

2.1 通过 docker 搭建

默认情况下,Docker 容器彼此无法通过localhost访问,需要通过容器的网络名称(桥接网络:bridge)进行通信。因此可以将proxy_poolredis放在同一个 Docker 网络中。

创建网络:

1
docker network create mynetwork

运行 Redis 容器并加入网络:

1
docker run -d --name redis --network=mynetwork -p 6379:6379 redis:latest

运行proxy_pool容器并连接 Redis:

1
docker run -d --name proxy_pool --network=mynetwork --env DB_CONN=redis://redis:6379/0 -p 5010:5010 jhao104/proxy_pool:latest

查看日志信息:

1
docker logs proxy_pool

这是很就能看到获得的代理地址和端口。如下面所示:

1
2
3
4
5
6
7
8
9
2024-10-17 21:03:11,533 fetch.py[line:39] INFO ProxyFetch - freeProxy11: 39.175.75.144:30001     ok
2024-10-17 21:03:11,533 fetch.py[line:39] INFO ProxyFetch - freeProxy11: 8.221.141.88:8443 ok
2024-10-17 21:03:11,533 fetch.py[line:39] INFO ProxyFetch - freeProxy11: 47.109.83.196:8080 ok
2024-10-17 21:03:11,533 fetch.py[line:39] INFO ProxyFetch - freeProxy11: 185.49.31.205:8080 ok
2024-10-17 21:03:11,533 fetch.py[line:39] INFO ProxyFetch - freeProxy11: 72.10.160.171:11319 ok
2024-10-17 21:03:11,533 fetch.py[line:39] INFO ProxyFetch - freeProxy11: 102.213.146.206:8080 ok
2024-10-17 21:03:11,533 fetch.py[line:39] INFO ProxyFetch - freeProxy11: 79.110.202.131:8081 ok
2024-10-17 21:03:11,533 fetch.py[line:39] INFO ProxyFetch - freeProxy11: 178.130.40.231:8080 ok
2024-10-17 21:03:11,533 fetch.py[line:39] INFO ProxyFetch - freeProxy11: 188.166.56.246:80 ok

3.使用教程

代码示例:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
import requests

def get_proxy():
return requests.get("http://127.0.0.1:5010/get/").json()

def delete_proxy(proxy):
requests.get("http://127.0.0.1:5010/delete/?proxy={}".format(proxy))

# your spider code

def getHtml(url, headers):
retry_count = 15
while retry_count > 0:
proxy = get_proxy().get("proxy")
proxies = {
"http": "http://" + proxy,
"https": "http://" + proxy
}
print(get_proxy().get("proxy"))
try:
# 使用代理访问
html = requests.get(url=url, headers=headers, proxies=proxies, timeout=5)
return html.text
except Exception:
retry_count -= 1
# 删除代理池中代理
delete_proxy(proxy)
return None

url = "https://wuhan.anjuke.com/sale/p1/?from=esf_list"
headers = {
"User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/130.0.0.0 Safari/537.36",
"Cookie": "your-cookie"
}

html_data = getHtml(url, headers)
print(html_data)

getHtml函数中,会通过while循环来切换代理ipport,直到请求成功,然后返回出结果并退出循环。

  • Title: 搭建代理池
  • Author: loskyertt
  • Created at : 2024-10-17 20:53:49
  • Updated at : 2024-11-13 03:07:38
  • Link: https://redefine.ohevan.com/2024/10/17/搭建代理池/
  • License: This work is licensed under CC BY-NC-SA 4.0.
Comments