Fork me on GitHub

App抓包

环境搭建

Android模拟器安装



官网下载夜神模拟器安装

抓包工具安装

appium安装

https://github.com/appium/appium-desktop/releases/tag/v1.11.0

mitmproxy安装

下载安装包, 直接点击下一步即可安装
https://github.com/mitmproxy/mitmproxy/releases/

装好之后配置一下环境变量就行了

也可以直接使用pip install mitmproxy

安装证书

在cmd中输入mitmdump, 可以看到mitmdump已经启动了, 在监听8080端口

1
2
C:\Users\IIce>mitmdump
Proxy server listening at http://*:8080

打开模拟器, 配置代理

查看一下pc的ip

1
2
3
4
5
6
7
以太网适配器 以太网:

连接特定的 DNS 后缀 . . . . . . . : North-Class.com
本地链接 IPv6 地址. . . . . . . . : fe80::68d7:38a8:2729:4d97%6
IPv4 地址 . . . . . . . . . . . . : 192.168.100.243
子网掩码 . . . . . . . . . . . . : 255.255.255.0
默认网关. . . . . . . . . . . . . : 192.168.100.250

配置好以后, 打开浏览器, 输入baidu.com进行查看
此时会弹出证书问题, 点继续即可

输入mitm.it
选择相应的版本进行安装

此时再访问网站就不会有证书问题了

docker安装

根据配置, 二选一

docker-toolbox

https://docs.docker.com/toolbox/toolbox_install_windows/
下载docker-toolbox, 双击进行安装即可

如果安装快要结束时报错
IPersistFile:Save 失败,代码0x80070005 拒绝访问
需要检查杀毒软件

安装成功后会出现三个图标

双击Docker Quickstart Terminal图标,启动一个终端
会下载一个boot2docker.iso文件,如果下载较慢的话,可以复制链接自行下载,
下载完成后复制到目录中即可

如果出现Unable to start the VM: C:\Program Files\Oracle\VirtualBox\VBoxManage.exe startvm default --type headless failed:卸载掉Oracle VM Virtualbox安装最新版即可
https://www.virtualbox.org/wiki/Downloads

完成后会出现

输入docker run hello-world

Docker for Windows

https://docs.docker.com/docker-for-windows/install/

下载后双击安装即可

如果安装卡顿, 需要检查杀毒软件, 因为会修改注册表和启动项等

启动时如果报错
“Hardware assisted virtualization and data execution protection must be enabled in the BIOS”

需要开启虚拟化Hyper-V

如果都开启还无法启动
参考https://www.e-learn.cn/content/wangluowenzhang/589447

两个混装可能出现的错误
https://blog.csdn.net/qq_35852248/article/details/80925154

设置加速器
https://cr.console.aliyun.com/cn-hangzhou/instances/mirrors

fiddler设置

手机连接配置
查看pc端IP

1
2
3
4
5
6
7
8
9
10
...

以太网适配器 以太网:

连接特定的 DNS 后缀 . . . . . . . : North-Class.com
本地链接 IPv6 地址. . . . . . . . : fe80::f44c:fb33:30bf:5c57%18
IPv4 地址 . . . . . . . . . . . . : 192.168.100.248
子网掩码 . . . . . . . . . . . . : 255.255.255.0
默认网关. . . . . . . . . . . . . : 192.168.100.250
...

设置代理,服务器主机名是pc端IPv4地址

设置完成后浏览器访问主机IP+端口

App应用在开启抓包工具后无法联网问题

http://www.imooc.com/article/251500

fiddler 不能抓包的方法

https://testerhome.com/topics/11462?from=singlemessage

豆果美食菜谱抓取

在模拟器中下载并安装豆果美食
设置代理准备进行数据抓包
打开fiddler和豆果美食

点击菜谱分类

点击标签进入详情

抓包分析,http://api.douguo.net/recipe/flatcatalogs这个url返回的是菜谱分类

http://api.douguo.net/recipe/v2/search/0/20返回的是详情
综合最佳,收藏最多做过最多使用的都是这一个url,只是提交的参数不同

1
2
# 0:综合最佳   2: 收藏最多   3:做过最多
"order": "0",

编码实现

请求头

首先将请求头共用的部分提取出来,注释掉的都是可以不用提交的

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
def handle_reques(url, data):
header = {
"client": "4",
"version": "6934.2",
"device": "OPPO R11",
"sdk": "22,5.1.1",
"imei": "866174010942858",
"channel": "baidu",
# "mac": " 6A:07:15:F0:34:85",
"resolution": "1280*720",
"dpi": "1.5",
# "android-id": "6a0715f034851883",
# "pseudo - id": "5f0348518836a071",
"brand": "OPPO",
"scale": "1.5",
"timezone": "28800",
"language": "zh",
"cns": "3",
"carrier": "CHINA+MOBILE",
# "imsi": "460071060715240",
"user-agent": "Mozilla/5.0 (Linux; Android 5.1.1; OPPO R11 Build/NMF26X) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/39.0.0.0 Mobile Safari/537.36",
"reach": "1",
"newbie": "1",
"Content-Type": "application/x-www-form-urlencoded; charset=utf-8",
"Accept-Encoding": "gzip, deflate",
"Connection": "Keep-Alive",
# "Cookie": "duid=59159842",
"Host": "api.douguo.net",
# "Content-Length": "74",
}

response = requests.post(url=url, headers=header, data=data)
return response

菜谱分类

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32

from multiprocessing import Queue

queue_list = Queue()

def handle_index():
url = "http://api.douguo.net/recipe/flatcatalogs"
data = {
"client": "4",
# "_session": "1552715432169866174010942858",
# "v": "1503650468",
# "_vs": "0", 0 和 2305都可以
"_vs": "2305",

}

response = handle_reques(url=url, data=data)
# print(response.text)
response_to_dict = json.loads(response.text)

for item in response_to_dict['result']['cs']:
for item_1 in item['cs']:
for item_2 in item_1['cs']:
data_2 = {
"client": "4",
# "_session": "1552715831226866174010942858",
"keyword": item_2['name'],
# 0:综合最佳 2: 收藏最多 3:做过最多
"order": "0",
"_vs": "400",
}
queue_list.put(data_2)

详情

1
2
3
4
5
6
7
8
9
10
11
12
13
14
def handle_caipu_list(data):
print("当前处理:", data['keyword'])
caipu_list_url = 'http://api.douguo.net/recipe/v2/search/0/20'
caipu_list_response = handle_reques(url=caipu_list_url, data=data)
response_to_dict = json.loads(caipu_list_response.text)
handle_caipu_detail(data, response_to_dict)

count=0
while response_to_dict['result']['end'] == 0:
count+=1
caipu_list_url = 'http://api.douguo.net/recipe/v2/search/{}/20'.format(count*20)
caipu_list_response = handle_reques(url=caipu_list_url, data=data)
response_to_dict = json.loads(caipu_list_response.text)
handle_caipu_detail(data, response_to_dict)

具体做法

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
def handle_caipu_detail(data, response_to_dict):

for item in response_to_dict['result']['list']:
caipu_info = {}
caipu_info['shicai'] = data['keyword']

if item['type'] == 13:
caipu_info['author'] = item['r']['an']
caipu_info['shicai_id'] = item['r']['id'] # 查看详细操作步骤时使用
caipu_info['describe'] = item['r']['cookstory']
caipu_info['caipu_name'] = item['r']['n']
caipu_info['zuoliao_list'] = item['r']['major']

detail_url = 'http://api.douguo.net/recipe/detail/' + str(caipu_info['shicai_id'])
detail_data = {
"client": "4",
# "_session": "1552715831226866174010942858",
"author_id": "0",
"_vs": "2803",
"_ext": '{"query":{"kw":' + data["keyword"] + ',"src":"2803","type":"13","id":' + str(
caipu_info["shicai_id"]) + '}}',
}

detail_response = handle_reques(url=detail_url, data=detail_data)
# print(detail_response.text)
detail_response_to_dict = json.loads(detail_response.text)

caipu_info['tips'] = detail_response_to_dict['result']['recipe']['tips']
caipu_info['cook_step'] = detail_response_to_dict['result']['recipe']['cookstep']

print('当前入库:', caipu_info['caipu_name'])
mongo_info.insert_item(caipu_info)

else:
continue

入库

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
import pymongo

from pymongo.collection import Collection


class Connect_Mongo:
def __init__(self):
self.client = pymongo.MongoClient()
self.db_data = self.client['dou_guo_mei_shi']

def insert_item(self, item):
db_collection = Collection(self.db_data, 'mei_shi')
db_collection.insert_one(item)


mongo_info = Connect_Mongo()

多线程测试

1
2
3
4
5
6
7
8
if __name__ == '__main__':
handle_index()
# print(queue_list.qsize())
# handle_caipu_list(queue_list.get())
pool = ThreadPoolExecutor()

while queue_list.qsize() > 0:
pool.submit(handle_caipu_list, queue_list.get())

安装android-sdk

http://tools.android-studio.org/index.php/sdk
下载安装即可

配置环境变量

变量
ANDROID_HOME(新建)G:\Program Files (x86)\Android\android-sdk
Path(添加)%ANDROID_HOME%\tools
Path(添加)%ANDROID_HOME%\platform-tools

运行SDK Manager.exe

安卓版本勾选最新版的即可,兼容旧版本

安装完成后打开cmd,输入adb,可以看到adb版本

1
2
3
4
5
6
7
8
9
10
11
12
13
Android Debug Bridge version 1.0.40
Version 28.0.2-5303910
Installed as G:\Program Files (x86)\Android\android-sdk\platform-tools\adb.exe

global options:
-a listen on all network interfaces, not just localhost
-d use USB device (error if multiple devices connected)
-e use TCP/IP device (error if multiple TCP/IP devices available)
-s SERIAL use device with given serial (overrides $ANDROID_SERIAL)
-t ID use device with given transport id
-H name of adb server host [default=localhost]
-P port of adb server [default=5037]
-L SOCKET listen on given socket for adb server [default=tcp:localhost:5037]

升级夜神模拟器的adb

android-sdk\platform-tools中的三个adb文件拷贝到模拟器安装目录下

将adb.exe复制一份,覆盖掉原来的nox_adb.exe,
开启模拟器的开发者选项
重启模拟器,打开cmd

1
2
3
C:\Users\lenovo>adb devices
List of devices attached
127.0.0.1:52001 device

模拟器已连接上了

uiautomatorviewer

文件位置D:\Program Files (x86)\Android\android-sdk\tools\uiautomatorviewer.bat

双击运行, 将黑窗口最小化,不要关闭

点击生成屏幕快照, 可以使用鼠标查看元素的信息

appium

启动参数配置
http://www.testclass.net/appium

1
2
3
4
5
6
7
{
"platformName": "Android",
"deviceName": "127.0.0.1:52001",
"platformVersion": "5.1.1",
"appPackage": "com.tal.kaoyan",
"appActivity": "com.tal.kaoyan.ui.activity.SplashActivity"
}

appPackageappActivity 获取
使用aapt.exe dump badging来获取

1
2
3
4
5
6
7
8
9
D:\Program Files (x86)\Android\android-sdk\build-tools\28.0.3>aapt.exe dump badging F:\BrowserDownload\kaoyanbang_3.3.7beta.243.apk

package: name='com.tal.kaoyan' versionCode='92' versionName='3.3.7beta' compileSdkVersion='28' compileSdkVersionCodename='9'
sdkVersion:'16'
...

launchable-activity: name='com.tal.kaoyan.ui.activity.SplashActivity' label='' icon=''

...

考研帮测试

1
pip install Appium-Python-Client
1
2
3
4
5
6
7
8
{
"platformName": "Android",
"deviceName": "127.0.0.1:52001",
"platformVersion": "5.1.1",
"appPackage": "com.tal.kaoyan",
"appActivity": "com.tal.kaoyan.ui.activity.SplashActivity",
"noReset": true
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
import time

from appium import webdriver
from selenium.webdriver.support.ui import WebDriverWait

cap = {
"platformName": "Android",
"deviceName": "127.0.0.1:52001",
"platformVersion": "5.1.1",
"appPackage": "com.tal.kaoyan",
"appActivity": "com.tal.kaoyan.ui.activity.SplashActivity",
"noReset": True
}

name = ""
pwd = ""

driver = webdriver.Remote("http://localhost:4723/wd/hub", cap)


def get_size():
x = driver.get_window_size()['width']
y = driver.get_window_size()['height']
return (x, y)


try:
# 是否跳过
if WebDriverWait(driver, 3).until(
lambda x: x.find_element_by_xpath("//android.widget.TextView[@resource-id='com.tal.kaoyan:id/tv_skip']")):
driver.find_element_by_xpath("//android.widget.TextView[@resource-id='com.tal.kaoyan:id/tv_skip']").click()
except:
pass

try:
if WebDriverWait(driver, 3).until(lambda x: x.find_element_by_xpath(
"//android.widget.EditText[@resource-id='com.tal.kaoyan:id/login_email_edittext']")):
driver.find_element_by_xpath(
"//android.widget.EditText[@resource-id='com.tal.kaoyan:id/login_email_edittext']").send_keys(name)
driver.find_element_by_xpath(
"//android.widget.EditText[@resource-id='com.tal.kaoyan:id/login_password_edittext']").send_keys(pwd)
driver.find_element_by_xpath(
"//android.widget.Button[@resource-id='com.tal.kaoyan:id/login_login_btn']").click()
except:
pass

try:
# 隐私协议
if WebDriverWait(driver, 3).until(
lambda x: x.find_element_by_xpath("//android.widget.TextView[@resource-id='com.tal.kaoyan:id/tv_title']")):
driver.find_element_by_xpath("//android.widget.TextView[@resource-id='com.tal.kaoyan:id/tv_agree']").click()
driver.find_element_by_xpath(
"//android.support.v7.widget.RecyclerView[@resource-id='com.tal.kaoyan:id/date_fix']/android.widget.RelativeLayout[3]").click()
except:
pass

# 点击研讯
if WebDriverWait(driver, 3).until(lambda x: x.find_element_by_xpath(
"//android.support.v7.widget.RecyclerView[@resource-id='com.tal.kaoyan:id/date_fix']/android.widget.RelativeLayout[3]/android.widget.LinearLayout[1]/android.widget.ImageView[1]")):
driver.find_element_by_xpath(
"//android.support.v7.widget.RecyclerView[@resource-id='com.tal.kaoyan:id/date_fix']/android.widget.RelativeLayout[3]/android.widget.LinearLayout[1]/android.widget.ImageView[1]").click()

l = get_size()

x1 = int(l[0] * 0.5)
y1 = int(l[1] * 0.75)
y2 = int(l[1] * 0.25)

# 滑动操作
while True:
driver.swipe(x1, y1, x1, y2)
time.sleep(0.5)

整体操作和selenium差不多

抖音粉丝抓取

先找一个分享链接
https://www.douyin.com/share/user/96578108671

浏览器打开, 进行查看, 可以看到数字被进行了混淆,
字符文件链接https://s3.bytecdn.cn/ies/resource/falcon/douyin_falcon/static/font/iconfont_9eb9a50.woff

在线字体查看http://fontstore.baidu.com/static/editor/index.html
将下载的字体文件上传到网站, 就能看到字符和数字之间的关系了

分享页面内容抓取

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
import re
import requests
import time
from lxml import etree

from douyin.handle_mongo import get_task


def handle_decode(input_data, share_web_url, task):
search_douyin_str = re.compile('抖音ID:')
regex_list = [
{'name': ['  ', '  ', '  '], 'value': 0},
{'name': ['  ', '  ', '  '], 'value': 1},
{'name': ['  ', '  ', '  '], 'value': 2},
{'name': ['  ', '  ', '  '], 'value': 3},
{'name': ['  ', '  ', '  '], 'value': 4},
{'name': ['  ', '  ', '  '], 'value': 5},
{'name': ['  ', '  ', '  '], 'value': 6},
{'name': ['  ', '  ', '  '], 'value': 7},
{'name': ['  ', '  ', '  '], 'value': 8},
{'name': ['  ', '  ', '  '], 'value': 9},
]

for i1 in regex_list:
for i2 in i1['name']:
input_data = re.sub(i2, str(i1['value']), input_data)
share_web_html = etree.HTML(input_data)
douyin_info = {}
douyin_info['nick_name'] = \
share_web_html.xpath("//div[@class='personal-card']/div[@class='info1']//p[@class='nickname']/text()")[0]
if 'douyin_id' in task:
douyin_info['douyin_id'] = task['douyin_id']
else:
douyin_id = ''.join(
share_web_html.xpath("//div[@class='personal-card']/div[@class='info1']/p[@class='shortid']/i/text()"))
if douyin_id == '':
try:
douyin_info['douyin_id'] = re.sub(search_douyin_str, '', share_web_html.xpath(
"//div[@class='personal-card']/div[@class='info1']/p[@class='shortid']/text()")[0]).strip()
except:
douyin_info['douyin_id'] = '无数据'
else:
douyin_info['douyin_id'] = douyin_id

try:
douyin_info['job'] = share_web_html.xpath(
"//div[@class='personal-card']/div[@class='info2']/div[@class='verify-info']/span[@class='info']/text()")[
0].strip()
except:
pass
douyin_info['describe'] = \
share_web_html.xpath("//div[@class='personal-card']/div[@class='info2']/p[@class='signature']/text()")[0].replace(
'\n', ',')
douyin_info['location'] = \
share_web_html.xpath("//div[@class='personal-card']/div[@class='info2']/p[@class='extra-info']/span[1]/text()")
douyin_info['xingzuo'] = \
share_web_html.xpath("//div[@class='personal-card']/div[@class='info2']/p[@class='extra-info']/span[2]/text()")
douyin_info['follow_count'] = share_web_html.xpath(
"//div[@class='personal-card']/div[@class='info2']/p[@class='follow-info']//span[@class='focus block']//i[@class='icon iconfont follow-num']/text()")[
0].strip()
fans_value = ''.join(share_web_html.xpath(
"//div[@class='personal-card']/div[@class='info2']/p[@class='follow-info']//span[@class='follower block']//i[@class='icon iconfont follow-num']/text()"))
unit = share_web_html.xpath(
"//div[@class='personal-card']/div[@class='info2']/p[@class='follow-info']//span[@class='follower block']/span[@class='num']/text()")
if unit[-1].strip() == 'w':
douyin_info['fans'] = str((int(fans_value) / 10)) + 'w'
like = ''.join(share_web_html.xpath(
"//div[@class='personal-card']/div[@class='info2']/p[@class='follow-info']//span[@class='liked-num block']//i[@class='icon iconfont follow-num']/text()"))
unit = share_web_html.xpath(
"//div[@class='personal-card']/div[@class='info2']/p[@class='follow-info']//span[@class='liked-num block']/span[@class='num']/text()")
if unit[-1].strip() == 'w':
douyin_info['like'] = str(int(like) / 10) + 'w'
douyin_info['from_url'] = share_web_url

print(douyin_info)



def handle_douyin_web_share(task):
share_web_url = 'https://www.douyin.com/share/user/' + task
print(share_web_url)
share_web_header = {
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.75 Safari/537.36'
}
share_web_response = requests.get(url=share_web_url, headers=share_web_header)
handle_decode(share_web_response.text, share_web_url, task)

if __name__ == '__main__':
# task = get_task("share_id")
handle_douyin_web_share("88445518961")
1
2
https://www.douyin.com/share/user/88445518961
{'nick_name': 'Dear-迪丽热巴', 'douyin_id': '274110380', 'job': '演员', 'describe': '先定一个能达到的小目标,比方说来句签名', 'location': [], 'xingzuo': [], 'follow_count': '0', 'fans': '5046.8w', 'like': '13527.7w', 'from_url': 'https://www.douyin.com/share/user/88445518961'}

粉丝抓取

前提: 登录状态, 最新版本

抓取个人的粉丝

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
import sys
import time
from selenium.webdriver.support.ui import WebDriverWait
from appium import webdriver

desired_caps = {}
desired_caps['platformName'] = 'Android'
desired_caps['deviceName'] = '127.0.0.1:62001'
desired_caps['platformVersion'] = '5.1.1'
desired_caps['appPackage'] = 'com.ss.android.ugc.aweme'
desired_caps['appActivity'] = 'com.ss.android.ugc.aweme.splash.SplashActivity'
desired_caps['noReset'] = True
desired_caps['unicodeKeyboard'] = True
desired_caps['resetKeyboard'] = True

driver = webdriver.Remote('http://localhost:4723/wd/hub', desired_caps)


def get_size(driver):
x = driver.get_window_size()['width']
y = driver.get_window_size()['height']
return (x, y)


def handle_douyin(driver):
try:
# 点击搜索
while WebDriverWait(driver, 10).until(lambda x: x.find_element_by_xpath(
"//android.widget.LinearLayout[@resource-id='com.ss.android.ugc.aweme:id/aps']")):
driver.find_element_by_xpath(
"//android.widget.LinearLayout[@resource-id='com.ss.android.ugc.aweme:id/aps']").click()
break
except:
print("找不到搜索按钮")

# 定位搜索框
if WebDriverWait(driver, 3).until(lambda x: x.find_element_by_xpath(
"//android.widget.EditText[@resource-id='com.ss.android.ugc.aweme:id/afo']")):
# 获取douyin_id进行搜索
driver.find_element_by_xpath(
"//android.widget.EditText[@resource-id='com.ss.android.ugc.aweme:id/afo']").send_keys('706942127')
while driver.find_element_by_xpath(
"//android.widget.EditText[@resource-id='com.ss.android.ugc.aweme:id/afo']").text != '706942127':
driver.find_element_by_xpath(
"//android.widget.EditText[@resource-id='com.ss.android.ugc.aweme:id/afo']").send_keys('706942127')
time.sleep(0.1)
# 点击搜索
driver.find_element_by_xpath("//android.widget.TextView[@resource-id='com.ss.android.ugc.aweme:id/afr']").click()

# 点击用户标签
if WebDriverWait(driver, 3).until(lambda x: x.find_element_by_xpath("//android.widget.TextView[@text='用户']")):
driver.find_element_by_xpath("//android.widget.TextView[@text='用户']").click()

# 点击头像
if WebDriverWait(driver, 3).until(lambda x: x.find_element_by_xpath(
"/hierarchy/android.widget.FrameLayout/android.widget.LinearLayout/android.widget.FrameLayout/android.widget.FrameLayout/android.widget.FrameLayout[2]/android.widget.RelativeLayout/android.widget.FrameLayout/android.widget.RelativeLayout/android.support.v4.view.ViewPager/android.widget.LinearLayout/android.widget.FrameLayout/android.view.View/android.support.v7.widget.RecyclerView/android.widget.RelativeLayout[1]/android.widget.RelativeLayout[1]/android.widget.ImageView[2]")):
driver.find_element_by_xpath(
"/hierarchy/android.widget.FrameLayout/android.widget.LinearLayout/android.widget.FrameLayout/android.widget.FrameLayout/android.widget.FrameLayout[2]/android.widget.RelativeLayout/android.widget.FrameLayout/android.widget.RelativeLayout/android.support.v4.view.ViewPager/android.widget.LinearLayout/android.widget.FrameLayout/android.view.View/android.support.v7.widget.RecyclerView/android.widget.RelativeLayout[1]/android.widget.RelativeLayout[1]/android.widget.ImageView[2]").click()
# 点击粉丝按钮
if WebDriverWait(driver, 3).until(lambda x: x.find_element_by_id("com.ss.android.ugc.aweme:id/aj1")):
driver.find_element_by_id("com.ss.android.ugc.aweme:id/aj1").click()

l = get_size(driver)
x1 = int(l[0] * 0.5)
y1 = int(l[1] * 0.75)
y2 = int(l[1] * 0.25)
while True:
if '没有更多了' in driver.page_source:
break
driver.swipe(x1, y1, x1, y2)
time.sleep(0.5)


if __name__ == '__main__':
handle_douyin(driver)

Appium会先打开抖音, 然后点击搜索图标, 获取搜索栏进行输入, 点击搜索按钮, 点击用户, 点击头像, 点击粉丝, 模拟滑动, 直到没有粉丝了

粉丝入库

使用mitmdump来将数据存入数据库
mitmdump -s xxx.py

1
2
3
4
5
6
7
8
9
10
11
12
13
import json

from douyin.handle_mongo import save_task


def response(flow):
if 'aweme/v1/user/follower/list/' in flow.request.url:
for user in json.loads(flow.response.text)['followers']:
douyin_info = {}
douyin_info['share_id'] = user['uid']
douyin_info['douyin_id'] = user['short_id']
douyin_info['nickname'] = user['nickname']
save_task(douyin_info)

这样在滑动粉丝时, 就会将粉丝的信息添加进数据库

多设备抓取

设置一下appium
appium客户端设置 udid
appium服务端设置 bootstrapPort

需要开启多个模拟器或者多台真机

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
import multiprocessing
import sys
import time
from selenium.webdriver.support.ui import WebDriverWait
from appium import webdriver

# desired_caps = {}
# desired_caps['platformName'] = 'Android'
# desired_caps['deviceName'] = '127.0.0.1:62001'
# desired_caps['platformVersion'] = '5.1.1'
# desired_caps['appPackage'] = 'com.ss.android.ugc.aweme'
# desired_caps['appActivity'] = 'com.ss.android.ugc.aweme.splash.SplashActivity'
# desired_caps['noReset'] = True
# desired_caps['unicodeKeyboard'] = True
# desired_caps['resetKeyboard'] = True
#
# driver = webdriver.Remote('http://localhost:4723/wd/hub', desired_caps)


def get_size(driver):
x = driver.get_window_size()['width']
y = driver.get_window_size()['height']
return (x, y)


def handle_douyin(driver):
while True:
# 定位搜索框
while WebDriverWait(driver, 10).until(lambda x: x.find_element_by_xpath(
"//android.widget.EditText[@resource-id='com.ss.android.ugc.aweme:id/afo']")):
# 获取douyin_id进行搜索
driver.find_element_by_xpath(
"//android.widget.EditText[@resource-id='com.ss.android.ugc.aweme:id/afo']").send_keys('706942127')
while driver.find_element_by_xpath(
"//android.widget.EditText[@resource-id='com.ss.android.ugc.aweme:id/afo']").text != '706942127':
driver.find_element_by_xpath(
"//android.widget.EditText[@resource-id='com.ss.android.ugc.aweme:id/afo']").send_keys('706942127')
time.sleep(0.1)
break
break
# 点击搜索
driver.find_element_by_xpath("//android.widget.TextView[@resource-id='com.ss.android.ugc.aweme:id/afr']").click()

# 点击用户标签
if WebDriverWait(driver, 10).until(lambda x: x.find_element_by_xpath("//android.widget.TextView[@text='用户']")):
driver.find_element_by_xpath("//android.widget.TextView[@text='用户']").click()

# 点击头像
if WebDriverWait(driver, 10).until(lambda x: x.find_element_by_xpath(
"/hierarchy/android.widget.FrameLayout/android.widget.LinearLayout/android.widget.FrameLayout/android.widget.FrameLayout/android.widget.FrameLayout[2]/android.widget.RelativeLayout/android.widget.FrameLayout/android.widget.RelativeLayout/android.support.v4.view.ViewPager/android.widget.LinearLayout/android.widget.FrameLayout/android.view.View/android.support.v7.widget.RecyclerView/android.widget.RelativeLayout[1]/android.widget.RelativeLayout[1]/android.widget.ImageView[2]")):
driver.find_element_by_xpath(
"/hierarchy/android.widget.FrameLayout/android.widget.LinearLayout/android.widget.FrameLayout/android.widget.FrameLayout/android.widget.FrameLayout[2]/android.widget.RelativeLayout/android.widget.FrameLayout/android.widget.RelativeLayout/android.support.v4.view.ViewPager/android.widget.LinearLayout/android.widget.FrameLayout/android.view.View/android.support.v7.widget.RecyclerView/android.widget.RelativeLayout[1]/android.widget.RelativeLayout[1]/android.widget.ImageView[2]").click()
# 点击粉丝按钮
if WebDriverWait(driver, 10).until(lambda x: x.find_element_by_id("com.ss.android.ugc.aweme:id/aj1")):
driver.find_element_by_id("com.ss.android.ugc.aweme:id/aj1").click()

l = get_size(driver)
x1 = int(l[0] * 0.5)
y1 = int(l[1] * 0.75)
y2 = int(l[1] * 0.25)
while True:
if '没有更多了' in driver.page_source:
break
elif '还没有粉丝' in driver.page_source:
break
else:
driver.swipe(x1, y1, x1, y2)
time.sleep(0.5)

driver.find_element_by_id("com.ss.android.ugc.aweme:id/n7").click()
driver.find_element_by_id("com.ss.android.ugc.aweme:id/n7").click()
driver.find_element_by_xpath(
"//android.widget.EditText[@resource-id='com.ss.android.ugc.aweme:id/afo']").clear()


def handle_appium(device, port):
caps = {}
caps["platformName"] = "Android"
caps["deviceName"] = device
caps["platformVersion"] = "5.1.1"
caps["appPackage"] = "com.ss.android.ugc.aweme"
caps["appActivity"] = "com.ss.android.ugc.aweme.splash.SplashActivity"
caps["noReset"] = True
caps["unicodeKeyboard"] = True
caps["resetKeyboard"] = True
caps["udid"] = device

driver = webdriver.Remote('http://localhost:'+str(port)+'/wd/hub', caps)

try:
# 点击搜索图标
while WebDriverWait(driver, 10).until(lambda x: x.find_element_by_xpath(
"//android.widget.LinearLayout[@resource-id='com.ss.android.ugc.aweme:id/aps']")):
driver.find_element_by_xpath(
"//android.widget.LinearLayout[@resource-id='com.ss.android.ugc.aweme:id/aps']").click()
break
except:
print("找不到搜索按钮")

handle_douyin(driver)

if __name__ == '__main__':
m_list = []

devices_list = ['127.0.0.1:62001', '127.0.0.1:62025']
for device in range(len(devices_list)):
port = 4723+2*device
m_list.append(multiprocessing.Process(target=handle_appium, args=(devices_list[device], port)))

for m in m_list:
m.start()

for m in m_list:
m.join()

devices_list 里的数据可以通过adb devices查看

1
2
3
4
C:\Users\IIce>adb devices
List of devices attached
127.0.0.1:62001 device
127.0.0.1:62025 device

抖音视频抓取

从抖音 APP 分享个人信息,复制链接,获得个人主页地址,示例:
https://www.iesdouyin.com/share/user/58862693224

视频接口解析

使用 Chrome 抓包,获取视频列表接口的请求信息

链接参数解析

1
2
3
4
5
6
7
https://www.iesdouyin.com/web/api/v2/aweme/post/?
user_id=58862693224 # 分享链接中的id
count=21 # 视频个数
max_cursor=0 # 翻页使用的参数, 第一次是0, 往后会根据上次的返回结果进行变化
aid=1128 # 固定值
_signature=laPLvBAVyX-c77Gpje7Ys5Wjy6 # 签名值,由签名算法计算
dytk=66cb5d220e0e48ed9195a7f62ac32764 # 不知道是啥, 网页中可直接提取

获取签名算法

打开控制台, 搜索_signature


定位_bytedAcrawler

定位 douyin_falcon:node_modules/byted-acrawler/dist/runtime

定位 __M.define

分析签名算法的执行逻辑

① 定义 __M对象,及其definerequire 函数
② 执行 __M.define("douyin_falcon:node_modules/byted-acrawler/dist/runtime......" 这段代码
③ 执行_bytedAcrawler = require("douyin_falcon:node_modules/byted-acrawler/dist/runtime")

④ 计算签名值 _signature = _bytedAcrawler.sign(user_id)

我们可以自己编写一个html文件, 访问这个文件来得到_signature
淘宝chromedriver镜像

源码地址

关于水印

视频链接的url分两种

  1. https://aweme.snssdk.com/aweme/v1/play/?video_id=v0300f6d0000bj81rdqrh6f3j18kvnpg&line=0&ratio=540p&media_type=4&vr_type=0&improve_bitrate=0
  2. https://aweme.snssdk.com/aweme/v1/playwm/?video_id=v0300f6d0000bj81rdqrh6f3j18kvnpg&line=0&ratio=540p&media_type=4&vr_type=0&improve_bitrate=0

区别:

  1. 第一个请求的是play,第二个请求的是playwm
  2. 第一个网站是打不开的, 第二个可以打开
  3. 都可以使用requests来获取
  4. 第一个是无水印的!!!

通过Postman测试, 发现只保留一个video_id即可

参数说明


has_more来判断是否需要翻页
max_cursor下次请求时需要携带的, 首次为 0

参考

使用 NodeJS 提供抖音签名算法服务

无水印解析

-------------本文结束感谢您的阅读-------------
坚持原创技术分享,您的支持将鼓励我继续创作!
0%