当前位置:
首页 > 博客 > xpath学习汇总

xpath学习汇总

提取网页内容常用的方法是requests+Beautifulsoup

还有一种方法,我觉得也很好用,requests+xpath

实例:

import requests
from lxml import etree


url = 'http://book.dangdang.com/'
header = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36'
}
#请求
res = requests.get(url, headers = header)

#etree方法
tree = etree.HTML(res.content)
a = tree.xpath('//div[@class="sub"]//a/text()')
b = tree.xpath('//div[@class="sub"]//a')
for i in b:
    print(i.xpath('text()'))

 

ui设计爱好者