BeautifulSoup4 介绍与安装

Beautiful Soup
从HTML或XML文件中提取数据的Python库(bs4)
Beautiful Soup 4.4.0 文档

安装 Beautiful Soup(且要安装lxml作为解析器)
pip install beautifulsoup4
pip install lxml
源码:Beautiful Soup

BeautifulSoup4 使用

  1. 导入 Beautiful Soup 库
    from bs4 import BeautifulSoup
    import requests

  2. 使用 requests 获取网页内容
    url = 'https://baidu.com/'
    response = requests.get(url)

  3. 创建 BeautifulSoup 对象
    soup = BeautifulSoup(response.text, 'lxml')

  4. 输出结果
    print(soup)

1
2
3
4
5
6
7
from bs4 import BeautifulSoup
import requests

url = 'https://baidu.com/'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'lxml') # 使用 lxml 解析器
print(soup)