Title: Parse HTML
Slug: parse_html
Summary: How to parse HTML for machine learning in Python.
Date: 2016-09-07 12:00
Category: Machine Learning
Tags: Preprocessing Text
Authors: Chris Albon
In [4]:
# Load library
from bs4 import BeautifulSoup
In [5]:
# Create some HTML code
html = "<div class='full_name'><span style='font-weight:bold'>Masego</span> Azra</div>"
In [6]:
# Parse html
soup = BeautifulSoup(html, "lxml")
# Find the div with the class "full_name", show text
soup.find("div", { "class" : "full_name" }).text
Out[6]: