Beautifulsoup4 Essentials

I love the utility made possible by beautifulsoup4, aka bs4. I also struggle with the docs every single time I pick it up. Somehow, I find it hard to find the parts which I need, and end up searching for them a long. Which is annoying the 5th time around.

So here’s a quick overview of the most essential snippets. You’re welcome future-me.

Installation & Docs

The package is called beautifulsoup4. You can find the docs here.

Import & Reading HTML Data

from bs4 import BeautifulSoup
soup = BeautifulSoup(htmlString, 'html.parser')

The soup variable is the one we’re going to work with.

Finding Elements

From the docs: find_all(name, attrs, recursive, string, limit, **kwargs)

# each element in the list is a soup-like object?
result_list = soup.find_all("a")

soup.find_all(id="the-id")

# a element with css class something
soup.find_all("a", class_="something")

# you can also pass a function
def function_evaluating_value(href):
    return href == "value"

soup.find_all(href=function_evaluating_value)

Getting Content

# or an element
element.get_text()

# get an attribute
# will return None if there is none
# the ['href'] notation will raise a KeyError
element.get('href')

# some are lists
# https://www.crummy.com/software/BeautifulSoup/bs4/doc/#multi-valued-attributes
element.get('class')

# all attributed
element.attrs

# tag name
element.name

That’s it! Happy soup-ing :)

Get updates
You will get emails when I publish something new or have something useful to share about making profitable digital businesses less brittle, less founder-dependent and easier to run.

Expect practical notes on backups, safer deploys, infrastructure as code, observability, documentation and automation. Usually no more than one email per week. You can unsubscribe at any time.

Für den Versand unserer Newsletter nutzen wir rapidmail. Mit Ihrer Anmeldung stimmen Sie zu, dass die eingegebenen Daten an rapidmail übermittelt werden. Beachten Sie bitte auch die AGB und Datenschutzbestimmungen .

vsupalov.com

© 2024 vsupalov.com. All rights reserved.