| Library resources | |
|---|---|
| PyPI | https://pypi.org/project/beautifulsoup4/ | 
| Github | --- | 
| Documentation | https://beautiful-soup-4.readthedocs.io/en/latest/ | 
Getting started
pip3 install beautifulsoup4
Usage
from bs4 import BeautifulSoup
file = 'path/to/folder/file.html'
soup = BeautifulSoup(open(file), 'html.parser')
books = soup.find_all('h3')
for book in books:
    title = book.text
    print(f"{title=}")
    y = x.contents[0]
    try:
        link = y['href']
    except:
        link = ''
    print(f"{link=}")
    author = x.next_sibling.next_sibling.text
    print(f"{author=}")
see also:
Target by ID with wildcard
useful when IDs or classes have random strings within them, eg:
-<div id="content-author-B00CR42MOY" class="information_row">C. S. Lewis</div>
-<div id="content-author-B08CGP9TJ7" class="information_row">Ernest Cline</div>
Snippet:
author = x.find("div", {"id" : lambda L: L and L.startswith('content-author')}).text
Find the next element after a tag
soup.head.next_element.next_element
Find an element based on a custom tag attribute
eg. data-field attribute:   
if content.find('a', {'data-field': 'experience_company_logo'}):