A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://www.geeksforgeeks.org/extracting-an-attribute-value-with-beautifulsoup-in-python below:

Extracting an attribute value with beautifulsoup in Python

Extracting an attribute value with beautifulsoup in Python

Last Updated : 29 Dec, 2020

Prerequisite: Beautifulsoup Installation

Attributes are provided by Beautiful Soup which is a web scraping framework for Python. Web scraping is the process of extracting data from the website using automated tools to make the process faster. A tag may have any number of attributes. For example, the tag <b class="active"> has an attribute “class” whose value is “active”. We can access a tag’s attributes by treating it like a dictionary.


Syntax: 

tag.attrs

Implementation:
Example 1: Program to extract the attributes using attrs approach.
 

Python3
# Import Beautiful Soup
from bs4 import BeautifulSoup

# Initialize the object with a HTML page
soup = BeautifulSoup('''
    <html>
        <h2 class="hello"> Heading 1 </h2>
        <h1> Heading 2 </h1>
    </html>
    ''', "lxml")

# Get the whole h2 tag
tag = soup.h2

# Get the attribute
attribute = tag.attrs

# Print the output
print(attribute)

Output: 

{'class': ['hello']}

Example 2: Program to extract the attributes using dictionary approach.
 

Python3
# Import Beautiful Soup
from bs4 import BeautifulSoup

# Initialize the object with a HTML page
soup = BeautifulSoup('''
    <html>
        <h2 class="hello"> Heading 1 </h2>
        <h1> Heading 2 </h1>
    </html>
    ''', "lxml")

# Get the whole h2 tag
tag = soup.h2

# Get the attribute
attribute = tag['class']

# Print the output
print(attribute)

Output: 

['hello']


Example 3: Program to extract the multiple attribute values using dictionary approach.

Python3
# Import Beautiful Soup
from bs4 import BeautifulSoup

# Initialize the object with a HTML page
soup = BeautifulSoup('''
    <html>
        <h2 class="first second third"> Heading 1 </h2>
        <h1> Heading 2 </h1>
    </html>
    ''', "lxml")

# Get the whole h2 tag
tag = soup.h2

# Get the attribute
attribute = tag['class']

# Print the output
print(attribute)

Output: 

['first', 'second', 'third']


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4