Last Updated : 23 Jul, 2025
In this article, we are going to see how to scrape Reddit with Python and BeautifulSoup. Here we will use Beautiful Soup and the request module to scrape the data.
Module neededpip install bs4
pip install requests
Approach:
Syntax: requests.get(url, args)
Syntax: soup = BeautifulSoup(r.content, 'html5lib')
Parameters:
- r.content : It is the raw HTML content.
- html.parser : Specifying the HTML parser we want to use.
Let’s see the stepwise execution of the script.
Step 1: Import all dependence
Python3
# import module
import requests
from bs4 import BeautifulSoup
Step 2: Create a URL get function
Python3
# user define function
# Scrape the data
def getdata(url):
r = requests.get(url, headers = HEADERS)
return r.text
Step 3: Now take the URL and pass the URL into the getdata() function and Convert that data into HTML code.
Python3
url = "https://www.reddit.com/r/learnpython/comments/78qnze/web_scraping_in_20_lines_of_code_with/"
# pass the url
# into getdata function
htmldata = getdata(url)
soup = BeautifulSoup(htmldata, 'html.parser')
# display html code
print(soup)
Output:
Note: This is only HTML code or Raw data.
Getting Author NameNow find authors with a div tag where class_ ="NAURX0ARMmhJ5eqxQrlQW". We can open the webpage in the browser and inspect the relevant element by pressing right-click as shown in the figure.
Example:
Python3
# find the Html tag
# with find()
# and convert into string
data_str = ""
for item in soup.find_all("div", class_="NAURX0ARMmhJ5eqxQrlQW"):
data_str = data_str + item.get_text()
print(data_str)
Output:
kashazizGetting article contains
Now find the article text, here we will follow the same methods as the above example.
Example:
Python3
# find the Html tag
# with find()
# and convert into string
data_str = ""
result = ""
for item in soup.find_all("div", class_="_3xX726aBn29LDbsDtzr_6E _1Ap4F5maDtT1E1YuCiaO0r D3IL3FD0RFy_mkKLPwL4"):
data_str = data_str + item.get_text()
print(data_str)
Output:
Getting the commentsNow Scrape the comments, here we will follow the same methods as the above example.
Python3
# find the Html tag
# with find()
# and convert into string
data_str = ""
for item in soup.find_all("p", class_="_1qeIAgB0cPwnLhDF9XSiJM"):
data_str = data_str + item.get_text()
print(data_str)
Output:
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4