Wikipedia-API
is easy to use Python wrapper for Wikipedias' API. It supports extracting texts, sections, links, categories, translations, etc from Wikipedia. Documentation provides code snippets for the most common use cases.
This package requires at least Python 3.9 to install because it's using IntEnum.
pip3 install wikipedia-api
Goal of Wikipedia-API
is to provide simple and easy to use API for retrieving informations from Wikipedia. Bellow are examples of common use cases.
Getting single page is straightforward. You have to initialize Wikipedia
object and ask for page by its name. To initialize it, you have to provide:
import wikipediaapi wiki_wiki = wikipediaapi.Wikipedia(user_agent='MyProjectName (merlin@example.com)', language='en') page_py = wiki_wiki.page('Python_(programming_language)')How To Check If Wiki Page Exists
For checking, whether page exists, you can use function exists
.
page_py = wiki_wiki.page('Python_(programming_language)') print("Page - Exists: %s" % page_py.exists()) # Page - Exists: True page_missing = wiki_wiki.page('NonExistingPageWithStrangeName') print("Page - Exists: %s" % page_missing.exists()) # Page - Exists: False
Class WikipediaPage
has property summary
, which returns description of Wiki page.
import wikipediaapi wiki_wiki = wikipediaapi.Wikipedia('MyProjectName (merlin@example.com)', 'en') print("Page - Title: %s" % page_py.title) # Page - Title: Python (programming language) print("Page - Summary: %s" % page_py.summary[0:60]) # Page - Summary: Python is a widely used high-level programming language for
WikipediaPage
has two properties with URL of the page. It is fullurl
and canonicalurl
.
print(page_py.fullurl) # https://en.wikipedia.org/wiki/Python_(programming_language) print(page_py.canonicalurl) # https://en.wikipedia.org/wiki/Python_(programming_language)
To get full text of Wikipedia page you should use property text
which constructs text of the page as concatanation of summary and sections with their titles and texts.
wiki_wiki = wikipediaapi.Wikipedia( user_agent='MyProjectName (merlin@example.com)', language='en', extract_format=wikipediaapi.ExtractFormat.WIKI ) p_wiki = wiki_wiki.page("Test 1") print(p_wiki.text) # Summary # Section 1 # Text of section 1 # Section 1.1 # Text of section 1.1 # ... wiki_html = wikipediaapi.Wikipedia( user_agent='MyProjectName (merlin@example.com)', language='en', extract_format=wikipediaapi.ExtractFormat.HTML ) p_html = wiki_html.page("Test 1") print(p_html.text) # <p>Summary</p> # <h2>Section 1</h2> # <p>Text of section 1</p> # <h3>Section 1.1</h3> # <p>Text of section 1.1</p> # ...
To get all top level sections of page, you have to use property sections
. It returns list of WikipediaPageSection
, so you have to use recursion to get all subsections.
def print_sections(sections, level=0): for s in sections: print("%s: %s - %s" % ("*" * (level + 1), s.title, s.text[0:40])) print_sections(s.sections, level + 1) print_sections(page_py.sections) # *: History - Python was conceived in the late 1980s, # *: Features and philosophy - Python is a multi-paradigm programming l # *: Syntax and semantics - Python is meant to be an easily readable # **: Indentation - Python uses whitespace indentation, rath # **: Statements and control flow - Python's statements include (among other # **: Expressions - Some Python expressions are similar to lHow To Get Page Section By Title
To get last section of page with given title, you have to use function section_by_title
. It returns the last WikipediaPageSection
with this title.
section_history = page_py.section_by_title('History') print("%s - %s" % (section_history.title, section_history.text[0:40])) # History - Python was conceived in the late 1980s bHow To Get All Page Sections By Title
To get all sections of page with given title, you have to use function sections_by_title
. It returns the all WikipediaPageSection
with this title.
page_1920 = wiki_wiki.page('1920') sections_january = page_1920.sections_by_title('January') for s in sections_january: print("* %s - %s" % (s.title, s.text[0:40])) # * January - January 1 # Polish–Soviet War in 1920: The # * January - January 2 # Isaac Asimov, American author # * January - January 1 – Zygmunt Gorazdowski, PolishHow To Get Page In Other Languages
If you want to get other translations of given page, you should use property langlinks
. It is map, where key is language code and value is WikipediaPage
.
def print_langlinks(page): langlinks = page.langlinks for k in sorted(langlinks.keys()): v = langlinks[k] print("%s: %s - %s: %s" % (k, v.language, v.title, v.fullurl)) print_langlinks(page_py) # af: af - Python (programmeertaal): https://af.wikipedia.org/wiki/Python_(programmeertaal) # als: als - Python (Programmiersprache): https://als.wikipedia.org/wiki/Python_(Programmiersprache) # an: an - Python: https://an.wikipedia.org/wiki/Python # ar: ar - بايثون: https://ar.wikipedia.org/wiki/%D8%A8%D8%A7%D9%8A%D8%AB%D9%88%D9%86 # as: as - পাইথন: https://as.wikipedia.org/wiki/%E0%A6%AA%E0%A6%BE%E0%A6%87%E0%A6%A5%E0%A6%A8 page_py_cs = page_py.langlinks['cs'] print("Page - Summary: %s" % page_py_cs.summary[0:60]) # Page - Summary: Python (anglická výslovnost [ˈpaiθtən]) je vysokoúrovňový skHow To Get Links To Other Pages
If you want to get all links to other wiki pages from given page, you need to use property links
. It's map, where key is page title and value is WikipediaPage
.
def print_links(page): links = page.links for title in sorted(links.keys()): print("%s: %s" % (title, links[title])) print_links(page_py) # 3ds Max: 3ds Max (id: ??, ns: 0) # ?:: ?: (id: ??, ns: 0) # ABC (programming language): ABC (programming language) (id: ??, ns: 0) # ALGOL 68: ALGOL 68 (id: ??, ns: 0) # Abaqus: Abaqus (id: ??, ns: 0) # ...How To Get Page Categories
If you want to get all categories under which page belongs, you should use property categories
. It's map, where key is category title and value is WikipediaPage
.
def print_categories(page): categories = page.categories for title in sorted(categories.keys()): print("%s: %s" % (title, categories[title])) print("Categories") print_categories(page_py) # Category:All articles containing potentially dated statements: ... # Category:All articles with unsourced statements: ... # Category:Articles containing potentially dated statements from August 2016: ... # Category:Articles containing potentially dated statements from March 2017: ... # Category:Articles containing potentially dated statements from September 2017: ...How To Get All Pages From Category
To get all pages from given category, you should use property categorymembers
. It returns all members of given category. You have to implement recursion and deduplication by yourself.
def print_categorymembers(categorymembers, level=0, max_level=1): for c in categorymembers.values(): print("%s: %s (ns: %d)" % ("*" * (level + 1), c.title, c.ns)) if c.ns == wikipediaapi.Namespace.CATEGORY and level < max_level: print_categorymembers(c.categorymembers, level=level + 1, max_level=max_level) cat = wiki_wiki.page("Category:Physics") print("Category members: Category:Physics") print_categorymembers(cat.categorymembers) # Category members: Category:Physics # * Statistical mechanics (ns: 0) # * Category:Physical quantities (ns: 14) # ** Refractive index (ns: 0) # ** Vapor quality (ns: 0) # ** Electric susceptibility (ns: 0) # ** Specific weight (ns: 0) # ** Category:Viscosity (ns: 14) # *** Brookfield Engineering (ns: 0)
Official API supports many different parameters. You can see them in the sandbox. Not all these parameters are supported directly as parameters of the functions. If you want to specify them, you can pass them as additional parameters in the constructor. For the info API call you can specify parameter converttitles. If you want to specify it, you can use:
import sys import wikipediaapi wiki_wiki = wikipediaapi.Wikipedia('MyProjectName (merlin@example.com)', 'zh', 'zh-tw', extra_api_params={'converttitles': 1}) page = wiki_wiki.page("孟卯") print(repr(page.varianttitles))How To See Underlying API Call
If you have problems with retrieving data you can get URL of undrerlying API call. This will help you determine if the problem is in the library or somewhere else.
import sys import wikipediaapi wikipediaapi.log.setLevel(level=wikipediaapi.logging.DEBUG) # Set handler if you use Python in interactive mode out_hdlr = wikipediaapi.logging.StreamHandler(sys.stderr) out_hdlr.setFormatter(wikipediaapi.logging.Formatter('%(asctime)s %(message)s')) out_hdlr.setLevel(wikipediaapi.logging.DEBUG) wikipediaapi.log.addHandler(out_hdlr) wiki = wikipediaapi.Wikipedia(user_agent='MyProjectName (merlin@example.com)', language='en') page_ostrava = wiki.page('Ostrava') print(page_ostrava.summary) # logger prints out: Request URL: http://en.wikipedia.org/w/api.php?action=query&prop=extracts&titles=Ostrava&explaintext=1&exsectionformat=wiki
.. toctree:: :maxdepth: 2 API CHANGES DEVELOPMENT wikipediaapi/api
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4