RetroSearch Browse

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from https://github.com/alphapapa/org-web-tools below:

alphapapa/org-web-tools: View, capture, and archive Web pages in Org-mode

This file contains library functions and commands useful for retrieving web page content and processing it into Org-mode content.

For example, you can copy a URL to the clipboard or kill-ring, then run a command that downloads the page, isolates the “readable” content with eww-readable, converts it to Org-mode content with Pandoc, and displays it in an Org-mode buffer. Another command does all of that but inserts it as an Org entry instead of displaying it in a new buffer.

Emacs 27.1 or later.
Commands that process HTML into Org require Pandoc. Note: The output of current Pandoc versions differs substantially from versions that may still be present in stable Linux distros. If you encounter any issues, please install a more recent version of Pandoc.

After installing from MELPA, just run one of the commands below. If you want to use any of the functions in your own code, you should (require 'org-web-tools).

org-web-tools-insert-link-for-url: Insert an Org-mode link to the URL in the clipboard or kill-ring. Downloads the page to get the HTML title.
org-web-tools-insert-web-page-as-entry: Insert the web page for the URL in the clipboard or kill-ring as an Org-mode entry, as a sibling heading of the current entry.
org-web-tools-read-url-as-org: Display the web page for the URL in the clipboard or kill-ring as Org-mode text in a new buffer, processed with eww-readable.
org-web-tools-convert-links-to-page-entries: Convert all URLs and Org links in current Org entry to Org headings, each containing the web page content of that URL, converted to Org-mode text and processed with eww-readable. This should be called on an entry that solely contains a list of URLs or links.
org-web-tools-archive-attach: Download archive of page at URL and attach with org-attach. If CHOOSE-FN is non-nil (interactively, with universal prefix), prompt for the archive function to use. If VIEW is non-nil (interactively, with two universal prefixes), view the archive immediately after attaching. (See also org-board).
org-web-tools-archive-view: Open Zip file archive of web page. Extracts to a temp directory and opens with browse-url-default-browser. Note: the extracted files are left on-disk in the temp directory.

These are used in the commands above and may be useful in building your own commands.

org-web-tools--dom-to-html: Return parsed HTML DOM as an HTML string. Note: This is an approximation and is not necessarily correct HTML (e.g. IMG tags may be rendered with a closing “</img>” tag).
org-web-tools--eww-readable: Return “readable” part of HTML with title.
org-web-tools--get-url: Return content for URL as string.
org-web-tools--html-to-org-with-pandoc: Return string of HTML converted to Org with Pandoc. When SELECTOR is non-nil, the HTML is filtered using esxml-query SELECTOR and re-rendered to HTML with org-web-tools--dom-to-html, which see.
org-web-tools--url-as-readable-org: Return string containing Org entry of URL’s web page content. Content is processed with eww-readable and Pandoc. Entry will be a top-level heading, with article contents below a second-level “Article” heading, and a timestamp in the first-level entry for writing comments.
org-web-tools--demote-headings-below: Demote all headings in buffer so the highest level is below LEVEL.
org-web-tools--get-first-url: Return URL in clipboard, or first URL in the kill-ring, or nil if none.
org-web-tools--read-url: Return a URL by searching at point, then in clipboard, then in kill-ring, and finally prompting the user.
org-web-tools--read-org-bracket-link: Return (TARGET . DESCRIPTION) for Org bracket LINK or next link on current line.
org-web-tools--remove-dos-crlf: Remove all DOS CRLF (^M) in buffer.

Changes

Errors from Pandoc are now displayed. (#47. Thanks to c1-g.)

Fixes

Default options to Wget (see #35).
Finding URL in clipboard on MacOS and Windows. (See #66. Thanks to @askdkc.)
Org timestamp format when inserting pages. (#54. Thanks to p4v4n for reporting.)

Internal

Use plz HTTP library and make various related optimizations.

Removed

Internal function org-web-tools--html-title. (If your program used this function, it’s trivially reimplemented; see source code.)

Improvements

Archiving tools:
- Can use multiple functions to attempt archiving.
- Associated options control retry attempts, delays, and fallbacks to other functions.
- Functions to archive Web pages with wget and tar:
  - Function org-web-tools-archive--wget-tar archives a URL’s Web page, including page resources.
  - Function org-web-tools-archive--wget-tar-html-only archives a URL’s HTML only.
- Command org-web-tools-archive-view handles both zip and tar archives.
- The default settings use wget and tar to archive pages (because the archive.today service has not worked reliably with external tools for a long time).

Changes

Option org-web-tools-archive-fn defaults to using wget and tar to archive pages to XZ archives with HTML and page resources. (The archive.is service has not worked reliably with other tools for a long time.)

Fixes

org-web-tools--org-link-for-url now returns the URL if the HTML page has no title tag. This avoids an error, e.g. when used in an Org capture template.

Compatibility

Emacs 27.1 or later is now required.
Updated for Org 9.3’s changes to org-bracket-link-regexp. (Thanks to Aaron Zeng and Akira Komamura.)
Activate org-mode in temporary buffer for org-web-tools--html-to-org-with-pandoc. (#56. Thanks to mooseyboots.)
Use compat library.

Fixed

Only test non-nil items in org-web-tools--get-first-url. This makes it work properly in non-GUI Emacs sessions. (Thanks to Ben Sima for reporting.)

Fixed

Require org-attach.

Additions

Command org-web-tools-attach-url-archive.
Command org-web-tools-view-archive.
Function org-web-tools--read-url.

Changes

Remove all property drawers that contain the CUSTOM_ID property from Pandoc output.

First declared stable release.

Contributions and suggestions are welcome.

GPLv3

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4