RetroSearch Browse

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from https://github.com/ADGEfficiency/climate-news-db below:

ADGEfficiency/climate-news-db: A database of climate change newspaper articles

The climate-news-db has two goals:

create a dataset of climate change newspaper articles for NLP researchers,
provide a web application for users to view climate change news.

Pulls urls.jsonl from S3 and crawls articles into articles/{newspaper}.jsonl and into database:

Take urls from articles/{newspaper}.jsonl and saves into database:

This is useful when you want to re-create the database without scraping articles.

Interactive Search for Getting URLs

Requires Go + Gum

$ ./scripts/search-cli.sh

graph LR
  1(urls.jsonl) -->|make crawl| 2(articles.jsonl)
  2(articles.jsonl) -->|make crawl, make regen-db| 3(database)

Loading

{"url": "https://www.chinadaily.com.cn/a/202302/21/WS63f4aea4a31057c47ebb004e.html", "search_time_utc": "2023-03-20T00:05:02.998560"}
{"url": "https://www.chinadaily.com.cn/a/202301/19/WS63c8a4a8a31057c47ebaa8e4.html", "search_time_utc": "2023-03-20T00:05:02.998560"}

Append only storage of raw newspaper urls. Created by a daily Google search for each newspaper with the keywords climate change and climate crisis. This file contains many duplicates.

Deployed as a Fly.IO app:

Deployed with AWS CDK:

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4