A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://github.com/keyATM/keyATM/issues/222 below:

An option to drop topics which failed `keyATM:::check_keywords` · Issue #222 · keyATM/keyATM · GitHub

Requested feature

The current setting for either keyATM(options = list(prune = TRUE)) or keyATM(options = list(prune = FALSE)) is to raise an error when there is at least one topic with all keywords pruned. It might be a nice feature to remind the user to refine the keywords. But it can be undesirable in cases where one wants to run everything automatically.

My current solution is to steal and modify keyATM:::check_keywords and do the filtering of the keywords first. It works for now. But it would be nice to have it by default.

## stole from keyATM:::check_keywords, modified to return good keyworded topics
check_keywords <- function(docs, keywords) {
    info <- list()
    
    if (is.null(docs$wd_names)) {
        info$wd_names <- unique(unlist(docs$W_raw, use.names = FALSE, recursive = FALSE))
        keyATM:::check_vocabulary(info$wd_names)
    } else {
        info$wd_names <- docs$wd_names
    }

    unique_words <- info$wd_names
    
    # Prune keywords that do not appear in the corpus
    keywords_flat <- unlist(keywords, use.names = FALSE, recursive = FALSE)
    non_existent <- keywords_flat[!keywords_flat %in% unique_words]
    keywords <- lapply(keywords, function(x) {x[!x %in% non_existent]})
    # Check there is at least one keywords in each topic
    num_keywords <- unlist(lapply(keywords, length))
    check_zero <- which(as.vector(num_keywords) != 0)
    return(check_zero)
}
keyATM_docs <- keyATM::keyATM_read(texts = data.frame(text = c("a", "a", "c")))
dict <- list(a = c("a"), b = ("b"))
dict <- dict[check_keywords(keyATM_docs, dict)]

I can look into it and submit a PR (perhaps adding one more option drop_empty_topics, default to FALSE), if you find it useful.


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4