The current setting for either keyATM(options = list(prune = TRUE))
or keyATM(options = list(prune = FALSE))
is to raise an error when there is at least one topic with all keywords pruned. It might be a nice feature to remind the user to refine the keywords. But it can be undesirable in cases where one wants to run everything automatically.
My current solution is to steal and modify keyATM:::check_keywords
and do the filtering of the keywords first. It works for now. But it would be nice to have it by default.
## stole from keyATM:::check_keywords, modified to return good keyworded topics check_keywords <- function(docs, keywords) { info <- list() if (is.null(docs$wd_names)) { info$wd_names <- unique(unlist(docs$W_raw, use.names = FALSE, recursive = FALSE)) keyATM:::check_vocabulary(info$wd_names) } else { info$wd_names <- docs$wd_names } unique_words <- info$wd_names # Prune keywords that do not appear in the corpus keywords_flat <- unlist(keywords, use.names = FALSE, recursive = FALSE) non_existent <- keywords_flat[!keywords_flat %in% unique_words] keywords <- lapply(keywords, function(x) {x[!x %in% non_existent]}) # Check there is at least one keywords in each topic num_keywords <- unlist(lapply(keywords, length)) check_zero <- which(as.vector(num_keywords) != 0) return(check_zero) } keyATM_docs <- keyATM::keyATM_read(texts = data.frame(text = c("a", "a", "c"))) dict <- list(a = c("a"), b = ("b")) dict <- dict[check_keywords(keyATM_docs, dict)]
I can look into it and submit a PR (perhaps adding one more option drop_empty_topics
, default to FALSE
), if you find it useful.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4