A fast version of the Rapid Automatic Keyword Extraction (RAKE) algorithm
Assuming you're using Maven, follow these two steps to use rakidrake
in your Java project:
rapidrake
in your POM:<dependency> <groupId>io.github.crew102</groupId> <artifactId>rapidrake</artifactId> <version>0.1.4</version> </dependency>
opennlp
trained models for sentence detection and part-of-speech tagging. You can find these two models (trained on various languages) on opennlp's model page. For example, you could use the English versions of the sentence detection and POS-tagger models. You'll specify the file paths to these models when you instantiate a RakeAlgorithm
object (see below for example).import io.github.crew102.rapidrake.RakeAlgorithm; import io.github.crew102.rapidrake.data.SmartWords; import io.github.crew102.rapidrake.model.RakeParams; import io.github.crew102.rapidrake.model.Result; public class Example { public static void main(String[] args) throws java.io.IOException { // Create an object to hold algorithm parameters String[] stopWords = new SmartWords().getSmartWords(); String[] stopPOS = {"VB", "VBD", "VBG", "VBN", "VBP", "VBZ"}; int minWordChar = 1; boolean shouldStem = true; String phraseDelims = "[-,.?():;\"!/]"; RakeParams params = new RakeParams(stopWords, stopPOS, minWordChar, shouldStem, phraseDelims); // Create a RakeAlgorithm object // You can use the RakeAlgorithm(RakeParams, POSTaggerME, SentenceDetectorME) // constructor instead of the one shown below if you want to pass in // pre-initialized opennlp models. String POStaggerURL = "model-bin/en-pos-maxent.bin"; // The path to your POS tagging model String SentDetectURL = "model-bin/en-sent.bin"; // The path to your sentence detection model RakeAlgorithm rakeAlg = new RakeAlgorithm(params, POStaggerURL, SentDetectURL); // Call the rake method String txt = "dogs are great, don't you agree? I love dogs, especially big dogs"; Result result = rakeAlg.rake(txt); // Print the resulting keywords (not stemmed) System.out.println(result.distinct()); } } // [dogs (1.33), great (1), big dogs (3.33)]
You can learn more about how RAKE works and the various parameters you can set by visiting slowraker
's website.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4