RetroSearch Browse

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from https://github.com/jprante/elasticsearch-plugin-bundle below:

jprante/elasticsearch-plugin-bundle: A bundle of useful Elasticsearch plugins

The usage of decompounds can lead to undesired results regarding phrase queries. After indexing, decompound tokens ca not be distinguished from original tokens. The outcome of a phrase query "Deutsche Bank" could be Deutsche Spielbankgesellschaft, what is clearly an unexpected result. To enable "exact" phrase queries, each decoumpound token is tagged with additional payload data.

To evaluate this payload data, you can use the exact_phrase as a wrapper around a query containing your phrase queries.

use_payload - if set to true, enable payload creation. Default: false

# Langdetect

    curl -XDELETE 'localhost:9200/test'

    curl -XPUT 'localhost:9200/test'

    curl -XPOST 'localhost:9200/test/article/_mapping' -d '
    {
      "article" : {
        "properties" : {
           "content" : { "type" : "langdetect" }
        }
      }
    }
    '

    curl -XPUT 'localhost:9200/test/article/1' -d '
    {
      "title" : "Some title",
      "content" : "Oh, say can you see by the dawn`s early light, What so proudly we hailed at the twilight`s last gleaming?"
    }
    '

    curl -XPUT 'localhost:9200/test/article/2' -d '
    {
      "title" : "Ein Titel",
      "content" : "Einigkeit und Recht und Freiheit für das deutsche Vaterland!"
    }
    '

    curl -XPUT 'localhost:9200/test/article/3' -d '
    {
      "title" : "Un titre",
      "content" : "Allons enfants de la Patrie, Le jour de gloire est arrivé!"
    }
    '

    curl -XGET 'localhost:9200/test/_refresh'

    curl -XPOST 'localhost:9200/test/_search' -d '
    {
       "query" : {
           "term" : {
                "content" : "en"
           }
       }
    }
    '
    curl -XPOST 'localhost:9200/test/_search' -d '
    {
       "query" : {
           "term" : {
                "content" : "de"
           }
       }
    }
    '

    curl -XPOST 'localhost:9200/test/_search' -d '
    {
       "query" : {
           "term" : {
                "content" : "fr"
           }
       }
    }
    '

# Standardnumber

Try it out
----
GET _analyze
{
  "tokenizer": "standard",
  "filter": [
    {
      "type": "standardnumber"
    }
  ],
  "text": "Die ISBN von Elasticsearch in Action lautet 9781617291623"
}
----

    {
       "index" : {
          "analysis" : {
              "filter" : {
                  "standardnumber" : {
                      "type" : "standardnumber"
                  }
              },
              "analyzer" : {
                  "standardnumber" : {
                      "tokenizer" : "whitespace",
                      "filter" : [ "standardnumber", "unique" ]
                  }
              }
          }
       }
    }


- WordDelimiterFilter2: taken from Lucene

- baseform: index also base forms of words (german, english)

- decompound: decompose words if possible (german)

- langdetect: find language code of detected languages

- standardnumber: standard number entity recognition

- hyphen: token filter for shingling and combining hyphenated words (german: Bindestrichwörter), the opposite of the decompound token filter

- sortform: process string forms for bibliographical sorting, taking non-sort areas into account

- year: token filter for 4-digit sequences

- reference:


## Crypt mapper

    {
        "someType" : {
            "_source" : {
                "enabled": false
            },
            "properties" : {
                "someField":{ "type" : "crypt", "algo": "SHA-512" }
            }
        }
    }

## Issues

All feedback is welcome! If you find issues, please post them at [Github](https://github.com/jprante/elasticsearch-plugin-bundle/issues)

# References

The decompunder is a derived work of ASV toolbox http://asv.informatik.uni-leipzig.de/asv/methoden

Copyright (C) 2005 Abteilung Automatische Sprachverarbeitung, Institut für Informatik, Universität Leipzig

The Compact Patricia Trie data structure can be found in

*Morrison, D.: Patricia - practical algorithm to retrieve information coded in alphanumeric. Journal of ACM, 1968, 15(4):514–534*

The compound splitter used for generating features for document classification is described in

*Witschel, F., Biemann, C.: Rigorous dimensionality reduction through linguistically motivated feature selection for text categorization. Proceedings of NODALIDA 2005, Joensuu, Finland*

The base form reduction step (for Norwegian) is described in

*Eiken, U.C., Liseth, A.T., Richter, M., Witschel, F. and Biemann, C.: Ord i Dag: Mining Norwegian Daily Newswire. Proceedings of FinTAL, Turku, 2006, Finland*



# License

elasticsearch-plugin-bundle - a compilation of useful plugins for Elasticsearch

Copyright (C) 2014 Jörg Prante

This program is free software: you can redistribute it and/or modify
it under the terms of the GNU Affero General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU Affero General Public License for more details.

You should have received a copy of the GNU Affero General Public License
along with this program.  If not, see <http://www.gnu.org/licenses/>.

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4