RetroSearch Browse

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from https://github.com/makcedward/nlpaug below:

makcedward/nlpaug: Data augmentation for NLP

This python library helps you with augmenting nlp for your machine learning projects. Visit this introduction to understand about Data Augmentation in NLP. Augmenter is the basic element of augmentation while Flow is a pipeline to orchestra multi augmenter together.

Generate synthetic data for improving model performance without manual effort
Simple, easy-to-use and lightweight library. Augment data in 3 lines of code
Plug and play to any machine leanring/ neural network frameworks (e.g. scikit-learn, PyTorch, TensorFlow)
Support textual and audio input

Textual Data Augmentation Example

Acoustic Data Augmentation Example

Quick Example
Example of Augmentation for Textual Inputs
Example of Augmentation for Multilingual Textual Inputs
Example of Augmentation for Spectrogram Inputs
Example of Augmentation for Audio Inputs
Example of Orchestra Multiple Augmenters
Example of Showing Augmentation History
How to train TF-IDF model
How to train LAMBADA model
How to create custom augmentation
API Documentation

Augmenter Target Augmenter Action Description Textual Character KeyboardAug substitute Simulate keyboard distance error Textual OcrAug substitute Simulate OCR engine error Textual RandomAug insert, substitute, swap, delete Apply augmentation randomly Textual Word AntonymAug substitute Substitute opposite meaning word according to WordNet antonym Textual ContextualWordEmbsAug insert, substitute Feeding surroundings word to BERT, DistilBERT, RoBERTa or XLNet language model to find out the most suitlabe word for augmentation Textual RandomWordAug swap, crop, delete Apply augmentation randomly Textual SpellingAug substitute Substitute word according to spelling mistake dictionary Textual SplitAug split Split one word to two words randomly Textual SynonymAug substitute Substitute similar word according to WordNet/ PPDB synonym Textual TfIdfAug insert, substitute Use TF-IDF to find out how word should be augmented Textual WordEmbsAug insert, substitute Leverage word2vec, GloVe or fasttext embeddings to apply augmentation Textual BackTranslationAug substitute Leverage two translation models for augmentation Textual ReservedAug substitute Replace reserved words Textual Sentence ContextualWordEmbsForSentenceAug insert Insert sentence according to XLNet, GPT2 or DistilGPT2 prediction Textual AbstSummAug substitute Summarize article by abstractive summarization method Textual LambadaAug substitute Using language model to generate text and then using classification model to retain high quality results Signal Audio CropAug delete Delete audio's segment Signal LoudnessAug substitute Adjust audio's volume Signal MaskAug substitute Mask audio's segment Signal NoiseAug substitute Inject noise Signal PitchAug substitute Adjust audio's pitch Signal ShiftAug substitute Shift time dimension forward/ backward Signal SpeedAug substitute Adjust audio's speed Signal VtlpAug substitute Change vocal tract Signal NormalizeAug substitute Normalize audio Signal PolarityInverseAug substitute Swap positive and negative for audio Signal Spectrogram FrequencyMaskingAug substitute Set block of values to zero according to frequency dimension Signal TimeMaskingAug substitute Set block of values to zero according to time dimension Signal LoudnessAug substitute Adjust volume Augmenter Augmenter Description Pipeline Sequential Apply list of augmentation functions sequentially Pipeline Sometimes Apply some augmentation functions randomly

The library supports python 3.5+ in linux and window platform.

To install the library:

pip install numpy requests nlpaug

or install the latest version (include BETA features) from github directly

pip install numpy git+https://github.com/makcedward/nlpaug.git

or install over conda

conda install -c makcedward nlpaug

If you use BackTranslationAug, ContextualWordEmbsAug, ContextualWordEmbsForSentenceAug and AbstSummAug, installing the following dependencies as well

pip install torch>=1.6.0 transformers>=4.11.3 sentencepiece

If you use LambadaAug, installing the following dependencies as well

pip install simpletransformers>=0.61.10

If you use AntonymAug, SynonymAug, installing the following dependencies as well

If you use WordEmbsAug (word2vec, glove or fasttext), downloading pre-trained model first and installing the following dependencies as well

from nlpaug.util.file.download import DownloadUtil
DownloadUtil.download_word2vec(dest_dir='.') # Download word2vec model
DownloadUtil.download_glove(model_name='glove.6B', dest_dir='.') # Download GloVe model
DownloadUtil.download_fasttext(model_name='wiki-news-300d-1M', dest_dir='.') # Download fasttext model

pip install gensim>=4.1.2

If you use SynonymAug (PPDB), downloading file from the following URI. You may not able to run the augmenter if you get PPDB file from other website

http://paraphrase.org/#/download

If you use PitchAug, SpeedAug and VtlpAug, installing the following dependencies as well

pip install librosa>=0.9.1 matplotlib

See changelog for more details.

This library uses data (e.g. capturing from internet), research (e.g. following augmenter idea), model (e.g. using pre-trained model) See data source for more details.

@misc{ma2019nlpaug,
  title={NLP Augmentation},
  author={Edward Ma},
  howpublished={https://github.com/makcedward/nlpaug},
  year={2019}
}

This package is cited by many books, workshop and academic research papers (70+). Here are some of examples and you may visit here to get the full list.

S. Vajjala. NLP without a readymade labeled dataset at Toronto Machine Learning Summit, 2021. 2021

S. Vajjala, B. Majumder, A. Gupta and H. Surana. Practical Natural Language Processing: A Comprehensive Guide to Building Real-World NLP Systems. 2020
A. Bartoli and A. Fusiello. Computer Vision–ECCV 2020 Workshops. 2020
L. Werra, L. Tunstall, and T. Wolf Natural Language Processing with Transformers. 2022

Research paper cited nlpaug

Google: M. Raghu and E. Schmidt. A Survey of Deep Learning for Scientific Discovery. 2020
Sirius XM: E. Jing, K. Schneck, D. Egan and S. A. Waterman. Identifying Introductions in Podcast Episodes from Automatically Generated Transcripts. 2021
Salesforce Research: B. Newman, P. K. Choubey and N. Rajani. P-adapters: Robustly Extracting Factual Information from Language Modesl with Diverse Prompts. 2021
Salesforce Research: L. Xue, M. Gao, Z. Chen, C. Xiong and R. Xu. Robustness Evaluation of Transformer-based Form Field Extractors via Form Attacks. 2021

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4