APPLIES TO: Azure Database for PostgreSQL - Flexible Server
Generative AI refers to a class of AI algorithms that can learn from existing multimedia content and produce new content. The produced content can be customized through techniques such as prompts and fine-tuning. Generative AI algorithms apply specific machine learning models:
Generative AI is used in image and music synthesis and in healthcare, along with common tasks such as text autocompletion, text summarization, and translation. Generative AI techniques enable features on data such as clustering and segmentation, semantic search and recommendations, topic modeling, question answering, and anomaly detection.
The following video demonstrates the use of generative AI with Azure Database for PostgreSQL and the pgvector
extension, which can help you understand the concepts in this article.
OpenAI is a research organization and technology company known for its pioneering work in the field of AI and machine learning. Its mission is to ensure that artificial general intelligence (AGI), which refers to highly autonomous AI systems that can outperform humans in most economically valuable work, benefits all of humanity. OpenAI brought to market state-of-the-art generative models such as GPT-3, GPT-3.5, and GPT-4.
Azure OpenAI is a Microsoft service offering to help build generative AI applications by using Azure. Azure OpenAI gives customers advanced language AI with OpenAI GPT-4, GPT-3, Codex, DALL-E, and Whisper models, with the security and enterprise capabilities of Azure. Azure OpenAI codevelops the APIs with OpenAI to ensure compatibility and a smooth transition from one to the other.
With Azure OpenAI, customers get the security capabilities of Microsoft Azure while running the same models as OpenAI. Azure OpenAI offers private networking, regional availability, and responsible AI content filtering.
Learn more about Azure OpenAI.
Large language modelA large language model (LLM) is a type of AI model that's trained on massive amounts of text data to understand and generate humanlike language. LLMs are typically based on deep learning architectures, such as transformers. They're known for their ability to perform a wide range of natural language understanding and generation tasks. The Azure OpenAI service and OpenAI's ChatGPT are examples of LLM offerings.
Key characteristics and capabilities of LLMs include:
GPT stands for Generative Pretrained Transformer, and it refers to a series of large language models that OpenAI developed. The GPT models are neural networks that are pretrained on vast amounts of data from the internet, so they're capable of understanding and generating humanlike text.
Here's an overview of the major GPT models and their key characteristics:
GPT-3: Released in June 2020 and a well-known model in the GPT series. It has 175 billion parameters, which makes it one of the largest and most powerful language models in existence.
GPT-3 achieved remarkable performance on a wide range of natural language understanding and generation tasks. It can perform tasks like text completion, translation, and question answering with human-level fluency.
GPT-3 is divided into various model sizes, ranging from the smallest (125 million parameters) to the largest (175 billion parameters).
GPT-4: The latest GPT model from OpenAI. It has 1.76 trillion parameters.
A vector is a mathematical concept that's used in linear algebra and geometry to represent quantities that have both magnitude and direction. In the context of machine learning, vectors are often used to represent data points or features.
Key attributes and operations of vectors include:
{x1, x2⦠xn}
).A vector database, also known as a vector database management system (DBMS), is a type of database system that's designed to store, manage, and query vector data efficiently. Traditional relational databases primarily handle structured data in tables, whereas vector databases are optimized for the storage and retrieval of multidimensional data points represented as vectors. These databases are useful for applications where operations such as similarity searches, geospatial data, recommendation systems, and clustering are involved.
Key characteristics of vector databases include:
PostgreSQL can gain the capabilities of a vector database with the help of the pgvector
extension.
Embeddings are a concept in machine learning and natural language processing that involves representing objects (such as words, documents, or entities) as vectors in a multidimensional space.
These vectors are often dense. That is, they have a high number of dimensions. They're learned through various techniques, including neural networks. Embeddings aim to capture semantic relationships and similarities between objects in a continuous vector space.
Common types of embeddings include:
Word2Vec
and GloVe
are popular word-embedding techniques.Doc2Vec
is popular for creating document embeddings.Embeddings are central to representing complex, high-dimensional data in a form that machine learning models can easily process. They can be trained on large datasets and then used as features for various tasks. LLMs use them.
PostgreSQL can gain the capabilities of generating vector embeddings with Azure AI extension OpenAI integration.
ScenariosGenerative AI has a wide range of applications across various domains and industries, including technology, healthcare, entertainment, finance, manufacturing, and more. Here are some common tasks that people can accomplish by using generative AI:
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4