AI Development Glossary
Comprehensive dictionary of AI, machine learning, and LLM terms. Clear definitions, examples, and links to detailed guides.
A
AI Agent
An autonomous software system that uses artificial intelligence to perform tasks, make decisions, and interact with environments or users without continuous human intervention. AI agents can be simple (single-task chatbots) or complex (multi-agent systems working together).
API (Application Programming Interface)
A set of protocols and tools that allows different software applications to communicate. In AI context, refers to cloud AI APIs like OpenAI API, Azure AI, AWS Bedrock that provide access to AI models via HTTP requests.
Attention Mechanism
A neural network component that allows models to focus on specific parts of input when generating output. Core technology behind Transformers and modern LLMs like GPT-4. Enables models to weigh importance of different input tokens.
AutoML (Automated Machine Learning)
Automated process of applying machine learning to real-world problems. AutoML tools automatically select algorithms, tune hyperparameters, and engineer features, making ML accessible to non-experts.
B
BERT (Bidirectional Encoder Representations from Transformers)
Google's pre-trained NLP model that understands context from both left and right sides of a word. Revolutionized natural language understanding tasks like question answering, sentiment analysis, and text classification. BioBERT and ClinicalBERT are domain-specific variants.
Bias (in AI)
Systematic errors in AI model predictions that favor certain outcomes over others, often reflecting biases in training data. Can lead to unfair treatment of demographic groups. Requires bias testing and mitigation strategies.
BM25
A ranking function used for information retrieval that scores documents based on query term frequency and inverse document frequency. Often combined with vector search in hybrid search systems for better RAG retrieval.
C
ChatGPT
OpenAI's conversational AI system based on GPT-3.5 and GPT-4 models. Trained using reinforcement learning from human feedback (RLHF) to follow instructions and provide helpful, harmless, and honest responses.
Claude
Anthropic's family of large language models (Claude 1, 2, 3) designed for safety and helpfulness. Known for long context windows (100K-200K tokens) and strong reasoning capabilities. Competes with GPT-4.
Computer Vision
AI field focused on enabling computers to interpret and understand visual information from images and videos. Applications include object detection, image classification, facial recognition, and medical imaging analysis.
CNN (Convolutional Neural Network)
Deep learning architecture designed for processing grid-like data such as images. Uses convolutional layers to automatically learn spatial hierarchies of features. Backbone of most computer vision applications.
Context Window
Maximum number of tokens (words/subwords) an LLM can process at once. GPT-4: 8K-32K tokens, Claude 2: 100K tokens, GPT-4 Turbo: 128K tokens. Larger context windows enable processing longer documents.
D
Data Augmentation
Technique to artificially expand training datasets by creating modified versions of existing data (rotation, cropping, noise addition). Improves model generalization and reduces overfitting.
Deep Learning
Subset of machine learning using multi-layer neural networks (deep networks) to learn hierarchical representations of data. Enables breakthroughs in computer vision, NLP, and speech recognition.
Diffusion Models
Generative AI models that create images by gradually removing noise from random data. Examples: Stable Diffusion, DALL-E 2, Midjourney. Used for text-to-image generation and image editing.
E
Embeddings
Dense vector representations of data (text, images, audio) in high-dimensional space where similar items are close together. Text embeddings from models like text-embedding-ada-002 enable semantic search and RAG systems.
Encoder-Decoder
Neural network architecture with two components: encoder processes input into representation, decoder generates output from that representation. Used in sequence-to-sequence tasks like translation.
Epoch
One complete pass through the entire training dataset. Model weights are updated after each epoch. Typical training uses 3-10 epochs; more epochs risk overfitting.
F
FAISS (Facebook AI Similarity Search)
Meta's library for efficient similarity search and clustering of dense vectors. Enables fast nearest neighbor search among millions/billions of vectors. Used in RAG systems and recommendation engines.
Few-Shot Learning
ML technique where models learn from very few examples (2-10). LLMs excel at few-shot learning - providing examples in the prompt enables task performance without fine-tuning.
Fine-tuning
Process of adapting a pre-trained model to a specific task by training on domain-specific data. More efficient than training from scratch. Techniques include full fine-tuning, LoRA, and QLoRA.
G
GAN (Generative Adversarial Network)
Two neural networks (generator and discriminator) competing against each other. Generator creates fake data, discriminator tries to detect it. Used for image generation, style transfer, and data augmentation.
GPT (Generative Pre-trained Transformer)
OpenAI's family of large language models (GPT-2, GPT-3, GPT-3.5, GPT-4). Autoregressive models trained on vast text data to generate human-like text. GPT-4 is the most capable as of 2025.
Gradient Descent
Optimization algorithm that iteratively adjusts model parameters to minimize loss function. Variations include SGD (stochastic), Adam, AdamW. Core training mechanism for neural networks.
H
Hallucination
When AI models generate false or nonsensical information presented as fact. Major challenge in LLMs. Mitigation strategies: RAG, fine-tuning on factual data, lower temperature, citation requirements.
Hugging Face
Leading platform for open-source NLP/AI models, datasets, and libraries. Hosts 200K+ models including BERT, GPT-2, Llama, Mistral. Transformers library is the standard for using pre-trained models.
Hyperparameter
Model configuration set before training (learning rate, batch size, number of layers). Unlike model parameters (weights), hyperparameters are not learned from data. Tuning them optimizes performance.
I
Inference
Using a trained model to make predictions on new data. In production, inference speed and cost are critical. Techniques: quantization, model pruning, batch inference, caching.
Instruction Tuning
Fine-tuning LLMs on instruction-following datasets (e.g., 'Summarize this text:', 'Translate to Spanish:'). Creates models better at following user instructions. Used for ChatGPT, Claude.
K
K-Nearest Neighbors (KNN)
Simple ML algorithm that classifies data points based on majority class of k nearest neighbors. Used in recommendation systems and as baseline for complex models.
Keras
High-level neural network API running on top of TensorFlow. Provides user-friendly interface for building and training deep learning models. Good for rapid prototyping.
L
LangChain
Framework for developing applications powered by language models. Provides chains, agents, memory, and tools for building RAG systems, chatbots, and AI agents. Supports multiple LLMs.
Latency
Time delay between input and output. Critical metric for production AI systems. Target: <100ms for most applications, <2s for LLM responses. Reduced via optimization, caching, edge deployment.
LLM (Large Language Model)
Neural network trained on massive text datasets (trillions of tokens) to understand and generate human language. Examples: GPT-4, Claude, Llama 2, Mistral. Size: 7B-175B+ parameters.
Llama (LLaMA)
Meta's family of open-source large language models (7B, 13B, 70B parameters). Llama 2 released July 2023 with commercial license. Strong alternative to proprietary models like GPT-4.
LoRA (Low-Rank Adaptation)
Parameter-efficient fine-tuning technique that trains small adapter matrices instead of full model weights. Reduces memory by 90%, enables fine-tuning large models on consumer GPUs.
Loss Function
Mathematical function measuring difference between predicted and actual outputs. Model training minimizes loss. Common losses: cross-entropy (classification), MSE (regression), triplet loss (embeddings).
LSTM (Long Short-Term Memory)
Type of recurrent neural network that can learn long-term dependencies. Addresses vanishing gradient problem of basic RNNs. Used for time series, text, and sequential data.
M
Machine Learning
AI approach where systems learn patterns from data without explicit programming. Three types: supervised (labeled data), unsupervised (unlabeled), reinforcement (reward-based).
Milvus
Open-source vector database built for billion-scale similarity search. Supports horizontal scaling, multiple index types (IVF, HNSW), and various distance metrics. Enterprise-grade alternative to Pinecone.
Mistral
French AI startup's family of open-source LLMs. Mistral 7B outperforms Llama 2 13B. Mixtral 8x7B uses mixture-of-experts for efficient inference. Apache 2.0 license allows commercial use.
MLOps (Machine Learning Operations)
Practices for deploying and maintaining ML models in production. Includes CI/CD for ML, model monitoring, drift detection, retraining pipelines, and experiment tracking.
Model Drift
Degradation of model performance over time as real-world data distribution changes. Requires monitoring and periodic retraining. Types: concept drift, data drift, upstream drift.
N
NLP (Natural Language Processing)
AI field focused on interaction between computers and human language. Tasks: sentiment analysis, named entity recognition, machine translation, question answering, text generation.
Neural Network
Computing system inspired by biological neural networks. Consists of interconnected nodes (neurons) organized in layers. Learns by adjusting connection weights during training.
O
OpenAI
AI research company that created GPT models, DALL-E, Whisper, and ChatGPT. Provides API access to GPT-4, embeddings, and other models. Leader in large language model development.
Overfitting
When model learns training data too well, including noise and outliers, hurting performance on new data. Prevented via regularization, dropout, early stopping, more training data.
P
PEFT (Parameter-Efficient Fine-Tuning)
Family of techniques for fine-tuning large models by updating only a small subset of parameters. Includes LoRA, QLoRA, prefix tuning, adapter layers. Reduces memory and compute costs.
Pinecone
Managed vector database service for similarity search at scale. Fully managed, auto-scaling, <50ms latency. Best for fast prototyping. Pricing: $70-200/month for 10M vectors.
Prompt Engineering
Art and science of crafting effective prompts to get desired outputs from LLMs. Techniques: few-shot examples, chain-of-thought, role prompting, system messages, temperature tuning.
PyTorch
Meta's open-source deep learning framework. Dynamic computational graphs, Pythonic API, strong research community. Preferred for research and increasingly for production.
Q
QLoRA (Quantized LoRA)
Combines LoRA with 4-bit quantization for extreme memory efficiency. Fine-tune 65B parameter models on single 24GB GPU. 95% less memory than full fine-tuning with minimal performance loss.
Quantization
Reducing model precision from 32-bit floating point to 8-bit or 4-bit integers. Reduces memory by 75-94% and speeds inference 2-4x with minimal accuracy loss.
Qdrant
Open-source vector database written in Rust. Fast performance, rich filtering, supports quantization. Good balance of features and cost. Self-host or use managed cloud.
R
RAG (Retrieval Augmented Generation)
Technique combining information retrieval with LLM generation. Retrieves relevant documents from knowledge base, injects into prompt, then generates response. Reduces hallucinations, enables up-to-date answers.
Reinforcement Learning
ML paradigm where agents learn optimal actions through trial-and-error interactions with environment. Used for robotics, game playing (AlphaGo), recommendation systems, and RLHF for LLMs.
RLHF (Reinforcement Learning from Human Feedback)
Training technique where humans rank model outputs, and model learns to generate outputs humans prefer. Used to align LLMs with human values. Core technology behind ChatGPT and Claude.
RNN (Recurrent Neural Network)
Neural network type designed for sequential data (text, time series, audio). Processes inputs sequentially, maintaining hidden state. Largely superseded by Transformers for NLP.
S
Semantic Search
Search by meaning rather than exact keyword matching. Uses embeddings to find documents semantically similar to query. Core technology in RAG systems and modern search engines.
SHAP (SHapley Additive exPlanations)
Method for explaining ML model predictions by assigning importance value to each feature. Provides consistent and accurate feature attributions. Widely used for model interpretability.
Stable Diffusion
Open-source text-to-image diffusion model. Can generate, edit, and manipulate images from text prompts. Runs on consumer GPUs. Alternatives: DALL-E, Midjourney.
Supervised Learning
ML approach using labeled training data (input-output pairs). Model learns mapping from inputs to outputs. Used for classification and regression tasks.
Synthetic Data
Artificially generated data that mimics real data. Used when real data is scarce, expensive, or privacy-sensitive. Generated via GANs, simulation, or LLMs.
T
Temperature
LLM parameter controlling output randomness. Low (0-0.3): focused, deterministic. Medium (0.7-0.9): balanced. High (1.0+): creative, diverse. Adjust based on use case.
TensorFlow
Google's open-source ML framework. Production-focused with TensorFlow Serving, TFX, and TensorFlow Lite for mobile/edge. Strong ecosystem and enterprise adoption.
Token
Smallest unit of text LLMs process. Roughly 0.75 words (1 token ≈ 4 characters). GPT-4 pricing based on tokens. Context limits measured in tokens (8K, 32K, 128K).
Transformer
Neural network architecture using self-attention mechanisms. Revolutionized NLP. Foundation of all modern LLMs (GPT, BERT, T5). Introduced in 'Attention is All You Need' (2017).
Transfer Learning
Using knowledge learned from one task to improve performance on related task. Pre-training on large datasets then fine-tuning on specific task. Core approach in modern deep learning.
U
Unsupervised Learning
ML approach using unlabeled data to discover patterns. Tasks: clustering, dimensionality reduction, anomaly detection. Examples: K-means, PCA, autoencoders.
V
Vector Database
Specialized database for storing and querying high-dimensional vectors (embeddings). Enables fast similarity search via approximate nearest neighbor algorithms. Examples: Pinecone, Weaviate, Qdrant, Milvus.
Vision Transformer (ViT)
Transformer architecture adapted for computer vision. Treats image patches as tokens. Outperforms CNNs on many tasks. Examples: ViT, CLIP, DINO.
W
Weaviate
Open-source vector database with GraphQL API, built-in vectorization, and hybrid search. Supports multi-tenancy and complex filtering. Good for on-premise deployments.
Whisper
OpenAI's automatic speech recognition (ASR) model. Transcribes and translates audio to text with high accuracy. Supports 99 languages. Open-source and API available.
X
XGBoost
Gradient boosting library optimized for speed and performance. Dominates structured/tabular data ML competitions. Used for classification, regression, ranking tasks.
Z
Zero-Shot Learning
Model performing tasks without specific training examples. LLMs can do zero-shot classification, translation, etc. by understanding instructions alone. Less accurate than few-shot but more flexible.
Need Help with AI Development?
Get a free consultation from our AI experts. We'll help you choose the right technologies and build production-ready AI solutions.
Schedule Free Consultation →