AI

AI & Innovation

12 min read

The Art & Science of Prompt Engineering

Prompt engineering is the highest-leverage skill in AI development. Well-crafted prompts improve output quality by 50-80%, reduce token costs by 30-50%, and often eliminate the need for fine-tuning entirely.

Core Techniques

1. Zero-Shot Prompting

Direct instruction without examples:

Classify the sentiment of this review as positive, negative, or neutral:
"The product arrived late but quality is excellent."

When to use: Simple tasks, quick prototyping. Accuracy: 60-75%.

2. Few-Shot Learning

Provide 2-5 examples of desired input-output pairs:

Extract key information from customer emails:

Email: "Hi, I need to return order #12345. It's too small."
Output: {"order_id": "12345", "intent": "return", "reason": "size"}

Email: "When will order #67890 ship?"
Output: {"order_id": "67890", "intent": "shipping_inquiry", "reason": null}

Email: "I want to cancel my subscription immediately."
Output:

Results: 75-90% accuracy with 3-5 examples. Optimal for most tasks.

3. Chain-of-Thought (CoT)

Ask model to explain reasoning step-by-step:

Problem: A store sold 48 apples. 1/3 were green, rest were red. How many red apples?

Solve this step-by-step:
1. Calculate number of green apples
2. Calculate number of red apples
3. Provide final answer

Results: 30-50% improvement on complex reasoning tasks.

4. Role Prompting

Assign specific expertise to the model:

You are an expert Python developer with 15 years of experience in data engineering.
Review this code for bugs and suggest improvements:

[code here]

Impact: 20-40% improvement in specialized tasks.

5. System Messages (ChatGPT/Claude)

System: You are a helpful customer service agent for TechCorp. Be concise,
professional, and always provide order numbers when discussing purchases.
Never make promises about refunds without manager approval.

User: I want my money back!

Use case: Set consistent behavior and constraints.

6. Output Formatting

Request specific output format:

Extract company information and return as JSON:

Text: "Apple Inc., founded in 1976, is headquartered in Cupertino, CA."

Output format:
{
  "company": "company name",
  "founded": "year",
  "headquarters": "location"
}

Advanced Techniques

ReAct (Reasoning + Acting)

Combine reasoning with tool use:

You have access to: [search_web, calculate, get_weather]

Question: What's the total GDP of countries with population > 100M?

Thought: I need to find countries with population > 100M
Action: search_web("countries population over 100 million")
Observation: [list of countries]
Thought: Now I need their GDPs
Action: search_web("GDP of [countries]")
...

Self-Consistency

Generate multiple reasoning paths, choose most common answer:

Run same prompt 5-10 times with temperature 0.7
Use majority voting on final answers
20-30% accuracy improvement on complex reasoning

Parameter Tuning

Temperature

0.0-0.3: Deterministic, factual (data extraction, classification)
0.7-0.9: Balanced creativity (content generation, conversations)
1.0+: Highly creative (brainstorming, creative writing)

Top-p (Nucleus Sampling)

0.1: Very focused (use with low temperature)
0.9: Balanced (default for most tasks)
1.0: Maximum diversity

Max Tokens

Set generous limits for reasoning tasks (1000-2000 tokens)
Constrain for simple tasks (100-300 tokens)
Monitor cost vs quality tradeoff

Model-Specific Tips

GPT-4

Excels at complex reasoning and coding
Use system messages for consistent behavior
32K context: leverage long conversations and documents
Best for: code generation, analysis, complex reasoning

Claude 2/3

100K-200K context: process entire books
Strong at following detailed instructions
More conservative, less prone to hallucination
Best for: document analysis, content moderation, safe responses

GPT-3.5 Turbo

10x cheaper than GPT-4
Use for simple tasks: classification, simple Q&A, summarization
Requires more explicit prompts than GPT-4

Common Pitfalls

Vague Instructions: Be specific about format, length, tone
No Examples: Add 2-3 examples for 30-50% accuracy boost
Wrong Temperature: Low for factual, high for creative
Ignoring Context Window: Monitor token usage, summarize long conversations
Not Testing Variations: A/B test prompts, measure quality

Testing & Optimization

Create Test Set: 50-100 representative examples
Iterate Prompts: Test variations systematically
Measure Quality: Accuracy, completeness, format compliance
Monitor Costs: Track tokens per request, optimize length
Version Control: Track prompt changes and performance

Real-World Examples

Customer Email Classification

Technique: Few-shot (3 examples) + output formatting (JSON)
Model: GPT-3.5 Turbo
Result: 92% accuracy, 0.5s latency, ₹0.02 per classification

Code Review Agent

Technique: Role prompting + chain-of-thought + system message
Model: GPT-4
Result: 87% bug detection rate, comparable to human reviewers

Legal Contract Analysis

Technique: Claude 2 (100K context) + structured output
Result: Process 50-page contracts in 30 seconds vs 2 hours human time

Need help optimizing your LLM prompts? Get a free prompt audit and optimization recommendations.

Get Free Prompt Audit →

Alex Chen

LLM Engineer specializing in prompt optimization, 5+ years building production LLM systems.