Buy Crypto Markets Spot FuturesGOLD Earn Event Center

Photo by Growtika on Unsplash In today’s fast-moving world of AI and large language models (LLMs), I’ve learned that one of the most valuable skills is not just understanding what these models can do but knowing how to guide them effectively. As I’ve spent time building applications, conducting research, and experimenting with different prompts, I’ve realized that real progress comes from learning how to control the generation process. In this blog, I want to share seven generation control techniques that have made a real difference in how I work with AI and that every practitioner, researcher, or enthusiast can benefit from. Temperature Top-p/Top-k Sampling Prompt Engineering Techniques Few-shot Learning In-context Learning Chain-of-Thought Prompting Hallucination Prevention 1. Temperature Understanding Temperature Temperature is perhaps the most fundamental parameter for controlling AI generation. It controls the randomness of the model’s output by scaling the probability distribution over possible tokens. How Temperature Works Behind the scenes, language models output logits unnormalized log probabilities for each possible next token. p_i = exp(z_i / T) / Σ_j exp(z_j / T) Where: z_i is the logit for token i T is the temperature parameter p_i is the final probability of selecting token i What’s Really Happening? Think of temperature as a “confidence dial”: Low Temperature (T < 1): Sharpens the distribution, making high-probability tokens even more dominant Temperature = 1: Uses the model’s natural probability distribution High Temperature (T > 1): Flattens the distribution, giving more chance to unlikely tokens Temperature → 0: Becomes deterministic (always picks the most likely token) Temperature → ∞: Approaches uniform randomness The Sampling Algorithm Here’s what happens under the hood: import numpy as npdef temperature_sample(logits, temperature=1.0): # Step 1: Scale logits by temperature scaled_logits = logits / temperature # Step 2: Apply softmax (with numerical stability) exp_logits = np.exp(scaled_logits - np.max(scaled_logits)) probs = exp_logits / np.sum(exp_logits) # Step 3: Sample from the distribution next_token = np.random.choice(len(probs), p=probs) return next_token The numerical stability trick (subtracting max before exp) prevents overflow when dealing with large logit values.Technical implementation of how temperature controls randomness in language model token selection Practical Examples 1) Low Temperature (0.1–0.3) Perfect for tasks requiring consistency and precision: # Example with low temperatureresponse = openai.ChatCompletion.create( model="gpt-4o", messages=[{"role": "user", "content": "What is the capital of France?"}], temperature=0.1)# Output: "The capital of France is Paris." Use cases: Factual question answering Code generation Mathematical calculations Data extraction Classification tasks The model becomes highly deterministic, consistently choosing the most probable tokens. 2) High Temperature (0.7–1.0+) Unleashes creativity and diverse outputs: # Example with high temperatureresponse = openai.ChatCompletion.create( model="gpt-4o", messages=[{"role": "user", "content": "Describe a sunset"}], temperature=0.9)# Output might vary each time:# "The crimson orb melted into the horizon..."# "Golden light spilled across the darkening sky..."# "Fire painted the clouds as day surrendered to night..." Use cases: Creative writing Brainstorming sessions Poetry and artistic content Marketing copy variations Story generation Each run produces notably different outputs as the model explores less probable but potentially more interesting token choices. 2. Top-k and Top-p Sampling Overview While temperature scales the entire probability distribution, top-p and top-k are truncation methods that eliminate low-probability tokens before sampling. They provide different ways to control output quality and diversity. Top-k Sampling Top-k sampling keeps only the k most probable tokens and redistributes their probability mass. How it works? Get probability distribution: P = softmax(logits / temperature) Sort tokens by probability: P_sorted Keep only top-k tokens, set others to 0 Renormalize: P’_i = P_i / Σ(top-k probabilities) Sample from P’ import torchimport torch.nn.functional as Fdef top_k_sampling(logits, k=50, temperature=1.0): """ Top-k sampling implementation Args: logits: [vocab_size] tensor of unnormalized scores k: number of top tokens to keep temperature: temperature scaling factor Returns: sampled token index """ # Step 1: Apply temperature logits = logits / temperature # Step 2: Get top-k logits and their indices top_k_logits, top_k_indices = torch.topk(logits, k) # Step 3: Apply softmax to top-k logits only top_k_probs = F.softmax(top_k_logits, dim=-1) # Step 4: Sample from top-k distribution sampled_index = torch.multinomial(top_k_probs, num_samples=1) # Step 5: Map back to original vocabulary index token = top_k_indices[sampled_index] return token Example Let’s say we have vocabulary of 8 tokens: tokens = ['the', 'a', 'is', 'very', 'quite', 'extremely', 'somewhat', 'rather']logits = [5.0, 4.5, 3.2, 2.8, 1.5, 0.8, 0.3, -0.5] # After softmax (temperature = 1.0)probs = [0.426, 0.259, 0.070, 0.047, 0.013, 0.006, 0.004, 0.002] With top-k = 3: # Step 1: Select top-3 tokenstop_k_tokens = ['the', 'a', 'is']top_k_probs = [0.426, 0.259, 0.070] Top-p (Nucleus Sampling) Top-p (also called nucleus sampling) keeps the smallest set of tokens whose cumulative probability ≥ p. How it works? Get probability distribution: P = softmax(logits / temperature) Sort tokens by probability (descending) Calculate cumulative sum: CDF_i = Σ P_j for j ≤ i Find nucleus: N = {tokens where CDF ≤ p} Renormalize and sample from N def top_p_sampling(logits, p=0.9, temperature=1.0): """ Top-p (nucleus) sampling implementation Args: logits: [vocab_size] tensor of unnormalized scores p: cumulative probability threshold (0 < p ≤ 1) temperature: temperature scaling factor Returns: sampled token index """ # Step 1: Apply temperature and softmax logits = logits / temperature probs = F.softmax(logits, dim=-1) # Step 2: Sort probabilities in descending order sorted_probs, sorted_indices = torch.sort(probs, descending=True) # Step 3: Calculate cumulative probabilities cumsum_probs = torch.cumsum(sorted_probs, dim=-1) # Step 4: Find the nucleus (tokens to keep) # Remove tokens where cumsum > p (keep first token that exceeds p) sorted_indices_to_remove = cumsum_probs > p # Shift right to keep the first token that exceeds p sorted_indices_to_remove[1:] = sorted_indices_to_remove[:-1].clone() sorted_indices_to_remove[0] = False # Step 5: Set removed token probabilities to 0 sorted_probs[sorted_indices_to_remove] = 0.0 # Step 6: Renormalize sorted_probs = sorted_probs / sorted_probs.sum() # Step 7: Sample from the nucleus sampled_sorted_index = torch.multinomial(sorted_probs, num_samples=1) # Step 8: Map back to original vocabulary token = sorted_indices[sampled_sorted_index] return token Same Example tokens = ['the', 'a', 'is', 'very', 'quite', 'extremely', 'somewhat', 'rather']probs = [0.426, 0.259, 0.070, 0.047, 0.013, 0.006, 0.004, 0.002] With top-p = 0.9: # Step 1: Sort by probability (already sorted)# Step 2: Calculate cumulative sumcumulative = [0.426, 0.685, 0.755, 0.802, 0.815, 0.821, 0.825, 0.827] With top-p = 0.75: # cumulative[2] = 0.755 > 0.75 ← Stop here!# Nucleus = ['the', 'a', 'is'] Visual Comparison Top-k = 4 (Fixed):███████████████ the (40%) ← Keep██████████ a (25%) ← Keep████ is (10%) ← Keep███ very (8%) ← Keep-- (7%) ← Discard (not in top-4)-- (5%) ← Discard-- (3%) ← Discard-- (2%) ← Discard Top-p = 0.9 (Adaptive):███████████████ the (40%) ← Keep██████████ a (25%) ← Keep████ is (10%) ← Keep███ very (8%) ← Keep-- (7%) ← Keep (cumsum still < 90%)-- (5%) ← Discard (cumsum > 90%)-- (3%) ← Discard-- (2%) ← Discard 3. Prompt Engineering Techniques Effective prompts are the foundation of controlled generation. The way you structure your prompts directly impacts the quality and relevance of outputs. Clear Instructions Bad: "Tell me about dogs"Good: "Write a 200-word informative paragraph about dog training techniques for puppies, focusing on positive reinforcement methods." Role-Based Prompting Prompt: "You are an expert data scientist with 10 years of experience.Explain gradient descent in simple terms for a beginner." Format Specification Prompt: "List the top 5 programming languages for beginners.Format your response as:1. [Language]: [Brief description]2. [Language]: [Brief description]..." Constraint Setting Prompt: "Write a product review for a smartphone. Requirements:- Exactly 150 words- Include both pros and cons- Mention battery life, camera, and performance- Use a neutral tone" 4. Few-shot Learning Few-shot learning involves providing examples within your prompt to guide the model’s behavior. This technique is incredibly powerful for establishing patterns and desired output formats. Example: Sentiment Classification Prompt: "Classify the sentiment of these reviews: Review: 'This product exceeded my expectations!'Sentiment: Positive Review: 'Terrible quality, waste of money.'Sentiment: Negative Review: 'It's okay, nothing special.'Sentiment: Neutral Review: 'I love this new feature update!'Sentiment: ?" Example: Code Generation Prompt: "Convert natural language to Python functions: Input: 'Create a function that adds two numbers'Output:def add_numbers(a, b): return a + b Input: 'Create a function that finds the maximum in a list'Output:def find_maximum(numbers): return max(numbers) Input: 'Create a function that reverses a string'Output: ?" Benefits of Few-shot Learning: Establishes clear patterns Reduces ambiguity Improves consistency across outputs Minimizes need for fine-tuning 5. In-context Learning In-context learning leverages the model’s ability to understand and apply new information provided within the conversation context, without updating the model’s parameters. Dynamic Adaptation Example Prompt: "I'm working with a specific dataset format:{ 'customer_id': 12345, 'purchase_date': '2024-01-15', 'items': ['laptop', 'mouse'], 'total': 899.99} Based on this format, generate 3 sample customer records for an electronics store." Context-Aware Responses Conversation Context:User: "I'm building a React application for a food delivery service."AI: "Great! What specific functionality are you looking to implement?" User: "I need help with the cart component."AI: [Provides React-specific cart component code tailored to food delivery] Best Practices for In-context Learning: Provide clear, relevant context early in the conversation Reference previous context when building on discussions Use specific examples from your domain Maintain consistency with established patterns 6. Chain-of-Thought Prompting Chain-of-Thought (CoT) prompting encourages the model to show its reasoning process, leading to more accurate and explainable outputs. Basic Chain-of-Thought Prompt: "Solve this step by step:A store has 24 apples. They sell 8 apples in the morning and 6 apples in the afternoon. How many apples are left? Let me work through this step by step:1) Starting apples: 242) Sold in morning: 83) Sold in afternoon: 64) Total sold: 8 + 6 = 145) Remaining: 24 - 14 = 10 Therefore, 10 apples are left." Zero-Shot Chain-of-Thought Prompt: "A company's revenue increased by 20% in Q1 and decreased by 10% in Q2. If they started with $100,000, what's their revenue at the end of Q2? Let's think step by step." Complex Reasoning Example Prompt: "Analyze whether this business model is sustainable: Business: Subscription-based meal delivery service- Monthly fee: $50- Food cost per meal: $8- Delivery cost per meal: $3- 20 meals per month per subscriber Let's break this down step by step:" When to Use Chain-of-Thought: Mathematical calculations Logic problems Decision-making scenarios Complex analysis tasks 7. Hallucination Prevention Hallucinations when AI models generate false or nonsensical information are a significant challenge. Here are strategies to minimize them: Grounding Techniques Prompt: "Based ONLY on the following text, answer the question: Text: [Insert specific source material] Question: [Your question] If the answer cannot be found in the provided text, respond with 'Information not available in the source.'" Confidence Indicators Prompt: "Answer the following question and indicate your confidence level (High/Medium/Low): Question: What is the population of Tokyo in 2024?Answer: [Response]Confidence: [Level]Reasoning: [Why this confidence level]" Fact-Checking Prompts Prompt: "Claim: 'Python was created in 1995 by Guido van Rossum' Please verify this claim step by step:1. Check the creation year2. Verify the creator3. Provide the correct information if any part is wrong4. Rate the accuracy: Correct/Partially Correct/Incorrect" Source Citation Requirements Prompt: "Write a summary about renewable energy trends. For each major claim, indicate what type of source would be needed to verify it (e.g., 'government report', 'academic study', 'industry survey')." Hallucination Prevention Best Practices: Request sources and citations Use specific, factual prompts Ask for confidence levels Provide authoritative source material when possible (You can use RAG also😃) Combining Techniques for Maximum Control The real power comes from combining these techniques strategically: Example: Research Assistant Prompt: "You are a research assistant helping with academic writing.Temperature: 0.3 (for accuracy) Task: Summarize the key findings about machine learning bias from the following paper excerpt.Follow this format: 1. Main Finding: [One sentence]2. Supporting Evidence: [Key statistics or examples]3. Implications: [What this means for practitioners]4. Confidence: [High/Medium/Low based on source quality] Paper Excerpt: [Insert text] Think through this step by step, and only include information directly supported by the text." Conclusion Mastering generation control is essential for anyone working with AI models. By understanding and applying these six techniques temperature and top-p sampling, prompt engineering, few-shot learning, in-context learning, chain-of-thought prompting, and hallucination prevention you can dramatically improve the quality, reliability, and usefulness of AI-generated content. Thank you for reading!🤗I hope that you found this article both informative and enjoyable to read. (Comment if you build any async Agent application lately love to hear that🙂) Fore more information like this follow me on LinkedIn Generation Control: Mastering AI Output for Better Results was originally published in Coinmonks on Medium, where people are continuing the conversation by highlighting and responding to this storyPhoto by Growtika on Unsplash In today’s fast-moving world of AI and large language models (LLMs), I’ve learned that one of the most valuable skills is not just understanding what these models can do but knowing how to guide them effectively. As I’ve spent time building applications, conducting research, and experimenting with different prompts, I’ve realized that real progress comes from learning how to control the generation process. In this blog, I want to share seven generation control techniques that have made a real difference in how I work with AI and that every practitioner, researcher, or enthusiast can benefit from. Temperature Top-p/Top-k Sampling Prompt Engineering Techniques Few-shot Learning In-context Learning Chain-of-Thought Prompting Hallucination Prevention 1. Temperature Understanding Temperature Temperature is perhaps the most fundamental parameter for controlling AI generation. It controls the randomness of the model’s output by scaling the probability distribution over possible tokens. How Temperature Works Behind the scenes, language models output logits unnormalized log probabilities for each possible next token. p_i = exp(z_i / T) / Σ_j exp(z_j / T) Where: z_i is the logit for token i T is the temperature parameter p_i is the final probability of selecting token i What’s Really Happening? Think of temperature as a “confidence dial”: Low Temperature (T < 1): Sharpens the distribution, making high-probability tokens even more dominant Temperature = 1: Uses the model’s natural probability distribution High Temperature (T > 1): Flattens the distribution, giving more chance to unlikely tokens Temperature → 0: Becomes deterministic (always picks the most likely token) Temperature → ∞: Approaches uniform randomness The Sampling Algorithm Here’s what happens under the hood: import numpy as npdef temperature_sample(logits, temperature=1.0): # Step 1: Scale logits by temperature scaled_logits = logits / temperature # Step 2: Apply softmax (with numerical stability) exp_logits = np.exp(scaled_logits - np.max(scaled_logits)) probs = exp_logits / np.sum(exp_logits) # Step 3: Sample from the distribution next_token = np.random.choice(len(probs), p=probs) return next_token The numerical stability trick (subtracting max before exp) prevents overflow when dealing with large logit values.Technical implementation of how temperature controls randomness in language model token selection Practical Examples 1) Low Temperature (0.1–0.3) Perfect for tasks requiring consistency and precision: # Example with low temperatureresponse = openai.ChatCompletion.create( model="gpt-4o", messages=[{"role": "user", "content": "What is the capital of France?"}], temperature=0.1)# Output: "The capital of France is Paris." Use cases: Factual question answering Code generation Mathematical calculations Data extraction Classification tasks The model becomes highly deterministic, consistently choosing the most probable tokens. 2) High Temperature (0.7–1.0+) Unleashes creativity and diverse outputs: # Example with high temperatureresponse = openai.ChatCompletion.create( model="gpt-4o", messages=[{"role": "user", "content": "Describe a sunset"}], temperature=0.9)# Output might vary each time:# "The crimson orb melted into the horizon..."# "Golden light spilled across the darkening sky..."# "Fire painted the clouds as day surrendered to night..." Use cases: Creative writing Brainstorming sessions Poetry and artistic content Marketing copy variations Story generation Each run produces notably different outputs as the model explores less probable but potentially more interesting token choices. 2. Top-k and Top-p Sampling Overview While temperature scales the entire probability distribution, top-p and top-k are truncation methods that eliminate low-probability tokens before sampling. They provide different ways to control output quality and diversity. Top-k Sampling Top-k sampling keeps only the k most probable tokens and redistributes their probability mass. How it works? Get probability distribution: P = softmax(logits / temperature) Sort tokens by probability: P_sorted Keep only top-k tokens, set others to 0 Renormalize: P’_i = P_i / Σ(top-k probabilities) Sample from P’ import torchimport torch.nn.functional as Fdef top_k_sampling(logits, k=50, temperature=1.0): """ Top-k sampling implementation Args: logits: [vocab_size] tensor of unnormalized scores k: number of top tokens to keep temperature: temperature scaling factor Returns: sampled token index """ # Step 1: Apply temperature logits = logits / temperature # Step 2: Get top-k logits and their indices top_k_logits, top_k_indices = torch.topk(logits, k) # Step 3: Apply softmax to top-k logits only top_k_probs = F.softmax(top_k_logits, dim=-1) # Step 4: Sample from top-k distribution sampled_index = torch.multinomial(top_k_probs, num_samples=1) # Step 5: Map back to original vocabulary index token = top_k_indices[sampled_index] return token Example Let’s say we have vocabulary of 8 tokens: tokens = ['the', 'a', 'is', 'very', 'quite', 'extremely', 'somewhat', 'rather']logits = [5.0, 4.5, 3.2, 2.8, 1.5, 0.8, 0.3, -0.5] # After softmax (temperature = 1.0)probs = [0.426, 0.259, 0.070, 0.047, 0.013, 0.006, 0.004, 0.002] With top-k = 3: # Step 1: Select top-3 tokenstop_k_tokens = ['the', 'a', 'is']top_k_probs = [0.426, 0.259, 0.070] Top-p (Nucleus Sampling) Top-p (also called nucleus sampling) keeps the smallest set of tokens whose cumulative probability ≥ p. How it works? Get probability distribution: P = softmax(logits / temperature) Sort tokens by probability (descending) Calculate cumulative sum: CDF_i = Σ P_j for j ≤ i Find nucleus: N = {tokens where CDF ≤ p} Renormalize and sample from N def top_p_sampling(logits, p=0.9, temperature=1.0): """ Top-p (nucleus) sampling implementation Args: logits: [vocab_size] tensor of unnormalized scores p: cumulative probability threshold (0 < p ≤ 1) temperature: temperature scaling factor Returns: sampled token index """ # Step 1: Apply temperature and softmax logits = logits / temperature probs = F.softmax(logits, dim=-1) # Step 2: Sort probabilities in descending order sorted_probs, sorted_indices = torch.sort(probs, descending=True) # Step 3: Calculate cumulative probabilities cumsum_probs = torch.cumsum(sorted_probs, dim=-1) # Step 4: Find the nucleus (tokens to keep) # Remove tokens where cumsum > p (keep first token that exceeds p) sorted_indices_to_remove = cumsum_probs > p # Shift right to keep the first token that exceeds p sorted_indices_to_remove[1:] = sorted_indices_to_remove[:-1].clone() sorted_indices_to_remove[0] = False # Step 5: Set removed token probabilities to 0 sorted_probs[sorted_indices_to_remove] = 0.0 # Step 6: Renormalize sorted_probs = sorted_probs / sorted_probs.sum() # Step 7: Sample from the nucleus sampled_sorted_index = torch.multinomial(sorted_probs, num_samples=1) # Step 8: Map back to original vocabulary token = sorted_indices[sampled_sorted_index] return token Same Example tokens = ['the', 'a', 'is', 'very', 'quite', 'extremely', 'somewhat', 'rather']probs = [0.426, 0.259, 0.070, 0.047, 0.013, 0.006, 0.004, 0.002] With top-p = 0.9: # Step 1: Sort by probability (already sorted)# Step 2: Calculate cumulative sumcumulative = [0.426, 0.685, 0.755, 0.802, 0.815, 0.821, 0.825, 0.827] With top-p = 0.75: # cumulative[2] = 0.755 > 0.75 ← Stop here!# Nucleus = ['the', 'a', 'is'] Visual Comparison Top-k = 4 (Fixed):███████████████ the (40%) ← Keep██████████ a (25%) ← Keep████ is (10%) ← Keep███ very (8%) ← Keep-- (7%) ← Discard (not in top-4)-- (5%) ← Discard-- (3%) ← Discard-- (2%) ← Discard Top-p = 0.9 (Adaptive):███████████████ the (40%) ← Keep██████████ a (25%) ← Keep████ is (10%) ← Keep███ very (8%) ← Keep-- (7%) ← Keep (cumsum still < 90%)-- (5%) ← Discard (cumsum > 90%)-- (3%) ← Discard-- (2%) ← Discard 3. Prompt Engineering Techniques Effective prompts are the foundation of controlled generation. The way you structure your prompts directly impacts the quality and relevance of outputs. Clear Instructions Bad: "Tell me about dogs"Good: "Write a 200-word informative paragraph about dog training techniques for puppies, focusing on positive reinforcement methods." Role-Based Prompting Prompt: "You are an expert data scientist with 10 years of experience.Explain gradient descent in simple terms for a beginner." Format Specification Prompt: "List the top 5 programming languages for beginners.Format your response as:1. [Language]: [Brief description]2. [Language]: [Brief description]..." Constraint Setting Prompt: "Write a product review for a smartphone. Requirements:- Exactly 150 words- Include both pros and cons- Mention battery life, camera, and performance- Use a neutral tone" 4. Few-shot Learning Few-shot learning involves providing examples within your prompt to guide the model’s behavior. This technique is incredibly powerful for establishing patterns and desired output formats. Example: Sentiment Classification Prompt: "Classify the sentiment of these reviews: Review: 'This product exceeded my expectations!'Sentiment: Positive Review: 'Terrible quality, waste of money.'Sentiment: Negative Review: 'It's okay, nothing special.'Sentiment: Neutral Review: 'I love this new feature update!'Sentiment: ?" Example: Code Generation Prompt: "Convert natural language to Python functions: Input: 'Create a function that adds two numbers'Output:def add_numbers(a, b): return a + b Input: 'Create a function that finds the maximum in a list'Output:def find_maximum(numbers): return max(numbers) Input: 'Create a function that reverses a string'Output: ?" Benefits of Few-shot Learning: Establishes clear patterns Reduces ambiguity Improves consistency across outputs Minimizes need for fine-tuning 5. In-context Learning In-context learning leverages the model’s ability to understand and apply new information provided within the conversation context, without updating the model’s parameters. Dynamic Adaptation Example Prompt: "I'm working with a specific dataset format:{ 'customer_id': 12345, 'purchase_date': '2024-01-15', 'items': ['laptop', 'mouse'], 'total': 899.99} Based on this format, generate 3 sample customer records for an electronics store." Context-Aware Responses Conversation Context:User: "I'm building a React application for a food delivery service."AI: "Great! What specific functionality are you looking to implement?" User: "I need help with the cart component."AI: [Provides React-specific cart component code tailored to food delivery] Best Practices for In-context Learning: Provide clear, relevant context early in the conversation Reference previous context when building on discussions Use specific examples from your domain Maintain consistency with established patterns 6. Chain-of-Thought Prompting Chain-of-Thought (CoT) prompting encourages the model to show its reasoning process, leading to more accurate and explainable outputs. Basic Chain-of-Thought Prompt: "Solve this step by step:A store has 24 apples. They sell 8 apples in the morning and 6 apples in the afternoon. How many apples are left? Let me work through this step by step:1) Starting apples: 242) Sold in morning: 83) Sold in afternoon: 64) Total sold: 8 + 6 = 145) Remaining: 24 - 14 = 10 Therefore, 10 apples are left." Zero-Shot Chain-of-Thought Prompt: "A company's revenue increased by 20% in Q1 and decreased by 10% in Q2. If they started with $100,000, what's their revenue at the end of Q2? Let's think step by step." Complex Reasoning Example Prompt: "Analyze whether this business model is sustainable: Business: Subscription-based meal delivery service- Monthly fee: $50- Food cost per meal: $8- Delivery cost per meal: $3- 20 meals per month per subscriber Let's break this down step by step:" When to Use Chain-of-Thought: Mathematical calculations Logic problems Decision-making scenarios Complex analysis tasks 7. Hallucination Prevention Hallucinations when AI models generate false or nonsensical information are a significant challenge. Here are strategies to minimize them: Grounding Techniques Prompt: "Based ONLY on the following text, answer the question: Text: [Insert specific source material] Question: [Your question] If the answer cannot be found in the provided text, respond with 'Information not available in the source.'" Confidence Indicators Prompt: "Answer the following question and indicate your confidence level (High/Medium/Low): Question: What is the population of Tokyo in 2024?Answer: [Response]Confidence: [Level]Reasoning: [Why this confidence level]" Fact-Checking Prompts Prompt: "Claim: 'Python was created in 1995 by Guido van Rossum' Please verify this claim step by step:1. Check the creation year2. Verify the creator3. Provide the correct information if any part is wrong4. Rate the accuracy: Correct/Partially Correct/Incorrect" Source Citation Requirements Prompt: "Write a summary about renewable energy trends. For each major claim, indicate what type of source would be needed to verify it (e.g., 'government report', 'academic study', 'industry survey')." Hallucination Prevention Best Practices: Request sources and citations Use specific, factual prompts Ask for confidence levels Provide authoritative source material when possible (You can use RAG also😃) Combining Techniques for Maximum Control The real power comes from combining these techniques strategically: Example: Research Assistant Prompt: "You are a research assistant helping with academic writing.Temperature: 0.3 (for accuracy) Task: Summarize the key findings about machine learning bias from the following paper excerpt.Follow this format: 1. Main Finding: [One sentence]2. Supporting Evidence: [Key statistics or examples]3. Implications: [What this means for practitioners]4. Confidence: [High/Medium/Low based on source quality] Paper Excerpt: [Insert text] Think through this step by step, and only include information directly supported by the text." Conclusion Mastering generation control is essential for anyone working with AI models. By understanding and applying these six techniques temperature and top-p sampling, prompt engineering, few-shot learning, in-context learning, chain-of-thought prompting, and hallucination prevention you can dramatically improve the quality, reliability, and usefulness of AI-generated content. Thank you for reading!🤗I hope that you found this article both informative and enjoyable to read. (Comment if you build any async Agent application lately love to hear that🙂) Fore more information like this follow me on LinkedIn Generation Control: Mastering AI Output for Better Results was originally published in Coinmonks on Medium, where people are continuing the conversation by highlighting and responding to this story

Generation Control: Mastering AI Output for Better Results

Author: Medium

Source: Medium

2025/11/13 16:44

11 min read

SLEEPLESSAI$0.01897-3.99%

For feedback or concerns regarding this content, please contact us at [email protected]

Photo by Growtika on Unsplash

In today’s fast-moving world of AI and large language models (LLMs), I’ve learned that one of the most valuable skills is not just understanding what these models can do but knowing how to guide them effectively. As I’ve spent time building applications, conducting research, and experimenting with different prompts, I’ve realized that real progress comes from learning how to control the generation process.

In this blog, I want to share seven generation control techniques that have made a real difference in how I work with AI and that every practitioner, researcher, or enthusiast can benefit from.

Temperature
Top-p/Top-k Sampling
Prompt Engineering Techniques
Few-shot Learning
In-context Learning
Chain-of-Thought Prompting
Hallucination Prevention

1. Temperature

Understanding Temperature

Temperature is perhaps the most fundamental parameter for controlling AI generation. It controls the randomness of the model’s output by scaling the probability distribution over possible tokens.

How Temperature Works

Behind the scenes, language models output logits unnormalized log probabilities for each possible next token.

p_i = exp(z_i / T) / Σ_j exp(z_j / T)

Where:

z_i is the logit for token i
T is the temperature parameter
p_i is the final probability of selecting token i

What’s Really Happening?

Think of temperature as a “confidence dial”:

Low Temperature (T < 1): Sharpens the distribution, making high-probability tokens even more dominant
Temperature = 1: Uses the model’s natural probability distribution
High Temperature (T > 1): Flattens the distribution, giving more chance to unlikely tokens
Temperature → 0: Becomes deterministic (always picks the most likely token)
Temperature → ∞: Approaches uniform randomness

The Sampling Algorithm

Here’s what happens under the hood:

import numpy as np

def temperature_sample(logits, temperature=1.0):
# Step 1: Scale logits by temperature
scaled_logits = logits / temperature

# Step 2: Apply softmax (with numerical stability)
exp_logits = np.exp(scaled_logits - np.max(scaled_logits))
probs = exp_logits / np.sum(exp_logits)

# Step 3: Sample from the distribution
next_token = np.random.choice(len(probs), p=probs)

return next_token

The numerical stability trick (subtracting max before exp) prevents overflow when dealing with large logit values.

Technical implementation of how temperature controls randomness in language model token selection

Practical Examples

1) Low Temperature (0.1–0.3)

Perfect for tasks requiring consistency and precision:

# Example with low temperature
response = openai.ChatCompletion.create(
model="gpt-4o",
messages=[{"role": "user", "content": "What is the capital of France?"}],
temperature=0.1
)
# Output: "The capital of France is Paris."

Use cases:

Factual question answering
Code generation
Mathematical calculations
Data extraction
Classification tasks

The model becomes highly deterministic, consistently choosing the most probable tokens.

2) High Temperature (0.7–1.0+)

Unleashes creativity and diverse outputs:

# Example with high temperature
response = openai.ChatCompletion.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Describe a sunset"}],
temperature=0.9
)
# Output might vary each time:
# "The crimson orb melted into the horizon..."
# "Golden light spilled across the darkening sky..."
# "Fire painted the clouds as day surrendered to night..."

Use cases:

Creative writing
Brainstorming sessions
Poetry and artistic content
Marketing copy variations
Story generation

Each run produces notably different outputs as the model explores less probable but potentially more interesting token choices.

2. Top-k and Top-p Sampling

Overview

While temperature scales the entire probability distribution, top-p and top-k are truncation methods that eliminate low-probability tokens before sampling. They provide different ways to control output quality and diversity.

Top-k Sampling

Top-k sampling keeps only the k most probable tokens and redistributes their probability mass.

How it works?

Get probability distribution: P = softmax(logits / temperature)
Sort tokens by probability: P_sorted
Keep only top-k tokens, set others to 0
Renormalize: P’_i = P_i / Σ(top-k probabilities)
Sample from P’

import torch
import torch.nn.functional as F
def top_k_sampling(logits, k=50, temperature=1.0):
"""
Top-k sampling implementation

Args:
logits: [vocab_size] tensor of unnormalized scores
k: number of top tokens to keep
temperature: temperature scaling factor

Returns:
sampled token index
"""
# Step 1: Apply temperature
logits = logits / temperature

# Step 2: Get top-k logits and their indices
top_k_logits, top_k_indices = torch.topk(logits, k)

# Step 3: Apply softmax to top-k logits only
top_k_probs = F.softmax(top_k_logits, dim=-1)

# Step 4: Sample from top-k distribution
sampled_index = torch.multinomial(top_k_probs, num_samples=1)

# Step 5: Map back to original vocabulary index
token = top_k_indices[sampled_index]

return token

Example

Let’s say we have vocabulary of 8 tokens:

tokens = ['the', 'a', 'is', 'very', 'quite', 'extremely', 'somewhat', 'rather']
logits = [5.0, 4.5, 3.2, 2.8, 1.5, 0.8, 0.3, -0.5]

# After softmax (temperature = 1.0)
probs = [0.426, 0.259, 0.070, 0.047, 0.013, 0.006, 0.004, 0.002]

With top-k = 3:

# Step 1: Select top-3 tokens
top_k_tokens = ['the', 'a', 'is']
top_k_probs = [0.426, 0.259, 0.070]

Top-p (Nucleus Sampling)

Top-p (also called nucleus sampling) keeps the smallest set of tokens whose cumulative probability ≥ p.

How it works?

Get probability distribution: P = softmax(logits / temperature)
Sort tokens by probability (descending)
Calculate cumulative sum: CDF_i = Σ P_j for j ≤ i
Find nucleus: N = {tokens where CDF ≤ p}
Renormalize and sample from N

def top_p_sampling(logits, p=0.9, temperature=1.0):
"""
Top-p (nucleus) sampling implementation

Args:
logits: [vocab_size] tensor of unnormalized scores
p: cumulative probability threshold (0 < p ≤ 1)
temperature: temperature scaling factor

Returns:
sampled token index
"""
# Step 1: Apply temperature and softmax
logits = logits / temperature
probs = F.softmax(logits, dim=-1)

# Step 2: Sort probabilities in descending order
sorted_probs, sorted_indices = torch.sort(probs, descending=True)

# Step 3: Calculate cumulative probabilities
cumsum_probs = torch.cumsum(sorted_probs, dim=-1)

# Step 4: Find the nucleus (tokens to keep)
# Remove tokens where cumsum > p (keep first token that exceeds p)
sorted_indices_to_remove = cumsum_probs > p

# Shift right to keep the first token that exceeds p
sorted_indices_to_remove[1:] = sorted_indices_to_remove[:-1].clone()
sorted_indices_to_remove[0] = False

# Step 5: Set removed token probabilities to 0
sorted_probs[sorted_indices_to_remove] = 0.0

# Step 6: Renormalize
sorted_probs = sorted_probs / sorted_probs.sum()

# Step 7: Sample from the nucleus
sampled_sorted_index = torch.multinomial(sorted_probs, num_samples=1)

# Step 8: Map back to original vocabulary
token = sorted_indices[sampled_sorted_index]

return token

Same Example

tokens = ['the', 'a', 'is', 'very', 'quite', 'extremely', 'somewhat', 'rather']
probs = [0.426, 0.259, 0.070, 0.047, 0.013, 0.006, 0.004, 0.002]

With top-p = 0.9:

# Step 1: Sort by probability (already sorted)
# Step 2: Calculate cumulative sum
cumulative = [0.426, 0.685, 0.755, 0.802, 0.815, 0.821, 0.825, 0.827]

With top-p = 0.75:

# cumulative[2] = 0.755 > 0.75 ← Stop here!
# Nucleus = ['the', 'a', 'is']

Visual Comparison

Top-k = 4 (Fixed):
███████████████ the (40%) ← Keep
██████████ a (25%) ← Keep
████ is (10%) ← Keep
███ very (8%) ← Keep
-- (7%) ← Discard (not in top-4)
-- (5%) ← Discard
-- (3%) ← Discard
-- (2%) ← Discard

Top-p = 0.9 (Adaptive):
███████████████ the (40%) ← Keep
██████████ a (25%) ← Keep
████ is (10%) ← Keep
███ very (8%) ← Keep
-- (7%) ← Keep (cumsum still < 90%)
-- (5%) ← Discard (cumsum > 90%)
-- (3%) ← Discard
-- (2%) ← Discard

3. Prompt Engineering Techniques

Effective prompts are the foundation of controlled generation. The way you structure your prompts directly impacts the quality and relevance of outputs.

Clear Instructions

Bad: "Tell me about dogs"
Good: "Write a 200-word informative paragraph about dog training techniques for puppies, focusing on positive reinforcement methods."

Role-Based Prompting

Prompt: "You are an expert data scientist with 10 years of experience.
Explain gradient descent in simple terms for a beginner."

Format Specification

Prompt: "List the top 5 programming languages for beginners.
Format your response as:
1. [Language]: [Brief description]
2. [Language]: [Brief description]
..."

Constraint Setting

Prompt: "Write a product review for a smartphone. Requirements:
- Exactly 150 words
- Include both pros and cons
- Mention battery life, camera, and performance
- Use a neutral tone"

4. Few-shot Learning

Few-shot learning involves providing examples within your prompt to guide the model’s behavior. This technique is incredibly powerful for establishing patterns and desired output formats.

Example: Sentiment Classification

Prompt: "Classify the sentiment of these reviews:

Review: 'This product exceeded my expectations!'
Sentiment: Positive

Review: 'Terrible quality, waste of money.'
Sentiment: Negative

Review: 'It's okay, nothing special.'
Sentiment: Neutral

Review: 'I love this new feature update!'
Sentiment: ?"

Example: Code Generation

Prompt: "Convert natural language to Python functions:

Input: 'Create a function that adds two numbers'
Output:
def add_numbers(a, b):
return a + b

Input: 'Create a function that finds the maximum in a list'
Output:
def find_maximum(numbers):
return max(numbers)

Input: 'Create a function that reverses a string'
Output: ?"

Benefits of Few-shot Learning:

Establishes clear patterns
Reduces ambiguity
Improves consistency across outputs
Minimizes need for fine-tuning

5. In-context Learning

In-context learning leverages the model’s ability to understand and apply new information provided within the conversation context, without updating the model’s parameters.

Dynamic Adaptation Example

Prompt: "I'm working with a specific dataset format:
{
'customer_id': 12345,
'purchase_date': '2024-01-15',
'items': ['laptop', 'mouse'],
'total': 899.99
}

Based on this format, generate 3 sample customer records for an electronics store."

Context-Aware Responses

Conversation Context:
User: "I'm building a React application for a food delivery service."
AI: "Great! What specific functionality are you looking to implement?"

User: "I need help with the cart component."
AI: [Provides React-specific cart component code tailored to food delivery]

Best Practices for In-context Learning:

Provide clear, relevant context early in the conversation
Reference previous context when building on discussions
Use specific examples from your domain
Maintain consistency with established patterns

6. Chain-of-Thought Prompting

Chain-of-Thought (CoT) prompting encourages the model to show its reasoning process, leading to more accurate and explainable outputs.

Basic Chain-of-Thought

Prompt: "Solve this step by step:
A store has 24 apples. They sell 8 apples in the morning and 6 apples in the afternoon. How many apples are left?

Let me work through this step by step:
1) Starting apples: 24
2) Sold in morning: 8
3) Sold in afternoon: 6
4) Total sold: 8 + 6 = 14
5) Remaining: 24 - 14 = 10

Therefore, 10 apples are left."

Zero-Shot Chain-of-Thought

Prompt: "A company's revenue increased by 20% in Q1 and decreased by 10% in Q2. If they started with $100,000, what's their revenue at the end of Q2? Let's think step by step."

Complex Reasoning Example

Prompt: "Analyze whether this business model is sustainable:

Business: Subscription-based meal delivery service
- Monthly fee: $50
- Food cost per meal: $8
- Delivery cost per meal: $3
- 20 meals per month per subscriber

Let's break this down step by step:"

When to Use Chain-of-Thought:

Mathematical calculations
Logic problems
Decision-making scenarios
Complex analysis tasks

7. Hallucination Prevention

Hallucinations when AI models generate false or nonsensical information are a significant challenge. Here are strategies to minimize them:

Grounding Techniques

Prompt: "Based ONLY on the following text, answer the question:

Text: [Insert specific source material]

Question: [Your question]

If the answer cannot be found in the provided text, respond with 'Information not available in the source.'"

Confidence Indicators

Prompt: "Answer the following question and indicate your confidence level (High/Medium/Low):

Question: What is the population of Tokyo in 2024?
Answer: [Response]
Confidence: [Level]
Reasoning: [Why this confidence level]"

Fact-Checking Prompts

Prompt: "Claim: 'Python was created in 1995 by Guido van Rossum'

Please verify this claim step by step:
1. Check the creation year
2. Verify the creator
3. Provide the correct information if any part is wrong
4. Rate the accuracy: Correct/Partially Correct/Incorrect"

Source Citation Requirements

Prompt: "Write a summary about renewable energy trends. For each major claim, indicate what type of source would be needed to verify it (e.g., 'government report', 'academic study', 'industry survey')."

Hallucination Prevention Best Practices:

Request sources and citations
Use specific, factual prompts
Ask for confidence levels
Provide authoritative source material when possible

(You can use RAG also😃)

Combining Techniques for Maximum Control

The real power comes from combining these techniques strategically:

Example: Research Assistant

Prompt: "You are a research assistant helping with academic writing.
Temperature: 0.3 (for accuracy)

Task: Summarize the key findings about machine learning bias from the following paper excerpt.
Follow this format:

1. Main Finding: [One sentence]
2. Supporting Evidence: [Key statistics or examples]
3. Implications: [What this means for practitioners]
4. Confidence: [High/Medium/Low based on source quality]

Paper Excerpt: [Insert text]

Think through this step by step, and only include information directly supported by the text."

Conclusion

Mastering generation control is essential for anyone working with AI models. By understanding and applying these six techniques temperature and top-p sampling, prompt engineering, few-shot learning, in-context learning, chain-of-thought prompting, and hallucination prevention you can dramatically improve the quality, reliability, and usefulness of AI-generated content.

Thank you for reading!🤗I hope that you found this article both informative and enjoyable to read. (Comment if you build any async Agent application lately love to hear that🙂)

Fore more information like this follow me on LinkedIn

Generation Control: Mastering AI Output for Better Results was originally published in Coinmonks on Medium, where people are continuing the conversation by highlighting and responding to this story.

Market Opportunity

Sleepless AI Price(SLEEPLESSAI)

$0.01897

$0.01897$0.01897

+0.52%

USD

Sleepless AI (SLEEPLESSAI) Live Price Chart

Get 20 USDT in Just 1 Minute

Deposit $100 to unlock $300 in GOLD positions

Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact [email protected] for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.

Tags:

#DEX #Index