Step-by-Step LLM Prompt Engineering Guide

Step 1: Basic Understanding of LLMs

What is an LLM?

A Large Language Model (LLM), in simple words, is an advanced text predictor that can write, summarize, translate, or answer questions by learning patterns from billions of words it has seen during training. Some popular examples are:

What are the key components of Language Models?

Transformers: An LLM is built using a special kind of neural network called transformers. Transfermers help the model to understand the input and predict what comes next.

Training Data: A Vast and Diverse amount of text that is used during the training to help the model learn relations between words.

Tokenization: is the process of breaking words into smaller pieces called tokens. Tokens are units of LLMs; they can be a whole word, part of the word, or even an individual character, depending on the model and tokenization method used.

Limitations: Hallucinations, Drifting, and Bias

Hallucinations: Sometimes LLMs generate false or made-up information confidently. This happens because models predict texts based on patterns, but it doesn't understand facts or reality. For example, you may ask a low-intelligent "Tell me the history of the Eiffel Tower in Madrid," and it may generate a history confidently instead of telling you the Eiffel Tower is located in Paris

Drifting: Drifting happens when the LLM's knowledge and understanding fall behind the changing world, and in consequence, it provides inaccurate and unreliable answers to our questions. For example, our natural language (English, Spanish, etc) changes over time, but the model can fall behind new changes in our languages, which leads to model drifting. Or, for example, if the model is trained with data before 2020, it might not understand what NFTs, TikTok trends, or any technologies came after 2020.

Bias Bias means models generate an output that unfairly favors or discriminates against certain groups, ideas, or perspectives. This happens because LLMs are trained on data that sometimes contains human biases, stereotypes, and imbalances.

LimitationWhat It MeansCause / Why It HappensSimple Example
HallucinationModel generates false or made-up information confidentlyModel predicts likely text based on patterns, not verified facts; incomplete or misleading training dataLLM says “The Great Emu War was between Australia and New Zealand” (false fact)
DriftingModel’s performance declines over time as language or data changesLanguage evolves or new topics emerge after model training, causing mismatch with outdated knowledgeModel trained before 2020 doesn’t understand “TikTok trends” or “NFTs”
BiasModel reflects and amplifies unfair stereotypes or prejudicesTraining data contains human biases, which the model learns and reproducesAssociating “doctor” with men and “nurse” with women, reinforcing gender stereotypes

Step 2: Prompt and Prompt Engineering

What is a Prompt?

Prompt is the input or instructions we feed into the LLM to tell it what we want. The model reads this input and generates an output based on that. All LLMs work with text input as a prompt.

Example 1: Vague prompt

Example 2: Detailed and Specific Prompt

What is Prompt Engineering?

Imagine you enter a coffee shop, you can ask for "A Coffee" or you can ask for "A medium oat-milk latte with one pump vanilla, extra foam". Which one gives you exactly what you want?

Prompt Engineering is the art and science of crafting LLM input in a way that results in exactly what we want, like ordering your favorite coffee with all the ingredients. By applying best practices of prompt engineering, the LLM can generate the most accurate, useful results and can also reduce costs, latency, and improve model performance. This process is more than writing a text; it includes monitoring, testing, securing, and versioning our prompt.

What are the responsibilities of a prompt engineer?

A Prompt Engineer is responsible for designing, developing, and refining prompts that guide Large Language Models (LLMs) to produce accurate, relevant, and useful outputs. Their responsibilities typically include:

  • Designing prompts: Crafting clear, precise, and context-aware prompts tailored to specific tasks or industries to elicit the best responses from AI models.
  • Testing and iterating: Continuously experimenting with prompts, analyzing outputs, and refining prompts to improve accuracy, relevance, and creativity.
  • Building prompt libraries: Creating and maintaining collections of effective prompts or prompt chains along with documentation or guidelines for reuse by teams or users.
  • Monitoring and reporting: Tracking prompt performance, documenting results, and providing feedback or reports to stakeholders for continuous improvement.
  • Training and tuning models: Assisting in fine-tuning or guiding AI learning processes to optimize model behavior and output quality.
  • Ethical oversight: Identifying and mitigating biases or ethical issues in AI outputs by adjusting prompt design and ensuring responsible AI use.

Step 3: How to write clear and concise prompts?

One of the recommended methods is using the COSTAR framework. COSTAR is a structured methodology for crafting effective prompts. This framework is flexible and helps you to structure prompts for different tasks like translation, summarization, code generation, etc. COSTAR breaks down prompt creation into six key elements to ensure clarity, relevance, and alignment with the desired output.

  • Context: Provide background information to help the model understand the scenario and generate relevant responses.
  • Objective: Clearly define the task or goal you want the model to accomplish.
  • Style: Specify the writing style or persona the model should emulate (e.g., formal, expert, humorous).
  • Tone: Set the attitude or emotional tone of the response (e.g., empathetic, professional).
  • Audience: Identify the intended audience to tailor the response appropriately (e.g., beginners, experts).
  • Response: Define the expected response format, such as a list, JSON, or report, to fit downstream use.

We can include prompt engineering best practices in the COSTAR elements. For example, one important technique is role playing, where we give a role to the model, which helps it to look for related knowledge in its trained data. The role will be part of the Context since it provides background information.

Step 4: Basic Rules and Practices for Prompt Engineering

Create Scenarios or Roles (Context)

Giving the model a Role primes domain knowledge. LLMs are trained on a vast amount of data, giving them a hint and background that guides them on which part of their data they should look at will improve output quality and accuracy.

In the following example, we included a role in the context section of the prompt "You are an expert medical doctor providing patient advice."

Be specific about the Task (Objective)

This is the end goal, what we want the model to do. Be as specific as possible, remember the Coffee shop example, if you give the model a vague request, it wouldn't end up with what you exactly want. In the previous example Explain the causes and prevention of diabetes. is our specific task we defined for the LLM.

Use Examples

Giving examples can improve model output quality, minimize ambiguity, and guide it to exactly what we are asking for. If you don't provide any example, you let the model guess based on its pre-trained data.

Examples are also called Shots in the LLM context. We have a zero-shot prompt, which doesn't have any examples, one-shot, and few-shot prompts.

Quote Sources or Ask for Citations (Response)

Asking the model to cite sources or base answers on given data reduces fabricated or false information.

Providing data in a prompt has its pros and cons. It improves models' accuracy, but we have to be careful that too much information can confuse the model and can have higher costs since we provide more input tokens to the model

Constraints

Constraints are like traffic lights for LLMs. We can define

  1. Output Length
  2. Style and Tone
  3. Output Format
  4. Filter for content
  5. Scope

A small study of 500 prompts (across GPT‐3.5 and GPT‐4) found a +25% increase in relevancy when length or format constraints were used and 30% fewer follow‐up edits needed if style/tone constraints were specified up front.

In user surveys, 82% of writers report that clear constraints cut revision time in half.

Here are some examples:

Prompt (No Constraint)Prompt (With Constraint)Resulting Difference
“Explain photosynthesis.”“Explain photosynthesis in 3 bullet points, each ≤ 20 words.”More concise, scannable output.
“Write a recipe for pancakes.”“Write a recipe for pancakes in JSON with keys ingredients and steps.”Easily machine‐readable recipe.
“Give me marketing tips.”“Give me 5 marketing tips, each in a casual, friendly tone.”Consistent voice, clear listicle.

Step 5: Advanced Prompt Engineering Techniques

In this section, we review some advanced techniques that can improve prompt effectiveness and accuracy.

Chain-of-Thought Prompting

Chain-of-Thought (CoT) prompting encourages LLMs to “think out loud” by breaking down multi-step problems into intermediate reasoning steps. Instead of jumping straight to an answer, the model generates a chain of mini-inferences, much like a student showing their work on paper. This approach dramatically improves accuracy on tasks such as math word problems, logical puzzles, or multi-stage planning. To include CoT in your prompt:

  • Pose the question you want answered.
  • Add an instruction like “Let’s think step by step” ”Think out loud.
  • The model replies with a chain of reasoning, then the final result.

Sequential Prompting or Prompt Chaining

Sequential Prompting is a technique where you give a language model a series of related prompts in a specific order, with each prompt building on the model’s previous response. This helps the AI handle complex tasks step-by-step, improving understanding and accuracy.

Instead of asking one big question, you break it down into smaller, connected questions or instructions. The model answers each step, and you use those answers to guide the next prompt. This creates a logical flow and helps the AI focus better.

Step-Back Prompting

Step-Back Prompting is a technique used to solve complex problems more effectively by first “stepping back” to think about the bigger picture or fundamental principles before diving into specific details. For example, instead of If a sealed container of gas at room temperature is heated to 50°C, what happens to the pressure? We can break it into 2 steps

Step 1 - Abstraction: Let the model think about the broad principles or concept behind the question

What general laws or principles govern the relationship between temperature and pressure in gases?

Model answer: “The Ideal Gas Law, which relates pressure, volume, and temperature.”

Step 2 - Reasoning: Using the abstracted knowledge from step 1, let the model reason for the specific question

Using the Ideal Gas Law, explain what happens to the pressure when the temperature increases in a sealed container.

Self-Critique and Refinement

By using self-critique, we let the model first generate an initial answer, then review and critique its response, identify mistakes or weaknesses, and finally produce a refined, improved answer based on that critique.

Ethical and Safety Guardrails

Guardrails act like safety barriers, guiding the model to avoid dangerous or inappropriate outputs. They ensure AI respects privacy, avoids stereotypes, and provides trustworthy information.

Step 6: Test and evaluate prompt effectiveness

Testing and evaluating prompt effectiveness is a systematic process that ensures your prompts consistently produce accurate, relevant, and high-quality outputs from LLMs.

1. Define Clear Success Criteria

Before testing, decide what a “good” response looks like for your use case.

  • Accuracy: Is the information correct?
  • Relevance: Is the response on-topic and does it answer the prompt?
  • Clarity: Is the output understandable for the intended audience?
  • Format: Does it follow the required structure (e.g., bullet points, JSON)?
  • Ethics & Safety: Is it free from bias, hallucination, or inappropriate content?

2. Create a Test Set

Prepare a diverse set of sample inputs (questions, scenarios, or tasks) that reflect real-world usage.

  • Include edge cases and ambiguous queries to test robustness.
  • Use both zero-shot and few-shot variations if relevant.

3. Run Prompt Trials

Feed your prompt and test cases into the LLM:

  • Use different model versions if possible (e.g., GPT-4o, Gemini).
  • Collect outputs for each test case.

4. Evaluate Outputs

Manual Review:

  • Check each output against your success criteria.
  • Note any errors, hallucinations, bias, or off-topic responses.

5. Iterate and Refine

  • Identify Patterns: Are there recurring issues (e.g., too much jargon, missing citations)?
  • Adjust Prompt Elements: Refine COSTAR elements (context, objective, style, etc.), add or modify examples (shots), clarify constraints, or add chain-of-thought instructions.
  • Retest: Run the revised prompt on your test set and compare results.

6. Monitor in Production

  • Track user feedback and real-world outputs.
  • Set up monitoring for ethical and safety guardrails (flagging bias, hallucinations, or inappropriate content).
  • Continue to refine prompts based on ongoing data.

7. Document and Version

  • Keep records of prompt versions, test results, and changes.
  • Document what works and what doesn’t for future reference and team sharing.

Case Study: Writing an Effective Prompt with COSTAR for a Climate Policy Summary

Step 1: Understand the Task and LLM Limitations

A strategy consultant at a tech firm needs to brief senior executives on a new AI research paper. The executives are not technical experts—they need a clear, concise summary of the paper’s business relevance, free from jargon and bias.

Step 2: Apply the COSTAR Framework

Let's break down our prompt in COSTAR elements and include role playing and Ethical and Safety guardrails in it.

COSTAR ElementExample in Practice
ContextYou are an expert technology analyst skilled at translating technical research into business insights.
ObjectiveSummarize the attached AI research paper, focusing on key findings and their potential impact on business.
StyleClear, jargon-free, and suitable for non-technical readers.
ToneProfessional and insightful.
AudienceSenior business executives with limited technical background.
ResponseProvide a summary in 4-5 bullet points. Highlight business relevance. Include citations for key claims. Avoid speculation, technical jargon, or sharing confidential data.

 Step 3: Add Prompt Shots (Examples)

Step 4: Final Prompt

Step 5: Test and Evaluate

The consultant systematically tests the prompt by running it on a diverse set of technical papers and reviewing the AI-generated summaries against clear success criteria: clarity, relevance, accuracy, format, and ethical compliance. They check if outputs are jargon-free, business-focused, and include citations for key claims, while also watching for hallucinations, bias, or speculative statements. Any recurring issues—such as overly technical language or missing citations—lead to prompt refinements, like clarifying constraints or adding more examples.

FAQs

How to handle unexpected or irrelevant AI responses?

To handle unexpected or irrelevant AI responses, best practices involve a combination of prompt refinement, response validation, and iterative feedback. Here’s a clear summary based on the search results and expert insights:

StrategyWhat It DoesHow It HelpsExplanation
Refine and Clarify PromptsImprove prompt specificity and clarityReduces ambiguity and misinterpretationClear, detailed prompts guide the AI better, minimizing irrelevant or off-topic answers.
Use Follow-up or Correction PromptsAsk the model to reconsider or correct its answerFixes mistakes or steers the conversation backIf the response is off, a follow-up prompt can request clarification or a revised answer.
Leverage Human-in-the-LoopHave humans review and adjust AI outputsEnsures quality and appropriatenessHuman oversight is critical for catching errors model misses and improving training data.
Use AI Incident Response FrameworksMonitor, detect, and respond to model failuresRapidly identify and mitigate issuesSimilar to cybersecurity incident response, this involves continuous monitoring and automated remediation.
Iterative Testing and FeedbackContinuously test prompts and outputsImproves prompt design and model behavior over timeRegular evaluation helps identify patterns of failure and refine prompts accordingly.

Can prompt engineering be used for complex or technical topics?

Yes, by applying prompt engineering best practices—such as providing clear context, examples, stepwise instructions, and role assignments—you can effectively harness LLMs to handle complex or technical topics with improved accuracy, clarity, and relevance.

Will prompt engineering become obsolete?

Prompt engineering will not become obsolete anytime soon; rather, it is evolving and becoming more sophisticated. Here are some key points to consider:

  • Automation and Optimization: In 2025, prompt refinement is increasingly automated through tools, which iteratively improve prompts based on model performance, making prompt engineering more efficient.
  • Security and Safety: New security-focused prompting techniques embed safety guardrails directly into prompts, reducing vulnerabilities and ensuring ethical AI use. This highlights that prompt engineering is critical for safe AI deployment.
  • Increasing Complexity: Modern AI applications require prompt engineers to design robust, scalable, and task-specific prompt systems, often involving chaining, role prompting, and iterative feedback loops.
  • Tooling and Frameworks: Emerging tools like LangChain, CrewAI, and AutoGen support advanced prompt engineering workflows, indicating the field is becoming more technical and integrated into AI development pipelines rather than becoming obsolete.

Is prompt engineering still relevant, or is it overrated?

Prompt engineering is not overrated; it is a foundational and evolving discipline essential for interacting effectively with AI in 2025. It empowers users to guide AI models toward accurate, relevant, and ethical outputs, making it a must-have skill across industries.

Conclusion

Prompt engineering is an essential discipline for anyone working with large language models, enabling users to unlock the full power and reliability of AI systems. By applying structured frameworks like COSTAR, incorporating role playing, ethical guardrails, clear constraints, and illustrative examples, you can dramatically improve the clarity, accuracy, and safety of AI outputs—even for complex or technical tasks.

Related Blogs

Looking to learn more about llm, Prompt, Prompt Engineering and step-by-step LLM prompt engineering guide? These related blog articles explore complementary topics, techniques, and strategies that can help you master Step-by-Step LLM Prompt Engineering Guide.