Building a Robust PII Sanitization Chain with LangChain
- Neha Singh
- Dec 26, 2025
- 5 min read
Updated: 7 days ago
Large Language Models (LLMs) are incredible, but asking them to do everything in one go, especially for critical tasks like PII (Personally Identifiable Information) sanitization, is often a recipe for disaster. We've all seen prompts like "Extract PII and summarize." While convenient, this approach often sacrifices accuracy for simplicity, leaving sensitive data vulnerable.
In this post, we'll dive into how to build a production-ready PII Sanitization Chain using LangChain Expression Language (LCEL). We'll leverage multiple prompt engineering techniques – Chain-of-Thought (CoT), Few-Shot Prompting, and Structured Output – to create a robust, auditable system, complete with a confidence assessment to flag uncertain results for human review.
The Challenge: Why One Prompt Isn't Enough
Imagine a customer support transcript: "Hi, this is Sarah Jenkins. I'm calling about my account #88291. My husband, Mark, and I are moving from 123 Maple St to a new place in Austin next week. You can reach me at sarah.j@email.com."
A single prompt asking an LLM to "remove PII" might miss "Mark" (contextual PII) or "Austin" (a common city name that becomes PII when tied to a specific individual's movement). Why? Because LLMs, when overloaded with multiple instructions, can deprioritize certain sub-tasks or get "creative" in their interpretations.
For PII sanitization, a mistake isn't just a bad answer; it's a security vulnerability. We need precision, auditability, and a safety net.
Our Three-Stage PII Sanitization Chain
We'll break down the complex task into three specialized, sequential steps, mimicking a human workflow:
The Analyst: Identifies all potential PII, with a CoT for clarity.
The Engineer: Replaces the identified PII with generic placeholders.
The Auditor: Performs a final check, assesses confidence with reasoning , and flags potential leaks.
Let's build this with LangChain!
Step 1: The Detective (Identification via Chain-of-Thought)
The first step is pure identification. We don't want the LLM to think about replacing yet, only finding. By using Chain-of-Thought (CoT), we force the model to explicitly list its reasoning, making its identification process transparent and less prone to overlooking subtle PII . We also enforce Structured Output using Pydantic for easy parsing by the next step.
from langchain_openai import ChatOpenAIfrom langchain_core.prompts import ChatPromptTemplate, PromptTemplatefrom langchain_core.output_parsers import StrOutputParser, JsonOutputParserfrom langchain_core.runnables import RunnablePassthroughfrom pydantic import BaseModel, Fieldllm=ChatOpenAI(model='gpt-4o-mini', max_tokens=1000, temperature=0.2)class PIIList(BaseModel): pii_found: list[str] = Field(description="A list of all identified PII items.")analyst_prompt=ChatPromptTemplate.from_template(""""""You are a PII detection specialist. Your task is to analyze the following text and identify all sensitive information, including names, emails, phone numbers, account IDs, and specific locations.\n\n **Instructions:**\n 1. Think step-by-step to identify any sensitive information including names, emails, phone numbers, account IDs, or specific locations.\n 2. For each identified item, briefly explain why it is sensitive.\n" 3. Provide a final list of *only the PII items themselves* as 'pii_items'.\n\n Text to Analyze:**\n {transcript}""")## ChatPromptTemplate automatically tags transcript mentioned within curly braces as input_variable.analyst_chain=analyst_prompt | llm | StrOutputParser(pydantic_object=PIIList)
# To run and verify this in isolation, you can use:
# pii_data = analyst_chain.invoke({"transcript": raw_text})Why CoT? By explicitly asking the model to "think step-by-step" and providing an example, we guide its reasoning process, leading to more thorough identification. The JSON output ensures the next step gets a clean, machine-readable list
Step 2: The Engineer (Sanitization with Few-Shot)
With the PII identified, the "Engineer" simply performs the replacement. This step is a Few-Shot pattern-matching task. We show it what a replacement should look like, and it applies that pattern.
eng_prompt = ChatPromptTemplate.from_template( """Replace the PII items in the transcript with placeholders like [NAME_1], [ID_1], [LOCATION_1], [EMAIL_1], etc.\n" PII Items: {pii_items}\n Transcript: {transcript}\n\n Sanitized Text: Do not output anything else apart from sanitized text. e.g. 'pii_items': ['Sarah Jenkins', 'Mark', '88291', '123 Maple St', 'Austin', 'sarah.j@email.com'] Transcript: "Hi, this is Sarah Jenkins. I'm calling about my account #88291. My husband, Mark, and I are moving from 123 Maple St to a new place in Austin next week. You can reach me at sarah.j@email.com." Sanitized Text: "Hi, this is [NAME_1]. I'm calling about my account [ID_1]. My husband, [NAME_2], and I are moving from [ADDRESS_1] to a new place in [LOCATION_1] next week. You can reach me at [EMAIL_ID]." """ ) engineer_chain= eng_prompt | llm | StrOutputParser()# To run this in isolation (or as part of the full chain later) # We can add a placeholder for pii_items from the previous step # sanitized_text_output = engineer_chain.invoke({"transcript": raw_text, "pii_items": pii_data })Why Few-Shot? For repetitive, pattern-based tasks like replacement, providing clear examples (few-shot) is more efficient than asking the LLM to "reason" through each one. I have added one example considering blog length, you can add more.
Step 3: The Auditor (Verification with Confidence Assessment)
This is our crucial safety net. The "Auditor" critically reviews the sanitized text and, most importantly, provides a confidence score with explanation. This score allows us to automatically flag results for human review if the AI is not confident.
class AuditResult(BaseModel): is_clean: bool = Field(description="True if no PII remains, False otherwise") sanitized_text: str = Field(description="The final text after any extra corrections") confidence: float = Field(description="Score from 0.0 to 1.0 of how sure the AI is") reasoning: str = Field(description="Explanation of why this score was given")auditor_prompt=ChatPromptTemplate.from_template( """ Review this sanitized text for any missed PII (e.g., hidden dates or cities).\n 1. Engineer has already redacted the text, check if there are any PIIs left.\n 2. REDACT any sensitive information including names, emails, phone numbers, account IDs, or specific locations if found.\n 3. Return TRUE if the final_text is clean. 4. Output the final sanitized text 5. Assign a confidence score (0.0 to 1.0) on how likely it is that this text is now 100% PII free. 6. Provide a brief explanation for the given confidence score.\n\n Text: {sanitized_text}\n Respond ONLY in a structured JSON format. """)auditor_chain = auditor_prompt | llm | JsonOutputParser(pydantic_object=AuditResult)Bringing It All Together: The Full LCEL Chain
Now, let's connect these three powerful steps into a single, cohesive LangChain Expression Language (LCEL) chain. The RunnablePassthrough.assign allows us to pass intermediate results from one step to the next without losing the original input.
#final_chain= (RunnablePassthrough.assign(pii_items=analyst_chain) | RunnablePassthrough.assign(sanitized_text=engineer_chain) | auditor_chain)# OR define like below
final_chain = ({"pii_items": analyst_chain, "transcript":RunnablePassthrough()} | RunnablePassthrough.assign(sanitized_text=engineer_chain) | auditor_chain)
from typing import Any, Dict
def run_agent(raw_text:str) -> Dict[str, Any]: threshold = 0.8 try: result = final_chain.invoke({"transcript": raw_text}) return result except Exception as e: return({"is_clean": False, 'confidence_score': 0.0, 'explanation': f"Error occurred: {e}", 'final_text': ''})if __name__ == "__main__": user_input=str(input("Enter customer data to be sanitized for PII: ")) print("Running agent .....") result=run_agent(raw_text=user_input) if result['confidence_score']>=threshold: print(f"✅ High Confidence: Final PII free customer data is ready for storage and as below: \n\n {result["final_text"]}") else: print(f"⚠️ LOW CONFIDENCE ({result['confidence']}): Need Human Review.")Test Cases and Output:
raw_text = "Hi, I'm John Doe. My account is 12345. I live in Seattle."result = final_chain.invoke({"transcript": raw_text})if result['confidence_score']>=threshold: print(f"✅ High Confidence: Final PII free customer data is ready for storage and as below: \n\n {result["final_text"]}")else: print(f"⚠️ LOW CONFIDENCE ({result1['confidence']}): Need Human Review.")
user_input="""Hi, this is Sarah Jenkins. I'm calling about my account #88291. My husband, Mark, and I are moving from 123 Maple St to a new place in Austin next week. You can reach me at sarah.j@email.com."""result1 = final_chain.invoke({"transcript": user_input})if result1['confidence_score']>=threshold: print(f"✅ High Confidence: Final PII free customer data is ready for storage and as below: \n\n {result1["final_text"]}")else: print(f"⚠️ LOW CONFIDENCE ({result1['confidence']}): Need Human Review.")
Here's what the result look like:

Conclusion
By breaking down a complex problem into smaller, specialized tasks and applying targeted prompt engineering techniques, we've built a more robust and reliable PII sanitization system.
Chain-of-Thought improves detection accuracy.
Few-Shot Prompting ensures consistent replacement.
Structured Output facilitates seamless data flow between steps.
Confidence Assessment provides a crucial safety net for human intervention.
This approach demonstrates that effective LLM applications often lie not in a single "magic prompt," but in intelligently chained sequences that leverage the strengths of each model interaction. The next time you face a complex LLM task, consider how you can decompose it into a series of smaller, auditable steps!


Comments