What AI Is Actually Doing in Document Review
When people say AI is used for document review, what they often imagine is a machine reading a contract and instantly understanding its meaning — like a legal assistant with infinite patience. That’s… not what’s happening.
Thank you for reading this post, don't forget to subscribe!The reality is closer to this: A language model (like ChatGPT or Claude) is being prompted with a massive chunk of legal text. The prompt instructs the model to either summarize key sections (ex: “termination clause”), identify obligations, or flag specific risk language. AI isn’t interpreting the law; it is pattern-matching based on mountains of training data. Huge difference.
Let’s take a lease agreement as an example. You paste the lease into a prompt window and ask: “Does this lease require the tenant to maintain HVAC systems?” Sometimes, it returns a clear yes with a quote from the document. Other times, it gives you a foggy, noncommittal answer like “It appears the tenant is responsible… but consult an attorney.” Not super confidence-inspiring.
Main issue: Long contracts often exceed typical input token limits of AI models. When that happens, either the model skips sections, or worse — truncates mid-clause (I’ve seen this happen with a 40-page supplier agreement; the AI just cut off right in the definition of “Force Majeure” and started imagining clauses that weren’t there 🤯).
Some tools solve this with a pre-parsing approach: they chunk documents into readable segments and then summarize each chunk before merging insights. Others embed vector chunking — meaning it rewrites parts into mathematical embeddings and searches definitions instead of raw text.
AI Capability | How It Behaves in Legal Review |
---|---|
Clause Recognition | Works well if exact language is standard. Poor with creative or customized wording. |
Obligation Extraction | Often accurate, but may miss conditional dependencies (e.g., “unless X happens”). |
Summarization | Great for overviews, but frequently omits detail needed for compliance/legal risk. |
To wrap up, AI in document review is more like a context-aware scanner than a silver-bullet lawyer — useful, but not yet a final authority.
Prompt Engineering That Actually Works for Contracts
Not all prompts are created equal. I’ve tested plain questions like “Summarize this contract” and more granular ones like “Extract only lease obligations of the lessee, ignoring background.” The second one gets drastically better results.
The best prompt format I’ve found (after dozens of tweaks) is:"You are a Legal Analyst trained in identifying actionable obligations in business contracts. Given the document below, extract each obligation with party name and clause number. Do NOT summarize. Focus on lessee’s responsibilities only."
That structure usually triggers the model to behave more like a structured parser and less like a storyteller. Avoid open-ended prompts. Do not say “Is this contract safe?” — it will hallucinate an opinion and waffle around liability terms. Ask very specific questions like “Does Clause 11 require prior written notice for termination?”
Things that almost always improve prompt quality:
- Use role-based instructions (“You are a contract analyst…”)
- List output format (“Return in table with three columns: Clause, Party, Obligation”)
- Tell it what to ignore (reduces noise hallucination)
Interesting discovery: When I used the word “compliance” in prompts, especially in enterprise vendor agreements, the AI would almost always veer into analyzing GDPR or data security. Removing that word brought it back to the actual clauses.
Ultimately, contract-focused prompts have to be treated like configuration files — tuned carefully, not written casually.
What Happens When You Feed Complex Contracts
This is one of the biggest traps: long, highly negotiated contracts (like SaaS master service agreements or partnership deals) don’t just break prompt windows. They break comprehension.
Here’s what happened in my test with a SaaS MSA that had over 25 pages:
- Pasting the whole thing into GPT-4 as one text block exceeded token limits — got clipped at midpoint.
- Chunking it manually worked, but context broke: the model didn’t connect Exhibit B (full of obligations) with Definitions on page 2.
- I tried embedding tools like LangChain to split and maintain vector index. Better context tracking, but significantly slower responses, and required building a basic interface in Streamlit or similar.
Best result came from doing hybrid prompting: break the contract into logical sections manually (Definitions, Payment Terms, Termination), then run prompts on each, saving responses into a flat doc. At the end, do a meta-prompt: “Summarize cross-section obligations and conflicts.”
To sum up, large contract handling requires setup time — if you skip setup, the AI just politely misleads you.
Comparing AI Tools for Legal Doc Review
I ran the same set of prompts on four different assistant platforms using the exact same vendor agreement. Here’s what actually came out:
Tool | Strength | Weakness | Output Style |
---|---|---|---|
ChatGPT-4 | Consistent structure, understands roles clearly | Sometimes truncates long contracts | Formal summary with bullet points |
Claude | Accepts longer document input | Misses references to footnotes or exhibits | Natural language, some paraphrasing |
Harvey | Domain-tuned for legal use (uses models trained on law data) | Not public access; enterprise only | Redlined outputs based on risk flags |
Juro AI | Contracts-first UX; previews key terms visually | Limited support for complex custom contracts | Structured term pairs and summaries |
Overall, raw AI APIs like GPT work well if you manually prepare doc chunks and prompts. Specialized tools like Harvey or Juro skip that step, but with limited flexibility.
To wrap this section, the best choice comes down to your workload volume and how much prep you’re willing to do manually.
Setting Up a Stable AI Prompting Environment
If you’re using AI for legal review regularly, doing it through a basic chatbot window is going to fail you eventually. Trust me.
Instead, I’ve moved to a local notebook + API sync flow. Here’s the setup:
- Upload contract to a tool like PDFPlumber (Python) that preprocesses text
- Use LangChain or LlamaIndex to chunk and embed logic
- Store prompt templates in a YAML or JSON config
- Trigger GPT-4 or Claude via API call with retrievable chunks, not the whole text
Benefits:
- No manual re-pasting errors
- You get to log outputs, catch errors, re-run chunks
- You can version your prompts as you discover better phrasing
Setup time: about an afternoon. But after that, it saves hours per contract.
Situations where this helped me the most: high-volume NDAs where only clause triggers mattered (“Is there a unilateral termination?”). Also in vendor reviews where we had to extract warranties across 30 contracts in bulk.
The bottom line is, once you’re using AI for legal doc review routinely, a structured environment prevents hallucinations and dropped obligations.
When AI Legal Review Goes Wrong: Real Errors
Here’s a fun one (painful at the time): I was reviewing a reseller agreement and used ChatGPT to extract all revenue share clauses.
The contract had this line buried in the appendix: “Additional incentives are calculated net of applicable taxes and fees.” GPT completely missed that. It interpreted the previous section (“Reseller receives up to 30% commission”) as the entire answer. Missed post-deductions entirely 😤.
Another time, a clause defined “termination for cause” uniquely. GPT ignored that custom definition and applied a generic interpretation — dangerously wrong during dispute resolution planning.
Guidance to prevent this:
- Always check for Appendix or Footnote references — AI often skips these unless explicitly asked.
- Pre-define all uncommon terms before prompting (e.g., “In this contract, ‘Service Tail’ means what?”)
- Use version control to test prompt responses across slightly different contracts
This also happens when the formatting breaks — like weird bullet points, tables, or track changes. If your contract came in DOCX form, convert it to clean Markdown or plain text before prompting.
As a final point, assume the AI will miss nuance unless proven otherwise — always double-check any automated outcome involving risk or payment duties.