Adversarial Test Report: Legal Citation Extraction System
Date: 2026-06-25
Tester: Phase3-Adversarial
Methodology: 5 adversarial test cases designed to expose failure modes in the contextual-distillation citation extraction pipeline. Each case targets a specific weakness in how the LLM parses legal text, maps it to CountryProfile level hierarchies, and produces reference_id values that must be verbatim substrings of the chunk.
Table of Contents
- Ambiguous Numbering
- Cross-Reference Confusion
- Multi-Document Chunk
- Amendment Text
- Code-Switching
- Robustness Rating
- Systemic Findings
1. Ambiguous Numbering
Scenario: French law text where "2" could be a paragraph number (2°) or a point number — or an article subdivision digit.
Query (FR)
Quelles sont les conditions d'ancienneté pour le préavis de licenciement ?
Chunk (FR)
Article L1234-1
Lorsque le licenciement n'est pas motivé par une faute grave, le salarié a droit :
1° S'il justifie chez le même employeur d'une ancienneté de services continus inférieure à six mois, à un préavis dont la durée est déterminée par la loi, la convention ou l'accord collectif de travail ou, à défaut, par les usages pratiqués dans la localité et la profession ;
2° S'il justifie chez le même employeur d'une ancienneté de services continus comprise entre six mois et moins de deux ans, à un préavis d'un mois ;
3° S'il justifie chez le même employeur d'une ancienneté de services continus d'au moins deux ans, à un préavis de deux mois.
Toutefois, les dispositions des 2° et 3° ne sont applicables que si la loi, la convention ou l'accord collectif de travail, le contrat de travail ou les usages ne prévoient pas un préavis ou une condition d'ancienneté de services plus favorable pour le salarié.
Why This Confuses the LLM
The number "2" appears in four structurally different roles:
| Occurrence | Actual Role | Risk |
|---|---|---|
2° (line 2 of enumeration) |
Point (level: point) |
Correct |
2° in "des 2° et 3°" (last paragraph) |
Back-reference to point, not a new point | LLM may extract it as a new citation |
deux ans |
Prose number — not a structural reference at all | LLM may hallucinate reference_id: "deux ans" |
L1234-1 |
Article number containing "1" at multiple depths | LLM may misparse the hyphenated structure |
The last paragraph's 2° et 3° is the real trap. It's a forward/backward reference embedded in running text, not a structural subdivision. The LLM must recognize that 2° here is a citation of the points above, not a new structural unit. The few-shot schema asks for reference_id as a verbatim substring — 2° does appear verbatim, so the LLM will be tempted to extract it again.
Correct Extraction
| relation | reference_id | section_name |
|---|---|---|
direct |
Article L1234-1 |
Préavis en cas de licenciement |
direct |
3° S'il justifie chez le même employeur d'une ancienneté de services continus d'au moins deux ans |
Préavis de deux mois (ancienneté ≥ 2 ans) |
indirect |
2° S'il justifie chez le même employeur d'une ancienneté de services continus comprise entre six mois et moins de deux ans |
Préavis d'un mois (ancienneté 6 mois–2 ans) |
The back-reference des 2° et 3° in the last paragraph should not produce a separate extraction — it's a pointer to the already-extracted points, not a structural unit itself.
Likely LLM Failure
- Duplicate extraction:
2°extracted twice (once from the enumeration, once from the back-reference) - Relation confusion: The back-reference
2°classified asdirectwhen it's really just prose - reference_id truncation:
3° S'il justifie chez le même employeur d'une ancienneté de services continus d'au moins deux ansis very long — LLMs tend to truncate to3°or3° S'il justifie…
2. Cross-Reference Confusion
Scenario: Korean law text referencing a section in a different law, where the cited law's section number happens to collide with the current law's structure.
Query (KR)
개인정보 유출 시 통지 의무와 처벌 규정은 어떻게 되나요?
Chunk (KR)
제34조의2(유출 등의 통지) ① 개인정보처리자는 개인정보가 분실·도난·유출된 사실을 알게 된 때에는 지체 없이 해당 정보주체에게 다음 각 호의 사항을 알려야 한다.
1. 유출된 개인정보의 항목
2. 유출된 시점
3. 대응조치 및 피해구제 절차
② 개인정보처리자는 제1항에 따른 통지를 받은 날부터 30일 이내에 「정보통신망 이용촉진 및 정보보호 등에 관한 법률」 제48조의2에 따른 개인정보 침해 사실 통지와 「신용정보의 이용 및 보호에 관한 법률」 제34조에 따른 유출 통지를 하여야 한다.
Why This Confuses the LLM
This chunk contains three different laws' section numbers in close proximity:
| Reference | Law | Section | Current Law? |
|---|---|---|---|
제34조의2 |
개인정보 보호법 | — | ✅ Yes (this chunk's article) |
제1항 |
Same law, self-reference | — | ✅ Yes |
제48조의2 |
정보통신망법 | Art. 48-2 | ❌ No — different law |
제34조 |
신용정보법 | Art. 34 | ❌ No — different law |
The critical collision: 제34조의2 (current law, inserted article 34-2) and 제34조 (completely different law, article 34). They share the same structural pattern (제N조) and nearly the same number. The LLM must:
- Recognize that
제48조의2belongs to정보통신망법, not the current law - Recognize that
제34조belongs to신용정보법, not the current law - NOT confuse
제34조의2(current, inserted) with제34조(other law) - Handle the self-reference
제1항pointing to paragraph 1 of the current article
Correct Extraction
| relation | reference_id | section_name |
|---|---|---|
direct |
제34조의2 |
유출 등의 통지 |
direct |
① |
통지 의무 |
direct |
1. |
유출된 개인정보의 항목 |
indirect |
② |
통지 방법 (타 법률 준용) |
제48조의2 and 제34조 from other laws should not be extracted as structural citations of this document — they're cross-references to external statutes. However, the system has no explicit mechanism to distinguish "this law's §34" from "that law's §34" when both appear in the same chunk.
Likely LLM Failure
- Cross-law extraction:
제48조의2and제34조extracted asdirectreferences of the current document - Number collision:
제34조(신용정보법) confused with제34조의2(current law) - Relation inflation: External cross-references classified as
directinstead ofindirector excluded - Missing context: The LLM has no way to know which law the chunk belongs to — the user message template (
[Question]\n{query}\n\n[Document Chunk]\n{chunk}) doesn't include the source document name
3. Multi-Document Chunk
Scenario: A chunk that contains text from two different laws concatenated together (e.g., a PDF extraction artifact, or a legislative amendment that embeds another law's text).
Query (FR)
Quelles sont les obligations de l'employeur en matière de sécurité et les droits du salarié en cas de harcèlement ?
Chunk (FR)
Article L4121-1
L'employeur prend les mesures nécessaires pour assurer la sécurité et protéger la santé physique et mentale des travailleurs.
Ces mesures comprennent :
1° Des actions de prévention des risques professionnels, y compris ceux mentionnés à l'article L. 4161-1 ;
2° Des actions d'information et de formation ;
3° La mise en place d'une organisation et de moyens adaptés.
Article L1152-2
Aucun salarié ne doit subir les agissements répétés de harcèlement moral qui ont pour objet ou pour effet une dégradation de ses conditions de travail susceptible de porter atteinte à ses droits et à sa dignité, d'altérer sa santé physique ou mentale ou de compromettre son avenir professionnel.
Why This Confuses the LLM
Two articles from the same code (Code du travail) but addressing completely different topics (workplace safety vs. moral harassment) are concatenated into one chunk. The LLM faces:
- Query relevance mismatch: The query asks about both topics, so both articles seem relevant — but they're from different parts of the code with no logical connection
- No document boundary marker: There's no separator, header, or metadata indicating the text switched laws
- Level confusion:
Article L4121-1uses theL.prefix (legislative), whileArticle L1152-2also usesL.— same level, same prefix scheme, unrelated content - Cross-reference within chunk:
L. 4161-1appears inside the first article's text — is it a new extraction or a cross-reference?
Correct Extraction
| relation | reference_id | section_name |
|---|---|---|
direct |
Article L4121-1 |
Obligation générale de sécurité |
direct |
Article L1152-2 |
Interdiction du harcèlement moral |
indirect |
1° Des actions de prévention des risques professionnels |
Actions de prévention |
L. 4161-1 in the first article's text should be classified as indirect (cross-reference) since it's not the article being discussed but rather mentioned within another article.
Likely LLM Failure
- False coherence: LLM treats the two unrelated articles as if they're part of the same logical unit, producing a narrative connecting safety obligations to harassment prevention
- reference_id collision: Both
Article L4121-1andArticle L1152-2are correct, but the LLM may inventsection_namevalues that falsely link them - Cross-reference extraction:
L. 4161-1extracted asdirectwhen it's just a mention inside another article's text - Missing the second article entirely: If the query only asked about safety, the LLM might stop reading after
Article L4121-1and missL1152-2completely
4. Amendment Text
Scenario: A chunk that is an amending law inserting a new section into an existing code, where the amendment text describes the new section in meta-language rather than presenting it as operative text.
Query (FR)
Quelle est la nouvelle disposition relative aux lanceurs d'alerte dans le Code du travail ?
Chunk (FR)
Article 9
I. - Le titre IV du livre Ier de la première partie du code du travail est complété par un chapitre V ainsi rédigé :
" Chapitre V
" Dispositions relatives aux lanceurs d'alerte en matière sociale
" Art. L. 1312-1. - Un lanceur d'alerte au sens de l'article 6 de la loi n° 2016-1691 du 9 décembre 2016 relative à la transparence, à la lutte contre la corruption et à la modernisation de la vie économique bénéficie, dans les conditions prévues par ladite loi, de la protection contre les mesures de représailles mentionnées à l'article 12 de ladite loi.
" Art. L. 1312-2. - Les représentants du personnel sont informés et consultés sur les procédures de recueil des signalements établies par l'employeur."
II. - Le présent article entre en vigueur le 1er janvier 2023.
Why This Confuses the LLM
This is a meta-legislative chunk — it's an amending law that inserts new articles into the Code du travail. The structural references are nested:
| Reference | What It Is | Depth |
|---|---|---|
Article 9 |
The amending law's own article | Level 1 — the "real" article |
I. / II. |
Paragraphs of Article 9 (Roman numeral) | Level 2 |
Chapitre V |
Title being inserted into the Code | Level 3 — quoted/amended text |
Art. L. 1312-1 |
Article being created inside the Code | Level 4 — double-nested |
Art. L. 1312-2 |
Another created article | Level 4 |
article 6 de la loi n° 2016-1691 |
Reference to a third law | Cross-reference |
article 12 de ladite loi |
Reference to the same third law | Cross-reference |
The LLM must decide: what is the "document" here? Is it the amending law (Article 9), or the Code du travail articles being inserted (L. 1312-1, L. 1312-2)? The query asks about the new provisions, but the chunk's structure is the amending law's article.
Correct Extraction
| relation | reference_id | section_name |
|---|---|---|
direct |
Art. L. 1312-1 |
Protection des lanceurs d'alerte en matière sociale |
direct |
Art. L. 1312-2 |
Information des représentants du personnel |
indirect |
Article 9 |
Article d'insertion (loi modificative) |
indirect |
Chapitre V |
Dispositions relatives aux lanceurs d'alerte |
The query asks about the new provisions, so L. 1312-1 and L. 1312-2 are direct. The amending Article 9 is indirect — it's the vehicle, not the content.
Likely LLM Failure
- Level inversion:
Article 9extracted asdirect(it's the "real" article in the chunk) whileL. 1312-1is missed or classified asindirect - Quoted text ignored: The doubled quotation marks (
" Art. L. 1312-1.) signal quoted/amended text — LLMs often skip or deprioritize quoted content - Cross-reference explosion:
article 6 de la loi n° 2016-1691andarticle 12 de ladite loiextracted as citations — they're cross-references to a third law, not structural elements of either the amending law or the Code - "ladite loi" resolution failure:
ladite loi(the aforementioned law) requires resolving the anaphoric reference toloi n° 2016-1691— LLMs may fail this co-reference reference_idfor quoted text: The verbatim substring" Art. L. 1312-1.includes leading quotes and a period — the LLM may strip the quotes, producing areference_idthat doesn't match the chunk
5. Code-Switching
Scenario: A Belgian or Canadian legal text mixing French and Dutch (Belgium) or French and English (Canada), where structural markers change language mid-sentence.
Query (BE — Belgium, bilingual FR/NL)
Quelles sont les conditions pour la détention provisoire ?
Chunk (BE)
Artikel 16 § 1. De onderzoeksrechter kan, op vordering van het openbaar ministerie, de aanhouding bevelen van een verdachte wanneer er ernstige aanwijzingen van schuld bestaan en hetzij de feiten een misdaad of wanbedrijf betreffen waarvoor de wet een gevangenisstraf van meer dan één jaar vaststelt, hetzij de verdachte gevaar oplevert voor de openbare veiligheid.
§ 2. Le juge d'instruction peut, à la réquisition du ministère public, ordonner l'arrestation d'un prévenu lorsqu'il existe des indices graves de culpabilité et que soit les faits constituent un crime ou un délit puni d'une peine d'emprisonnement de plus d'un an, soit le prévenu constitue un danger pour la sécurité publique.
§ 3. Le mandat d'arrêt précise les faits qui en motivent la délivrance et la qualification légale.
Artikel 16bis. In afwachting van de beslissing van de raadkamer over de verlenging van de aanhouding, kan de onderzoeksrechter de gevangenhouding met ten hoogste vijftien dagen verlengen.
§ 2. En attendant la décision de la chambre du conseil sur la prolongation de la détention, le juge d'instruction peut prolonger la détention provisoire pour une durée maximale de quinze jours.
Why This Confuses the LLM
Belgian legislation is officially bilingual (French/Dutch), and consolidated texts often alternate languages at the article or paragraph level:
| Line | Language | Structure |
|---|---|---|
Artikel 16 § 1. |
Dutch | Article 16, § 1 |
§ 2. |
French | Same article, § 2 — language switched! |
§ 3. |
French | Same article, § 3 |
Artikel 16bis. |
Dutch | Inserted article (Latin suffix) |
§ 2. |
French | Same article's § 2 — language switched again! |
The LLM faces:
- Numbering format shift:
Artikel 16(Dutch) vs. implicitArticle 16(French) — same concept, different token - § symbol parsing:
§ 1/§ 2/§ 3are paragraph markers — but which CountryProfile level do they map to? Belgium may not have a profile yet - Insertion pattern:
Artikel 16bisuses the Latin suffix pattern (bis) — same as FrenchArticle 16 bis, but concatenated without space in Dutch convention - Duplicate content: § 1 (Dutch) and § 2 (French) are the same provision in two languages — the LLM may extract them as two separate citations
- Language-agnostic structural markers:
§ 2appears in both the Dutch and French sections — it's the same paragraph number but in different languages
Correct Extraction
If the system is configured for French (BE-FR):
| relation | reference_id | section_name |
|---|---|---|
direct |
§ 2 (first occurrence, in Article 16) |
Conditions d'arrestation |
direct |
§ 2 (second occurrence, in Article 16bis) |
Prolongation de détention |
indirect |
Artikel 16 |
Mandat d'arrêt (version néerlandaise) |
indirect |
Artikel 16bis |
Prolongation de détention (version néerlandaise) |
The Dutch Artikel 16 § 1 and French § 2 are the same provision in two languages. The system should ideally extract only the language-relevant version.
Likely LLM Failure
- Duplicate extraction:
§ 2extracted for both Dutch and French versions as if they're separate provisions — they're translations of each other reference_idcollision:§ 2appears 3 times in the chunk — the LLM can't distinguish which§ 2it's pointing to- Language detection failure: No CountryProfile exists for Belgium (
BE.yamlis absent from the repo), so the system has no guidance on bilingual handling ArtikelvsArticle: The DutchArtikelwon't match French-level few-shot patterns; the LLM may ignore it entirelybisspacing:Artikel 16bis(no space) vs.Article 16 bis(with space) —reference_idmust be verbatim, so if the LLM producesArticle 16 bisbut the chunk hasArtikel 16bis, the substring check fails
Robustness Rating
Overall: Fragile — would fail on edge cases
| Dimension | Rating | Evidence |
|---|---|---|
| Ambiguous numbering | Fragile | The system relies on the LLM to disambiguate structural numbers from back-references in prose. No post-processing validates that a reference_id like 2° isn't a back-reference to an already-extracted point. The few-shot examples show clean cases; adversarial ones with repeated numbers are untested. |
| Cross-reference confusion | Fragile | The user message template ([Question]\n{query}\n\n[Document Chunk]\n{chunk}) carries no source document metadata. The LLM cannot distinguish "this law's §34" from "that law's §34" without knowing which law the chunk belongs to. CountryProfile cross_references sections document the pattern but don't help the LLM resolve it. |
| Multi-document chunks | Brittle | No chunk boundary detection exists. The system assumes each chunk is from a single document. When chunking (e.g., PDF extraction) concatenates unrelated articles, the LLM has no signal to detect the boundary. This is a structural failure, not just an LLM reasoning failure. |
| Amendment text | Fragile | Quoted/amended text uses non-standard formatting (" Art. L. 1312-1.) that produces reference_id values with leading quotes. The verbatim substring check will fail if the LLM strips the quotes. The meta-legislative structure (amending law inserting into another code) has no representation in the CountryProfile hierarchy. |
| Code-switching | Brittle | Bilingual legal systems (Belgium, Canada, Switzerland, Luxembourg, South Africa) have no CountryProfile support for dual-language handling. The system treats each chunk as monolingual. When a chunk switches language mid-article, the LLM will either ignore the non-configured language or produce duplicate extractions. |
Summary Table
| Test Case | Failure Mode | Severity | Likelihood |
|---|---|---|---|
| Ambiguous numbering (FR) | Duplicate extraction of back-references | Medium | High |
| Cross-reference confusion (KR) | External citations extracted as direct | High | High |
| Multi-document chunk (FR) | False coherence between unrelated articles | High | Medium |
| Amendment text (FR) | Level inversion; quoted text reference_id mismatch | High | High |
| Code-switching (BE) | Duplicate extraction; reference_id collision | Critical | High |
Systemic Findings
Finding 1: No Source Document Context in User Message
The user message template is:
[Question]
{query}
[Document Chunk]
{chunk}
There is no field for the source document name, law title, or country code. The LLM must infer the document identity from the chunk content alone. This makes cross-reference disambiguation (Test Case 2) and multi-document detection (Test Case 3) structurally impossible without hallucination.
Recommendation: Add {source} or {document_title} to the user message template.
Finding 2: reference_id Verbatim Constraint vs. Quoted/Amended Text
The reference_id must be a verbatim substring of the chunk. This works for clean legislative text but breaks for amendment text where structural markers are embedded in quotation marks (" Art. L. 1312-1.). The LLM must either include the quotes (unintuitive, fragile) or strip them (fails validation).
Recommendation: Allow reference_id normalization that strips leading/trailing punctuation and quotation marks during validation.
Finding 3: No Chunk Boundary Metadata
The distillation pipeline processes chunks independently. When a chunk spans multiple documents (PDF artifact, amendment with embedded text), there's no mechanism to detect or handle the boundary. Each chunk is assumed to be a coherent unit from a single document.
Recommendation: Add optional source_document metadata per chunk, or implement a chunk-boundary heuristic in the chunker.
Finding 4: Monolingual Assumption
The entire system — CountryProfiles, few-shots, level hierarchies — assumes one language per chunk. Bilingual legal systems produce chunks where structural markers switch language mid-paragraph. The system will either:
- Extract only the configured-language markers (missing half the legal structure)
- Produce duplicate extractions for the same provision in two languages
- Fail the
reference_idsubstring check when the LLM normalizesArtikeltoArticle
Recommendation: Add a secondary_language field to CountryProfile, with language-specific label aliases per level. For bilingual chunks, either split by language before extraction or configure the prompt to handle both.
Finding 5: No Anaphoric Reference Resolution
Legal text is full of anaphoric references: ladite loi (the aforementioned law), the same section, 위 법률 (the above law). The LLM must resolve these to their antecedents, but the extraction schema has no field for resolved references. The reference_id captures the mention, not the meaning.
Recommendation: Add an optional resolved_to field in the extraction schema for anaphoric/cross-references, or exclude anaphoric mentions from extraction entirely.
End of adversarial test report.