Adversarial Tests

Adversarial Test Report: Legal Citation Extraction System

Date: 2026-06-25
Tester: Phase3-Adversarial
Methodology: 5 adversarial test cases designed to expose failure modes in the contextual-distillation citation extraction pipeline. Each case targets a specific weakness in how the LLM parses legal text, maps it to CountryProfile level hierarchies, and produces reference_id values that must be verbatim substrings of the chunk.


Table of Contents

  1. Ambiguous Numbering
  2. Cross-Reference Confusion
  3. Multi-Document Chunk
  4. Amendment Text
  5. Code-Switching
  6. Robustness Rating
  7. Systemic Findings

1. Ambiguous Numbering

Scenario: French law text where "2" could be a paragraph number () or a point number — or an article subdivision digit.

Query (FR)

Quelles sont les conditions d'ancienneté pour le préavis de licenciement ?

Chunk (FR)

Article L1234-1

Lorsque le licenciement n'est pas motivé par une faute grave, le salarié a droit :

1° S'il justifie chez le même employeur d'une ancienneté de services continus inférieure à six mois, à un préavis dont la durée est déterminée par la loi, la convention ou l'accord collectif de travail ou, à défaut, par les usages pratiqués dans la localité et la profession ;

2° S'il justifie chez le même employeur d'une ancienneté de services continus comprise entre six mois et moins de deux ans, à un préavis d'un mois ;

3° S'il justifie chez le même employeur d'une ancienneté de services continus d'au moins deux ans, à un préavis de deux mois.

Toutefois, les dispositions des 2° et 3° ne sont applicables que si la loi, la convention ou l'accord collectif de travail, le contrat de travail ou les usages ne prévoient pas un préavis ou une condition d'ancienneté de services plus favorable pour le salarié.

Why This Confuses the LLM

The number "2" appears in four structurally different roles:

Occurrence Actual Role Risk
(line 2 of enumeration) Point (level: point) Correct
in "des 2° et 3°" (last paragraph) Back-reference to point, not a new point LLM may extract it as a new citation
deux ans Prose number — not a structural reference at all LLM may hallucinate reference_id: "deux ans"
L1234-1 Article number containing "1" at multiple depths LLM may misparse the hyphenated structure

The last paragraph's 2° et 3° is the real trap. It's a forward/backward reference embedded in running text, not a structural subdivision. The LLM must recognize that here is a citation of the points above, not a new structural unit. The few-shot schema asks for reference_id as a verbatim substring — does appear verbatim, so the LLM will be tempted to extract it again.

Correct Extraction

relation reference_id section_name
direct Article L1234-1 Préavis en cas de licenciement
direct 3° S'il justifie chez le même employeur d'une ancienneté de services continus d'au moins deux ans Préavis de deux mois (ancienneté ≥ 2 ans)
indirect 2° S'il justifie chez le même employeur d'une ancienneté de services continus comprise entre six mois et moins de deux ans Préavis d'un mois (ancienneté 6 mois–2 ans)

The back-reference des 2° et 3° in the last paragraph should not produce a separate extraction — it's a pointer to the already-extracted points, not a structural unit itself.

Likely LLM Failure

  • Duplicate extraction: extracted twice (once from the enumeration, once from the back-reference)
  • Relation confusion: The back-reference classified as direct when it's really just prose
  • reference_id truncation: 3° S'il justifie chez le même employeur d'une ancienneté de services continus d'au moins deux ans is very long — LLMs tend to truncate to or 3° S'il justifie…

2. Cross-Reference Confusion

Scenario: Korean law text referencing a section in a different law, where the cited law's section number happens to collide with the current law's structure.

Query (KR)

개인정보 유출 시 통지 의무와 처벌 규정은 어떻게 되나요?

Chunk (KR)

제34조의2(유출 등의 통지) ① 개인정보처리자는 개인정보가 분실·도난·유출된 사실을 알게 된 때에는 지체 없이 해당 정보주체에게 다음 각 호의 사항을 알려야 한다.
1. 유출된 개인정보의 항목
2. 유출된 시점
3. 대응조치 및 피해구제 절차

② 개인정보처리자는 제1항에 따른 통지를 받은 날부터 30일 이내에 「정보통신망 이용촉진 및 정보보호 등에 관한 법률」 제48조의2에 따른 개인정보 침해 사실 통지와 「신용정보의 이용 및 보호에 관한 법률」 제34조에 따른 유출 통지를 하여야 한다.

Why This Confuses the LLM

This chunk contains three different laws' section numbers in close proximity:

Reference Law Section Current Law?
제34조의2 개인정보 보호법 ✅ Yes (this chunk's article)
제1항 Same law, self-reference ✅ Yes
제48조의2 정보통신망법 Art. 48-2 ❌ No — different law
제34조 신용정보법 Art. 34 ❌ No — different law

The critical collision: 제34조의2 (current law, inserted article 34-2) and 제34조 (completely different law, article 34). They share the same structural pattern (제N조) and nearly the same number. The LLM must:

  1. Recognize that 제48조의2 belongs to 정보통신망법, not the current law
  2. Recognize that 제34조 belongs to 신용정보법, not the current law
  3. NOT confuse 제34조의2 (current, inserted) with 제34조 (other law)
  4. Handle the self-reference 제1항 pointing to paragraph 1 of the current article

Correct Extraction

relation reference_id section_name
direct 제34조의2 유출 등의 통지
direct 통지 의무
direct 1. 유출된 개인정보의 항목
indirect 통지 방법 (타 법률 준용)

제48조의2 and 제34조 from other laws should not be extracted as structural citations of this document — they're cross-references to external statutes. However, the system has no explicit mechanism to distinguish "this law's §34" from "that law's §34" when both appear in the same chunk.

Likely LLM Failure

  • Cross-law extraction: 제48조의2 and 제34조 extracted as direct references of the current document
  • Number collision: 제34조 (신용정보법) confused with 제34조의2 (current law)
  • Relation inflation: External cross-references classified as direct instead of indirect or excluded
  • Missing context: The LLM has no way to know which law the chunk belongs to — the user message template ([Question]\n{query}\n\n[Document Chunk]\n{chunk}) doesn't include the source document name

3. Multi-Document Chunk

Scenario: A chunk that contains text from two different laws concatenated together (e.g., a PDF extraction artifact, or a legislative amendment that embeds another law's text).

Query (FR)

Quelles sont les obligations de l'employeur en matière de sécurité et les droits du salarié en cas de harcèlement ?

Chunk (FR)

Article L4121-1

L'employeur prend les mesures nécessaires pour assurer la sécurité et protéger la santé physique et mentale des travailleurs.

Ces mesures comprennent :

1° Des actions de prévention des risques professionnels, y compris ceux mentionnés à l'article L. 4161-1 ;

2° Des actions d'information et de formation ;

3° La mise en place d'une organisation et de moyens adaptés.

Article L1152-2

Aucun salarié ne doit subir les agissements répétés de harcèlement moral qui ont pour objet ou pour effet une dégradation de ses conditions de travail susceptible de porter atteinte à ses droits et à sa dignité, d'altérer sa santé physique ou mentale ou de compromettre son avenir professionnel.

Why This Confuses the LLM

Two articles from the same code (Code du travail) but addressing completely different topics (workplace safety vs. moral harassment) are concatenated into one chunk. The LLM faces:

  1. Query relevance mismatch: The query asks about both topics, so both articles seem relevant — but they're from different parts of the code with no logical connection
  2. No document boundary marker: There's no separator, header, or metadata indicating the text switched laws
  3. Level confusion: Article L4121-1 uses the L. prefix (legislative), while Article L1152-2 also uses L. — same level, same prefix scheme, unrelated content
  4. Cross-reference within chunk: L. 4161-1 appears inside the first article's text — is it a new extraction or a cross-reference?

Correct Extraction

relation reference_id section_name
direct Article L4121-1 Obligation générale de sécurité
direct Article L1152-2 Interdiction du harcèlement moral
indirect 1° Des actions de prévention des risques professionnels Actions de prévention

L. 4161-1 in the first article's text should be classified as indirect (cross-reference) since it's not the article being discussed but rather mentioned within another article.

Likely LLM Failure

  • False coherence: LLM treats the two unrelated articles as if they're part of the same logical unit, producing a narrative connecting safety obligations to harassment prevention
  • reference_id collision: Both Article L4121-1 and Article L1152-2 are correct, but the LLM may invent section_name values that falsely link them
  • Cross-reference extraction: L. 4161-1 extracted as direct when it's just a mention inside another article's text
  • Missing the second article entirely: If the query only asked about safety, the LLM might stop reading after Article L4121-1 and miss L1152-2 completely

4. Amendment Text

Scenario: A chunk that is an amending law inserting a new section into an existing code, where the amendment text describes the new section in meta-language rather than presenting it as operative text.

Query (FR)

Quelle est la nouvelle disposition relative aux lanceurs d'alerte dans le Code du travail ?

Chunk (FR)

Article 9

I. - Le titre IV du livre Ier de la première partie du code du travail est complété par un chapitre V ainsi rédigé :

" Chapitre V

" Dispositions relatives aux lanceurs d'alerte en matière sociale

" Art. L. 1312-1. - Un lanceur d'alerte au sens de l'article 6 de la loi n° 2016-1691 du 9 décembre 2016 relative à la transparence, à la lutte contre la corruption et à la modernisation de la vie économique bénéficie, dans les conditions prévues par ladite loi, de la protection contre les mesures de représailles mentionnées à l'article 12 de ladite loi.

" Art. L. 1312-2. - Les représentants du personnel sont informés et consultés sur les procédures de recueil des signalements établies par l'employeur."

II. - Le présent article entre en vigueur le 1er janvier 2023.

Why This Confuses the LLM

This is a meta-legislative chunk — it's an amending law that inserts new articles into the Code du travail. The structural references are nested:

Reference What It Is Depth
Article 9 The amending law's own article Level 1 — the "real" article
I. / II. Paragraphs of Article 9 (Roman numeral) Level 2
Chapitre V Title being inserted into the Code Level 3 — quoted/amended text
Art. L. 1312-1 Article being created inside the Code Level 4 — double-nested
Art. L. 1312-2 Another created article Level 4
article 6 de la loi n° 2016-1691 Reference to a third law Cross-reference
article 12 de ladite loi Reference to the same third law Cross-reference

The LLM must decide: what is the "document" here? Is it the amending law (Article 9), or the Code du travail articles being inserted (L. 1312-1, L. 1312-2)? The query asks about the new provisions, but the chunk's structure is the amending law's article.

Correct Extraction

relation reference_id section_name
direct Art. L. 1312-1 Protection des lanceurs d'alerte en matière sociale
direct Art. L. 1312-2 Information des représentants du personnel
indirect Article 9 Article d'insertion (loi modificative)
indirect Chapitre V Dispositions relatives aux lanceurs d'alerte

The query asks about the new provisions, so L. 1312-1 and L. 1312-2 are direct. The amending Article 9 is indirect — it's the vehicle, not the content.

Likely LLM Failure

  • Level inversion: Article 9 extracted as direct (it's the "real" article in the chunk) while L. 1312-1 is missed or classified as indirect
  • Quoted text ignored: The doubled quotation marks (" Art. L. 1312-1.) signal quoted/amended text — LLMs often skip or deprioritize quoted content
  • Cross-reference explosion: article 6 de la loi n° 2016-1691 and article 12 de ladite loi extracted as citations — they're cross-references to a third law, not structural elements of either the amending law or the Code
  • "ladite loi" resolution failure: ladite loi (the aforementioned law) requires resolving the anaphoric reference to loi n° 2016-1691 — LLMs may fail this co-reference
  • reference_id for quoted text: The verbatim substring " Art. L. 1312-1. includes leading quotes and a period — the LLM may strip the quotes, producing a reference_id that doesn't match the chunk

5. Code-Switching

Scenario: A Belgian or Canadian legal text mixing French and Dutch (Belgium) or French and English (Canada), where structural markers change language mid-sentence.

Query (BE — Belgium, bilingual FR/NL)

Quelles sont les conditions pour la détention provisoire ?

Chunk (BE)

Artikel 16 § 1. De onderzoeksrechter kan, op vordering van het openbaar ministerie, de aanhouding bevelen van een verdachte wanneer er ernstige aanwijzingen van schuld bestaan en hetzij de feiten een misdaad of wanbedrijf betreffen waarvoor de wet een gevangenisstraf van meer dan één jaar vaststelt, hetzij de verdachte gevaar oplevert voor de openbare veiligheid.

§ 2. Le juge d'instruction peut, à la réquisition du ministère public, ordonner l'arrestation d'un prévenu lorsqu'il existe des indices graves de culpabilité et que soit les faits constituent un crime ou un délit puni d'une peine d'emprisonnement de plus d'un an, soit le prévenu constitue un danger pour la sécurité publique.

§ 3. Le mandat d'arrêt précise les faits qui en motivent la délivrance et la qualification légale.

Artikel 16bis. In afwachting van de beslissing van de raadkamer over de verlenging van de aanhouding, kan de onderzoeksrechter de gevangenhouding met ten hoogste vijftien dagen verlengen.

§ 2. En attendant la décision de la chambre du conseil sur la prolongation de la détention, le juge d'instruction peut prolonger la détention provisoire pour une durée maximale de quinze jours.

Why This Confuses the LLM

Belgian legislation is officially bilingual (French/Dutch), and consolidated texts often alternate languages at the article or paragraph level:

Line Language Structure
Artikel 16 § 1. Dutch Article 16, § 1
§ 2. French Same article, § 2 — language switched!
§ 3. French Same article, § 3
Artikel 16bis. Dutch Inserted article (Latin suffix)
§ 2. French Same article's § 2 — language switched again!

The LLM faces:

  1. Numbering format shift: Artikel 16 (Dutch) vs. implicit Article 16 (French) — same concept, different token
  2. § symbol parsing: § 1 / § 2 / § 3 are paragraph markers — but which CountryProfile level do they map to? Belgium may not have a profile yet
  3. Insertion pattern: Artikel 16bis uses the Latin suffix pattern (bis) — same as French Article 16 bis, but concatenated without space in Dutch convention
  4. Duplicate content: § 1 (Dutch) and § 2 (French) are the same provision in two languages — the LLM may extract them as two separate citations
  5. Language-agnostic structural markers: § 2 appears in both the Dutch and French sections — it's the same paragraph number but in different languages

Correct Extraction

If the system is configured for French (BE-FR):

relation reference_id section_name
direct § 2 (first occurrence, in Article 16) Conditions d'arrestation
direct § 2 (second occurrence, in Article 16bis) Prolongation de détention
indirect Artikel 16 Mandat d'arrêt (version néerlandaise)
indirect Artikel 16bis Prolongation de détention (version néerlandaise)

The Dutch Artikel 16 § 1 and French § 2 are the same provision in two languages. The system should ideally extract only the language-relevant version.

Likely LLM Failure

  • Duplicate extraction: § 2 extracted for both Dutch and French versions as if they're separate provisions — they're translations of each other
  • reference_id collision: § 2 appears 3 times in the chunk — the LLM can't distinguish which § 2 it's pointing to
  • Language detection failure: No CountryProfile exists for Belgium (BE.yaml is absent from the repo), so the system has no guidance on bilingual handling
  • Artikel vs Article: The Dutch Artikel won't match French-level few-shot patterns; the LLM may ignore it entirely
  • bis spacing: Artikel 16bis (no space) vs. Article 16 bis (with space) — reference_id must be verbatim, so if the LLM produces Article 16 bis but the chunk has Artikel 16bis, the substring check fails

Robustness Rating

Overall: Fragile — would fail on edge cases

Dimension Rating Evidence
Ambiguous numbering Fragile The system relies on the LLM to disambiguate structural numbers from back-references in prose. No post-processing validates that a reference_id like isn't a back-reference to an already-extracted point. The few-shot examples show clean cases; adversarial ones with repeated numbers are untested.
Cross-reference confusion Fragile The user message template ([Question]\n{query}\n\n[Document Chunk]\n{chunk}) carries no source document metadata. The LLM cannot distinguish "this law's §34" from "that law's §34" without knowing which law the chunk belongs to. CountryProfile cross_references sections document the pattern but don't help the LLM resolve it.
Multi-document chunks Brittle No chunk boundary detection exists. The system assumes each chunk is from a single document. When chunking (e.g., PDF extraction) concatenates unrelated articles, the LLM has no signal to detect the boundary. This is a structural failure, not just an LLM reasoning failure.
Amendment text Fragile Quoted/amended text uses non-standard formatting (" Art. L. 1312-1.) that produces reference_id values with leading quotes. The verbatim substring check will fail if the LLM strips the quotes. The meta-legislative structure (amending law inserting into another code) has no representation in the CountryProfile hierarchy.
Code-switching Brittle Bilingual legal systems (Belgium, Canada, Switzerland, Luxembourg, South Africa) have no CountryProfile support for dual-language handling. The system treats each chunk as monolingual. When a chunk switches language mid-article, the LLM will either ignore the non-configured language or produce duplicate extractions.

Summary Table

Test Case Failure Mode Severity Likelihood
Ambiguous numbering (FR) Duplicate extraction of back-references Medium High
Cross-reference confusion (KR) External citations extracted as direct High High
Multi-document chunk (FR) False coherence between unrelated articles High Medium
Amendment text (FR) Level inversion; quoted text reference_id mismatch High High
Code-switching (BE) Duplicate extraction; reference_id collision Critical High

Systemic Findings

Finding 1: No Source Document Context in User Message

The user message template is:

[Question]
{query}

[Document Chunk]
{chunk}

There is no field for the source document name, law title, or country code. The LLM must infer the document identity from the chunk content alone. This makes cross-reference disambiguation (Test Case 2) and multi-document detection (Test Case 3) structurally impossible without hallucination.

Recommendation: Add {source} or {document_title} to the user message template.

Finding 2: reference_id Verbatim Constraint vs. Quoted/Amended Text

The reference_id must be a verbatim substring of the chunk. This works for clean legislative text but breaks for amendment text where structural markers are embedded in quotation marks (" Art. L. 1312-1.). The LLM must either include the quotes (unintuitive, fragile) or strip them (fails validation).

Recommendation: Allow reference_id normalization that strips leading/trailing punctuation and quotation marks during validation.

Finding 3: No Chunk Boundary Metadata

The distillation pipeline processes chunks independently. When a chunk spans multiple documents (PDF artifact, amendment with embedded text), there's no mechanism to detect or handle the boundary. Each chunk is assumed to be a coherent unit from a single document.

Recommendation: Add optional source_document metadata per chunk, or implement a chunk-boundary heuristic in the chunker.

Finding 4: Monolingual Assumption

The entire system — CountryProfiles, few-shots, level hierarchies — assumes one language per chunk. Bilingual legal systems produce chunks where structural markers switch language mid-paragraph. The system will either:

  • Extract only the configured-language markers (missing half the legal structure)
  • Produce duplicate extractions for the same provision in two languages
  • Fail the reference_id substring check when the LLM normalizes Artikel to Article

Recommendation: Add a secondary_language field to CountryProfile, with language-specific label aliases per level. For bilingual chunks, either split by language before extraction or configure the prompt to handle both.

Finding 5: No Anaphoric Reference Resolution

Legal text is full of anaphoric references: ladite loi (the aforementioned law), the same section, 위 법률 (the above law). The LLM must resolve these to their antecedents, but the extraction schema has no field for resolved references. The reference_id captures the mention, not the meaning.

Recommendation: Add an optional resolved_to field in the extraction schema for anaphoric/cross-references, or exclude anaphoric mentions from extraction entirely.


End of adversarial test report.