Viterbi extractor#1541
Draft
ampli wants to merge 105 commits into
Draft
Conversation
Add dictionary and tokenizer support for replacing the Wj CONTAINS_ONE postprocessing rule with ordinary grammar. The English dictionary now marks prep+wh paths that need a hidden helper token. The tokenizer creates an optional wjqprep helper alternative for those marked paths, and linkage presentation suppresses the helper word and helper links after postprocessing. This lets the dictionary require the Jw/JQ witness for rule 12 without exposing the helper in displayed linkages. Add corpus-knowledge.batch as a focused corpus for 4.0.knowledge rule replacement tests, starting with rule 12. Verified with: link-parser < ./data/en/corpus-knowledge.batch; link-parser < ./data/en/corpus-basic.batch; link-parser < ./data/en/corpus-fixes.batch; link-parser < ./data/en/corpus-fix-long.batch; link-parser < ./data/en/corpus-failures.batch. Co-authored-by: OpenAI Codex <codex@openai.com>
Replace the rule-specific WJ shadow tokenizer marker with a generic dictionary convention: INSERTL<token>+ on the previous word and INSERTR<token>+ on the following word request an optional helper token named <token>. Both markers point right, so they carry tokenizer data without forming a possible grammar link. Hide dictionary helper words and their links through a generic helper-label prefix instead of checking for a WJ-specific label. The English wjqprep helper now uses this convention. Verified with make, link-parser corpus-knowledge.batch (0 errors), corpus-basic.batch (88 errors), corpus-fixes.batch (367 errors), and git diff --check. Co-authored-by: OpenAI Codex <codex@openai.com>
Add JQ-specific preposition continuations for PP rule 13 so JQ-bearing preposition branches keep only the allowed companions: MVp, Mj, MX#j, or Wj. Include both the direct Wj/Qp question continuation and the helper-token WJI path. Remove the corresponding CONTAINS_ONE postprocessing rule from 4.0.knowledge and add focused corpus-knowledge coverage for JQ preposition questions and relatives. Verified with link-parser corpus-knowledge.batch (0 errors), corpus-basic.batch (88 errors), corpus-fixes.batch (367 errors), corpus-fix-long.batch (9 errors), and git diff --check. Co-authored-by: OpenAI Codex <codex@openai.com>
Split the generic preposition continuation into ordinary and relative forms so Mj and MX#j relative-preposition continuations are only licensed when the same preposition also has a Jw wh-object link. Keep the JQ question paths explicit through the existing companion continuation. Remove 4.0.knowledge CONTAINS_ONE rules 10 and 11 and add focused corpus-knowledge coverage for direct, comma-delimited, conjoined, and of-relative preposition examples. Validated with link-parser corpus-knowledge at 0 errors, corpus-basic at 88 errors, corpus-fixes at 367 errors, and corpus-fix-long at 9 errors. Co-authored-by: OpenAI Codex <codex@openai.com>
Add data/en/GRAMMAR-FIXES.md as maintainer-facing documentation for the English PP migration work currently present in this branch. Document the library-assisted helper-token mechanism and the status of rules 12, 13, and 10/11. Rule 12 has dictionary support but remains in 4.0.knowledge; rules 13 and 10/11 have been moved into dictionary grammar and removed from 4.0.knowledge. Co-authored-by: OpenAI Codex <codex@openai.com>
The Wj contains-one condition is now handled by the library-assisted dictionary helper-token path documented in GRAMMAR-FIXES.md, so remove the old postprocessing backstop from 4.0.knowledge. Before removal, link-parser -test=noPP:12,no-metric-extraction matched the accepted corpus expectations for corpus-knowledge, corpus-basic, corpus-fixes, and corpus-fix-long. After removal, corpus-knowledge also passes with ordinary parsing. Co-authored-by: OpenAI Codex <codex@openai.com>
Remove the naked to.r I*a+ fallback and the matching contains-one rule for incorrect use of 'to'6'. This ports the previously validated dictionary-side migration into the en-test grammar state. Validated with link-parser -test=noPP:6,no-metric-extraction before removal, then with link-parser -test=no-metric-extraction after removal on corpus-knowledge, corpus-basic, corpus-fixes, and corpus-fix-long. corpus-knowledge also passes with ordinary parsing. Co-authored-by: OpenAI Codex <codex@openai.com>
Require relative of-whom Jr paths to pass through OFJ, and expose postnominal B#j only with the same OFJ certificate. This removes the Incorrect relative15 and Incorrect relative16 contains-one PP rules. Validated with link-parser -test=noPP:15,noPP:16,no-metric-extraction before removal, then with link-parser -test=no-metric-extraction after removal on corpus-knowledge, corpus-basic, corpus-fixes, and corpus-fix-long. corpus-knowledge also passes with ordinary parsing. Co-authored-by: OpenAI Codex <codex@openai.com>
Remove the no-metric-extraction test flag from English grammar-fix verification commands. The en-test branch does not carry metric-extraction code, so PP-rule migration checks should use ordinary link-parser runs and noPP suppression only. Co-authored-by: OpenAI Codex <codex@openai.com>
Remove the BIh contains-one postprocessing rule after validating that ordinary en-test corpus runs remain unchanged. Keep the neighboring THb and BIq predicate rules in place for separate focused work. Also correct grammar-fix verification notes to use ordinary link-parser runs in this branch, which does not carry the PP suppression or metric-extraction test controls from viterbi-extraction. Co-authored-by: OpenAI Codex <codex@openai.com>
Drop the MX#a contains-one rule because it rejects documented good parenthetical adjective examples. The fixes corpus already marks these sentences as victims of the old adjective65 postprocessing rule. Validated with ordinary en-test link-parser runs: corpus-knowledge remains 0 errors, corpus-basic remains 88, corpus-fix-long remains 9, and corpus-fixes improves from 367 to 365 errors. Co-authored-by: OpenAI Codex <codex@openai.com>
Remove the CONTAINS_ONE pronoun66 PP rule by narrowing second-object dictionary paths. Standalone O*n+ second-object uses now go through <obj2-non-pronoun>, preserving noun-like second objects while excluding Ox pronoun objects; the unrelated dCO*n+ path is unchanged. Add focused corpus-knowledge coverage and document the migration in data/en/GRAMMAR-FIXES.md. Ordinary link-parser corpus checks stayed at 0 corpus-knowledge errors, 88 corpus-basic errors, 365 corpus-fixes errors, and 9 corpus-fix-long errors. Co-authored-by: OpenAI Codex <codex@openai.com>
Require postposed Ma adjective paths to carry a local complement license in the dictionary. POST_ADJ_LIC preserves predicative and conjoined adjective uses while limiting Ma and MJX postposed variants to adjective classes with an explicit license. Remove the adjective63 rule from 4.0.knowledge, keep the neighboring Mam adjective64 rule, and add focused corpus-knowledge coverage. Ordinary link-parser checks stayed at 0 corpus-knowledge errors, 88 corpus-basic errors, 365 corpus-fixes errors, and 9 corpus-fix-long errors. Co-authored-by: OpenAI Codex <codex@openai.com>
Remove seven CONTAINS_NONE checks whose absence leaves ordinary en-test corpus behavior unchanged: rules 69, 70, 74, 75, 76, 77, and 79. Document the removed rules in data/en/GRAMMAR-FIXES.md. Validated with link-parser on corpus-knowledge.batch (0 errors), corpus-basic.batch (88 errors), corpus-fixes.batch (365 errors), and corpus-fix-long.batch (9 errors). Co-authored-by: OpenAI Codex <codex@openai.com>
Rules 49 and 50 rejected valid more/less-than predicative adjective constructions, including corpus examples such as 'He is nothing less than inspired!' and 'He is more than capable!'. Remove both checks from 4.0.knowledge and document the cleanup in GRAMMAR-FIXES.md. Ordinary link-parser corpus checks now expect corpus-fixes.batch to report 363 errors; corpus-knowledge, corpus-basic, and corpus-fix-long stay at 0, 88, and 9 errors respectively. Co-authored-by: OpenAI Codex <codex@openai.com>
Rules 51 through 54 rejected valid reduced comparative adjuncts, including the corpus example 'they report less robust earnings than previously'. Remove the four checks from 4.0.knowledge and document the cleanup in GRAMMAR-FIXES.md. Ordinary link-parser corpus checks now expect corpus-fixes.batch to report 362 errors; corpus-knowledge, corpus-basic, and corpus-fix-long stay at 0, 88, and 9 errors respectively. Co-authored-by: OpenAI Codex <codex@openai.com>
Rules 60 and 61 no longer change the agreed English corpus behavior or focused comparative examples. Remove the broad THc/TOc domain checks from 4.0.knowledge and document the redundancy cleanup. Ordinary link-parser corpus checks remain at 0 errors for corpus-knowledge.batch, 88 for corpus-basic.batch, 362 for corpus-fixes.batch, and 9 for corpus-fix-long.batch. Co-authored-by: OpenAI Codex <codex@openai.com>
Rules 45 and 46 do not change the agreed English corpus behavior or focused comparative examples. Remove the broad MV#a/MV#i domain checks from 4.0.knowledge and document the redundancy cleanup. Ordinary link-parser corpus checks remain at 0 errors for corpus-knowledge.batch, 88 for corpus-basic.batch, 362 for corpus-fixes.batch, and 9 for corpus-fix-long.batch. Co-authored-by: OpenAI Codex <codex@openai.com>
Rules 55 and 57 do not change the agreed English corpus behavior or focused comparative examples, while neighboring rules 56, 58, and 59 still protect corpus-basic cases and remain active. Remove the redundant U#t and Sp#c domain checks from 4.0.knowledge and document the cleanup in GRAMMAR-FIXES.md. Ordinary link-parser corpus checks remain at 0, 88, 362, and 9 errors for the knowledge, basic, fixes, and fix-long corpora respectively. Co-authored-by: OpenAI Codex <codex@openai.com>
Rule 62 no longer changes the agreed English corpus behavior or focused infinitival-comparative examples. Remove the broad TOtc/TOt domain check from 4.0.knowledge and document the redundancy cleanup. Ordinary link-parser corpus checks remain at 0 errors for corpus-knowledge.batch, 88 for corpus-basic.batch, 362 for corpus-fixes.batch, and 9 for corpus-fix-long.batch. Co-authored-by: OpenAI Codex <codex@openai.com>
Add a concise deferred-candidates table to GRAMMAR-FIXES.md for PP rules that were tested for simple removal or identified as likely future dictionary migration work. The section records why rule 14, expletive-it rules, existential-there rules, BIq, and remaining comparative checks are still active instead of leaving those findings only in local notes. Co-authored-by: OpenAI Codex <codex@openai.com>
Record the concrete corpus effects and diagnostic conclusions for the remaining preposition, expletive-it, existential-there, predicate/question, and comparative PP migration candidates. These notes document why the tested families are not safe deletion candidates and what kind of dictionary split is still needed. Co-authored-by: OpenAI Codex <codex@openai.com>
Replace the subscripted wh-preposition object connector with a distinct JW connector family so ordinary J+ object paths cannot match wh-object continuations. Add explicit JW continuations for relative and wh-question preposition paths, preserving the wjqprep helper path for Wj licensing. Remove the rule 14 check from 4.0.knowledge, add focused corpus-knowledge coverage, and document the migration in GRAMMAR-FIXES.md. Verified with link-parser on corpus-knowledge.batch, corpus-basic.batch, corpus-fixes.batch, and corpus-fix-long.batch. Co-authored-by: OpenAI Codex <codex@openai.com>
Require postnominal comparative Mam paths to carry their local license in dictionary grammar. Reuse the MJX licensed conjunction path for conjoined comparatives where the license is supplied by one adjective rather than the conjunction. Remove the corresponding postprocessing rule from 4.0.knowledge, add corpus-knowledge coverage, regenerate 4.0.dict, and document the rule migration plus the affected English link types. Verified with top-five linkage comparison against master for representative Mam examples. Also verified link-parser corpus-knowledge.batch (0 errors), corpus-basic.batch (88 errors), corpus-fixes.batch (362 errors), corpus-fix-long.batch (8 errors), regenerated 4.0.dict consistency, and git diff --check. Co-authored-by: OpenAI Codex <codex@openai.com>
Document that the removed rule 6 postprocessing check covered only unlicensed I#a paths. Other bad to.r or going-to parses through different I subscripts remain separate grammar issues, so they should not be treated as evidence that rule 6 migration is incomplete. Co-authored-by: OpenAI Codex <codex@openai.com>
Give temporal as.#while a dedicated verb-side MVSWH connector instead of ordinary MVs, so comparative EAy adjective paths can no longer attach to subordinate temporal as through broad @mv+ modifier slots. Remove the corresponding Bad comparative78 CONTAINS_NONE postprocessing rule and add corpus-knowledge coverage for valid comparative-as and temporal-as examples plus the rejected EAy/MVs bad path. Verified with link-parser corpus-knowledge.batch (0 errors), corpus-basic.batch (88 errors), corpus-fixes.batch (361 errors), corpus-fix-long.batch (8 errors), focused top-linkage comparison against master, and git diff --check. Co-authored-by: OpenAI Codex <codex@openai.com>
Split comparative-clause subject continuations on than.e and as.e-c so singular and plural S*c paths require explicit dictionary-side antecedent certificates. Add CMPS, CMPP, and CMPX certificate connectors for singular/mass, plural, and agreement-neutral comparative antecedents, and remove the corresponding Bad comparative58 postprocessing rule. Add focused corpus-knowledge examples and document the new connector families and rule migration in GRAMMAR-FIXES.md. Verified with link-parser corpus-knowledge.batch (0 errors), corpus-basic.batch (88 errors), corpus-fixes.batch (361 errors), corpus-fix-long.batch (8 errors), and git diff --check. Co-authored-by: OpenAI Codex <codex@openai.com>
Require comparative antecedent certificates on the than.e Cc/CV comparative-clause path, matching the existing rule 58 CMPS/CMPP/CMPX split. This prevents raw MVt + Cc/CV paths such as comparative adjective clauses from relying on postprocessing rule 59. Remove the corresponding CONTAINS_ONE rule from 4.0.knowledge and document rules 58 and 59 as one certificate-based comparative-clause migration. Validated with focused link-parser probes plus corpus-knowledge.batch, corpus-basic.batch, corpus-fixes.batch, corpus-fix-long.batch, m4 dictionary comparison, and git diff --check. Co-authored-by: OpenAI Codex <codex@openai.com>
Add the CMPC internal certificate for clausal comparative continuations. Comparative modifier paths corresponding to the old rule 56 witnesses now supply CMPC, and as.e-c / than.e Cc-CV branches require it before they can form the comparative clause. Remove the corresponding CONTAINS_ONE rule from 4.0.knowledge and document the migration in GRAMMAR-FIXES.md with focused corpus coverage. Validated with focused link-parser probes plus corpus-knowledge.batch, corpus-basic.batch, corpus-fixes.batch, corpus-fix-long.batch, m4 dictionary comparison, and git diff --check. Co-authored-by: OpenAI Codex <codex@openai.com>
Remove the CONTAINS_ONE postprocessing rule that required Qe domains to contain EEh or EAh. Tighten the ordinary-adverb dictionary expression instead: keep the direct EEh- and Qe+ how-question path, but remove the loose Qe+ alternative from the generic EE/EF adverb branch. This prevents lower-ranked parses such as how--AF--is with tall.e--Qe--is from becoming accepted after the PP rule is removed, while preserving accepted how-adverb and how-adjective question parses. Verified with link-parser corpus-knowledge.batch (0 errors), corpus-basic.batch (88 errors), corpus-fixes.batch (361 errors), corpus-fix-long.batch (8 errors), m4 regeneration consistency, focused Qe witness checks, and git diff --check. Co-authored-by: OpenAI Codex <codex@openai.com>
Keep the DWHs replacement branch in <costly-common-noun> under the same cost wrapper as the old costly determiner path. This prevents the rule 68 certificate migration from resetting the ranking of costly common nouns such as type.n. Document the cost-preservation requirement separately from accepted/rejected parity, including why the degree-of-trust preferred-analysis change is not the cost-preservation reference. Verified with focused link-parser disjunct inspection for "He won't divulge what type it is." and with corpus-knowledge.batch, corpus-basic.batch, corpus-fixes.batch, and corpus-fix-long.batch.\n\nCo-authored-by: OpenAI Codex <codex@openai.com> Co-authored-by: OpenAI Codex <codex@openai.com>
Preserve inherited ranking costs on migrated THi certificate paths. The direct copula, inverted seem/appear, infinitival be, perfect auxiliary, and modal be paths now keep the old branch costs while using the THi certificate links for rule-20 licensing. Document the focused Rule 20 cost-preservation checks in GRAMMAR-FIXES.md. Verified with focused link-parser positives and negatives for Rule 20 and with corpus-knowledge.batch, corpus-basic.batch, corpus-fixes.batch, corpus-fix-long.batch, and corpus-failures.batch. Co-authored-by: OpenAI Codex <codex@openai.com>
Split the object-raising cleft-object carrier so TOCL paths use IOCT from infinitival to into the lower predicate, while auxiliary and perfect paths continue to use IOCL and PPOCL. This preserves the old displayed cost ladder for modal, object-raising, and perfect cleft-object examples without letting auxiliary IOCL use the object-raising no-wall branch. Document the focused Rule 31 cost-preservation checks and record the ROCL shame-of-it path as an intentional preferred-analysis change rather than an exact preferred-linkage preservation target. Verified with focused link-parser positives and negatives for Rule 31 and with corpus-knowledge.batch, corpus-basic.batch, corpus-fixes.batch, corpus-fix-long.batch, and corpus-failures.batch. Co-authored-by: OpenAI Codex <codex@openai.com>
Restore the Qe/SFIs/THb inverted filler-it path for matrix how-questions while keeping the stricter root-wall requirement that blocks embedded inversions. Preserve the old Qw/Qe root-wall preference cost so the first displayed linkage for focused accepted examples keeps its reference ranking. Document the Rules 37/38/39 cost-preservation check in GRAMMAR-FIXES.md. Verified with focused link-parser positives and negatives for Rules 37/38/39 and with corpus-knowledge.batch, corpus-basic.batch, corpus-fixes.batch, corpus-fix-long.batch, and corpus-failures.batch. Co-authored-by: OpenAI Codex <codex@openai.com>
Add a dedicated ITAF certificate for accepted local filler-it comparative than paths that formerly used AFdi. The certificate preserves the lower copula disjunct cost without exposing an AF-prefixed connector that ordinary lower-subject clauses can satisfy. Document that the remaining infinitival preferred-cost delta belongs to the separate rule-23 TOi/TOIC audit. Verified focused Rule 30 accepted examples and the ordinary-subject negative, plus link-parser corpus-knowledge, corpus-basic, corpus-fixes, corpus-fix-long, and corpus-failures count checks. Co-authored-by: OpenAI Codex <codex@openai.com>
Add TOi carrier paths for filler-it predicates so the rule 23 dictionary replacement preserves the old predicative and auxiliary costs without a crossing direct certificate link. Direct object-complement cases still use TOIC, while direct copula, inverted, modal, perfect, object-raising, and comparative examples use TTOI, ITOI, PPTOI, and PTOI carriers. Add focused corpus-knowledge examples and document the carrier link families and cost-preservation checks in GRAMMAR-FIXES.md. Verified focused first-linkage costs for direct, inverted, modal, perfect, object-raising, object-complement, and comparative filler-it TOi examples. Verified corpus-knowledge.batch remains at 0 errors and standard English corpus error counts remain unchanged for corpus-basic.batch, corpus-fixes.batch, corpus-fix-long.batch, and corpus-failures.batch. Co-authored-by: OpenAI Codex <codex@openai.com>
Add TSi carrier paths for filler-it predicates so the rule 21 dictionary replacement preserves the old predicative and auxiliary costs without relying on a crossing direct TSIC certificate. Direct object-complement cases can still use TSIC, while direct copula, inverted, modal, perfect, and object-raising examples use TTSI, ITSI, PPTSI, and PTSI carriers. Add focused corpus-knowledge examples and document the carrier link families and cost-preservation checks in GRAMMAR-FIXES.md. Verified focused first-linkage costs for direct, inverted, modal, perfect, object-raising, object-complement, and passive subjunctive-that TSi examples. Verified corpus-knowledge.batch remains at 0 errors and standard English corpus error counts remain unchanged for corpus-basic.batch, corpus-fixes.batch, corpus-fix-long.batch, and corpus-failures.batch. Co-authored-by: OpenAI Codex <codex@openai.com>
Add QIi carrier paths for filler-it predicates so the rule 22 dictionary replacement preserves the old predicative and auxiliary costs. Direct object-complement cases can still use QIIC, while direct copula, inverted, modal, perfect, and object-raising examples use TQII, IQII, PPQII, and PQII carriers. Add focused corpus-knowledge examples and document the carrier link families and cost-preservation checks in GRAMMAR-FIXES.md. Verified focused first-linkage costs for direct, inverted, modal, perfect, object-complement, and object-raising QIi examples. Verified corpus-knowledge.batch remains at 0 errors and standard English corpus error counts remain unchanged for corpus-basic.batch, corpus-fixes.batch, corpus-fix-long.batch, and corpus-failures.batch. Co-authored-by: OpenAI Codex <codex@openai.com>
Add Ci carrier paths for copular, modal, perfect, and object-raising filler-it finite-clause complements so the dictionary replacement preserves the old predicative-link costs instead of relying on crossing direct CIIC evidence. Keep local CIIC for object-complement cases, add PPCII for perfect auxiliary paths, and document focused cost-preservation checks for direct, inverted, auxiliary, object-complement, and object-raising examples. Verified with focused Rule 24 accepted-linkage checks and link-parser runs over corpus-knowledge.batch, corpus-basic.batch, corpus-fixes.batch, corpus-fix-long.batch, and corpus-failures.batch. Co-authored-by: OpenAI Codex <codex@openai.com>
Keep the replacement JW connector on the to.r wh-preposition path in the same cost position as the old optional J connector. This preserves the preferred cost for To what do you owe your success? while retaining the rule 12 dictionary-side witness split. Document the focused cost-preservation check for representative fronted-preposition questions and relatives. Verified with focused Rule 12 accepted-linkage checks and link-parser runs over corpus-knowledge.batch, corpus-basic.batch, corpus-fixes.batch, corpus-fix-long.batch, and corpus-failures.batch. Co-authored-by: OpenAI Codex <codex@openai.com>
Allow wh/degree B extraction noun branches to expose a local TOn link to infinitival to. This keeps the broad to.r I*a fallback removed while restoring the pre-migration preferred cost for wh-object infinitives such as which book/books to read and how much water to drink. Updated GRAMMAR-FIXES.md and corpus-knowledge coverage for the cost-preserving path. Verified focused accepted-linkage costs against the pre-migration behavior and checked link-parser corpus-knowledge, corpus-basic, corpus-fixes, corpus-fix-long, and corpus-failures error counts. Co-authored-by: OpenAI Codex <codex@openai.com>
Give the certified `so much that ...` replacement branch the inherited result-modifier cost that the old `much` path paid through `[[MVa-]]`. This keeps the local RTHAT certificate while restoring the preferred cost for the focused `so much that` result-clause example. Verified focused Rule 67 accepted and rejected examples, plus the standard English corpus checks. Co-authored-by: OpenAI Codex <codex@openai.com>
Add the missing DWHs extraction branch to way.n so wh-extraction questions such as `Which way did they go?` keep the certified DWH/B path and the previous preferred displayed cost after the PP-rule migration. Verified focused Qd/MX examples and the standard English corpus checks. Co-authored-by: OpenAI Codex <codex@openai.com>
Add a narrow ECQ/EEQ certificate chain for how much more adverb questions so the direct Qe linkage preserves the pre-migration zero-cost path without restoring the broad ordinary-adverb Qe fallback. Verified focused Rule 19 positives and negatives, corpus-knowledge.batch, corpus-basic.batch, corpus-fixes.batch, corpus-fix-long.batch, and corpus-failures.batch. Co-authored-by: OpenAI Codex <codex@openai.com>
Apply the old plural existential inversion branch cost to the Qd/THRP path for are.v, restoring Are there dogs in the park? to the pre-migration first-linkage cost while leaving the singular THRS question path unchanged. Document the late cost-preservation audit result in GRAMMAR-FIXES.md. Verified with focused existential-there positives and negatives, plus link-parser error-count checks for corpus-knowledge, corpus-basic, corpus-fixes, corpus-fix-long, and corpus-failures. Co-authored-by: OpenAI Codex <codex@openai.com>
Keep the migrated PVTHB passive certificate path at the same disjunct cost as the old Pvf passive path for THb predicate-complement linkages. In particular, `An allegation was made that he did it.` again sorts at the old first cost while direct and modal THb examples stay unchanged. Verified focused zero-null THb positives and negatives, and verified error-counts for link-parser runs over corpus-knowledge, corpus-basic, corpus-fixes, corpus-fix-long, and corpus-failures. Co-authored-by: OpenAI Codex <codex@openai.com>
Expose the plural comparative antecedent certificate from the specialized currency object branch, matching ordinary plural noun behavior for comparative auxiliary clauses. This restores `I earn as many dollars as John does.` to the comparative as.e-c first linkage and old first cost. Verified focused zero-null Rule 48 positives and the adjective negative, and verified error-counts for link-parser runs over corpus-knowledge, corpus-basic, corpus-fixes, corpus-fix-long, and corpus-failures. Co-authored-by: OpenAI Codex <codex@openai.com>
Add narrow filler-it BIh predicate paths for direct, inverted, modal, and perfect auxiliary forms without restoring the old broad BI+ copular branch for ordinary subjects. The new IBIH and PPBIH carriers preserve the old preferred costs for examples such as 'It was as if he knew.', modal/perfect variants, and inverted 'Was it as if he knew?', while ordinary-subject as-if predicates continue to use the previous adverbial modifier analysis. Verified focused Rule 41 positive and negative zero-null examples, and verified no error-count changes in link-parser runs over corpus-knowledge, corpus-basic, corpus-fixes, corpus-fix-long, and corpus-failures. Co-authored-by: OpenAI Codex <codex@openai.com>
Add narrow filler-it QIi paths for the idiomatic matter predicate. Direct matters uses the filler-it subject witness, and modal, inverted, and contracted negative do-support paths use IQII to carry that witness to lower matter without restoring bare ordinary-subject QIi+ on the predicate. This restores the old first costs for 'It matters what Ted does.', 'It doesn't matter what Ted does.', 'It may matter what Ted does.', and inverted do-support variants. Ordinary-subject controls such as 'Joe doesn't matter what Ted does.' remain rejected. Verified focused positive and negative zero-null examples, and verified no error-count changes in link-parser runs over corpus-knowledge, corpus-basic, corpus-fixes, corpus-fix-long, and corpus-failures. Co-authored-by: OpenAI Codex <codex@openai.com>
Restore the classic Qd,MX contains-none rule because the directive-opener dictionary rewrite is not an exact replacement for the full postprocessing check. The dictionary tightening still handles capitalized no-comma opener paths, but PP remains authoritative for other direct-question Qd+MX domains such as noun subjects with postnominal wh modifiers. Add focused corpus coverage for the noun-subject overgeneration and update GRAMMAR-FIXES.md to document the partial migration status and current validation counts. Verified focused Qd/MX positives and negatives, verified the filler-it matter QIi costs still sort first at DIS=1.50, and verified no error-count changes in link-parser runs over corpus-knowledge, corpus-basic, corpus-fixes, corpus-fix-long, and corpus-failures. Co-authored-by: OpenAI Codex <codex@openai.com>
Adjust the THR agreement carrier branches so the dictionary replacement for the old existential-there PP checks preserves the inherited preferred costs for going-to-be and likely-to-be chains. PGTHR now offsets the certified final be no-wall cost, and PATHR carries the remaining predicative-adjective cost delta from the old been-Pa path. Document the cost-preservation details in GRAMMAR-FIXES.md. Verified focused existential-there examples against master/pre-migration first costs, including singular/plural/uncountable going-to-be paths and likely/unlikely-to-be paths. Verified link-parser corpus-knowledge.batch, corpus-basic.batch, corpus-fixes.batch, corpus-fix-long.batch, and corpus-failures.batch kept their expected error counts. Co-authored-by: OpenAI Codex <codex@openai.com>
Record the accepted-linkage audit result for Rule 47. The migrated CMPO path intentionally makes "I did as much as he did" prefer the comparative as.e-c object-clause analysis over master's less specific temporal as.#while analysis, while the ordinary "the same as it did" control keeps its first cost and the lexical-verb object-clause example remains new coverage. Verified focused link-parser comparisons against master for "I did as much as he did", "The coffee tastes the same as it did last year", and "I earned as much as John earned"; verified the focused adjective negative remains rejected. Co-authored-by: OpenAI Codex <codex@openai.com>
Restore the inherited [Ma-] cost on MJX-licensed conjoined postposed-adjective paths. The migrated Rule 63 structure still uses MJXl/MJXr to prove that one conjoined adjective carries the required complement license, but the conjunction-side Ma anchor now pays the old postnominal-adjective cost instead of bypassing it. Focused checks verify that Many Democrats unhappy about the economy but doubtful that Clinton can be elected probably will not vote at all sorts at DIS=3.00 with but.j-m cost 1.000, while unlicensed postposed-adjective negatives still reject. Corpus error-count checks were stable for corpus-knowledge, corpus-basic, corpus-fixes, corpus-fix-long, and corpus-failures. Co-authored-by: OpenAI Codex <codex@openai.com>
Add a PVTHI carrier from filler-it be auxiliaries to passive participles so passive THi predicates no longer fall back to direct THIC evidence and inherited wall/passive costs remain intact. Cover ordinary passive verbs and the taken-for-granted Vtg path. Update GRAMMAR-FIXES.md with PVTHI and focused passive cost checks. Verified focused link-parser examples for believed/taken passive filler-it forms and ordinary-subject negatives. Verified error-count stability for corpus-knowledge, corpus-basic, corpus-fixes, corpus-fix-long, and corpus-failures. Co-authored-by: OpenAI Codex <codex@openai.com>
Limit CMPS and CMPP on noun-main paths to subject/object roles, so an ordinary prepositional object cannot license a comparative subject clause merely by appearing before than or as. This blocks the party --CMPS-- than path while keeping subject/object comparative antecedents available. Update GRAMMAR-FIXES.md to describe the narrower certificate scope and the passive/expletive analysis retained for the singular More people ... than was expected case. Verified focused link-parser examples for Rules 58 and 59 positives, number-mismatch negatives, and the More people ... than was expected ranking case. Verified error-count stability for corpus-knowledge, corpus-basic, corpus-fixes, corpus-fix-long, and corpus-failures. Co-authored-by: OpenAI Codex <codex@openai.com>
Add as-specific CMPOA and CMPOPA certificates so comparative object nouns can license as.e-c object clauses through an intervening PP modifier without crossing the main object link or losing the old MVp-preposition preference cost. This restores "I earn as much money in a month as John earns in a year." to the pre-migration first cost, keeps "I earn as much money as John earns." at its old first cost, and keeps the mixed "as much money than John earns" and plain-adjective comparative controls rejected. Validated focused link-parser probes and corpus counts: corpus-knowledge.batch 0 errors, corpus-basic.batch 86 errors with the intended Rule56 acceptance, corpus-fixes.batch 355 errors, corpus-fix-long.batch 8 errors, and corpus-failures.batch 1495 errors. Co-authored-by: OpenAI Codex <codex@openai.com>
Add metric-ordered extraction for limited non-generation parses, including PP-aware exact checks for migrated rule families and bounded rule-78 feedback as a PP-taught optimization. The parser keeps classic postprocessing authoritative for bounded rule-78 feedback, keeps ordinary extraction available through maintainer test flags, and documents validation paths for PP agreement, first-linkage ordering, and extractor-side PP suppressions. Co-authored-by: OpenAI Codex <codex@openai.com>
Set maintained corpus batch files to !limit=1 so batch runs exercise the metric extraction path instead of relying on high ordinary linkage limits. This makes low-limit corpus testing consistent across languages that ship batch files. Co-authored-by: OpenAI Codex <codex@openai.com>
Add Python example tests for low-limit metric extraction: a public API check that a one-linkage parse belongs to the ordinary first equal-cost bucket, and a parse-file smoke fixture for a large finite parse set. Co-authored-by: OpenAI Codex <codex@openai.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Hello Linas,
This PR changes linkage extraction for cases where a sentence has more
linkages than requested, especially combinatorial-explosion cases. Instead of
randomly sampling raw linkages, the link extractor returns linkages in cost
order, so low linkage limits such as !limit=1 can still find the lowest-cost
valid linkage.
To make cost-ordered extraction practical, the English post-processing rule
set was reduced to the single rule that still needs classic postprocessing.
Most rules were moved into the dictionary, and a few remaining exact checks
were moved into the metric/Viterbi-style extractor. The remaining bounded s
rule uses feedback from postprocessing so the extractor can skip later
same-shape violations before linkage materialization.
The PP-rule migration was validated on
corpus-knowledge.batch, a focusedEnglish grammar-regression corpus, and on the broader English corpora
corpus-basic.batch, corpus-fixes.batch, corpus-fix-long.batch, and
corpus-failures.batch. The observed sentence-status changes were improvements
rather than regressions. Linkage structure and disjunct costs were also
compared on many focused examples where replacement paths could affect the
preferred linkage. Performance improved substantially on the main English
corpora, while corpus-failures.batch is somewhat slower. For sentences with
combinatorial explosion, returning the lowest-cost valid linkages first is a
much more meaningful result than sorted random extraction, which can still miss
the low-cost region.
This is a large piece of work touching the parser, English PP-rule migration,
corpus linkage limits, tests, and documentation. Most needed is a review of
the PP-rule migration into the English dictionary, since this part is far from
my own expertise and I cannot judge Codex's grammar changes well enough by
myself. Review of the metric/Viterbi-style extractor approach and validation
strategy is also welcome.
The PP-rule migration is documented in
data/en/GRAMMAR-FIXES.md.The metric/Viterbi-style extractor design, PP-blocker handling, and validation
strategy are documented in
link-grammar/parse/METRIC-EXTRACTION.md.Amir