Skip to main content

Idempotency Regressions

Minimized repros for formatter non-idempotency bugs surfaced by dfmt/idempotency-test.

Each scenario's expected block is set to what the formatter currently produces on pass 1. FormatTestRunner then re-formats that output and asserts pass 2 == pass 1. When a bug is present the runner reports:

IDEMPOTENCY FAILURE
Pass 1: …
Pass 2: …

The expected here documents the current buggy output so we can isolate the non-idempotency. When a bug is fixed, expected here must be updated to the correct output.

Bug 1: MERGE WHEN NOT MATCHED BY SOURCE THEN DELETE leaks target/source aliases — FIXED

Previously the MERGE target/source aliases (a, b) were stripped from their position on the MERGE INTO/USING lines and re-emitted inside the DELETE branch (as delete a b), which then silently dropped on pass 2.

Fixed by the same FieldRoleAnalyzer change as Bug 2b: targetAlias and sourceAlias are Optional[Reference(Alias)], and Alias is a sum with an Implicit variant that starts with a bare identifier. Propagating "non-clause-like" through Sum variants means the aliases are now classified as inline OTHER instead of being floated out as CHILD_CLAUSE siblings.

MERGE INTO t1 a USING t2 b ON a.id = b.id WHEN MATCHED THEN UPDATE SET v = b.v WHEN NOT MATCHED BY SOURCE THEN DELETE
merge into
t1 a using t2 b on a.id = b.id
when matched then update set v = b.v
when not matched by source then delete

Bug 2b: CREATE TABLE FK REFERENCES(cols) ordered after ON DELETE — FIXED

Previously REFERENCES u(b) rendered with (b) after ON DELETE CASCADE on pass 1, and on pass 2 the Postgres parser silently truncated the body to just create table t. Fixed by making FieldRoleAnalyzer classify Optional[Reference] whose target starts with punctuation (e.g. a parenthesized column list) as inline OTHER instead of floating it out as a CHILD_CLAUSE sibling after the main clause body.

CREATE TABLE t (a INT, FOREIGN KEY (a) REFERENCES u(b) ON DELETE CASCADE)
create table t (a int, foreign key (a) references u (b) on delete cascade)

Bug 3: CREATE TABLE column DEFAULT break placement — FIXED

Previously the formatter inserted hard breaks inside column definitions, detaching DEFAULT <expr> from its owning column. Pass 2 then re-parsed the mangled output to a truncated AST and dropped the entire column body. Fixed via FormatHint.Inline on NotNullConstraint, NullConstraint, and DefaultValue — these column-constraint products now render as inline atoms (no Doc.Clause wrapping), so they stay on the same line as their owning column instead of forcing a newline before themselves.

CREATE TABLE t (id INT NOT NULL DEFAULT 1, name VARCHAR(100) NOT NULL DEFAULT 'x', FOREIGN KEY (id) REFERENCES u(x) ON DELETE CASCADE)
create table
t (
id int not null default 1,
name varchar (100) not null default 'x',
foreign key
(id) references u (x)
on delete cascade
)

Bug 4: consistentSiblings=true not idempotent under preserveBreaks=breaks_and_alignment — FIXED

Previously non-idempotent: on pass 1 only sum(…) broke across lines; on pass 2 the sibling propagation additionally broke count(…) because triviaAnalysis.hasInnerNewline() read pass-1-emitted newlines back on pass 2. Each successive pass broke more siblings.

Fixed in DocRenderer.Sequence handling by removing the trivia-newline check from the consistentSiblings propagation: it now propagates only on structural breaks and width pressure, both of which are pass-independent signals. See dfmt/docs/features/sibling-layout.md for the idempotency test cases.

SELECT customer_id, count(order_id) AS order_count, sum(total_amount) AS total FROM orders GROUP BY customer_id
select
customer_id,
count(order_id) as order_count,
sum(total_amount) as total
from orders
group by customer_id

Bug 5: alignTokens=AS drift under preserveBreaks=BREAKS_AND_ALIGNMENT — FIXED

Previously non-idempotent on SELECT lists where one item overflows and internally breaks while others fit. Pass 1 emitted alignment padding before AS on the overflowing item's last line; pass 2 saw that padding as "manual alignment" and activated autoAlignPaddingActive globally, which then leaked into (a) the nested Align(lhs, -, rhs) representing binary operators inside the same item, and (b) the Align(e, ., name) nodes representing field access in other items. The result was huge pad runs appearing in the middle of unrelated expressions on every subsequent pass.

Fixed in two places:

  1. DocRenderer.Align case now saves and clears pendingAlignPad around emit(a.content()) so the enclosing SeparatedList's alignment target cannot leak into nested Align nodes inside the content subtree.
  2. PreserveBreaksDetector.detectAlignmentMarker now returns the specific marker text (e.g. "AS") that the user manually padded. DocRenderer only auto-aligns Align nodes whose marker text equals that value — so padding before AS no longer causes . or - Align nodes to be treated as alignment targets.
SELECT dept, COUNT(*) AS cnt, AVG(salary) AS avg_sal, MAX(salary) - MIN(salary) AS range FROM employees WHERE salary > 50000 GROUP BY dept
  SELECT
dept, count(*) AS cnt, avg(salary) AS avg_sal, max(
salary
) - min(salary) AS range
FROM employees
WHERE salary > 50000
GROUP BY dept