Idempotency Regressions
Minimized repros for formatter non-idempotency bugs surfaced by dfmt/idempotency-test.
Each scenario's expected block is set to what the formatter currently produces on pass 1.
FormatTestRunner then re-formats that output and asserts pass 2 == pass 1. When a bug is
present the runner reports:
IDEMPOTENCY FAILURE
Pass 1: …
Pass 2: …
The expected here documents the current buggy output so we can isolate the non-idempotency.
When a bug is fixed, expected here must be updated to the correct output.
Bug 1: MERGE WHEN NOT MATCHED BY SOURCE THEN DELETE leaks target/source aliases — FIXED
Previously the MERGE target/source aliases (a, b) were stripped from their position on the
MERGE INTO/USING lines and re-emitted inside the DELETE branch (as delete a b), which then
silently dropped on pass 2.
Fixed by the same FieldRoleAnalyzer change as Bug 2b: targetAlias and sourceAlias are
Optional[Reference(Alias)], and Alias is a sum with an Implicit variant that starts with a
bare identifier. Propagating "non-clause-like" through Sum variants means the aliases are now
classified as inline OTHER instead of being floated out as CHILD_CLAUSE siblings.
MERGE INTO t1 a USING t2 b ON a.id = b.id WHEN MATCHED THEN UPDATE SET v = b.v WHEN NOT MATCHED BY SOURCE THEN DELETE
merge into
t1 a using t2 b on a.id = b.id
when matched then update set v = b.v
when not matched by source then delete
Bug 2b: CREATE TABLE FK REFERENCES(cols) ordered after ON DELETE — FIXED
Previously REFERENCES u(b) rendered with (b) after ON DELETE CASCADE on pass 1, and on
pass 2 the Postgres parser silently truncated the body to just create table t. Fixed by making
FieldRoleAnalyzer classify Optional[Reference] whose target starts with punctuation (e.g. a
parenthesized column list) as inline OTHER instead of floating it out as a CHILD_CLAUSE
sibling after the main clause body.
CREATE TABLE t (a INT, FOREIGN KEY (a) REFERENCES u(b) ON DELETE CASCADE)
create table t (a int, foreign key (a) references u (b) on delete cascade)
Bug 3: CREATE TABLE column DEFAULT break placement — FIXED
Previously the formatter inserted hard breaks inside column definitions, detaching
DEFAULT <expr> from its owning column. Pass 2 then re-parsed the mangled output to a
truncated AST and dropped the entire column body. Fixed via FormatHint.Inline on
NotNullConstraint, NullConstraint, and DefaultValue — these column-constraint products
now render as inline atoms (no Doc.Clause wrapping), so they stay on the same line as
their owning column instead of forcing a newline before themselves.
CREATE TABLE t (id INT NOT NULL DEFAULT 1, name VARCHAR(100) NOT NULL DEFAULT 'x', FOREIGN KEY (id) REFERENCES u(x) ON DELETE CASCADE)
create table
t (
id int not null default 1,
name varchar (100) not null default 'x',
foreign key
(id) references u (x)
on delete cascade
)
Bug 4: consistentSiblings=true not idempotent under preserveBreaks=breaks_and_alignment — FIXED
Previously non-idempotent: on pass 1 only sum(…) broke across lines; on pass 2 the sibling
propagation additionally broke count(…) because triviaAnalysis.hasInnerNewline() read
pass-1-emitted newlines back on pass 2. Each successive pass broke more siblings.
Fixed in DocRenderer.Sequence handling by removing the trivia-newline check from the
consistentSiblings propagation: it now propagates only on structural breaks and width
pressure, both of which are pass-independent signals. See dfmt/docs/features/sibling-layout.md
for the idempotency test cases.
SELECT customer_id, count(order_id) AS order_count, sum(total_amount) AS total FROM orders GROUP BY customer_id
select
customer_id,
count(order_id) as order_count,
sum(total_amount) as total
from orders
group by customer_id
Bug 5: alignTokens=AS drift under preserveBreaks=BREAKS_AND_ALIGNMENT — FIXED
Previously non-idempotent on SELECT lists where one item overflows and internally breaks while
others fit. Pass 1 emitted alignment padding before AS on the overflowing item's last line;
pass 2 saw that padding as "manual alignment" and activated autoAlignPaddingActive globally,
which then leaked into (a) the nested Align(lhs, -, rhs) representing binary operators inside
the same item, and (b) the Align(e, ., name) nodes representing field access in other items.
The result was huge pad runs appearing in the middle of unrelated expressions on every
subsequent pass.
Fixed in two places:
DocRenderer.Aligncase now saves and clearspendingAlignPadaroundemit(a.content())so the enclosing SeparatedList's alignment target cannot leak into nestedAlignnodes inside the content subtree.PreserveBreaksDetector.detectAlignmentMarkernow returns the specific marker text (e.g."AS") that the user manually padded.DocRendereronly auto-alignsAlignnodes whose marker text equals that value — so padding beforeASno longer causes.or-Align nodes to be treated as alignment targets.
SELECT dept, COUNT(*) AS cnt, AVG(salary) AS avg_sal, MAX(salary) - MIN(salary) AS range FROM employees WHERE salary > 50000 GROUP BY dept
SELECT
dept, count(*) AS cnt, avg(salary) AS avg_sal, max(
salary
) - min(salary) AS range
FROM employees
WHERE salary > 50000
GROUP BY dept