Interesting question. I’d be comfortable up to level 2 in this list, after which I want to have my eyes on the changes. Even where code is functionally or semantically equivalent, style can make a lot of difference for comprehension and maintainability.
I’d agree, for the same reasons. Communicating intent is definitely one of the main things that separates mediocre from amazing developers (and software can’t check that).
It’s interesting to consider a tool that does all of levels 1-3 (and more) as a way to verify that a style refactoring hasn’t changed logic. I assume that’s what they meant when they wrote “modifications that were supposed to be no-ops but aren’t”.