Saturday, November 04, 2017

Understanding HLint rules

Summary: I added a degenerate foldr to map rule in the new version of HLint, here I describe how it works.

I've just released HLint 2.0.10, which includes a rule to recognise uses of foldr that should really be map. As an example:

foldr (\curr acc -> (+1) curr : acc) []

Can be rewritten as:

map (\curr -> (+1) curr)

Which is much more readable (and then subsequently HLint will suggest map (+1), which is vastly clearer than the initial foldr). The change required to HLint was to add a rule to the hlint.yaml saying:

- warn: {lhs: "foldr (\\c a -> x : a) []", rhs: "map (\\c -> x)"}

You can read this statement as saying if you see foldr (\c a -> x : a) [], suggest map (\c -> x) as a warning. The HLint matching engine then applies that template to every subexpression in your program. In the rest of the post I'll talk through the steps HLint performs.

Step 1: Unification

The first step is to try unifying the template foldr (\c a -> x : a) [] against the users subexpression, namely foldr (\curr acc -> (+1) curr : acc) []. HLint is trying to find assignments for the single-letter variables in the template (namely c, a and x) which cause it to match the subexpression. Unification proceeds top-down, and if it finds anything concrete that does not match (e.g. the user had written foldl) then it fails. In this case the unification succeeds with the bindings:

  • c = curr (from the first argument to the lambda)
  • a = acc (from the second argument to the lambda)
  • x = (+1) curr (from before the cons)
  • a = acc (from after the cons)

An example of a subexpression that would have failed unification is foldl (\curr acc -> (+1) curr : acc) [].

Step 2: Validity

The next step is to check that any value which has been bound more than once is equal in all bindings. In our case only a has been used twice, and it always binds to acc, so the unification is valid.

An example of a subexpression that would have failed validity is foldr (\curr acc -> (+1) curr : xs) [].

Step 3: Substitution

Now we've got some bindings, we can substitute them into the RHS, namely map (\c -> x). We replace c and x using the bindings above. Note that a isn't mentioned on the RHS, so we don't use it. After substitution we get:

map (\curr -> (+1) curr)

Step 4: Free variable check

Consider the expression foldr (\curr acc -> f acc : acc) []. Using the rules above we'd end up with map (\curr -> f acc), which is terrible, since we've gone from referring to a locally bound acc to whatever acc is in scope (if any). To solve that, we check that the result doesn't introduce any new free variables:

(freeVars result \\ freeVars hintRuleRHS) `isSubsetOf` freeVars original

Specifically any free variables introduced in the result, which weren't in the RHS (excluding the fake unification variables), must have been in the original subexpression.

With that, for foldr, we're done. There are a handful of other steps that apply in some cases.

Step A: Dot expansion in the template

If you write a hint map f (map g x) ==> map (f . g) x then HLint notices that also implies the rule map f . map g ==> map (f . g) and adds it. As a result, you shouldn't write your HLint rules in point-free style.

Step B: Dot/dollar expansion in the subexpression

When matching a subexpression HLint will expand f $ x and (f . g) x if doing so results in a match. These operators are used commonly enough that they are often treated more like brackets than functions.

Step C: Scope matching

When unifying qualified function names, HLint uses the active imports to guess whether they match. If you have import qualified Data.Vector as V then the subexpression V.length will unify with Data.Vector.length. Since HLint doesn't have complete import information it uses a few heuristics to figure out matching.

Step D: Scope moving

Similarly to scope matching on the LHS of a rule, after matching, HLint tries to requalify any necessary values on the RHS. As an example, assuming we are producing Data.Vector.null, if we know about import qualified Data.Vector as V then we suggest V.null.

Full code

To see the full code and all supporting definitions go to the HLint source, which defines matchIdea - here I show a gently simplified version. Given scope information, a rule (LHS and RHS) and a subexpression, we optionally produce a resulting expression after substitution.

matchIdea :: Scope -> HintRule -> Exp_ -> Maybe Exp_
matchIdea s HintRule{..} original = do
    u <- unifyExp hintRuleLHS original
    u <- validSubst u
    -- need to check free vars before unqualification, but after subst (with e)
    -- need to unqualify before substitution (with res)
    let result = substitute u hintRuleRHS
    guard $ (freeVars result Set.\\ Set.filter (not . isUnifyVar) (freeVars hintRuleRHS))
            `Set.isSubsetOf` freeVars original
        -- check no unexpected new free variables
    return result

4 comments:

Bogdan Penkovsky said...

Funny enough
$ git clone https://github.com/ndmitchell/hlint
$ hlint hlint
...
32 hints

Neil Mitchell said...

$ git clone https://github.com/ndmitchell/hlint
$ cd hlint
$ hlint src
No hints

And I check that in the test suite :-)

For info, when I repeat the experiment you did I get 33 hints. There are 41 hints inside non-code locations (e.g. misc, dist, generated test files), and I ignore 7 hints using the .hlint.yaml file in the root. I'm now curious why I don't get 48 hints with your experiment...

Neil Mitchell said...

I now know what "hlint ." has more errors - because I turn on lots of additional things as errors in the .hlint.yaml file which your experiment doesn't use. That explains all the differences in numbers.

Bogdan Penkovsky said...

Nice article by the way. Thanks