Sunday, February 18, 2018

Atomic Expressions Generically

Summary: For certain hints HLint needs to determine if a Haskell expression is atomic. I wrote a generic method to generate expressions and test if they are atomic.

With HLint, if you write a statement such as:

main = print ("Hello")

You get the hint:

Sample.hs:1:14: Warning: Redundant bracket
Found:
  ("Hello")
Why not:
  "Hello"

One of ways HLint figures out if brackets are redundant is if the expression inside the brackets is "atomic" - if you never have to bracket it in any circumstances. As an example, a literal string is atomic, but an if expression is not. The isAtom function from haskell-src-exts-util has a list of the types of expression which are atomic, but the Exp type from haskell-src-exts has 55 distinct constructors, and I don't even know what many of them do. How can we check the isAtom function is correct?

One approach is to use human thought, and that's the approach used until now, with reasonable success. However, I've recently written a script which solves the problem more permanently, generating random expressions and checking that isAtom gives the right value. In this post I'm going to outline a few features of how that script works. There are basically three steps:

1) Generate a type-correct Exp

The first step is to generate a random Exp which follows the type definition. Fortunately the Data class in Haskell lets us generate values. We define:

mkValue :: forall a . Data a => Int -> IO a
mkValue depth
    | Just x <- cast "aA1:+" = randomElem x
    | Just x <- cast [-1 :: Int, 1] = randomElem x
    | Just x <- cast [-1 :: Integer, 1] = randomElem x
    | AlgRep cs <- dataTypeRep $ dataTypeOf (undefined :: a) =
        if depth <= 0 then throwIO LimitReached else fromConstrM (mkValue $ depth - 1) =<< randomElem cs

Here we are saying that given a depth, and a result type a, we generate a value of type a. Note that the a argument is the result, but we don't pass anything in of type a. The first three lines of the body follow the pattern:

    | Just x <- cast [list_of_element] = randomElem x

This tries to convert list_of_element to [a] by using runtime type information. If it succeeds, we pick a random element from the list. If it doesn't we continue onwards.

The final case uses dataTypeRep/dataTypeOf to get a list of the constructors of a. Note that we don't have a value of a, so we make one up using undefined :: a - but that's OK because dataTypeOf promises not to look at its argument. Given a list of constructors, we pick one at random, and then call fromConstrM - which says how to create a value of the right constructor, using some argument to fill in all the fields. We pass mkValue as that argument, which causes us to recursively build up random values.

One immediate problem is what if we are building a [Int] and the random generator often picks (:)? We'll take a very long time to finish. To solve this problem we keep a depth counter, decrement it in every recursive call, and when it runs out, throwIO an exception and give up.

2) Generate a parsing Exp

Now we've got a valid Exp value, but just because an Exp can be represented in the AST doesn't mean it corresponds to Haskell fragment. As an example, consider Var (UnQual (Ident "Test")). That's a valid value of type Exp, but if you pretty print it you get Test, and if you parse it back you'll get Con (UnQual (Ident "Test")) - variables must start with a leading lower-case letter.

To ignore invalid expressions we try pretty printing then parsing the expression, and ignore all expressions which don't roundtrip.

3) Determine if the Exp is atomic

Now we've got a valid Exp, which we know the user could have typed in as a source program, we need to figure out if isAtom is correct. To do that we see if given expression x whether self-application roundtrips, i.e. x x. As a positive example, foo (a variable) roundtrips as foo foo being foo applied to itself. However, if b then t else f when applied to itself gives if b then t else f if b then t else f, which parses back more like if b then t else f (if b then t else f), and is not atomic.

Putting it all together

Now we've got a random expression, and we know if the atomicity agrees with what we were expecting, we can report any differences. That approach has identified many additional patterns to match, but it's not perfect, in particular:

  • Most values either exceed the depth limit or fail to roundtrip. For 10,000 if expressions I typically get 1 or 2 which roundtrip properly. For non-if expressions it's usually 100 or so. The advantage of random testing is that throwing more time at a problem solves such issues without thinking too hard.
  • For some expressions, e.g. ParComp, I've never managed to get a valid value created. Perhaps haskell-src-exts can't parse it, or perhaps it requires constants I don't have in my hardcoded list - none of these were particularly common examples.
  • haskell-src-exts has a bug where -1 is pretty printed as (-1), which is then parsed as a paren and -1. That fails step 2, so we don't test with negative literals. As it happens, non-negative literals are atomic, but negative literals aren't, so we need to take care.
  • There are some patterns which appear to roundtrip successfully on their own, but not when surrounded by brackets, but secretly are just very weird. For example do rec\n [] parses successfully, but with source positions that are error values, and when applied to itself pretty prints incorrectly. There's at least one haskell-src-exts bug here.
  • The program appears to leak progressively more memory. I solved that by running slices of it at a time, and didn't look too hard. I've seen cases of blowup in Data constructors when recursing, so it could be that. but needs investigating.

As a result of all this work a future HLint will spot unnecessary brackets for 20 more types of expression, 8 more types of pattern and 7 more types of type.

Sunday, December 17, 2017

Announcing the 'debug' package

Haskell is a great language, but debugging Haskell is undoubtedly a weak spot. To help with that problem, I've just released the debug library. This library is intended to be simple and easy to use for a common class of debugging tasks, without solving everything. As an example, let's take a function we are interested in debugging, e.g.:

module QuickSort(quicksort) where
import Data.List

quicksort :: Ord a => [a] -> [a]
quicksort [] = []
quicksort (x:xs) = quicksort lt ++ [x] ++ quicksort gt
    where (lt, gt) = partition (<= x) xs

Turn on the TemplateHaskell and ViewPatterns extensions, import Debug, indent your code and place it under a call to debug, e.g.:

{-# LANGUAGE TemplateHaskell, ViewPatterns #-}
module QuickSort(quicksort) where
import Data.List
import Debug

debug [d|
   quicksort :: Ord a => [a] -> [a]
   quicksort [] = []
   quicksort (x:xs) = quicksort lt ++ [x] ++ quicksort gt
       where (lt, gt) = partition (<= x) xs
   |]

We can now run our debugger with:

$ ghci QuickSort.hs
GHCi, version 8.2.1: http://www.haskell.org/ghc/  :? for help
[1 of 1] Compiling QuickSort        ( QuickSort.hs, interpreted )
Ok, 1 module loaded.
*QuickSort> quicksort "haskell"
"aehklls"
*QuickSort> debugView

The call to debugView starts a web browser to view the recorded information, looking something like:

From there you can click around to explore the computation.

I'm interested in experiences using debug, and also have a lot of ideas for how to improve it, so feedback or offers of help most welcome at the bug tracker.

If you're interested in alternative debuggers for Haskell, you should check out the GHCi debugger or Hood/Hoed.

Tuesday, December 12, 2017

Benchmarking strchr vs memchr

Summary: memchr is faster, but the obvious implement seems to beat the builtin versions.

There are two related C functions for finding the next character in a string - strchr which assumes the string has a NUL character at the end, and memchr which takes the string length as an argument. For strings where you have the size and a NUL terminator, which is fastest? Using gcc 6.2.0 64bit MSYS2 on Windows 10, searching for a single byte 10M bytes along a string, the times were (fastest to slowest):

  • 11.05ms memchr implemented the obvious way.
  • 14.82ms strchr implemented the obvious way.
  • 14.96ms memchr provided by GCC.
  • 19.63ms strchr provided by GCC.

Trying on 3 different Windows computers, the results are all similar (but scaled).

Given the choice, you should prefer memchr over strchr.

Surprise result

The optimised implementations shipped with GCC are slower than the obvious C implementations taken from a wiki. I have absolutely no idea why. From what I can tell, the builtin versions are coded in assembly, operating on multiple bytes at a time, using SSE instructions. In contrast, the C variants operate on a single byte at a time, and aren't vectorised by the optimiser according to Godbolt. If anyone has an explanation I'd be keen to hear it.

Benchmark Code

To benchmark the variants I wrote a Haskell program using criterion. The full code and build instructions are available in this gist. I compiled the C code with -O3, using the gcc shipped with GHC 8.2.1. I've reproduced the Haskell code below, with some comments:

-- Import all the necessary pieces
import qualified Data.ByteString as BS
import qualified Data.ByteString.Unsafe as BS
import Criterion.Main
import Foreign
import Foreign.C.Types
import Data.Monoid

-- Make all the C imports
foreign import ccall unsafe "string.h memchr" memchr_std :: Ptr Word8 -> CInt -> CSize -> IO (Ptr Word8)
foreign import ccall unsafe "string.h strchr" strchr_std :: Ptr Word8 -> CInt -> IO (Ptr Word8)
foreign import ccall unsafe memchr_c :: Ptr Word8 -> CInt -> CSize -> IO (Ptr Word8)
foreign import ccall unsafe strchr_c :: Ptr Word8 -> CInt -> IO (Ptr Word8)

-- Method for ignoring the size when using strchr
ignoreSize f a b _ = f a b

-- Build a suitable string with an interesting character i bytes along
cstr i = BS.replicate i 32 <> BS.singleton 64 <> BS.replicate i 32 <> BS.singleton 0

-- The functions to benchmark
funs =
    [("memchr_std", memchr_std)
    ,("strchr_std", ignoreSize strchr_std)
    ,("memchr_c", memchr_c)
    ,("strchr_c", ignoreSize strchr_c)]

-- The main function, using Criterion
main = defaultMain
    [ seq bs $ bench (show i ++ " " ++ name) $ whnfIO $ test fun bs
    | i <- [1,10,100,1000,10000,100000,1000000,10000000]
    , let bs = cstr i
    , (name, fun) <- funs]

-- The function under test and input string
{-# NOINLINE test #-}
test fun bs =
    BS.unsafeUseAsCStringLen bs $ \(ptr,len) ->
        fun (castPtr ptr) 64 (fromIntegral len)

Sunday, November 26, 2017

Haskell exceptions and FFI wrappers

Summary: If you create a C function pointer from a Haskell function with "wrapper", and it throws an exception, bad things happen.

The Haskell FFI is incredibly powerful, allowing you to convert Haskell functions into C function pointers. In this post I'll give a quick example, then go into what happens if the Haskell function throws an exception. First, let's define a C function (and put it in a file called c.c):

int apply(int(*f)(int), int x)
{
    return f(x);
}

The piece int(*f)(int) says f is a function of type Int -> Int. The function apply is equivalent to $, restricted to int - it applies the first argument f to the second argument x and returns the result. We can call that in Haskell with:

foreign import ccall apply :: FunPtr (CInt -> IO CInt) -> CInt -> IO CInt
foreign import ccall "wrapper" wrap :: (CInt -> IO CInt) -> IO (FunPtr (CInt -> IO CInt))

main :: IO ()
main = do
    f <- wrap $ \x -> return $ x + 20
    res <- apply f 22
    print res

On the first line we wrap apply into a Haskell definition, turning a C function pointer into FunPtr. In the second we define a special "wrapper" FFI definition - the name "wrapper" is a specific string which is part of the FFI spec - it converts a Haskell function into a C function pointer. In main we put these pieces together, and other than the pervasive IO, it looks like the equivalent Haskell.

Note: In real code you should always call freeHaskellFunPtr after you have finished using a "wrapper" function, usually using bracket.

Consequences of Exceptions

What happens if the function we pass to wrap throws an exception? If you read the GHC manual, you'll find an incomplete link to the FFI spec, which stays silent on the subject. Thinking it through, Haskell has exceptions, but C does not - if the Haskell throws an exception it can't be passed back through C. Haskell can't provide a return value, so it can never resume the C code that called it. The GHC runtime can block indefinitely or kill the thread, both of which are fairly fatal for a program. As a consequence, I strongly recommend never throwing an exception from a function generated by "wrapper" - but what if we do?

Suggestion: most of the FFI addendum should probably be reproduced in the GHC manual with details around corner cases and exceptions.

Testing Exceptions

First, let's change our wrapped function to wrap $ \x -> fail "finish". Running that prints out:

bug.exe: user error (finish)

That seems like a standard exception. However, let's go further and put the entire program inside a finally, to show we have a normal Haskell exception:

main = flip finally (print "done") $ do
    ...

The output doesn't change - we never print out "done". It seems the exception thrown inside wrap aborts the program rather than bubbling up.

Suggestion: This error looks like a normal exception, but really isn't. It should say you have violated the wrapper invariant and your program has been violently aborted.

We've encountered bad behaviour, but can we go worse? Yes we can, by adding threads:

main = do
    replicateM_ 100 $ do
        forkIO $ do
            ff <- wrap $ \_ -> fail "die"
            print =<< apply ff 12
    threadDelay 10000000

Here we spawn 100 threads, each of which does an apply with an exception, then we wait for 10 seconds. The output is:

bug.exe: user error (die)
bug.exe: user error (die)
bug.exe: warning: too many hs_exit()s

It looks like there is a race condition with the exit path, causing two fatal wrapper exceptions to try and take down the runtime twice.

Suggestion: The hs_exit bug should be fixed.

Avoiding Exceptions

Now we know we need to avoid throwing exceptions inside "wrapper" functions, the obvious approach is to wrap them in a catch, e.g.:

wrap $ \x -> ... `catch` \(_ :: SomeException) -> return (-1)

Namely catch all exceptions, and replace them with -1. As usual with catch, it is important to force evaluation of the ... inside the catch (e.g. using catchDeep from safe-exceptions). If you want to recover the original exception you can capture it in an IORef and throw it after leaving C:

ref <- newIORef Nothing
f <- wrap $ \x -> ... `catch` \(e :: SomeException) -> do
    writeIORef ref $ Just e
    return (-1)
res <- apply f 22
whenJustM (readIORef ref) throwIO

However, what if there is an asynchronous exception after we leave the catch but before we return to C? From my experiments, this doesn't appear to be possible. Even though getMaskingState returns Unmasked exceptions thrown to the function inside wrapper appear to be deferred until the C code returns.

Suggestion: The documentation should clarify if my experiments are correct. Should getMaskingState return MaskedUninterruptible?

Friday, November 10, 2017

Ghcid with VS Code

Summary: New versions of Ghcid and the VS Code extension work even better together.

I've just released Ghcid v0.6.8 and the associated VS Code extension haskell-ghcid v0.2.0. Together they vastly simplify the Ghcid VS Code experience.

Ghcid reads .ghcid files

A new feature in Ghcid is that if there is a .ghcid file in the current directory it will load it as additional arguments. For example, in the Shake repo I have a .ghcid file:

-c "ghci -fno-code -ferror-spans"

Which tells ghcid to not guess at the command (e.g. using stack if you have a .stack-work) but always run ghci -fno-code -ferror-spans. This command works because I have a .ghci file which loads all the necessary files, while -fno-code speeds up compilation and -ferror-spans gives better error highlighting.

Ghcid VS Code starts ghcid

A new feature in the VS Code extension is the action Start Ghcid which starts a new ghcid terminal, writes the output to a temporary file, and uses that output to populate the Problems pane. Importantly, the extension runs ghcid with no command line arguments, so having a sensible .ghcid lets you control what it does.

The effect of these changes is that to start ghcid in VS Code is now a few key strokes, whereas before it required special flags, opening files, running commands etc.

Saturday, November 04, 2017

Understanding HLint rules

Summary: I added a degenerate foldr to map rule in the new version of HLint, here I describe how it works.

I've just released HLint 2.0.10, which includes a rule to recognise uses of foldr that should really be map. As an example:

foldr (\curr acc -> (+1) curr : acc) []

Can be rewritten as:

map (\curr -> (+1) curr)

Which is much more readable (and then subsequently HLint will suggest map (+1), which is vastly clearer than the initial foldr). The change required to HLint was to add a rule to the hlint.yaml saying:

- warn: {lhs: "foldr (\\c a -> x : a) []", rhs: "map (\\c -> x)"}

You can read this statement as saying if you see foldr (\c a -> x : a) [], suggest map (\c -> x) as a warning. The HLint matching engine then applies that template to every subexpression in your program. In the rest of the post I'll talk through the steps HLint performs.

Step 1: Unification

The first step is to try unifying the template foldr (\c a -> x : a) [] against the users subexpression, namely foldr (\curr acc -> (+1) curr : acc) []. HLint is trying to find assignments for the single-letter variables in the template (namely c, a and x) which cause it to match the subexpression. Unification proceeds top-down, and if it finds anything concrete that does not match (e.g. the user had written foldl) then it fails. In this case the unification succeeds with the bindings:

  • c = curr (from the first argument to the lambda)
  • a = acc (from the second argument to the lambda)
  • x = (+1) curr (from before the cons)
  • a = acc (from after the cons)

An example of a subexpression that would have failed unification is foldl (\curr acc -> (+1) curr : acc) [].

Step 2: Validity

The next step is to check that any value which has been bound more than once is equal in all bindings. In our case only a has been used twice, and it always binds to acc, so the unification is valid.

An example of a subexpression that would have failed validity is foldr (\curr acc -> (+1) curr : xs) [].

Step 3: Substitution

Now we've got some bindings, we can substitute them into the RHS, namely map (\c -> x). We replace c and x using the bindings above. Note that a isn't mentioned on the RHS, so we don't use it. After substitution we get:

map (\curr -> (+1) curr)

Step 4: Free variable check

Consider the expression foldr (\curr acc -> f acc : acc) []. Using the rules above we'd end up with map (\curr -> f acc), which is terrible, since we've gone from referring to a locally bound acc to whatever acc is in scope (if any). To solve that, we check that the result doesn't introduce any new free variables:

(freeVars result \\ freeVars hintRuleRHS) `isSubsetOf` freeVars original

Specifically any free variables introduced in the result, which weren't in the RHS (excluding the fake unification variables), must have been in the original subexpression.

With that, for foldr, we're done. There are a handful of other steps that apply in some cases.

Step A: Dot expansion in the template

If you write a hint map f (map g x) ==> map (f . g) x then HLint notices that also implies the rule map f . map g ==> map (f . g) and adds it. As a result, you shouldn't write your HLint rules in point-free style.

Step B: Dot/dollar expansion in the subexpression

When matching a subexpression HLint will expand f $ x and (f . g) x if doing so results in a match. These operators are used commonly enough that they are often treated more like brackets than functions.

Step C: Scope matching

When unifying qualified function names, HLint uses the active imports to guess whether they match. If you have import qualified Data.Vector as V then the subexpression V.length will unify with Data.Vector.length. Since HLint doesn't have complete import information it uses a few heuristics to figure out matching.

Step D: Scope moving

Similarly to scope matching on the LHS of a rule, after matching, HLint tries to requalify any necessary values on the RHS. As an example, assuming we are producing Data.Vector.null, if we know about import qualified Data.Vector as V then we suggest V.null.

Full code

To see the full code and all supporting definitions go to the HLint source, which defines matchIdea - here I show a gently simplified version. Given scope information, a rule (LHS and RHS) and a subexpression, we optionally produce a resulting expression after substitution.

matchIdea :: Scope -> HintRule -> Exp_ -> Maybe Exp_
matchIdea s HintRule{..} original = do
    u <- unifyExp hintRuleLHS original
    u <- validSubst u
    -- need to check free vars before unqualification, but after subst (with e)
    -- need to unqualify before substitution (with res)
    let result = substitute u hintRuleRHS
    guard $ (freeVars result Set.\\ Set.filter (not . isUnifyVar) (freeVars hintRuleRHS))
            `Set.isSubsetOf` freeVars original
        -- check no unexpected new free variables
    return result

Wednesday, September 20, 2017

Shake 0.16 - revised rule definitions

Summary: I've just released shake v0.16. A lot has changed, but it's probably only visible if you have defined your own rules or oracles.

Shake-0.16 is now out, 8 months since the last release, and with a lot of improvements. For full details read the changelog, but in this post I'm going to go through a few of the things that might have the biggest impact on users.

Rule types redefined

Since the first version of Shake there has been a Rule key value type class defining all rule types - for instance the file rule type has key of filename and value of modification time. With version 0.16 the type class is gone, rules are harder to write, but offer higher performance and more customisation. For people using the builtin rule types, you'll see those advantages, and in the future see additional features that weren't previously possible. For people defining custom rule types, those will require rewriting - read the docs and if things get tough, ask on StackOverflow.

The one place many users might encounter the changes are that oracle rules now require a type instance defining between the key and value types. For example, if defining an oracle for the CompilerVersion given the CompilerName, you would have to add:

type instance RuleResult CompilerName = CompilerVersion

As a result of this type instance the previously problematic askOracle can now infer the result type, removing possible sources of error and simplifying callers.

The redefining of rule types represents most of the work in this release.

Add cmd_

The cmd_ function is not much code, but I suspect will turn out to be remarkably useful. The cmd function in Shake is variadic (can take multiple arguments) and polymorphic in the return type (you can run it in multiple monads with multiple results). However, because of the overloading, if you didn't use the result of cmd it couldn't be resolved, leading to ugly code such as () <- cmd args. With cmd_ the result is constrained to be m (), so cmd_ args can be used.

Rework Skip/Rebuild

Since the beginning Shake has tried to mirror the make command line flags. In terms of flags to selectively control rebuilding, make is based entirely on ordered comparison of timestamps, and flags such as --assume-new don't make a lot of sense for Shake. In this release Shake stops trying to pretend to be make, removing the old flags (that never worked properly) and adding --skip (don't build something even if it is otherwise required) and --build (build something regardless). Both these flags can take file patterns, e.g, --build=**/*.o to rebuild all object files. I don't think these flags are finished with, but it's certainly less of a mess than before.