tag:blogger.com,1999:blog-50487338243763105792007-10-28T10:13:34.269-05:00ska's blogskahttp://www.blogger.com/profile/07206056709352200940noreply@blogger.comBlogger13125tag:blogger.com,1999:blog-5048733824376310579.post-22220793541588534792007-07-22T18:41:00.000-05:002007-07-22T18:46:44.726-05:00end of the (sudoku) roadApologies .... I thought I'd posted this entry last week but it turns out it was saved as a draft.<br /><br />Anyway, this is the last thing I'll say about sudoku, at least for a while. But I did promise something a little more useful out of all this discussion, so lets talk a bit about generating puzzles. Be warned, though, this is all pretty hackish.<br /><br />There are all sorts of issues here, particularly the issue that to have a reliable idea of how difficult these things are for a person to solve, we probably want a heuristic solver, no a backtracking solver like the one discussed here. So as far as `difficulty' goes, we'll just have to rely on the number of clues as a rough proxy.<br /><br />We want to generate puzzles, and one way to think about this is to first figure out how to make a valid solution (i.e. a fully solved puzzle) and then keep taking away positions until we have the right number of clues. We may not be able to take away as many as we want, though, and still keep the original position the <e>unique</e> solution. For my purposes I'm only going to consider unique solutions valid, anything else is cheating.<br /><br />First off, realize that because of the way the solver works, we already know how to generate a solved puzzle, we just start of with an empty (all zeros) one:<br /><pre>BLOG> (find-first-solution (make-array 81 :initial-element 0))<br />#(2 7 5 1 4 3 8 6 9 1 3 6 7 9 8 2 4 5 8 4 9 5 6 2 7 1 3 7 1 2 8 3 5 4 9 6 4 6 3<br /> 2 1 9 5 7 8 5 9 8 4 7 6 1 3 2 6 5 4 3 2 1 9 8 7 3 2 1 9 8 7 6 5 4 9 8 7 6 5 4<br /> 3 2 1)</pre>(notice that if we had used find-all-solutions instead of find-first-solution, we'd have to wait a long time --- the code would happily go along generating every possible solved sudoku board).<br /><br />So that's great --- except it will always generate the same board. We'll have to introduce some randomness. It's important to note that I have done no analysis of this, and while it will generate a really large number of different boards, I'm not going to claim that it is uniformly distributed or anything like that. What we do is `seed' a board with a certain number of random positions:<br /><pre>(defun seed-sudoku (num-sites)<br /> (let ((soln (make-array 81 :initial-element 0))<br /> (sites (shuffle! (loop for n below 81 collecting n))))<br /> (labels ((digit-at (n)<br /> (aref soln n))<br /> (possible-digits (n)<br /> (set-difference<br /> `(1 2 3 4 5 6 7 8 9)<br /> (mapcar #'digit-at (neighbors-of n))))<br /> (nth-value-set-p (n)<br /> (/= 0 (digit-at n)))<br /> (random-element (list)<br /> (nth (random (length list)) list))<br /> (update (n)<br /> (when (or (null sites) (zerop num-sites)) <br /> (return-from seed-sudoku soln))<br /> (let ((possible-digits (possible-digits n)))<br /> (when possible-digits<br /> (setf (aref soln n) (random-element possible-digits))<br /> (decf num-sites)))<br /> (update (pop sites))))<br /> (update (pop sites)))))</pre>It is possible that for larger num-sites this will not work, and for smaller num-sites there will be a large number of solutions. So we will just repeat until we find a valid board:<br /><pre>(defun generate-filled-soduku ()<br /> (loop for x = (seed-sudoku 20)<br /> unless (null x) do<br /> (let ((res (find-first-solution x)))<br /> (when res (return-from generate-filled-soduku res)))))</pre>We allow for the possibility that a solution isn't possible (it usually is) and just repeat until we get one. I picked 20 for the random seeds as an empirical balance. Too low (e.g. 10) and occasionally the solver takes a very long time to find a solution (on the order of minutes). Too high (e.g. 30) and often seed-sudoku will fail to find enough places to place a clue (i.e., they all become constrained)<br /><br />So now we can reliably generate solved boards, we'll add a little bit of code to try and remove clues until a desired number is achieved. We start of with 81 (everything known) and randomly select ones to remove so long as doing that doesn't make the solution non-unique.<br /><pre>(defun generate-sudoku (target-clues)<br /> "returns a valid sudoku puzzle, tries to get it down to target-clues number of non-zero entries" <br /> (loop with res = (generate-filled-soduku)<br /> with sites = (shuffle! (loop for n below 81 collect n))<br /> with clues = 81<br /> until (= clues target-clues)<br /> while sites<br /> for n = (pop sites) <br /> for val = (aref res n) do<br /> (setf (aref res n) 0)<br /> (if (multiple-solutions-p res)<br /> (setf (aref res n) val)<br /> (decf clues))<br /> finally<br /> (return (values res clues))))</pre>Now for one last thing, just to give more control. I might want to know that I actually have a 25 clue puzzle, not that I tried for 25 and only made it down to 30. We now do pure rejection sampling on the number of clues. In other words, we keep running generate-sudoku with an argument of n until it is successful:<br /><pre>(defun sudoku-of-n-clues (num-clues)<br /> (loop for (res clues) = (multiple-value-list (generate-sudoku num-clues))<br /> until (= clues num-clues)<br /> finally (return res)))</pre>This becomes slow (apparently exponentially growing in the time to find a solution) at about 20 clues. For my purposes this is fine. I don't do the puzzles myself, but have been told the 20 solution ones that I was generating were pretty hard to do. If we stay above 20, things are really not very expensive:<br /><pre>BLOG> (time (loop repeat 10000 collecting (sudoku-of-n-clues (+ 22 (random 20)))))<br />Evaluation took: 195.26 seconds of real time </pre>Combining this with the pdf-output I had before would generate a book of 10,000 puzzles, which should take a while to work through.<br /><br />Of course, this really isn't the end of the road, there are a large number of improvements that could be made, and this algorithm would need some rethinking if we wanted to seriously explore the space for whatever reasons, particularly for low clue numbers. Anyway, it was just an example to show a few CL idioms etc., but I added this as a starting-off point, perhaps, for a decent puzzle generator.skahttp://www.blogger.com/profile/07206056709352200940noreply@blogger.comtag:blogger.com,1999:blog-5048733824376310579.post-34525395298330515292007-07-16T23:03:00.000-05:002007-07-17T10:17:59.166-05:00pdf output<a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp2.blogger.com/_mB0w88uLkeU/Rpzc--b10vI/AAAAAAAAAAY/kWoTlgyz2jY/s1600-h/screen1.png"><img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;" src="http://bp2.blogger.com/_mB0w88uLkeU/Rpzc--b10vI/AAAAAAAAAAY/kWoTlgyz2jY/s400/screen1.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5088184653491589874" /></a><br /><br />A short post about cl-pdf, to get the above output. I've found it handy to hack up a number of outputs for figures and the odd bits and pieces. So, since I had a puzzle generator for sudoku puzzle, I thought I make something to print them (I'll still do a post later on generating puzzles).<br /><br />This is a hack and I haven't bothered to clean it up but might be useful as an example so I'll post it.<br /><br />First off we loop over a list of puzzles, printing them four to a page and allowing for a short page left over. This handles opening a document and writing it out:<br /><pre><br />(defun pdf-sudoku (sudokus file)<br /> "print a list, sudokus, of puzzles to file name file. Print 4 to a page"<br /> (pdf:with-document () <br /> (loop until (< (length sudokus) 4) do<br /> (loop repeat 4 collecting (pop sudokus) into list<br /> finally (pdf-sudoku-4up-page list))<br /> finally<br /> (unless (null sudokus)<br /> (pdf-sudoku-4up-page sudokus)))<br /> (pdf:write-document file)))</pre><br />So that looks ok so long as the 4up function actually prints puzzles somehow. This handles the logic of translating around to 4 positions on the page. We're hard coding in 8.5" by 11" paper (72 points per inch) here, and doing a couple of calculations to get the offsets so everything is centered.<br /><pre><br />(defun pdf-sudoku-4up-page (sudokus)<br /> "print between 1 and 4 puzzles on a single page"<br /> (pdf:with-page ()<br /> (let* ((helvetica (pdf:get-font "Helvetica" :win-ansi-encoding))<br /> (spacer (/ 72 4)) ; 1/4 inch<br /> (width (- (/ (* 8.5 72) 2) (* 2 spacer)))<br /> (voffset (/ (- (* 11 72) (* 2 width) (* 4 spacer)) 2))) <br /><br /> (pdf:translate spacer voffset)<br /> (when (first sudokus)<br /> (fill-in-grid (first sudokus) helvetica width))<br /><br /> (pdf:translate 0 (+ width spacer))<br /> (when (second sudokus)<br /> (fill-in-grid (second sudokus) helvetica width))<br /> <br /> (pdf:translate (+ width spacer) 0)<br /> (when (third sudokus)<br /> (fill-in-grid (third sudokus) helvetica width))<br /> <br /> (pdf:translate 0 (- (+ width spacer)))<br /> (when (fourth sudokus)<br /> (fill-in-grid (fourth sudokus) helvetica width)))))</pre><br />Ok, assuming that works we still need to draw a grid and fill in the numbers. I separated out a function to draw the lines, draw-grid.<br /><pre><br />(defun draw-grid (x0 y0 width height cellsx cellsy)<br /> (pdf:move-to x0 y0)<br /> (loop for x from x0 upto (+ x0 width) by (/ width cellsx) do<br /> (pdf:move-to x y0)<br /> (pdf:line-to x (+ y0 height))<br /> (pdf:stroke))<br /> (loop for y from y0 upto (+ y0 height) by (/ height cellsy) do<br /> (pdf:move-to x0 y)<br /> (pdf:line-to (+ x0 width) y)<br /> (pdf:stroke)))<br /><br /><br />(defun fill-in-grid (array font &optional (scale (* 8.5 72)))<br /> "fill in numbers from the sudoku in array if they are set (non zero)"<br /> (let ((font-size (/ scale 10))<br /> (step-size (/ scale 9)))<br /> (pdf:set-line-width 1.0)<br /> (draw-grid 0 0 scale scale 9 9)<br /> (pdf:set-line-width 3.0)<br /> (draw-grid 0 0 scale scale 3 3)<br /> (pdf:in-text-mode<br /> (pdf:move-text 0 (- scale step-size))<br /> (pdf:set-font font font-size)<br /> (multiple-value-bind (w h x) <br /> (pdf:get-char-size #\9 font font-size)<br /> (pdf:move-text (/ (- step-size w) 2) (/ (- step-size h) 2)))<br /> (loop with n = -1<br /> repeat 9 do<br /> (loop repeat 9 <br /> for x = (aref array (incf n))<br /> unless (= 0 x) do<br /> (pdf:draw-text (format nil "~D" x))<br /> do<br /> (pdf:move-text (/ scale 9) 0)<br /> finally (pdf:move-text (- scale) (- (/ scale 9))))))))</pre><br />That's it then. There is all sorts of hard coded nonsense, and it should probably handle 2up and 1up as well as different paper sizes. Oh, and let us change the font. But like I said, it's just hacked together and I've tried to isolate the bits that have the hardcoding of sizes, etc. So perhaps it will be useful as an example.<br /><br />I owe you a short post on generating puzzles, and then will be done with these sudokus ... probably.skahttp://www.blogger.com/profile/07206056709352200940noreply@blogger.comtag:blogger.com,1999:blog-5048733824376310579.post-6519427382262516052007-07-12T01:08:00.000-05:002007-07-12T01:44:40.312-05:00optimizing the solver: wrap-upI thought I'd add a few closing remarks to the optimization comments. This post won't make much sense unless you've read the previous few solver posts. <br /><br />1) Particularly if you are new to lisp and coming from a c-family language, you may have a bad feel for how fast your lisp implementation can walk across a list. I made a version that stored all the neighborhood information in an array instead of a vector of lists -- it was slower.<br /><br />2) You might also question the speed of using recursion. I rewrote the recursive version of solve12 as a loop. It was both much more difficult to understand, and slower. This was because to use the same algorithm, I had to maintain lists of both the unassigned sites, and the ones I'd visited --- the call stack is doing the latter for me, and it does it quickly. With solve5, I made a slightly faster non-recursive version because I could compute where the last site I tried was, but in the more sophisticated solver you can't predict the order that way.<br /><br />3) There are any number of other tweaks like this you might try, I tried a few, but by no means exhausted the possibilities. Both the approaches in 1) and 2) could probably be finessed into at least catching up with our solver12, if not exceeding it.<br /><br /><br />I think I've hit (passed, really) diminishing returns with this code. You could spend a lot more time on it, but keep in mind the following:<br /><br />- any big gain i more likely to come from a better algorithm, if at all<br /><br />- at this point, we are pretty much hitting implementation dependent improvements. For example, I had one version using a do* loop that was a bit slower, I'm not sure why.<br /><br />- finding the reasons for questions like the previous will probably involve a combination of profiling an examining the actual code generation (via disassemble) in order to find out what is really going on.<br /><br />- the painstaking process described in the previous point takes a lot of time, is bug-prone, and tends to end up with code that is hard to follow and maintain. Your carefully measured `optimal' is unlikely to be the same for different implementations. It might not hold for the next upgrade of your compiler.<br /><br />- In my experience, making fast code say, twice as fast as it was is often a lot more work than making slow code thousands of times faster<br /><br /><br />All the above taken together means I pretty much never tweak code this far, let alone further if I can avoid it. Unless there is a compelling reason. Turning six months of runtime into a couple of weeks is worth spending a few days on. Quadrupling the speed of some one-off thing that will only run for a day would be a waste of time even if you just consider the time spent coding --- when you add in the much higher probability of some subtle bug that invalidates your results, it's <i>really</i> counterproductive....<br /><br />All that said, sometimes you need efficient code in research, and I hope this little series has left you with the idea that this is really quite possible with lisp, so that won't scare you off using the expressive power of a language like this. <br /><br />I'll do one more post in this series, making a simple little puzzle generator and some pdf output for it.skahttp://www.blogger.com/profile/07206056709352200940noreply@blogger.comtag:blogger.com,1999:blog-5048733824376310579.post-80325060905786298022007-07-08T15:07:00.000-05:002007-07-08T16:42:17.771-05:00optimizing the solver part IV: now we're cooking with gasLast time we established that while solve5 was pretty fast at what it does, it's not a great algorithm. As mentioned, I found a files somewhere that I believe originated <a href="http://magictour.free.fr/top2365">here</a>, which we can used for testing. If I add a small change to solve5 to just count how many times it calls update, look at what we get for the first 100 puzzles in this file:<br /><pre><br />BLOG> (loop repeat 100 for x in *top2365* summing (solve5 x))<br />77405348<br /></pre><br />Thats roughly 750,000 calls per puzzle, on average. Lets see if we can't do something about that. I'm going to make a simple and fairly obvious change, but it will require rethinking things. What we want to do is to visit the sites in order of most-constrained to least-constrained. As it is now, we just go and look at each neighbor of the current site once, to see what digits are constrained. If we stay with vector of bits to describe constrained/unconstrained, we have a problem. This is because when you set a new constraint all you do is set the bit to one, but if you want to clear a constraint, you'll have to check all the other sites that might affect that position, to see if it is really cleared (this because a site can be constrained from having a particular digit by multiple other sites). <br /><br />All that running around and checking seems like a lot of work, so I'm going to use an array that counts the number of contraints a particular position has, and also stores the total number. So we need to keep this array updated, and we'll re-introduce set-digit and clear-digit to wrap that logic up. Other than that, we do something very similar to solve5, except we sort the candidate sites by how constrained they are. Here's a first cut:<br /><pre><br />(defun solve7 (initialstate)<br /> (let ((soln (make-array `(81) :element-type '(integer 0 9)))<br /> (constraints (make-array `(81 10) :element-type '(integer 0 20) :initial-element 0))<br /> (unassigned `())<br /> (calls 0))<br /> (labels ((sort-unassigned ()<br /> (setf unassigned (sort unassigned #'(lambda (n m) (> (aref constraints n 0) (aref constraints m 0))))))<br /> (set-digit (n d)<br /> (setf (aref soln n) d)<br /> (over-neighbors (i n)<br /> (when (= 1 (incf (aref constraints i d)))<br /> (incf (aref constraints i 0)))))<br /> (clear-digit (n d)<br /> (over-neighbors (i n)<br /> (when (= 0 (decf (aref constraints i d)))<br /> (decf (aref constraints i 0))))<br /> (setf (aref soln n) 0))<br /> (solve (n)<br /> (incf calls)<br /> (loop for digit from 1 to 9<br /> when (zerop (aref constraints n digit)) do<br /> (set-digit n digit)<br /> (sort-unassigned)<br /> (if unassigned<br /> (solve (pop unassigned))<br /> (return-from solve7 (values calls soln)))<br /> (clear-digit n digit)<br /> (sort-unassigned)<br /> finally<br /> (push n unassigned))))<br /> ;;setup<br /> (loop for n below 81<br /> for x across initialstate<br /> if (= 0 x) do<br /> (push n unassigned)<br /> else do<br /> (set-digit n x))<br /> (sort-unassigned)<br /> (solve (pop unassigned)))))<br /></pre><br />And the result:<br /><pre><br />BLOG> (loop repeat 100 for x in *top2365* summing (solve7 x))<br />107568<br /></pre><br />From 77 million calls down to a hundred thousand? Sure, each step is a little more expensive, but there are a <span style="font-weight:bold;">lot</span> fewer steps.<br /><br />The only thing about this approach is that the sort is really heavyweight (go ahead and measure this....). I played around with a couple of options, and what I ended up with was code to do a a search through the list of possible options, and remove (one of) the most constrained one and return it. This code looks hairier than anything we've done so far, but it is fast. The complexity is there to manage unlinking the list. Here it is:<br /><pre><br />(defun solve8 (initialstate)<br /> (declare (optimize speed (debug 0) (safety 0)))<br /> (let ((soln (make-array `(81) :element-type '(integer 0 9)))<br /> (constraints (make-array `(81 10) :element-type '(integer 0 20) :initial-element 0))<br /> (unassigned `()))<br /> (labels ((find-next-site ()<br /> (loop with maxval = (aref constraints (car unassigned) 0)<br /> with head = nil<br /> for a on unassigned<br /> for b = (cadr a)<br /> while b<br /> for val = (aref constraints b 0) <br /> when (> val maxval) do<br /> (setf head a maxval val)<br /> finally<br /> (return (if head<br /> (prog1 (cadr head)<br /> (setf (cdr head) (cddr head)))<br /> (pop unassigned)))))<br /> (set-digit (n d)<br /> (setf (aref soln n) d)<br /> (over-neighbors (i n)<br /> (when (= 1 (incf (aref constraints i d)))<br /> (incf (aref constraints i 0)))))<br /> (clear-digit (n d)<br /> (over-neighbors (i n)<br /> (when (= 0 (decf (aref constraints i d)))<br /> (decf (aref constraints i 0))))<br /> (setf (aref soln n) 0))<br /> (solve (n)<br /> (loop for digit from 1 to 9<br /> when (zerop (aref constraints n digit)) do<br /> (set-digit n digit)<br /> (if unassigned<br /> (solve (find-next-site))<br /> (return-from solve8 soln))<br /> (clear-digit n digit)<br /> finally<br /> (push n unassigned))))<br /> ;;setup<br /> (loop for n below 81<br /> for x across initialstate<br /> if (= 0 x) do<br /> (push n unassigned)<br /> else do<br /> (set-digit n x))<br /> (solve (find-next-site)))))<br /></pre><br />You'll notice that on this final version I've gone ahead and used maximally fast (and minimally safe) optimization declares. I did the same for solve5, and it makes a small difference. The algorithmic change though, is anything but a small difference:<br /><pre><br />BLOG> (time (loop for x in *top2365* do (solve5 x)))<br />Evaluation took:<br /> 1650.157 seconds of real time<br /> 1586.752 seconds of user run time<br /> 23.794111 seconds of system run time<br /> [Run times include 112.596 seconds GC run time.]<br /> 0 calls to %EVAL<br /> 0 page faults and<br /> 185,974,228,944 bytes consed.<br />BLOG> (time (loop for x in *top2365* do (solve7 x)))<br />Evaluation took:<br /> 46.938 seconds of real time<br /> 46.745686 seconds of user run time<br /> 0.144156 seconds of system run time<br /> [Run times include 0.2 seconds GC run time.]<br /> 0 calls to %EVAL<br /> 0 page faults and<br /> 226,594,424 bytes consed.<br />NIL<br />BLOG> (time (loop for x in *top2365* do (solve8 x)))<br />Evaluation took:<br /> 3.55 seconds of real time<br /> 3.533416 seconds of user run time<br /> 0.012559 seconds of system run time<br /> [Run times include 0.037 seconds GC run time.]<br /> 0 calls to %EVAL<br /> 0 page faults and<br /> 40,511,344 bytes consed.<br />NIL<br />BLOG> (/ 3.55d0 (length *top2365*))<br />0.0015010570824524313d0<br /></pre><br />An average time of 0.0015s/puzzle looks pretty good, and it's 400 times faster than solve5, which was already 10,000 times faster than our first attempt, at least on a particular puzzle. I'm not going to run that one on 2000+ puzzles for you, but if you've got a few days of cpu to put to it, go ahead....<br /><br />This post is pretty long, so I'll leave it like that for now and recap the optimization stuff next time. Then I'm going to go ahead and build a simple puzzle generator (with pdf output) out of solve8 (at least, solve8 with the restart put back in). For those who are interests, putting the restart back in will raise th total runtime from about 3.5s to about 4.0s. Not nothing, but hardly the end of the world.<br /><br />Oh, and one last thing. It's always a good idea to sanity-check this sort of thing as you go along:<br /><pre><br />BLOG> (notany #'null (mapcar #'validate (loop for x in *top2365* collecting (solve8 x))))<br />T<br /></pre><br />Here validate is a simple routine that checks that every digit is between 1..9, and no digit is the same value as any neighbor it has in row,column, or cell.skahttp://www.blogger.com/profile/07206056709352200940noreply@blogger.comtag:blogger.com,1999:blog-5048733824376310579.post-65314388975572915632007-07-07T20:49:00.000-05:002007-07-07T21:52:23.592-05:00optimizing the solver part III: a bit more about measurementThe code called `solve5' is reasonably fast. I suspect we could make it a little faster, but at this point we've gone about as far with it as I'd like. After all, we're just tuning a particular algorithm, and eventually we have to hit diminishing returns --- as it is we have a clean, easy to follow implementation that is many orders of magnitude faster than to naive code we started off with.<br /><br />We haven't really asked if the algorithm is good. It isn't, particularly. It's not terrible, but it really isn't that good. So it doesn't make a lot of sense to keep tuning it. Which brings me to the second issue I mentioned a few posts ago; in optimizing code we have to balance effort spent on better algorithms vs. effort spent on tuning.<br /><br />I talked a bit about profilers. If you're doing this sort of thing, you should acquaint yourself with yours. If you're using a free lisp, you may not have a lot of options (for example, sbcl on darwin/ppc doesn't seem to have a working statistical profiler). There's lots of information out there, but most of it is implementation dependent and I won't spend time on it now. There are other things you can do; what follows is an example.<br /><br />If you think about the kind of search problem we are doing here, you're bound to start wondering about the order of search. As it is, we just look at the sites in order, lowest to highest, and try and fill in the digits. It should be clear that if someone knew how you were doing this, they could construct a puzzle that was very difficult for you to solve (but might be quite easy if you searched highest-to-lowest, say). This comes back to the issue I earlier raised -- that performance on this problem is controlled by two things: how long it takes you to try a single vertex (i.e. position in the puzzle), and how many times you have to try. This latter part is strongly affected by the order in which we look.<br /><br />If we're worried about this sort of thing, it doesn't make a lot of sense to look at only one or two puzzles, does it? That tells us a lot about the first part of the issues (how fast can one vertex be tested) but nothing much about how many things I have to try on average. <br /><br />Before we do something about that, here is a little experiment. I made a new solver, solve6, with a small change:<br /><pre><br />...<br /> else do<br /> (setf (aref soln n) x))<br />(setf unassigned (shuffle! unassigned))<br />...<br /></pre><br />At the end of the loop setting things up, I randomly shuffle the list of unassigned sites. So every time I run solve6, I look at the sites in a different (random) order. This should have only a tiny effect on execution speed, because it a) only happens once, and b) is done with fast code. Look what happens:<br /><pre><br />BLOG> (avg-runtime 30 (solve5 *easy*))<br />1.3333333333333334d-4<br />1.1954022988505748d-7<br />BLOG> (avg-runtime 30 (solve6 *easy*))<br />0.47433333333333333d0<br />0.8706354712643679d0<br /></pre><br />My macro (avg-runtime n body) executes body n times and returns the mean and variance of the runtime in seconds (The details probably aren't important, but I'll note it is computing seconds based on get-internal-runtime. This is implementation dependent, but a handy userspace tool to measure things with. I'll describe it more at the end)<br /><br />I used a puzzle called *easy* this time. Why? Because solve6 takes too long to solve the *slow* puzzle, even once. I got tired of waiting for just one solution after 5 minutes or so (compared to solve5's 1.3e-4s, remember). This little change made a <span style="font-weight:bold;">huge</span> difference. Why? Because the order we visit sites really, really matters. Which suggests that rather than fiddling around with tuning what we have, we should really think a bit about how to visit sites in a smart way.<br /><br />There's another clue here. solve6 execution speed has high variance, which means picking any old order of visitation can get you in all kinds of trouble. But also note, solve5 is comparatively very, very fast. Why? We're doing something clever, without thinking about it. What we're doing is walking along in row-major order through the array, basically. But remember that rows are one of the constraints for this puzzle. So as we move along, we are constraining the next choice (and the one after that, etc.) each time. Looks like this works for us pretty well!<br /><br />We can do a lot better, and I'll talk about that next time. And we're going to have to talk about average performance across a larger number of examples. I poked around on the web and found some <a href="http://magictour.free.fr/sudoku.htm">here</a>, that are labeled as `hard to solve'. I'll be using the file `top 2635' if you want to play along next time.<br /><br />Before I end though, ss promised, a bit more on the macro I used. First off, I used a handy little function that computes the mean and variance of any sequence, something like this:<br /><pre><br />(defun mean-and-variance (seq)<br /> (let ((n 0)<br /> (mean 0)<br /> (sum^2 0))<br /> (flet ((update (x)<br /> (let ((dev (- x mean)))<br /> (incf n)<br /> (incf mean (/ dev n))<br /> (incf sum^2 (* dev (- x mean))))))<br /> (map 'nil #'update seq)<br /> (values mean (/ sum^2 (1- n))))))<br /></pre><br />I mention this particularly, because when I first started lisping I would far too often special case different sequence types in a loop (a c programmers approach, I guess) and forgo code like this that is clean, reasonably quick, and doesn't care if you hand it a list or a vector, etc. Common lisp is full of nice function like map, and if you haven't found them yet you should look.<br /><br />Here is avg-runtime, and a macro it depends on:<br /><pre><br />(defmacro tictoc (&body body)<br /> (with-unique-names (t0 t1 res)<br /> `(let (,t0 ,t1 ,res) <br /> (setf ,t0 (get-internal-run-time)<br /> res ,@body<br /> ,t1 (get-internal-run-time))<br /> (values (float (/ (- ,t1 ,t0) internal-time-units-per-second) 1.0d0) res))))<br /><br />(defmacro avg-runtime (n &body body)<br /> (with-unique-names (res)<br /> `(loop repeat ,n <br /> collecting (tictoc ,@body) into ,res<br /> finally (return (mean-and-variance ,res)))))<br /></pre><br /><br />Of course there are loads of ways to do this, and if I was worried about long lists I wouldn't collect one up to pass to mean-and-variance, but it's fine for our purposes. I'm sure there are any number of weaknesses in the above, too. It's something I whipped off both to measure the solve6 run, and more as an example of how easy it is to build this sort of thing to help you.skahttp://www.blogger.com/profile/07206056709352200940noreply@blogger.comtag:blogger.com,1999:blog-5048733824376310579.post-65653272775573980402007-07-06T12:08:00.000-05:002007-07-06T13:03:24.444-05:00optimizing the solver, part II: Know your profilerLet me start off with a comment: I'm very new at this and having trouble keeping the entries a manageable size. Because of this, some of you have noticed I pulled a bit of a bait and switch on you last time. I talked about the need to measure things, but then I didn't show you any measurements. I'm going to start talking about that now, but there's lots to say so I'll just start with a small piece (and take it up again in future posts).<br /><br />The sudoku example I've show you is simple, and we can reason some things out about the performance. Also, of course, some things come from experience. In general though, you really can't hope to optimize what you can't measure. The first thing you can do this (assuming of course, you've all ready decided you actually *have* to optimize this bit of code) is get to know what sort of profiling abilities you have. This is by nature implementation dependent: take some time to familiarize yourself with your particular system.<br /><br />Here are some results from the statistical profiler in sbcl (somewhat trimmed). This showed me that my intuition was correct, and neighbors-of had a lot of unneeded overhead in the original implementation:<br /><pre><br />BLOG> (sb-sprof:with-profiling (:max-samples 50000 :report :flat :loop nil) (find-first-solution *slow*))<br /> Self Total Cumul<br /> Nr Count % Count % Count % Calls Function<br />------------------------------------------------------------------------<br /> 1 3285 30.2 3285 30.2 3285 30.2 - (FLET #:CLEANUP-FUN-1165)<br /> 2 2277 20.9 2277 20.9 5562 51.1 - (FLET #:CLEANUP-FUN-1030)<br /> 3 763 7.0 4053 37.2 6325 58.1 - SB-KERNEL:%PUTHASH<br /> 4 611 5.6 8555 78.6 6936 63.7 - SB-IMPL::LIST-REMOVE-DUPLICATES*<br /> 5 595 5.5 595 5.5 7531 69.2 - (FLET #:CLEANUP-FUN-77)<br /> 6 551 5.1 2828 26.0 8082 74.3 - SB-IMPL::GETHASH3<br /> 7 387 3.6 575 5.3 8469 77.8 - SET-DIFFERENCE<br /> 8 354 3.3 617 5.7 8823 81.1 - MAKE-HASH-TABLE<br /> 9 318 2.9 393 3.6 9141 84.0 - REMOVE<br /> 10 277 2.5 9425 86.6 9418 86.5 - NEIGHBORS-OF<br /> 11 232 2.1 232 2.1 9650 88.7 - SB-VM::MOVE-FROM-SIGNED<br /> 12 209 1.9 210 1.9 9859 90.6 - EQL<br /> 13 195 1.8 10880 100.0 10054 92.4 - (LABELS UPDATE)<br /> 14 119 1.1 174 1.6 10173 93.5 - SB-IMPL::%MAKE-HASH-TABLE<br /> 15 113 1.0 113 1.0 10286 94.5 - SB-VM::GENERIC-+<br /> 16 91 0.8 95 0.9 10377 95.3 - LENGTH<br /> 17 88 0.8 88 0.8 10465 96.2 - SUBS->INDEX<br /> 18 73 0.7 73 0.7 10538 96.8 - SB-IMPL::GETHASH2<br /> 19 50 0.5 50 0.5 10588 97.3 - TRUNCATE<br /> 20 39 0.4 88 0.8 10627 97.6 - FLOOR<br /></pre><br />Look at line 10, particularly.<br />After replacing neighbors-of with the lookup version we get this:<br /><pre><br />BLOG> (sb-sprof:with-profiling (:max-samples 50000 :report :flat :loop nil) (find-first-solution *slow*))<br /> Self Total Cumul<br /> Nr Count % Count % Count % Calls Function<br />------------------------------------------------------------------------<br /> 1 323 41.5 477 61.2 323 41.5 - SET-DIFFERENCE<br /> 2 183 23.5 779 100.0 506 65.0 - (LABELS UPDATE)<br /> 3 137 17.6 137 17.6 643 82.5 - EQL<br /> 4 62 8.0 62 8.0 705 90.5 - (FLET #:CLEANUP-FUN-77)<br /> 5 35 4.5 35 4.5 740 95.0 - (LABELS DIGIT-AT)<br /> 6 23 3.0 23 3.0 763 97.9 - SB-KERNEL:%COERCE-CALLABLE-TO-FUN<br /> 7 11 1.4 11 1.4 774 99.4 - SB-VM::MOVE-FROM-SIGNED<br /> 8 3 0.4 3 0.4 777 99.7 - SB-VM::GENERIC-+<br /> 9 1 0.1 1 0.1 778 99.9 - NEIGHBORS-OF<br /> 10 1 0.1 1 0.1 779 100.0 - (FLET #:CLEANUP-FUN-4)<br /> 11 0 0.0 779 100.0 779 100.0 - SOLVE2<br /> 12 0 0.0 779 100.0 779 100.0 - FIND-FIRST-SOLUTION<br /> 13 0 0.0 779 100.0 779 100.0 - NIL<br /> 14 0 0.0 779 100.0 779 100.0 - SB-INT:SIMPLE-EVAL-IN-LEXENV<br /> 15 0 0.0 779 100.0 779 100.0 - SWANK::EVAL-REGION<br /> 16 0 0.0 779 100.0 779 100.0 - "Unknown component: #xA81A978"<br /> 17 0 0.0 779 100.0 779 100.0 - (LAMBDA (SWANK-BACKEND::FN))<br /> 18 0 0.0 779 100.0 779 100.0 - SWANK::CALL-WITH-BUFFER-SYNTAX<br /> 19 0 0.0 779 100.0 779 100.0 - SWANK:LISTENER-EVAL<br /> 20 0 0.0 779 100.0 779 100.0 - "Unknown component: #xA80DB98"<br /></pre><br />So you see, not only have we reduced the overall runtime by 10x, but neighbors-of not accounts for a tiny percentage of the total runtime (so we can safely ignore it, at least for now). This result also showed me that to proceed I needed to concentrated on reducing the cost of the set-difference. Here I knew from experience I could do the same thing with a bit vector quickly, but even if I didn't know that, I'd know this is where I wanted to concentrate my efforts.<br /><br />There is one other thing I should note. There is a lot of cruft in these reports from flet, labels etc, which brings up a bit of an issue. I've sort of implicitly advocated wrapping things up as local functions to keep the local detail local, and not fill your namespace with all sorts of specialized functions.<br /><br />There are a couple of downsides to this. The more important one is that it can be much easier to debug separate functions. Additionally, when (if!) we get to the point of profiling and optimizing, it can be harder to see what is going on. On the other hand, a tiny function like the set-digit we had (which just wraps an array reference) could have more calling overhead than actual overhead, in this case we can distort the result by not allowing the compiler to easily elide the call. It can be a bit of a pain to move from one form to the other, but in practice I often start off will a bunch of little functions and then may wrap them all up in a flet or labels later.<br /><br />Next time I'll talk about more measurement and optimization issues, and begin discussing algorithmic changes we can make (and why).skahttp://www.blogger.com/profile/07206056709352200940noreply@blogger.comtag:blogger.com,1999:blog-5048733824376310579.post-82940638806668973002007-07-05T12:23:00.000-05:002007-07-05T14:35:56.029-05:00optimizing the solver, part IOptimization, particularly when `premature' has a deservedly bad reputation. In the quest for `fast' code, it is very easy to end up with something that is buggy and hard to read. Many coders fall into the trap of thinking about this far too early in the development cycle .. and the code suffers for it.<br /><br />That being said, sometimes we have no choice. Over the next couple of posts I'm going to discuss what we can do with this `sudoku' example. I'm going to make some specific points along the way, but before that, lets talk a bit about a philosophy of sorts for optimization.<br /><br />You hear a lot of pithy sayings about code optimization (e.g. `premature optimization is the root of all evil'), some much more useful than others. If I had to sum it up my feelings about it in a single sentence, I'd probably say (almost certainly borrowed this, but no idea where from): "Rules for optimization: 1) Don't 2) Don't yet 3) If you must, analyse & measure everything and work from there". <br /><br />This works for me, particularly because coming from a background (asm/c) that lends itself to blazingly fast code nobody can read, I can easily fall into the trap of premature optimization. The 3rd point is crucial though; you can waste an astonishing amount of time polishing code that you <b>thought</b> was slowing you down, but it wasn't. A corollary to this statement is `Know your implementation'. On that note, I'll try and stay as general as possible, but some of the things that follow will be tailored to the sbcl compiler. Really high performance code will often necessarily be implementation dependent (and cpu dependent, and cache/memory dependent ... and if you get to that level may be quite counter-intuitive). <br /><br />We'll be talking about common lisp here, and one of the really nice things about CL is that you get an extremely expressive high level language, but without resorting to libraries etc. you can still have very efficient compiled code. The language is unique in that respect, in my experience. For what it's worth I've found that with a bit of work, my numerical CL code tends to be within a factor of 2 or so of code (and yes, sometimes faster although usually not). The advantages in development time and correctness/debugging relative to c have been very large in my experience. It's a very research-friendly language. Of course we can't hope to meet carefully tuned asm coding --- but who can?<br /><br />So now we've dispensed with the obligatory nod to reason and good sense, let's hammer on this poor little sudoku solver (the previous few entries, if you haven't been following along) a bit. If you'll recall, we had a problem:<br /><pre><br />BLOG> (time (find-first-solution *slow*))<br />Evaluation took: 113.817 seconds of real time<br /></pre><br />(I'm on a different, slightly slower machine this time). I'll note that everything you see here has been declaimed to have (optimize speed). Now broadly speaking, algorithmic changes will beat `tuning', but both are important. There are always trade-offs, too; a very fast algorithm may be difficult to read and implement correctly, for example.<br /><br />We have here a simple algorithm that tries a number of possible configurations until it finds a solution. This means we have two ways to make it faster. 1) make each individual attempted entry faster, and 2) look at less possibilities.<br /><br />#2 involves changing the algorithm to be smarter about where we look. I'll take that up in a second post. For now, let's look at #1, how to make each step faster.<br /><br />There is one thing about the current code that immediately jumps out at me. There are only 81 possible positions we examine, and each of these has 20 neighbors (same row, same column, or same cell). Every time we want to look at a position, we compute every neighbor and cons them up in a list. Obviously we'll benefit from not repeating these computations:<br /><pre><br />BLOG> (memoize:memoize-function 'neighbors-of)<br />NEIGHBORS-OF<br />BLOG> (time (find-first-solution *slow*))<br />Evaluation took: 9.531 seconds of real time<br /></pre><br />Pretty easy way to gain an order of magnitude, no? And you know what they say: and order of magnitude here, an order of magnitude there --- pretty soon you're talking about real performance improvements.<br /><br />The above relied on <a href="http://www.cliki.net/memoize"> memoization</a> code. Since this is a simple problem of looking up a known (small) set of lists, we could also just minorly modify the definition of neighbors-of too, by precomputing everything and just closing over a vector of the lists:<br /><pre><br />(loop with array = (make-array 81)<br /> for idx below 81 do<br /> (loop with (i0 j0) = (index->subs idx)<br /> for n below 9 <br /> collecting (subs->index i0 n) into res<br /> collecting (subs->index n j0) into res<br /> finally<br /> (loop repeat 3 for i upfrom (* 3 (floor i0 3)) do<br /> (loop repeat 3 for j upfrom (* 3 (floor j0 3)) do<br /> (push (subs->index i j) res)))<br /> (setf (aref array idx) (sort (remove idx (remove-duplicates res)) #'<)))<br /> finally<br /> (defun neighbors-of (idx)<br /> (aref array idx)))<br /><br />BLOG> (time (find-first-solution *slow*))<br />Evaluation took: 8.081 seconds of real time<br /></pre><br /><br />This is less general, but also a little faster. With that obvious fix taken out of the way, let's move on the specific changes. For the sake of vertical space, just for the moment, I'm going to dispense with both the signal & restart stuff described previously, and the local functions set-digit and clear-digit. We'll bring this all back soon.<br /><br />I'm going to introduce several changes. If you're interested, I encourages separating out the changes and measuring their relative effect (I'm going to talk a bit more about measurement in another post soon). Here is the first variant:<br /><pre><br />(defun solve4 (initialstate)<br /> (declare (optimize speed))<br /> (let ((soln (make-array `(81) :element-type '(integer 0 9) :initial-contents initialstate)))<br /> (labels ((possible-digits (idx)<br /> (let ((set (make-array 10 :element-type 'bit :initial-element 0)))<br /> (over-neighbors (n idx)<br /> (setf (sbit set (aref soln n)) 1))<br /> set)) <br /> (solve (n)<br /> (when (= n 81) (return-from solve4 soln))<br /> (if (/= 0 (aref soln n))<br /> (solve (1+ n))<br /> (loop with possible = (possible-digits n)<br /> for digit from 1 to 9<br /> when (zerop (sbit possible digit)) do<br /> (setf (aref soln n) digit)<br /> (solve (1+ n))<br /> (setf (aref soln n) 0)))))<br /> (solve 0))))<br />BLOG> (time (solve4 *slow*))<br />Evaluation took: 0.043 seconds of real time<br /></pre><br />There are two significant changes here. I've added declarations to help the compiler (sbcl in this case) generate efficient code. I've also replaced the list-based sets of possible digits with a bit array. This may be implementation dependent, but in this case the gain is large.<br /><br />One little algorithmic tweak can be made too. As it is, we walk over every position, and just ignore it if it is already set. Rather than do this, we can walk a list and only look at the positions that we actually need to:<br /><pre><br />(defun solve5 (initialstate)<br /> (declare (optimize speed))<br /> (let ((soln (make-array `(81) :element-type '(integer 0 9)))<br /> (unassigned `()))<br /> (labels ((possible-digits (idx)<br /> (let ((set (make-array 10 :element-type 'bit :initial-element 0)))<br /> (over-neighbors (n idx)<br /> (setf (sbit set (aref soln n)) 1))<br /> set))<br /> (solve (n)<br /> (loop with possible = (possible-digits n)<br /> for digit from 1 to 9<br /> when (zerop (sbit possible digit)) do<br /> (setf (aref soln n) digit)<br /> (if unassigned<br /> (solve (pop unassigned))<br /> (return-from solve5 soln))<br /> (setf (aref soln n) 0)<br /> finally<br /> (push n unassigned))))<br /> ;;setup<br /> (loop for n below 81<br /> for x across initialstate<br /> if (= 0 x) do<br /> (push n unassigned)<br /> else do<br /> (setf (aref soln n) x))<br /> (solve (pop unassigned)))))<br />BLOG> (time (solve5 *slow*))<br />Evaluation took:0.01 seconds of real time<br /></pre><br /><br />So with very little effort, we've made it from about 100s to about .01s ... I wasn't kidding about the 10,000 times speedup. We're not finished though.. I'll add a post about the limitations of this sort of measurement, and some algorithmic improvements too. You'll probably notice I haven't really backed up my `measure everything' statement either, because I've just shown that the changes I made worked (on this example, anyway) not why I was driven to try them. This is a long enough entry, though, and I'll take that up next time.skahttp://www.blogger.com/profile/07206056709352200940noreply@blogger.comtag:blogger.com,1999:blog-5048733824376310579.post-15863487436132141242007-07-03T21:16:00.000-05:002007-07-03T23:04:52.443-05:00format-ing output, and a peek at performanceContinuing comments around the ongoing sudoku examples:<br /><br />Reader Leslie had a sensible comment, a post or two ago: `Why didn't you use a 2D array in the implementation, it seems more natural'.<br /><br />I admit, from the point of view of the puzzles themselves this would seem to be true. And if I was writing about an all-singing, all-dancing sudoku system, I'd probably be working in 2D. Actually, that's a lie; I'd probably start with (defclass sudoku ...) and hide these sorts of details. The interface would allow 2D subscripts though. <br /><br />In this case, I wasn't actually thinking of the problems the way you see them on paper, I was thinking about them as problems on a set of vertices with certain connectivity. I've renamed some variables and things to hide this, but a 1D array was natural. There's no reason we couldn't have written essentially the same code in 2D, though. The neighbors-of function would be nearly the same, now it would collect up (i,j) pairs instead of indices n. It would be less clean, because you'd have to take apart the i,j pairs all over the place to use them (or #'apply to them). Also, there isn't a nice ordering now of how to call the inner recursion. All these things could be done fairly nicely. There <span style="font-style:italic;">are</span> some performances reasons that may push you to use a 1D representation like this, but at this stage of the game that's the last thing we want to drive the design. Still, it will come up.<br /><br />Before we get into all that though, Leslie's comment reminded me that it's kind of counterintuitive to look at these things as vectors though, so I'll put together a little printing function and as an added bonus it will work for 1D or 2D representations:<br /><pre><br />(defun print-sudoku (array &optional (stream t))<br /> (let ((control-string <br /> "~{~A~^ ~#[~;~;~;| ~;~;~;| ~;~;~;~%~<br /> ~;~;~;| ~;~;~;| ~;~;~;~%~<br /> ~;~;~;| ~;~;~;| ~;~;~;~%~<br /> ------+-------+------~%~<br /> ~;~;~;| ~;~;~;| ~;~;~;~%~<br /> ~;~;~;| ~;~;~;| ~;~;~;~%~<br /> ~;~;~;| ~;~;~;| ~;~;~;~%~<br /> ------+-------+------~%~<br /> ~;~;~;| ~;~;~;| ~;~;~;~%~<br /> ~;~;~;| ~;~;~;| ~;~;~;~%~<br /> ~;~;~;| ~;~;~;| ~;~;~;~%~<br /> ~:;~]~}" )<br /> (displaced (make-array 81 :element-type (array-element-type array) :displaced-to array)))<br /> (loop for x across displaced<br /> if (= x 0) collect "." into res<br /> else collect x into res<br /> finally (format stream control-string res))))<br /></pre><br />You can get tricky about format strings, and with all the repetition here I'm sure I could have done something much more compact, but I like the way spreading it out like this makes it pretty obvious what I'm doing even if you're not so used to format. It's also assuming I only feed it nice input, etc. but we won't worry about that now. Here's how it looks:<br /><pre><br />BLOG> *slow*<br />#(0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 9 0 0 9 7 0 3 0 0 0 0 0 0 1 0 0 6 0 5 0 0 0 0 4<br /> 7 0 8 0 0 2 0 0 0 0 0 2 0 0 6 0 3 1 0 0 4 0 0 0 0 0 0 8 0 0 1 6 7 0 8 7 0 0 0<br /> 0 0 0)<br />BLOG> (print-sudoku *slow*)<br />. . . | . . . | . . . <br />. . . | . . . | 9 . . <br />9 7 . | 3 . . | . . . <br />------+-------+------<br />. 1 . | . 6 . | 5 . . <br />. . 4 | 7 . 8 | . . 2 <br />. . . | . . 2 | . . 6 <br />------+-------+------<br />. 3 1 | . . 4 | . . . <br />. . . | 8 . . | 1 6 7 <br />. 8 7 | . . . | . . .<br />BLOG> (loop with x = (make-array `(9 9)) <br /> for n below 81 do <br /> (setf (row-major-aref x n) (aref * n)) <br /> finally (return x))<br />#2A((0 0 0 0 0 0 0 0 0)<br /> (0 0 0 0 0 0 9 0 0)<br /> (9 7 0 3 0 0 0 0 0)<br /> (0 1 0 0 6 0 5 0 0)<br /> (0 0 4 7 0 8 0 0 2)<br /> (0 0 0 0 0 2 0 0 6)<br /> (0 3 1 0 0 4 0 0 0)<br /> (0 0 0 8 0 0 1 6 7)<br /> (0 8 7 0 0 0 0 0 0))<br />BLOG> (print-sudoku *)<br />. . . | . . . | . . . <br />. . . | . . . | 9 . . <br />9 7 . | 3 . . | . . . <br />------+-------+------<br />. 1 . | . 6 . | 5 . . <br />. . 4 | 7 . 8 | . . 2 <br />. . . | . . 2 | . . 6 <br />------+-------+------<br />. 3 1 | . . 4 | . . . <br />. . . | 8 . . | 1 6 7 <br />. 8 7 | . . . | . . .<br /></pre><br />So that looks a bit better. Where I'm heading with all this stuff, by the way (at least as an excuse to discuss bits an pieces) is to make a puzzle generator. Whenever I make a change in the generator, I'll want to verify it has a single unique solution. So each puzzle will get solved many, many times -- and not just looking for a first solution either.<br /><br />Now might be a good time for a sanity check then. I've shown that this algorithm works, but what sort of times are we looking at? You might wonder why the particular puzzle above is called *slow*. Here's why:<br /><pre><br />BLOG> (time (find-first-solution *slow*))<br />Evaluation took:<br /> 92.649 seconds of real time<br /> 88.828224 seconds of user run time<br /> 1.617718 seconds of system run time<br /> [Run times include 5.09 seconds GC run time.]<br /> 0 calls to %EVAL<br /> 0 page faults and<br /> 7,634,328,056 bytes consed.<br />#(1 2 3 9 4 6 7 8 5 8 4 6 2 5 7 9 3 1 9 7 5 3 8 1 6 2 4 3 1 2 4 6 9 5 7 8 5 6 4<br /> 7 1 8 3 9 2 7 9 8 5 3 2 4 1 6 2 3 1 6 7 4 8 5 9 4 5 9 8 2 3 1 6 7 6 8 7 1 9 5<br /> 2 4 3)<br />BLOG> (print-sudoku *)<br />1 2 3 | 9 4 6 | 7 8 5 <br />8 4 6 | 2 5 7 | 9 3 1 <br />9 7 5 | 3 8 1 | 6 2 4 <br />------+-------+------<br />3 1 2 | 4 6 9 | 5 7 8 <br />5 6 4 | 7 1 8 | 3 9 2 <br />7 9 8 | 5 3 2 | 4 1 6 <br />------+-------+------<br />2 3 1 | 6 7 4 | 8 5 9 <br />4 5 9 | 8 2 3 | 1 6 7 <br />6 8 7 | 1 9 5 | 2 4 3<br /></pre><br />For what it's worth, that's compiled under sbcl 1.0.6 on a intel dual-core 2.16Ghz.<br />Hmmmm. We may have a bit of a problem here. This is where a neophyte lisp user might get discouraged, thinking `what use is all this if I can't use the code for anything computationally expensive?'.<br /><br />Don't worry about that. Next time I'll start to describe several simple changes -- and will see a speedup of, oh, about 10,000 times.skahttp://www.blogger.com/profile/07206056709352200940noreply@blogger.comtag:blogger.com,1999:blog-5048733824376310579.post-26839336544249766742007-07-02T14:01:00.000-05:002007-07-02T15:16:33.178-05:00a glimpse of conditions and restartsCommon lisp has a `condition system'. It took me an embarrassingly long time to realize how useful this can be. I had a mixed impression of the exception systems in c++ and the ilk, and so when I first saw mention of signals in CL I guess I thought `oh, ok, it's got catch/throw', and moved on (it does have catch/throw, and they aren't the same thing as what I'm going to talk about today, but that's a different story and I haven't really had a use for them). It turns out though, they are quite a different animal for one key reason. Not only does a condition allow you to sidestep the call stack, but <span style="font-style:italic;">you don't have to unroll the stack</span>, so you don't need to lose all the state you've got when this happens.<br /><br /><br />Let's see what this means for the sudoku example. Take it on faith for now (in practice, I changed this code after writing something else that made it necessary, of course) that I actually want something more than just finding the first solution to a puzzle, if it exists. I want to be able to answer three questions: `is there a solution (and what is it)?', `are there multiple solutions?' , `what are all the solutions?'<br /><br /><br />I'm going to make a minor change to the core algorithmic in the function `update' (defined last time):<br /><br /><pre><br />(update (n)<br /> (when (= n 81) <br /> (restart-case <br /> (signal 'solution-found :solution soln)<br /> (keep-looking () <br /> (return-from update nil))))<br /> (if (nth-value-set-p n)<br /> (update (1+ n))<br /> (over-possible-digits (d n)<br /> (set-digit n d)<br /> (update (1+ n))<br /> (clear-digit n)))))<br /></pre><br />So what does this mean? First off, it won't work until I define the condition solution-found that I'm going to signal, so let's do that:<br /><pre><br />(define-condition solution-found (condition)<br /> ((solution :initarg :solution :reader solution)))<br /></pre><br />What does this buy us? Now all three of my questions can be answered by the same piece of code. A little thought about it will make it clear that the core `search' part of the algorithm is the same regardless if you are looking for one solution or all solutions, all that is needed is a mechanism to keep going. By wrapping up the solution in a condition (called solution-found) and using #'signal rather than returning-from the outer function, we merely sidestep the usual return mechnism and there is no real gain. However, it is the restart that makes this approach different: the key is that we don't have to unroll the stack and loose all the accumulated state; we can pop our heads up and say `hey, here's a solution' via the sigal mechanism, and then go back to the search via invoking the restart. Here is one way it could work.<br /><pre><br />(defun find-first-solution (puzzle)<br /> (handler-case (solve2 puzzle)<br /> (solution-found (x) (solution x))))<br /><br />(defun multiple-solutions-p (puzzle)<br /> (let ((found 0))<br /> (handler-bind ((solution-found<br /> #'(lambda (c) (declare (ignore c))<br /> (when (> (incf found) 1)<br /> (return-from multiple-solutions-p t))<br /> (invoke-restart 'keep-looking))))<br /> (solve2 puzzle))))<br /><br />(defun find-all-solutions (puzzle)<br /> (let ((solns `()))<br /> (handler-bind <br /> ((solution-found<br /> #'(lambda (c)<br /> (push (make-array 81 :initial-contents (solution c)) solns)<br /> (invoke-restart 'keep-looking))))<br /> (solve2 puzzle))<br /> solns))<br /><br />Which allows, for example:<br /><br />BLOG> (multiple-solutions-p (vector 0 0 0 9 0 0 0 0 0 0 0 0 8 7 0 0 3 5 0 1 4 0 <br />0 0 0 0 0 1 3 0 0 2 0 9 0 0 6 0 0 0 0 0 0 0 8 0 0 7 0 1 0 0 5 3 0 0 0 0 0 0 1 2 <br />0 2 4 0 0 8 0 0 0 0 0 0 0 0 0 0 0 0 0))<br />T<br />BLOG> <br /></pre><br />So what's good about this? We don't muddy up the implementation of solve1 with all sorts of policy details depending on what we want as output. We could have added a switch to solve1, and some sort of case-dependent handling of the solutions when we found them. This doesn't scale well, and can turn a clean, obvious implementation into something you have read very carefully. It also means we don't really care what comes along later, if we want to add a slightly different functionality, the implementation at the level of solve1 doesn't change. This is a clean way of separating the concerns of solve1 (finding solutions) from the broader concerns of what to do with them.<br /><br />As the title suggests, this was only a glimpse --- I've left out a load of stuff about signals, conditions, etc. You may have noticed I haven't mentioned errors at all! This has, I hope, given a picture of one thing that makes the CL condition ssystem <span style="font-style:italic;">different</span> from what you may already be used too, though.<br /><br />Oh, and one more thing. Reader Frédéric correctly notes that I could have implemented index->subs (in the last entry) more cleanly by simply using multiple-values-list:<br /><pre><br />(defun index->subs (n)<br /> (multiple-value-list (floor n 9)))<br /></pre><br />I had actually missed this the first time when I was changing (slightly) what I actually did into what I described in the blog entry. I've been away from lisp for about a year, and I'm currently knocking the rust out, so there is likely to be quite a bit of that. This time I caught it while writing it up and decided that it was probably worthwhile to just skip the discussion of multiple values today, assumeing my audience for this particular series of posts. Since it's allready there in the comments, I'll note it, and thanks for catching it!skahttp://www.blogger.com/profile/07206056709352200940noreply@blogger.comtag:blogger.com,1999:blog-5048733824376310579.post-33863715252433247012007-06-28T22:19:00.001-05:002007-07-06T13:03:01.891-05:00exploratory programmingSo one of the things I find most useful about lisp is the way it facilitates exporatory programming. The last entry described a bactracking algorithm, what would a recursive implementation look like? How about something like this:<br /><pre><br />(defun update (n)<br /> (when (= n 81) (return soln))<br /> (if (nth-value-set n)<br /> (update (1+ n))<br /> (over-possible-digits (d n)<br /> (set-digit n d)<br /> (update (1+ n))<br /> (clear-digit n)))<br /></pre><br />This is almost but not quite valid code, think of it as pseudo-code that describes what I'd like to do, I just haven't quite figured out how yet. But the idea is there. I have 81 possible positions in my 9x9 matrix. Lets call them 0..80. If all the bits of this code do what their names suggest, we'll be in business.<br /><br />This brings up a bit of an issue, development-wise. I don't really don't want to expose details I don't need to, and at the moment I'm not really sure how I'm going to do anything. It's nice to use labels/flet to keep locally important details and assumptions local, and not fill the namespace with little functions that might quite non-general. On the other hand, it can be a pain to debug local functions. It's a trade off.<br /><br />Now I need to flesh out the functions the first bit depended on. I'm to represent the state of things as a 81 element vector, and most of the code follows directly:<br /><pre><br />(defun solve1 (initialstate)<br /> (let ((soln (make-array 81 :initial-contents initialstate)))<br /> (labels ((digit-at (n)<br /> (aref soln n))<br /> (set-digit (n d)<br /> (setf (aref soln n) d))<br /> (clear-digit (n)<br /> (setf (aref soln n) 0)) <br /> (possible-digits (n)<br /> (set-difference<br /> `(1 2 3 4 5 6 7 8 9)<br /> (mapcar #'digit-at (neighbors-of n))))<br /> (nth-value-set-p (n)<br /> (/= 0 (digit-at n)))<br /> (update (n)<br /> (when (= n 81) (return-from solve1 soln))<br /> (if (nth-value-set-p n)<br /> (update (1+ n))<br /> (over-possible-digits (d n)<br /> (set-digit n d)<br /> (update (1+ n))<br /> (clear-digit n)))))<br /> (update 0))))</pre><br />So I've localized most assumptions. I'm representing the sudoku puzzle as a length 81 vector (later i'll specialize this) but I can pass in any sequence. I'm somewhat redundantly using set-digit and clear-digit, but if I want to change my representation, I don't have to touch the core algorithm.<br /><br />I've only got two problems left. I don't know what neighbors-of is, and I don't know what over-possible-digits is. It's pretty clear from the function possible-digits though, that for now at least I can implement this as a simple macro:<br /><pre><br />(defmacro over-possible-digits ((d n) &amp;body body)<br /> `(dolist (,d (possible-digits ,n))<br /> ,@body))</pre><br />This represents a bit of a departure from keeping things localized, but I'll live with that. If it sticks around we can turn this into a macro-let to get it in the body of solve1. I could, of course, have just coded the dolist and commented what it was doing, but this is both self-documenting and allows me to play with the representation without changing update at all, which I think is a good thing.<br /><br />It's the implementation of possible-digits that also gives me an idea of how to go about makeing `nieghbors-of'. What I really want is a way to check what constraints are. That means looking over every position in the matrix in the same row, column, or cell as the current one. We could do this in a 2D way with i,j indices but it is easier to label each position by its row-major index. The whole matrix looks like:<br /><pre> <br /> 0 1 2 3 4 5 6 7 8<br /> 9 10 11 12 13 14 15 16 17<br />18 19 20 21 22 23 24 25 26<br />27 28 29 30 31 32 33 34 35<br />36 37 38 39 40 41 42 43 44<br />45 46 47 48 49 50 51 52 53<br />54 55 56 57 58 59 60 61 62<br />63 64 65 66 67 68 69 70 71<br />72 73 74 75 76 77 78 79 80<br /></pre><br />I'll make a couple of helper functions to jump between 2D subscripts and the indices, hard-coded for 9x9 (at least for now):<br /><pre><br />(defun subs->index (i j)<br /> (+ (* 9 i) j))<br /><br />(defun index->subs (n)<br /> (let ((i (floor n 9)))<br /> (list i (- n (* 9 i)))))<br /></pre>Then neighbors-of is a function that gathers up a list of every index in the same row, column, or 3x3 submatrix as the current `idx'.<br /><pre><br />(defun neighbors-of (idx)<br /> (loop with (i0 j0) = (index->subs idx)<br /> for n below 9<br /> collecting (subs->index i0 n) into res<br /> collecting (subs->index n j0) into res<br /> finally<br /> (loop repeat 3 for i upfrom (* 3 (floor i0 3)) do<br /> (loop repeat 3 for j upfrom (* 3 (floor j0 3)) do<br /> (push (subs->index i j) res)))<br /> (return (remove idx (remove-duplicates res)))))</pre><br />And we have a complete solver:<br /><pre>CL-USER> (vector 0 6 0 0 0 0 0 1 0 0 0 0 6 5 1 0 0 0 1 0 7 0 0 0 6 0 2 6 2 0 3 0 5 0 9 4 0 0 3 0 0 0 2 0 0 4 8 0 9 0 7 0 3 6 9 0 6 0 0 0 4 0 8 0 0 0 7 9 4 0 0 0 0 5 0 0 0 0 0 7 0)<br />#(0 6 0 0 0 0 0 1 0 0 0 0 6 5 1 0 0 0 1 0 7 0 0 0 6 0 2 6 2 0 3 0 5 0 9 4 0 0 3<br />0 0 0 2 0 0 4 8 0 9 0 7 0 3 6 9 0 6 0 0 0 4 0 8 0 0 0 7 9 4 0 0 0 0 5 0 0 0 0<br />0 7 0)<br />CL-USER> (solve1 *)<br />#(5 6 8 4 7 2 3 1 9 2 3 9 6 5 1 8 4 7 1 4 7 8 3 9 6 5 2 6 2 1 3 8 5 7 9 4 7 9 3<br />1 4 6 2 8 5 4 8 5 9 2 7 1 3 6 9 7 6 5 1 3 4 2 8 8 1 2 7 9 4 5 6 3 3 5 4 2 6 8<br />9 7 1)<br /></pre><br />So that's something, we've got a backtracking solver in about 40 lines, if we wanted to play `shortest version' it could be a lot smaller. It's pretty rudimentary, and the input and output are hard to read, but it works.<br /><br />Something to throw darts at in the next installment, anyway.skahttp://www.blogger.com/profile/07206056709352200940noreply@blogger.comtag:blogger.com,1999:blog-5048733824376310579.post-39812163867840496532007-06-28T15:55:00.000-05:002007-07-02T18:34:02.343-05:00sorry, more background[Edit: I decided I went about this bass-ackward, so I've trimmed it and I'll try again in the next entry. Bear with me, I'm not used to this medium]<br /><br />Like I said before, I'm not really interested in solving sudoku puzzles, but we need a concrete problem to talk about. I'm not going to say much about the puzzles themselves, but I'll sketch the problem for completeness.<br /><br />We have a 9x9 matrix, and each entry can be one of the digits {1,2,...,9}. These are a form of Latin Square, and the rule is that each digit must be unique in it's row, in it's column, and in the 3x3 submatrix it belongs to. Given a matrix with some entries filled in and the rest left blank, can we fill in the blank entries to find a completed matrix that satisfies these constraints?<br /><br />Now clearly these puzzlies live within a large space (9^81 configurations) that is constrained, but still very large. The brute force approach of trying to randomly fill in blank values until we find a valid solution is likely to leave us twiddling our thumbs for rather a long time. What we need is some way to prune the tree of possible configurations in order to make the search reasonable. An obvious choice for this is a backtracking algorithm which basically works like this:<br /><br />1) Go to the first blank, and enter the first valid digit (one that doesn't violate a constraint)<br />2) Go to the next blank, try and find a valid digit<br /> 2a) if you can't, back up and try the next digit on the previous blank<br /> 2b) if you can, fill in that blank and move to the next one<br />3) repeat 2<br />4) stop if you've reached a valid solution<br /><br />If you think about this a bit you'll see that the algorithm will explore the tree of possible configurations as far along each branch as it can until it reaches an invalid configuration. Since most configurations are not allowed, this greatly reduces the search, but it is guaranteed to find a solution (eventually) if it exists.skahttp://www.blogger.com/profile/07206056709352200940noreply@blogger.comtag:blogger.com,1999:blog-5048733824376310579.post-43077327005900260002007-06-28T14:25:00.000-05:002007-07-06T13:08:58.038-05:00No, wait, hear me out.For my first trick, I'm going to talk about ..... sudoku. No wait, don't run away, or at least hear me out. At this point, you might ask `why bother?'. There are a million bits of code around, it's a simple problem (or is it?) and why would we even care?<br /><br />Fair enough. But I realized (while playing with a puzzle generator) that this is actually a pretty good simple example for some things I wanted to demonstrate. So I'm going to use this to motivate a discussion of some issues of both why Lisp is a good language for protyping and research code, and how (and how not) to make it fast when we need to. I'll also discuss a few typical tripping points for those coming from a more c-like background.<br /><br />Ok, so if you are still with me, I'm targetting this as a bit of exposition, aimed at a maths grad student, about how we might go about writing some lisp code, and why bother. That means my audience maybe hasn't done much programming, or has only done so in domain specific languages like matlab or mathematica. So I'll include some comments tailored to that issue. However, in the hopes some of my comments about optimization etc will be more generally useful, I'll try and not make much assumption about the readers background, either.<br /><br />Rather than have monolithic posts, I'm going to try and break things up a bit. To give you an idea where this is going, I want to talk a bit about design flexibility and abstraction, what Lisp buys you as a research coder, how to develop fast numerical code when you need to (and when not to), and introduce some parts of Common Lisp such as types, macros, and the conditions system, that will not be familiar to some readers.<br /><br />I should point out that I'm not reall interested in sudoku puzzles themselves. I'm told there are some interesting open problems associated with them, but I'm not a graph theorist by any stretch of the imagination. Furthermore, the methods I'm going to talk about arguably aren't a very good approach to solve these puzzles with, depending on what your actual goals are. For our purposes though, this will be fine.<br /><br />However, you can look at these as graph colouring problems (they are special cases of a `Latin Square') which puts us into the territory of an NP-complete search through a space of configurations. This is much closer to problems that I am interested in, but has the advantage of being easy to specify and play with.<br /><br />I'll take up the specifics in the following posts.skahttp://www.blogger.com/profile/07206056709352200940noreply@blogger.comtag:blogger.com,1999:blog-5048733824376310579.post-22175872312429681752007-06-28T11:56:00.001-05:002007-06-28T11:57:03.947-05:00I suspect there is a customary `hello world' post, so I guess this is it. I'm not sure how I'll use this, or if I actually will much. In any case, there are some thoughts I wanted to get down and keep somewhere, after mostly reproducing them in emails a few times. On the off chance it's useful to a few more people, I'll put them here.<br /><br />I suspect I'll mostly talk about Lisp, programming, research, and sundry. I plan to talk a bit about Lisp initially --- why I started using it, why I still use it for research projects when I can. I'm not any sort of expert but I'll outline a few things that work well for me, and why. I suspect I'll drop a few random bits and pieces here too.<br /><br />Till then!skahttp://www.blogger.com/profile/07206056709352200940noreply@blogger.com