July 2009 Archives

Meta-Programming for Dummies

| No Comments

Here's something I have done a few times. Take the "document":

Hello, World!

And turn it into a Python program:

print """
Hello, World!
"""

As long as your "document" doesn't include triple-quotes, you're laughing.

The reason you might do this is because the "document" can then be (refactored to be) parameterized nicely; for example:

place='World'

print """
Hello, %(place)s!
""" % vars()

(Note: percent-signs in the text suddenly matter.) Going a little bit further:

greeting= sys.argv[1]
place   = sys.argv[2]

print """
%(greeting)s, %(place)s!
""" % vars()

(In which case, the message is effectively from the command line.)

The above is exactly how an old www.verilab.com's pages were built. The version before that (identical-looking) were all done by copy-and-paste, so the Javascriptery, menus, sidebars, etc., were subtly and inexplicably different from page to page. I abstracted all of that guff out into Python variables (in a module called COMMON.py), and then splatted the guffery back in, exactly as above.

The main point about the technique: some nice gain for tiny effort.

An EDA context in which I've used nearly the same technique is: generating Makefiles. (Precisely: instead of a Makefile, have a "script" called something like projmake. It generates the Makefile, and, having done so, then exec's make to do the business.)

What I would do: Keep all of the project data in Perl or Python tables. Optionally, throw in a little sanity-checking to make sure those tables are sensible. Then generate -- and this is the important bit... -- a long but boneheadedly-simple Makefile from the data. Because you're generating it, there's no reason to use clever make-ry, and the simple Makefile should be easier to debug if you have to.

And one final thing: if there's some weird completely-odd thing that needs to happen in your make workflow (the kind of thing that can give grief to a "real Makefile"), you just program it into the projmake script as an "outlier".

Tedious, repetitive, nearly-the-same, textual content, whether code or data? -- consider this simplest form of "meta-programming" (a program that writes a program).

[An earlier version of this note appeared in Verilab's internal newsletter.]

Crazy Code Commenting

| No Comments

Verilab folklore is that I am opposed to comments in code, whereas ... well, everyone else ... is in favor of fulsome "blue pencil" commentary. (My actual view is: Better to express yourself with code than comments -- clear code with no comments is the ideal.)

There is one type of commenting that I'm all for, and it tends not to show up in "coding guidelines" and "good practices". Maybe this should tell me something; however, I would like to think it is because the practice applies to mature code, not new code (which tends to steal the limelight).

Scenario: A body of code has been "released" for a while. A bug manifests itself, is logged in the bug tracker, and you're put on the job. You dig around, and you believe you've cracked the problem.

In this context, you will find that the system has been used in a way that you (the developers) didn't predict or, at best, one that you didn't test for. Sometimes the user has given a wacky-but-plausible combination of inputs. If threads are in play, things happened in an unexpected order.

There is a fighting chance that the fix, whatever it is, will be an odd-looking special case added to the code base.

How should this new special case be commented? I would argue: not as normal, and certainly not by merely tossing in the issue-tracking number. I would put the entire bug story inline, next to the funny code.

Why? Because the kind of code just added is exactly the kind that makes no sense at all to the guy who next reads the code, three years later. It's an obvious candidate for "What's this? Oh, delete it, then." (One plus point for including the full story is that maybe the code can be deleted later: "Ah, yes, but that can't happen any more.")

Another reason to wax rhapsodic is that your fix may contain bugs of its own. (Rule of thumb: 25% of fixes have bugs.) At least people will know why you checked in new boneheaded code.

Granted, the result of this commenting strategy will be pretty odd-looking: stretches of clean "self-documenting" code punctuated by long tales of heroic debugging-wrestling. (Perhaps when the latter starts to dominate, it's time for a rewrite?)

[An earlier version of this note appeared in Verilab's internal newsletter.]

What lies behind the SPEC CPU benchmark suite

| No Comments

I am a long-time fan of the SPEC CPU benchmark suite(s). The current version (SPECcpu2006) was released in August, 2006, replacing the 2000 suite, which was the successor of 1995, 1992, and 1989 versions. They are currently collecting programs for the next version of the benchmark.

I will give the history and significance of this benchmark suite, which concentrates on raw computing speed. It was "benchmarks done right" after four decades of those done wrong and so remains instructive.

History

A computer system never runs quicker than its slowest piece. For a long time (into the 1970s?), the CPU was the straggler, so knowing a machine's MIPS/MFLOPS (millions of {instructions, floating-point operations} per second) rating was most of what you needed to know about a machine's speed. (Scientists cared about MFLOPS, everyone else about MIPS.)

The simplest (CPU-intensive) benchmark is a small program for a well-defined problem that can be easily run across a range of computers. Common examples have been: computing Fibonacci numbers; solving the N-Queens problem; solving the Towers of Hanoi; matrix multiplication. Such benchmarks tell you a little and are a lot of fun -- and there are still many web pages discussing them. (Nowadays, they turn up more in "my programming language is better than yours" comparisons.)

The other thing about such benchmarks is that it is easy to game the system. Want your Scheme compiler to look good? -- keep an eye out for functions named fib and insert optimal machine code when you see them.

Knowing that a particular machine is whizzy on matrix multiply is not confidence-inspiring if you want to buy something for a general mix of applications. Moreover, it has been a very long time since all of a computer's instructions executed in the same number of cycles.

The first major benchmark to address such concerns was the Whetstone benchmark (1972), a synthetic benchmark intended to reflect the statistical behavior of scientific programs (written in Algol 60, as it happens). It also tried to be less "game-able".

Whetstone was mostly floating-point, so it was inevitable that an integer synthetic benchmark followed -- yes, called Dhrystone (Reinhold P. Weicker, 1984). It had the same goals: being representative of integer-intensive programs, and not being trivialized by compilers. A nice piece of work; but it still meant you were judging complex machines by a single figure of merit (always tempting, always unwise).

As the 1980s moved along, two extra factors started to kick in. First, memory system performance really started to matter (and to become the bottleneck). Second, nearly all code was written in a high-level language (C, for example) and went through a compiler.

The SPEC idea

The idea of the SPEC benchmarks was fabulously simple: collect the source for real programs, and then set standard rules for compiling, running, and reporting the performance results for those programs. This approach acknowledged: that no synthetic benchmark could take the place of real programs; that the whole system needed testing (not just the CPU); and that the compiler was as much part of the system as anything else.

The work on SPEC began in September, 1988, and the first version came out in October, 1989. There were ten programs: gcc, espresso, li, eqntott [mostly integer]; spice, doduc, nasa7, matrix, fpppp, tomcatv [mostly floating-point]. And, yes, many of those were open-source programs floating around at the time (e.g. GCC = GNU C Compiler) -- a program couldn't (and still can't) be used as a benchmark unless the source is publicly available.

The 2006 integer benchmarks are 400.perlbench, 401.bzip2, 403.gcc, 429.mcf, 445.gobmk, 456.hmmer, 458.sjeng, 462.libquantum, 464.h264ref, 471.omnetpp, 473.astar, and 483.xalancbmk link. (The prefixed numbers are, essentially, version tags.) The floating-point benchmarks are: 410.bwaves, 416.gamess, 433.milc, 434.zeusmp, 435.gromacs, 436.cactusADM, 437.leslie3d, 444.namd, 447.dealII, 450.soplex, 453.povray, 454.calculix, 459.GemsFDTD, 465.tonto, 470.lbm, 481.wrf, 482.sphinx3. link

The SPEC CPU benchmarks have changed several times. Why? Some of the main reasons:

  • Computers speed up so much that a program completes "too quickly";

  • Cache sizes increase so that a whole program fits in cache and the memory system (as a whole) becomes irrelevant;

  • Compiler tricks can make some benchmark program into a joke (e.g. a way to compute the answer at compile time, legitimately);

  • New source languages (e.g. C++) and new domains (XML grokking) become interesting.

Computer system vendors took (and take) SPEC benchmarks very seriously. For each program in the suite, they figured out super-complex sets of compiler flags that worked well. (As long as they were reported, this was within the rules.) Because of this, a separate measure -- the SPECbase results -- was introduced: results from using one set of compiler flags across the whole suite.

The SPEC (CPU) benchmarks have affected your life -- some decent fraction of the performance of the fast machines you use is because of much laboring in the SPEC trenches. (It would be even better if the world hadn't fallen for Intel's GHz-are-good wheeze.)

My life as a benchmarker

Back when I was a Haskell guy, that world was particularly blighted by terrible benchmarking. For some reason, functional programmers were obsessed with the Fibonacci function: you could get a paper published -- if not make an entire career -- by having a nicely-behaved functional version of fib. The sub-field was held back by such madness.

One of the things we did at Glasgow was put together the (you guessed it...) nofib benchmark suite. The biggest obstacle at the time was that only a few real, non-trivial Haskell programs existed, never mind could be put into a public suite. nofib was unashamedly based on the SPEC way of doing things. And, as with SPEC, I would say that Haskell users are seeing decent performance in part because of mining in the nofib pit.

Conclusion

The key insight of the SPEC CPU benchmarks was to use real programs. It is amazing that it took until 1990 to take up this idea.

Hope springs eternal, however. People keep cooking up synthetic benchmarks, hoping for a short cut to quality performance comparisons. I am not confident. The SPEC way is likely to be one we will have the pleasure of learning -- again and again.

[An earlier version of this note appeared in Verilab's internal newsletter.]

About this Archive

This page is an archive of entries from July 2009 listed from newest to oldest.

June 2009 is the previous archive.

August 2009 is the next archive.

Find recent content on the main index or look in the archives to find all content.