June 2009 Archives

The Small Dishes of Linux

| No Comments

A quick mention of some of the smaller morsels usually loitering in a nearby /usr/bin...

base64 - Encode/decode data in base64 format. (Useful if writing a script to push weird data through e-mail?)

chacl,getfacl,setfacl - Normal Unix permissions not fine-grained enough? Maybe Linux ACLs will help.

comm - Compare two sorted files line by line.

csplit - Split a file into sections determined by context lines.

diction, style - If you really care about readability of some text, run it through these programs. You might pick up some useful ideas.

env - The right way to modify/set a program's environment (but not other programs').

expand, unexpand - Convert all tabs to spaces (and vice versa).

flock - Manage locks from shell scripts

getopt - If you need a shell script with real option parsing.

hexdump - Want to see a file in hex (or octal (or ...))?

id - Show my userid and groups.

killall - Kill processes by name. (I wouldn't trust it, but you might.)

logger - Do syslog stuff from a shell script/command line

namei - Follow a pathname until a terminal point is found; untangles symlink junk.

pcregrep - Grep but with Perl-compatible regular expressions

pgrep,pkill - A safer way to grep for processes (safer than: ps -ef | grep ...)

printf - For formatted printing in shell scripts.

pstree - Show process tree; I use -p and -u a lot, too.

rename - Rename many files all at once, e.g. rename .htm .html *.htm

renice - Change (usually: lower) the priority of an existing process.

script - Keep a typescript of a terminal session.

seq - Shell script that counts by fives?

for i in `seq 0 5 100` ; do ...

sha1sum - Generate or check a strong checksum; best way to check that file-here is the same as file-there.

split - Split a file into pieces; you need --bytes to split into exact-sized chunks.

stat - More than you wanted to know about a file.

tail - Run, do not walk, if you don't know about tail -f.

tailf - A weird variant tail -f that does less disk access and is therefore friendlier to spun-down laptop disks.

uniq - Squash out identical successive lines -- or show only those duplicates.

[An earlier version of this note appeared in Verilab's internal newsletter.]

Finding Yourself

| No Comments

If pushed, I would say that find is the least-used-most-useful general Unix command. Here we explore real-world uses of find, either things I do or that I wish I could remember how to do :-)

find crawls over a a directory tree, typically reporting the files, directories, etc., that it encounters. It differs from ls in that it recurses into sub-directories.

The general syntax is...

find path1 [path2 ...] predicate(s) action

Almost always, the "path" is . (current working directory). The predicates are the fun part. The action (at least the way I use find) is nearly always -print or -ls. (See a little mention of -print0 later on.)

List what's in the current directory:

# short GNU-ish form:
find
# classic works-anywhere form:
find . -print

(In the GNU-ish minimal example, the "path" defaults to .; there is no predicate -- so everything matches; and the action is indeed -print.)

List the current directory, sorted:

find | sort

List only the things with "tommy" in the name:

find . -name '*tommy*'
# or, equivalently:
find | grep tommy

Only things without "tommy" in the name:

find . '(' '!' -name '*tommy*' ')'
# or...
find | grep -v tommy

The -name thing only matches against the "basename" of the paths that find is chomping through. You can also match against the whole paths with:

find . -wholename '*public/*tommy*'

Choosing names with a regular expression, e.g. to find all files ending in .c, .h, .cc, .cpp, and .C:

find . -regex '.*/svnwork/.*\.\([Cch]\|cc\|cpp\)'

(Note: the -regex thing matches against full paths, as with -wholename. Also note: those are wacky Emacs-style regular expressions; you can change that with -regextype.)

Back to something more sane... Choosing names case-insensitively:

find . -iname '*tommy*'

(There's also -iregex, -iwholename, ...)

Alright, I admit it: it is super-rare for me to use -regex, -iwholename, -iname, etc. However, I very commonly list all the files (no directories, symbolic links, sockets, named pipes, special device files, ...):

find . -type f

And I often want to know more about them than just their name; enter -ls:

find . -type f -ls

Very often, what I want to know is "What are the ten biggest files?" That's:

find . -type f -ls | sort -k 7nr | head

(sort -k 7nr: sort on the 7th column [size, as it happens], numerically, reverse order.)

Now, often in a case like that, I don't even want to bother with, say, files smaller than 100KB; filter those out ahead of time with -size:

find . -type f -size +100000c -ls | sort -k 7nr | head

(Note: with -size, and later with -mtime, +N means "greater than N" and -N means "less than N". You almost never want -size 1000000c, which means "exactly one million characters [bytes]".)

In case it isn't clear: the component parts of a predicate (-type f, -size ...) get ANDed together. (Yes, there is an OR operator: -or.)

And, since I've mentioned it: -mtime checks the modification time of a file, i.e. how old it is. So, for files over 20KB modified sometime in the last month:

find . -type f -mtime -30 -size +20000c -ls | sort -k 7nr

There are a whole slew of find options to do with picking files by their modification/creation/access times and doing so in super-precise ways... The only thing I ever use besides -mtime is...

find . -type f -newer ~/t/last-update -ls | sort -k 7nr

I.e. pick files by their relative age compared to a file. The nice thing about that is you can create a file with the exact time you care about (e.g.

touch -t 200904010000.01 ~/t/april-fool

) and then find against that.

Another 'mtime-y' thing I often do is try to find the guilty among recently-changed files. This usually takes the form:

ls -ltr `find . -type f -mtime -1`

Besides modification and access times, you can also go finding against permissions, with -perm. Typically, the problem is "readable files that shouldn't be", "unreadable files that should be", "files with gratuitous execute permission", or even "generally horked directory permissions". I always have to double-check the manual for this stuff but it's usually things like:

# no read permission of any kind (even owner):
find . -type f \( \! -perm /444 \)

# no read permission of any kind for non-owner (group or other):
find . -type f \( \! -perm /044 \)

# write permission for group or owner:
find .. -type f -perm /022

# any non-directory with some kind of execute permission:
find . \( \! -type d \) -perm /111

Again, there is often a find-plus-grep cheap-and-cheerful equivalent:

# look for directories with permissions we don't like:
find . -type d -ls | grep 'drwx------'

# even more flexible with egrep (regular expression-ish):
find . -type d -ls | grep 'd...------'

Times, permissions,... yes, you can also look for users and groups. The obvious things: find stuff owned by a user (that perhaps shouldn't be), or find stuff that isn't owned by a user (but that should be).

find . -user root -ls
find . \( \! -user tommyk \) -ls

Another tangent... Files (-type f) are not always the object of my attentions. Directories sometimes feature; so...

# traverse in depth-first order, finding the empty directories:
find . -depth -type d -empty

(-depth is important if you want to do something rash like '/bin/rmdir' on them [example later].)

Symbolic links also get singled out for attention, say, when they go wonky (when copied, for example) and need to be fixed. Here's the finding part...:

find . -type l

If you use -ls, you can grep on the symbolic links' values:

# find symlinks pointing to Acme tree:
find . -type l -ls | grep -- '-> /usr/local/acme'

OK, we're nearly done with the fun. Just a detail or two left.

If you have reason to find in the root directory, you run the risk of walking through every mounted filesystem (when perhaps all you want to know is "Why is my root filesystem 100% full?") For this situation, use -mount to avoid crossing filesystem boundaries:

find / -mount -type f -size +1000000c -ls

A word about -print0 (I promised...): If you are going to do something with find output -- and I recommend xargs -- and if the find output contains spaces (or other shell-significant characters), then you need -print0 (separate with NUL bytes) instead of -print (separate with newline bytes). So, to remove empty directories:

find . -depth -type d -empty -print0 | xargs -0 /bin/rmdir

Finally, if you run find across big wads of files not all your own, you may get lots of (expected) 'permission denied' errors which just clutter up the output. Two ways around that:

# the find way:
find . -nowarn -type f -name 'core.[0-9][0-9]*'

# the olde (non-csh) Unix way:
find . -type f -name 'core.[0-9][0-9]*' 2> /dev/null

I hope the above illustrates that there are a lot of things you can (and should) do with find. And the dirty little secret, as our examples have shown: a simple find command in a straightforward shell pipeline is often easier than deep find magic.

[An earlier version of this note appeared in Verilab's internal newsletter.]

EDA Installs, the Yum Way?

| No Comments

Installing EDA tools for the major vendors requires using their individual (idiosyncratic) install tools.

Wouldn't it be lovely to use a normal Linux package manager? (I'll use yum in my examples; from the Red Hat orbit) -- a lowly sysadmin can dream, can't he?

Installing the hypothetical tool VCquestman might be, in full:

yum install vcquestman

Merits:

  • The installation task can be automated (scripted).

  • The software can be divided into its constituent parts (packages), and the dependency mechanism in yum can sort out which are needed.

  • Crazed duplication can be eliminated, again via the package dependency mechanism.

  • Software updates are just a yum update away.

  • It's easy and the Normal Way to Do Things in modern Linux.

However, EDA tools have extra installation requirements beyond those of a screensaver, including:

  • Best practice is that a project chooses a set of tools (and versions thereof) and, barring a good reason, does not budge.

    And not just until tape-out: tweaks, fixes, and possibly even derivatives -- maybe years later -- should be done with an identical toolset.

  • It follows that multiple versions of a tool must coexist, possibly for a very long time.

  • It must be pretty hard, preferably impossible, to invoke one version of a tool when you think you're invoking another.

  • Tools from different vendors must play together; for example, one vendor's verification environment with another's simulator.

  • One man's "all-important hot fix" is another's "you just broke my whole chip" unwelcome update.

These requirements eliminate the lovely-but-impractical notion of having (say) a single 'vcquestman' package that you simply go get, update, etc.

I would have a two-part solution.

One Tool, Many Packages

First part: I would have major versions of a tool appear to yum (and friends...) as different packages altogether. So, for instance, there would be a vcquestman61 package, a vcquestman62 package, and so on. You'd pick the one you wanted (or all of them). To yum they are as different as packages xemacs and vim.

These version-specific "packages" would mostly just splat bits into some well-known corner of the file namespace, e.g. /opt/cadmensys/vcquestman62 (to pick what I think the Filesystem Hierarchy Standard would prefer).

The bits should not be scribbled hither and yon in /usr. When that partition was created, it undoubtedly did not cater for dozens of multi-gigabytes of EDA largesse being added, so much EDA yumming might cause disk-space troubles. Something like /opt/cadmensys can be a symlink to some spacious disk, or an NFS mount, or...

All of the back-room software that typically comes with an EDA install (e.g. Perl, GCC, Tcl, etc.) would be just the same, e.g. landing in /opt/cadmensys/perl588. It probably is not worth it to use the system Perl (for example) even if it is compatible -- because it might subsequently change in some unexpected way because of a yum update.

It ought to be the case that, if you set your PATH to the right-but-painful thing, e.g. ...

PATH=/opt/cadmensys/vcquestman62/bin:/opt/syncadmen/abc2009.6/bin:$PATH

... things ought to work (including getting the expected versions of those bits of software).

Much of the environment-variable madness that follows EDA tools ought to be redundant. I mean, if you invoke /opt/cadmensys/vcquestman62/bin/vcquestman, surely it can intuit that VCQUESTMAN_HOME is /opt/vcquestman62? Moreover, surely it can find the carefully-Cadmensys-hacked version of GCC that it needs?

Having said that, some environment-variable gunk will inevitably live on. It's hard to see LM_LICENSE_FILE going away...

Wrapping A Tidy Package

So far, we can install many independent versions of each EDA tool. We can run yum update to bring in essential fixes (if we wish). By setting our PATH carefully (and the tiniest bit of environment-variable tweaking...), we can pick the exact versions of tools we want and run them.

Two problems. Lesser: the exact mix of tools we've selected may not work together. Greater: the scheme is a pain in the backside.

I have a (perhaps over-exaggerated) belief that relying on users' careful setting and maintenance of environment variables (including PATH) in their .bashrc files is a recipe for chaos-if-not-disaster. Far better if they just login, type vcquestman, and it's the right thing.

Now, in some project contexts, typing vcquestman and getting...

vcquestman: command not found

... is exactly what you want. Running a tool outside a project (with its carefully-chosen tool/version set) should be an error.

But in other situations (e.g. Verilab's), typing vcquestman means "Run the latest slightly-stable release and don't give me a hassle." So there needs to be a way for the tool administrator to set up some tool defaults.

Existing Linux systems have mechanisms for this kind of thing. In Fedora, it's called alternatives. The /usr/sbin/sendmail program may point to an executable out of the sendmail package (no surprise there...) or the postfix package (and maybe some others). You run a little program after the fact to manipulate these "alternatives".

I'm guessing EDA tools would need something a bit heavier. One possibility might be... "Software profiles" that specify what mix of tool/versions you want, plus extra information ("My eVCs are over there"). One profile would be the default, i.e. what you get if you invoke /usr/bin/vcquestman. Individual projects could -- and should -- have their own profiles.

Profiles should be text files, but it's OK if there's a "wizard" to help make one.

Another needed tool would be something to take a profile and make it the default, i.e. set up executables in /usr/bin, and anything else it takes to make it Just Work.

You would probably end up with a "tool wrapper" program. Dirty little secret: "Invoke a wrapper to invoke the tool" is already pretty common in EDA vendors' software. Anyway, the wrapper's job would be to interpret a profile and then invoke the "real" executable.

A Little Install Verification

What I've described so far is fairly brainless (good), and still has scope for going wrong (bad). For instance, it's hard to stop someone getting ahold of the raw RPM (package) files and doing...

rpm -Uvh --force --nodeps cadmensys-vcquestman62

... thus making a genuine mess. A tool which ensures that a particular software profile does something sensible would be great. It could look up the packages mentioned in the profile, check that they are really installed, make sure that their dependencies are as expected, (possibly) look to see if there are hot-fix updates available ("We really think a yum update would do you good..."), and so on. Myself, I'd go for it keeping a few small test runs around and actually trying them, before issuing its Good-to-Go imprimatur.

(Cadence (has?) had a checkSysConf tool for a long time. It checks that a system is up to running some particular Cadence software. It began as a "Do you have the right patches?" tool for Solaris and HP-UX. My limited experience was that it is overly persnickety -- it declares many systems unsuitable when, in fact, they're fine. I would prefer something that really ran the tools and saw what did and didn't work.)

Keeping Vendors Happy

I see no reason that installation machinery like that outlined here wouldn't be good for vendors.

One potential issue is that marketing people love to re-bundle software into different clumps with new names, if only to show that Things Are Happening. That's OK... typing...

yum install shiny-new-toy-2009-06

... could "install" a virtual package; that is, its "dependencies" would be the real packages with real bits that get installed. Sysadmins will be very happy when they find that many of the bits for Shiny New Toy are, in fact, already sitting on their disk in /opt/cadmensys (or whatever).

This scheme will also be helpful when debugging clients' problems. Firstly, if this system is compellingly good, everyone will use it, and that will greatly cut down on the "bozo client installed everything wrongly to start with" problems. Standard system tools -- e.g. rpm --verify, could be brought to bear.

Secondly, if there exists a post-install checker (or "software profile verifier"), it could serve as an excellent first line of "what's going on" defense. Moreover, if a client gets to the "you'd better send us your code so we can debug it here" stage, then they can send their "profile" along to the vendor, and, in many cases, the vendor could run the client's code with some confidence that the execution environment was "the same".

None of the above is rocket science, or even new. The ubiquitous availability of good package managers under Linux makes the job very much easier (they do the heavy lifting).

[An earlier version of this note appeared in Verilab's internal newsletter.]

About this Archive

This page is an archive of entries from June 2009 listed from newest to oldest.

May 2009 is the previous archive.

July 2009 is the next archive.

Find recent content on the main index or look in the archives to find all content.