March 2009 Archives

The Shell History Meme

| No Comments

Several years ago, we had the "history meme", in which individuals mine their shell histories to find their ten most executed Unix commands.

What interested me was how the shell one-liner to discover the "top ten" morphed as people got ahold of it -- a glorious game of Geek Chinese Whispers. Each variant is a small puzzle of "Why did they do that?"

The first version I saw was AWK-based:

history \
| awk '{a[$2]++} END{for(i in a){print a[i] " " i}}' \
| sort -rn \
| head

(Note: for zsh, you need something like history -1000 instead of plain history.)

Aside: AWK was the first Unix-y language with associative arrays (used here); that's where Perl nicked them.

I also saw a neater-printing version of the same thing:

history \
| awk '{a[$2]++} END{for(i in a){printf "%5d\t%s \n",a[i],i}}' \
| sort -rn \
| head

Another version eschewed AWK for true Unix purity:

history \
| tr -s ' ' \
| cut -d ' ' -f 3 \
| sort \
| uniq -c \
| sort -rn \
| head -n 10

(The -n 10 on head is sadly unidiomatic -- that's the default.)

The third variant used just a little AWK to avoid Unix obscurantism:

history \
| awk '{print $2}' \
| sort \
| uniq -c \
| sort -rn \
| head

Colleague Gordon McGregor's version (and it was he who forced this meme upon us...) improved on that by trying to count non-first commands in a pipeline -- showing a bug in all the other versions(!):

history \
| awk '{print $2}' \
| awk 'BEGIN {FS="|"}{print $1}' \
| sort \
| uniq -c \
| sort -n \
| tail \
| sort -rn

All of these variants put me in mind of my favorite Unix quote, now lost to the mists of time: In Unix there are a thousand ways to do any task -- all but one of them are wrong.

[An earlier version of this note appeared in Verilab's internal newsletter.]

Editing a Really Huge File

| No Comments

I recently had reason to edit the first 15 lines of a really huge file (cough, cough... 300+ MB). Maybe Emacs does this -- I didn't care to find out.

The obvious thing to do is to slice off the first 15 lines, and graft on a replacement. I used (the GNU versions of) head and tail.

First, let's test that our slicing-and-dicing is correct:

( head -n 15 huge; tail -n +16 huge ) | diff -U1 huge -

If diff gives any output, you're not there yet. Otherwise, proceed:

# slice:
head -n 15 huge > huge.head
# edit:
editor-of-choice huge.head

And splice it back in, checking that the changes are as expected.

( cat huge.head; tail -n +16 huge ) | diff -U1 huge -

Now do it for real:

( cat huge.head; tail -n +16 huge ) | mysql -u privusr -p

With a similar creeping-forward, you could do the same with the last part of a file, or even a middle section (but then there may be easier ways).

Note: such creeping-forward is just Unix incrementalism.

[An earlier version of this note appeared in Verilab's internal newsletter.]

Puppet Practice for the Paranoid

| No Comments

Puppet is a tool to maintain systems' configuration in an automated way; it is an intellectual descendant of Cfengine. Puppet has a vibrant community around it.

Puppet's Getting Started document suggests starting daemons (well, at least one) and running things that might change my system.

No, no, no, no! I am paranoid. I don't want anything changed until I start to be comfortable with what the tool is doing, and I give the go-ahead. For now I'm Just Testing.

The Puppet folks do have a relevant guide, a document lurking behind the name Test-driving Puppet. What follows is just a slight elaboration on that. (I'm assuming Linux here, notably Fedora...)

One machine in a Puppet universe is the "puppetmaster". (We'll call it puppet.example.com in the examples below.) Even though we're Just Testing, I suggest a pretty modern distro for the puppetmaster -- not RHEL/CentOS 4. (I got segmentation faults and other Ruby goodness, all of which went away when I switched to Fedora 10.)

A client machine phones home to the puppetmaster, is given a spec for the client configuration, and Puppet-on-the-client makes it so. Except: we're Just Testing, and we want it simply to tell us exactly what it would have done.

To begin, install the software with sudo yum install puppet-server (puppetmaster) or sudo yum install puppet. Or your equivalent.

Now, make up a little Puppet example (e.g. the standard sudoers example), perhaps in your home directory. It doesn't have to be right yet. Be sure it is in a directory that user puppet can read.

Check that your firewall (iptables) stuff is such that your client will be able to get through to the server (puppetmaster) on port 8140.

Start the puppetmaster in its own terminal in the foreground where you can watch it:

sudo puppetmasterd --verbose --no-daemonize \
    --ignorecache --confdir /home/me/s-puppet

Now wander to your client and do the Puppet Certificate Dance:

client% sudo puppetd --server puppet.example.com --test
master% sudo /usr/sbin/puppetca --list
master% sudo /usr/sbin/puppetca --sign client.example.com

Finally, do as much Just Testing on the client as you wish:

sudo puppetd --server puppet.example.com --test --noop \
    --diff_args=-U1

I've run that last command dozens of times, as I slowly tweak my Puppet config (in /home/me/s-puppet) to try and learn about various things.

So far, I haven't changed a thing. I'm paranoid.

Finding All of a Spilled Mess

| No Comments

Here's a little Unixery that I haven't seen written down before... Imagine you have some job/script/program that went haywire and scribbled things in unknown places on the disk. You would like to find the damage.

Usually you know of some part of the damage, e.g. a log file or something. But the question is: What else got scribbled?

Another thing you often know is the approximate time of the damage; for example, the timestamp on a log file might give a clue. So the question turns into, "What files did I stomp on between 1625h and 1645h yesterday afternoon?"

The standard Unix tool for wandering over a filesystem looking for things is find.

The simplest solution I can think of for the problem at hand is:

  1. Create some files with our target times as their modification times:

    touch -t 200903021625 start-time
    touch -t 200903021645 stop-time
    
  2. Check that they are what you expect:

    ls -l {start,stop}-time
    
  3. Now use them to search (the current directory) for the goods:

    find . -newer start-time \( \! -newer stop-time \) -ls
    

You can make all the obvious changes, e.g. search a variety of specific directories, search only for files (-type f), only list the filenames (-print instead of -ls), pipe the output into xargs, etc.

[An earlier version of this note appeared in Verilab's internal newsletter.]

About this Archive

This page is an archive of entries from March 2009 listed from newest to oldest.

February 2009 is the previous archive.

April 2009 is the next archive.

Find recent content on the main index or look in the archives to find all content.