Monday, March 21, 2011

Daily wtf

My wife is taking a shell programming class, and one of the questions on the homework was:

How would you use fgrep to replace this command? grep '[a-z]\{9\}'

Now, fgrep is grep without the regular expressions. So this seems to be a very strange question. Nevertheless, let us assume that there is some hidden wisdom the problem is designed to help us uncover.

Approach 1: can we turn regexes back on with a flag? The manpage is unclear on this point; it tells us that fgrep is equivalent to grep -F, but doesn't specify what happens if we were to give, say, grep -F -E. At least in ubuntu lucid, it produces "grep: conflicting matchers specified".

Trying 'fgrep -E' produces 'fgrep: invalid option -- 'E'', which was my first hint that fgrep is /not/ in fact exactly equivalent to 'grep -F'. /bin/fgrep is in fact its own binary, about half the size of /bin/grep.

Now, the paragraph talking about fgrep in the manpage looks like this:

       In  addition,  three  variant  programs  egrep,  fgrep  and  rgrep  are
       available.   egrep  is  the  same  as  grep -E.   fgrep  is the same as
       grep -F.  rgrep is the same as grep -r.  Direct  invocation  as  either
       egrep  or  fgrep  is  deprecated,  but  is provided to allow historical
       applications that rely on them to run unmodified.

Note the last sentence, which tells us that if we call grep as fgrep, it should behave like fgrep. This turns out not to be the case. Symlinking or simply copying grep to fgrep doesn't appear to change grep's behavior at all. So that's a second bug in the manpage.

Approach 2: Perhaps the instructor really does want us to avoid regexes entirely. The manpage has this as the description of -F:
        -F, --fixed-strings
              Interpret PATTERN as a  list  of  fixed  strings,  separated  by
              newlines,  any  of  which is to be matched.  (-F is specified by
              POSIX.)

Separated by newlines? That's a little weird. I can't actually figure out how to put a newline into a command line argument. Inserting a literal one with ^V doesn't seem to do the trick, nor do things like fgrep 'a\nb'. So as far as I can tell, that's another bug, although fgrep -e pattern1 -e pattern2 does seem to work.

The other way to specify multiple patterns is to put them in a file, and then use fgrep -f pattern-file.

So we could, in theory, enumerate all the possible matching strings in a file and thus avoid the regex. A file with all 9-letter lowercase words, one per line, has 26^9 = 5.4T lines of 10 characters each, for a total of 54TB. Newegg currently has a 2TB disk for $69.99, so not counting tax, shipping, or filesystem overhead, a cool $1900.05 worth of disk would be enough to hold the pattern file. There do also appear to be several options for transparently compressed filesystems in linux, which makes the problem much more manageable.

You could also give fgrep -f - and let it read the patterns from stdin, piping in a list generated on the fly (although it's not clear if this would violate the terms of the question, since it requires more than just fgrep, unless you're a very determined typist).

The next problem is what fgrep does with the patterns. If it copies them into RAM or puts them into some sort of data structure (like, say, a search tree), you're going to need a whole lot of swap space. But this limitation might in turn be overcome by compressed swap.

And of course all of this assumes a 64-bit OS, and an implementation of fgrep that can handle trillions of patterns.

So my answer to the question How would you use fgrep to replace this command? grep '[a-z]\{9\}' would have to be:

Enumerate all 9-letter lowercase words to a 54 terabyte text file called 'patterns'. Then run fgrep -f patterns, ensuring there's enough memory available to store the set of matching patterns.

Friday, March 18, 2011

Xenarc 702tsv on Lucid with evtouch driver

I got a Xenarc touchscreen working pretty well, except that intermittently (and usually after a reboot), the mouse pointer would behave erratically, while other times the touch behavior would work just fine.

The trick was to create an /etc/modprobe.d/xenarc.conf with:
# Stop messing with my Xenarc touch screen
blacklist usbtouchscreen

(I found that suggestion on Samiux's blog.

Now it seems to work consistently across reboots. The rest of the process was to set up a /usr/lib/X11/xorg.conf.d/11-touchkit.conf with:
Section "InputClass"
       Identifier "Touchkit Touch"
       MatchIsTablet   "on"
       MatchDevicePath "/dev/input/event*"
       Driver  "evtouch"
       Option  "ReportingMode" "Raw"
       Option  "Emulate3Buttons"
       Option  "Emultate3Timeout"      "50"
       Option  "SendCoreEvents"        "On"
       Option  "TapTimer"              "200"
       Option  "LongTouchTimer"        "400"
       Option  "MinX"                  "140"
       Option  "MinY"                  "193"
       Option  "MaxX"                  "1550"
       Option  "MaxY"                  "1750"
EndSection

Note that I've tried ~6 different 702TSV units, and they all seem to more or less work with these calibration parameters.

And then this is my /etc/X11/xorg.conf:
Section "ServerLayout"
    Identifier     "Layout0"
    Screen      0  "Screen0" 0 0
    InputDevice    "Keyboard0" "CoreKeyboard"
    InputDevice    "Mouse0" "CorePointer"
EndSection

Section "Files"
EndSection

Section "InputDevice"
    # generated from default
    Identifier     "Mouse0"
    Driver         "mouse"
    Option         "Protocol" "auto"
    Option         "Device" "/dev/psaux"
    Option         "Emulate3Buttons" "no"
    Option         "ZAxisMapping" "4 5"
EndSection

Section "InputDevice"
    # generated from default
    Identifier     "Keyboard0"
    Driver         "kbd"
EndSection

Section "Monitor"
    Identifier     "Xenarc700"
EndSection

Section "Device"
    Identifier     "Device0"
    Driver         "nvidia"
    VendorName     "NVIDIA Corporation"
EndSection

Section "Screen"
    Identifier     "Screen0"
    Device         "Device0"
    Monitor        "Xenarc700"
    DefaultDepth    24
    SubSection     "Display"
        Depth       24
        # The native resolution of a Xenarc 702TSV is 800x480, but
        # I couldn't find any modes that'd work.
        Modes       "800x600"
    EndSubSection
EndSection

Section "ServerFlags"
    Option         "BlankTime" "0"
    Option         "StandbyTime" "0"
    Option         "SuspendTime" "0"
    Option         "OffTime" "0"
EndSection