Article

Creating your own personal aspell dictionary

Something that has bothered me forever is that applications that use GNU aspell for spell checking kept marking my name as a misspelling (I'm looking at you, KMail). Most front-end applications don't provide a way for you to add your own custom words.

Apparently, creating your own personal dictionary is ridiculous easy with aspell.

If your language is English, create a file in your home directory called ".aspell.en.pws":

personal_ws-1.1 en 0
Samat
quasirhombicosidodecahedron

The first line is a required header. Every subsequent line is a word you want to add to your dictionary. I can't believe I've let this sit for so long. Because it's a nice text file, syncing this file between machines to take your dictionary with you is trivially easy.

Taking Drupal sites offline via mysql and the command line

Drupal-powered websites can be put into an "offline mode." This is much better than most alternatives (such as taking the web server offline), especially for search engines, as the message and HTTP status codes given to users and robots alike will tell them to patiently come back later.

I've found that putting the site into offline mode makes database backups go much faster on heavily trafficked sites (which is obvious). However, for a particular site I was working with, this needed to be done in an automated manner, and on a dedicated database server that did not have access to the Drupal installation.

Most people take their Drupal sites offline through Drupal's web-based administration interface. They can also be put offline through the Drupal Shell. Neither were suitable for me: the former cannot be automated easily, and the latter requires access to the Drupal installation. Fortunately, Drupal sites can easily be taken offline by setting things in the database, which can easily be done via bash scripts and the command-line MySQL client.

Given your database user is my_db_user, password my_password, and database my_drupal_db, the backup script would look something similar to:

#!/bin/bash

# Take site offline
mysql --user my_db_user --password=my_password my_drupal_db << EOF
UPDATE variable SET value='s:1:"1";' WHERE name = 'site_offline';
DELETE FROM cache WHERE CID = 'variables';
EOF

# Do stuff here while the site is offline (e.g. backup)

# Bring site online
mysql --user my_db_user --password=my_password my_drupal_db << EOF
UPDATE variable SET value='s:1:"0";' WHERE name = 'site_offline';
DELETE FROM cache WHERE CID = 'variables';
EOF

Update: The original version of this article had some problems on some setups with the variables table being cached. I added another SQL statement to make sure this cache is flushed so the site actually reflects its configuration.

Update: This method really doesn't work that well, and the more I think about it, there isn't a way to get around writing something that interacts with Drupal. I'm working on a script that will be more fool-proof.

Python-like tuple unpacking for PHP

Python provides a neat way for functions to return multiple arguments via "tuple unpacking". For example:

def blah:
  return ('one', 'two')

rval_1, rval_2 = blah()

The same can be done in PHP relatively easily via the list construct:

function blah()
{
  return array('one', 'two');
}

list($rval_1, $rval_2) = blah();

Speeding up SpamAssassin rule processing on Debian and Ubuntu

SpamAssassin is one of the most-used spam filtering systems in use today. Unfortunately, because there are so many different ways SpamAssassin can be used, SpamAssassin remains subject to many performance problems. Fortunately, there are several speed-ups and optimizations that can be applied to most SpamAssassin installations to speed up its rule processing, especially on Debian and Ubuntu GNU/Linux-based systems. These instructions can be adopted to other operating systems as well.

This article does not discuss configuring your mail filtering system (i.e. procmail, maildrop). This depends completely on your setup, and more than likely there are plenty of other articles that describe the best way to setup what you want.

GPG public key signing post-party automation with KMail

This past Ubucon's key signing party was my first key signing party. One thing I noticed--signing keys after a key signing party is a boring repetitive task. Summarized from the Ubuntu wiki entry on typical key signing post-party protocol:

  1. Retrieve all public keys of key signing party participants, using gpg –-recv-key
  2. Compare the hardcopy fingerprint from the keysigning party to the fingerprint of the retrieved public keys, using gpg –-fingerprint
  3. Sign the key, using gpg –-sign Send the signed key back, either by
    • E-mail: export the key, then e-mail it to the key owner, using gpg –-export -a | mail -s “Your signed key” user@example.com
    • Key server: send the key to a public keyserver, using gpg –send-keys

This is incredibly monotonous—and people have to wonder why Web of Trust-based encryption is not more popular?

The Debian signing-party package provides the utility caff to automate some of this. It's not very friendly to “desktop” users, however:

  • it's a CLI application
  • it requires a local MTA (/usr/sbin/sendmail in particular), or an “open” SMTP server, with no support for authenticated SMTP or SMTP/SSL
  • the configuration file syntax is Perl and confusing; there are also few examples on the Internet

You could add authenticated SMTP or SMTP/SSL support to the script, but having to know how to hack Perl definitely disqualifies caffe from being a desktop-friendly application.

So, I hacked together my own key signing party script in Python that would send signed keys back to people via KMail. To use it, create a text file listing all key IDs you wish to sign, one per line. Pipe the contents of this list into the script:

cat list-of-ids.txt | key-signing-party-batch-process-via-kmail.py

The script will download each key, ask you to verify the fingerprint, and then sign it. It then will open a KMail composer window, pre-filled with the key owner's e-mail address, a friendly template message (customizable in the script), and attached key. Review each e-mail to make sure it is kosher, and click send. Other than continuing to be a CLI program, I think this is much friendlier--the only manual work done is the creation of list of keys to sign, comparing fingerprints (this could be automated, but it seems in the spirit of the Web of Trust-based systems not to), and clicking send in a familiar desktop e-mail client.

Now for some notes...

It uses the DCOP automation features of KDE's Kmail to send messages. You could similarly use Evolution and D-Bus, but I don't use Evolution so I can't contribute that bit of functionality. Mozilla's Thunderbird unfortunately does not yet support any kind of automation features (as far as I know, anyway), so you're completely out of luck if you use it.

DCOP with Python is a complete, utter, pain. The easy way to drag-and-drop boiler-plate code with kdcop did not work, as it appears the APIs have changed. A problem with KDE/Python dcopext's module and multiple identically-named-functions sealed the deal for me and I gave up trying to use DCOP with Python, and instead settled for a hack of using the shell instead. I'm looking forward the one Linux desktop IPC protocol to rule them all, D-Bus, to debut in KDE4.

My script does not provide all the functionality of caffe. It, for example, does not encrypt the messages and their keys back to their owners. There doesn't appear to be an easy way to do this with KMail and DCOP, so it's a feature that will have to wait.

Sprint's EVDO Mobile Broadband on Ubuntu GNU/Linux

sprint-mobile-broadband-card.jpgand your connection will work

So, you've gotten your shiny new EVDO datacard working under Linux (if not, see High-speed cellular wireless modems (e.g. EVDO, HSPDA) in Ubuntu GNU/Linux 6.10) and you want to now setup the actual Internet connection?

In this article I document how I setup Sprint's Mobile Broadband service with ppp in Ubuntu GNU/Linux 6.10.

High-speed cellular wireless modems (e.g. EVDO, HSPDA) in Ubuntu GNU/Linux 6.10

novatel-s720.gif

Note: If you are running Ubuntu 7.04 or greater, this article is no longer relevant. Your EVDO modem should be detected and run at a higher speed automatically.

I've been raving about cellular wireless modems/data cards for a while now. While they've been available for a long while, they've finally become practical with networks such as EVDO and HSPDA that offer broadband-like speeds. I personally own a Novatel Merlin S720 that I use with Sprint's Mobile Broadband service.

Most of these datacards are easy to get running in Linux--I actually setup mine in Linux faster than I did in Microsoft Windows. However, due to some shortcomings in the kernel used by Ubuntu GNU/Linux 6.10, you cannot take advantage of the speeds that these modern wireless networks offer.

This article talks about some of the problems of the often-used usbserial driver, and how to use the better-performing airprime driver instead.

High-speed Internet access through cellular phone networks

I'm a T-Mobile Hotspot subscriber, but I cannot say I'm particularly happy with it. Reliability is in general pretty good, but there have been a few times a certain hotspot has been flaky, and these tend to be the times I needed access the most. It's also a pain to have to go somewhere to get Internet access, especially when, for example, I don't like Starbuck's coffee. I rather have the Internet come to me.

Enter EVDO. It's a 3rd generation cellular technology that allows for broadband-like speeds, typically almost everywhere you have a cellular phone signal. There are different speeds depending on what network is available in a particular location:

  • 1xRTT, allowing for 144 Kbps/144 Kbps download/upload speeds
  • EVDO 1x Rev 0, allowing for 2.45 Mbps/150 Kbps
  • EVDO 1x Rev A, allowing for 3.1 Mbps/1.8 Mbps speeds.

All three types of networks are available can be found in the United States, and a typical provider's access plan lets you roam between them anywhere in the country for free.

Access comes through a provider-specific modem (i.e. you cannot use one provider's modem with another provider). These usually are PCMCIA cards, reminiscent of the 802.11b network cards people used before WiFi was built-into notebook computers. Connection to a provider usually is provided through PPP software. Most the modems available on the market today are a little oddball: they expose a USB controller, which then exposes a USB serial interface which controls a virtual modem. Yes, it's strange, especially when these devices aren't actually modems (there is no MOdulation or DEModulation taking place, the devices are more “network bridges”), but thankfully it allows these devices to easily work with alternative operating systems like Linux and MacOS X.

In the USA, there are essentially three major EVDO providers: Sprint, Verizon Wireless, and Alltell, with Sprint and Verizon having the largest networks by far. What differentiates the Sprint and Verizon, I think, is pricing and policies. If you do not want to sign a contract, both providers cost the same. If you want to sign a contract for 2 yrs, you only get a discount rate with Verizon if you've a qualifying voice plan—Sprint has no such limitation to get a discounted rate.

Verizon does a bit of questionable marketing: they advertise their service as “unlimited,” but they pull a trick often used in contract writing and specifically define “unlimited” as 5 GB/month. If you go over this limit, you're breaking Verizon's terms of service. Verizon often cancels subscribers accounts, and assumes you are a criminal, downloading illegal music or software. An article in the Washington Post, Bandwidth Bandit, discusses about one subscriber's woes. Their terms of service disallows many popular Internet applications as well, such as VoIP, video conferencing, or any online gaming. Sprint's terms of service are more vague and do not explicitly disallow these things, but reports from their subscribers say that they don't have unreasonably low bandwidth limits nor have draconian policy enforcement assuming you guilty until proven innocent.

This wouldn't be a good summary without me discussing what new bleeding-edge technology was right around the corner. EVDO Rev B, allowing for at least 4.9 Mbps/1.8 Mbps speeds, has been deployed in a few places in Asia, but given how backward North America tends to be in technology adoption, won't be in the United States anytime soon. WiMAX, a 4th generation cellular technology allowing for speeds of at least 10 Mbps, will probably take the place of EVDO. Sprint is the only major provider dedicated to building a WiMAX network, with plans to begin deployment at the end of 2007.

Some external links with good information:

Connecting to the Columbia Medical Center's Athens WiFi network with Linux

Columbia University's Medical Center, like many university campuses, has many WiFi access points. To meet HIPAA privacy regulations all their wireless networks require use of VPNs or sophisticated encryption protocols.

Connecting to their athens wireless network, which uses IEEE-802.1x authentication, is a little non-obvious with Linux, but is possible given your wireless card supports WPA and works with wpa_supplicant.

To save the many weeks I spent fiddling, the magic wpa_supplicant.conf stanza that works for me:

network={
  ssid="athens"
  key_mgmt=WPA-EAP
  eap=TTLS
  pairwise=TKIP
  group=TKIP
  phase2="auth=PAP"
  identity="foo"
  password="bar"
  priority=2
}

Replace foo with your Columbia University UNI and bar with your password.

T-Mobile WiFi Hotspot login script

T-Mobile's WiFi Hotspot service, thankfully, forgoes a proprietary authentication mechanism for a solution that while cross platform (i.e. it works with Linux), can be extremely annoying. On opening a web browser and attempting to go to any website, you're required to login on an SSL-protected website with your account username and password before you can use the connection. If your web browser automatically tries to open many pages on startup, such as when you're using the Session Saving extension for Firefox, you get T-Mobile's Hotspot login page in every tab--extremely annoying!

I've written a small Python script that can login programatically without use of a web browser.

A quick shell include for setting paths for programs installed in non-traditional locations

A page in the Beyond Linux from Scratch manual describes environment variables that should be set when installing software in a non-traditional location (e.g. your home directory).

I've written a sh/bash include that can be included from .bashrc to set these variables, as well as PYTHON_PATH for separately installed Python libraries:

#!/bin/bash

PREFIX=$HOME/usr

export PATH="$PREFIX/bin:$PATH"
export PYTHONPATH="$PREFIX/lib/python2.4/site-packages:$PYTHON_PATH"
export MANPATH="$PREFIX/man:$MANPATH"
export INFOPATH="$PREFIX/info:$INFOPATH"
export LD_LIBRARY_PATH="$PREFIX/lib:$LD_LIBRARY_PATH"
export PKG_CONFIG_PATH="$PREFIX/lib/pkgconfig:$PKG_CONFIG_PATH"
export CPPFLAGS="-I$PREFIX/includes $CPPFLAGS"
export LDFLAGS="-L$PREFIX/lib $LDFLAGS"

Notes on the CD images for the Ubuntu 6.06 LTS "Dapper Drake" release

There have been a couple changes in the latest Ubuntu Linux release, changes that I had to dig to find information about...

First, the "live" and "install" CDs have been renamed. The "live" CD is now called "desktop", to reflect that the Live CD can now function as an installer. It's the preferred way to do installs for newbies. The "install" CD has been renamed "alternate install" and provides the text-mode installer for those who are familiar with it, and allows for installation with LVM and software RAID which are not available with the "desktop" installer.

There are also DVDs available for Dapper Drake, but they are difficult to find... I could only find links to them on the Ubuntu Forums, and not on the main Ubuntu website. The download pages, with ISOs and torrents:

Calculating bandwidth from a combined-format web server log

Given a combined web server access log, such as the ones generated by Apache, it can be useful to know the total amount of data transfer of all requests in that log. This task is simple: extract the field listing the number of bytes sent for a request, and add them all up. For something so simple, there is an odd lack of examples or pre-made scripts that do this. Or, at least, I couldn’t find any.

I wrote my solution, calculate-data-transfer.py, in Python:

A better way to separate Apache log files by virtual host domains

Apache's "combined" log format is one the most common log formats used in access logging, containing useful fields such as referrer and user agent. Unfortunately, it does not contain a field listing the the virtual host for whom a request was formed. With Apache, this is easily rectified by defining a custom logging format and post-processing logs to maintain compatibility.

Alternative search engines to Google

I am making a personal effort to avoid using Google lately. If you've talked to me lately, you know why--and to be polite I'm going to keep my psychotic ranting and hating off my weblog. So, other than Google, what is there?

If you can remember the 90s, Alta Vista was the search engine that was all the rage. In my opinion, Alta Vista's peak was when they were owned by Digital Computer Corporation, but it all went downhill when they split off into their own company during the dot-com boom of the late 90s. Today, Alta Vista clearly shows signs of neglect, and is not a very good search engine...

If you pay attention to the news, you'd know that Google does not like MSN Search very much: so much so that Google is suing Microsoft over search engine placement in Internet Explorer 7. I'm not so sure what Google is worried about, because if you use MSN Search for a while, you realize it's not very good and doesn't hold a candle to Google's search results.

However, Microsoft's "beta" search engine, Windows Live, is a different story. If Microsoft replaces MSN Search with the technology powering Windows Live, Google better start getting worried. Search results are a little bit more broad than Google, but still remain concise and accurate. Windows Live, however, has a totally horrible UI. It's awful! It's the definition of when you go overkill with AJAX and DHTML. Besides being slow, it does not work too well with Mozilla Firefox, and my personal pet peeve: it uses low-contrast greys and blues in its design, so it can be a strain to read anything.

Fortunately, you're not forced to use Windows Live's interface, because there a few partners using their search results. Amazon's A9 search engine now fetches results from Windows Live, and its UI is great. For some hard-to-quantify reason I like its UI more than Google's.

Last but not least is Alltheweb. Bought by Yahoo a few years ago, Alltheweb has always delivered great search results, but was too little to be really compete with the big boys. Being small, they've had some interesting innovative features, such as custom CSS for those who want to customize how their search results look.

Yahoo is continuing the the tradition, especially with Alltheweb's Livesearch. Unlike Windows Live, Alltheweb's Livesearch uses AJAX in quite a slick way, providing a unique search UI that is focused on providing suggestions, similar to Google Suggest, but better.

So, what do I use for my Google-free web searching? Mostly, Amazon's A9, as well as Alltheweb when I feel like it.

Syndicate content