icculus.org:
cvs.icculus.org has passed on from this world after a long battle with
obsolescence. He was seven.
cvs.i.o died in the comfort of his rackmount server,
colocated in Chicago,
in the early hours of March 31st. He was employed as a revision control
manager, serving thousands of open source developers.
He is survived by his children:
Subversion,
Mercurial, and
Git.
The family has requested that a
donation be made to charity in lieu of
flowers.
Other stuff:
This is a really really long, boring post about mailing lists. If your dog is
whining, you should walk him before sitting down to read this, or he's gonna
mess on the carpet by the time you're done.
So, we have a new mailing list manager now. We were using ezmlm (specifically,
ezmlm-idx), and it sucked for a lot of reasons. Now we've got GNU Mailman,
which resolves all my complaints with ezmlm, and adds a bunch of really nice
features I didn't even know I wanted at the time.
I waited a long time to do this conversion, because I thought it would be
really painful. It turns out, it wasn't, so for the sake of the next poor
soul that googles for "migrate ezmlm to mailman," here's some information
they might stumble upon.
First, I thought I'd have to ditch qmail. I didn't. Dropping qmail would be
way more painful than dropping ezmlm. The integration with qmail was dirt
simple.
Basically, I just installed Mailman, from source, following the directions
in the manual. Ultimately, you end up setting up a few .qmail files per-list,
and reserve some URL space on your webserver ("/mailman" and "/pipermail",
to be exact). Sane people will just "apt-get install mailman" or something,
but this broken-ass system didn't have that luxury (we're solving that with
a nice, clean install of Ubuntu in the near future, thank you very much).
Building lists is automated with
a shell script. It creates the list from
the Mailman command line interface, updates /etc/aliases, tweaks some
default list settings to my liking, and sets up the .qmail files, so
incoming mail for the list goes to Mailman for consideration. This works
nicely for my setup, but of course you can build lists from the web interface,
too.
Please be careful, though: it will delete too many .qmail files in one case:
we had quake3@icculus.org and quake3-bugzilla@icculus.org lists, and after
converting the buzilla one ("rm .qmail-quake3-bugzilla*"), we lost some
.qmail files on conversion of the main list ("rm .qmail-quake3*").
The real fun comes from migrating the existing mailing lists. First, we need
to migrate existing subscriptions.
To limit end-user confusion about the migration, I would send a note to each
list before the move, explaining what was about to happen. Some of them I
had to subscribe to before doing so, so the message wouldn't bounce. After
about 10 iterations of this, I decided to automate it with a hacky script.
Yes, it's a shell script that calls a perl script. It uses ezmlm command
line tools to manage subscribing me for one post, if necessary, and
qmail-inject to send the message. I'm not proud. It's messy.
Get
m.sh and
m.pl,
EDIT THEM FIRST so they don't email me, chmod +x,
and then run "./m.sh $listname" ... consider everyone notified.
Now, it's time to make the move. Build the Mailman list, as above. A side
effect of the script is that it wipes out the existing .qmail files that
tossed mail to ezmlm. Once that is done, the new list is accepting mail, and
will bounce any incoming email from existing subscribers until you migrate
them with this simple shell script.
#!/bin/sh
ezmlm-list /var/qmail/alias/$listname >/tmp/$1.subscribers
/usr/local/mailman/bin/add_members -r /tmp/$1.subscribers -w y -a y $1
rm /tmp/$1.subscribers
This sends every email address on the list a note with the new list details,
as if they had just subscribed, and then a list of all email addresses to
the list owner (presumably, you). The note to the subscribers tells them
where to find the list's management page and their login details. It's all
the pertinent info, which they'll probably delete without reading. This is
why Mailman resends a reminder once a month by default.
At this point, subscribers continue to use the list as before at the same
email address, except they now have a nicer web interface for managing
their subscription.
You aren't done with the subscribers yet. Almost certainly you have some
undesirables in there.
Spammers and scammers tend to send email from "joe job" (that is, fake)
addresses. If you ever saw your friend's email address on a spam addressed
to you, this is what happened. Most of my mailing lists had addresses from
PayPal, eBay, and Amazon (plus other spammier-looking things).
What happened is a slimeball of some caliber sent a joe-job email to a
mailing list from, say, support@paypal.com. Not someone that works for
PayPal, mind you; that's a different kind of slimeball.
Ezmlm manages subscriptions through virtual addresses: if you wanted to be
on physfs@icculus.org's mailing list, you'd send an email (ANY email!) to
physfs-subscribe@icculus.org and it'd get you set up...namely, it'd tell you
"just reply to this email so we know it's really you and we'll start the
subscription." Yeah, you can see where this is going already.
Joe-job comes from support@paypal.com to physfs-subscribe@icculus.org. It's
just a spam email, but ezmlm doesn't care what's in your initial email at
all (we eventually started spam-filtering the *-subscribe addresses, but
SpamAssassin still gets a percentage of false negatives anyhow). Ezmlm
responds to joe-job by replying to support address, even
though it didn't
really originally come from there, with a helpful
message like, "okay, just to make sure your friend isn't playing a prank on
you by subscribing you to the list, just reply to this email without
changing the subject line, okay?"
It was a noble intention, and largely, it keeps out joe-jobbers and pranking
friends. In normal cases, this is the end of the conversation: for prankers,
the person would just delete the email. For joe-jobbers, likewise, if the
address even existed. Otherwise, ezmlm handles the bounce fine.
Some addresses, like dear support@paypal.com, however, would autoreply to the
ezmlm response with a "Thank you for contacting PayPal support! Your
business is important to us, so sit tight until we get to your email!!!!!!1"
...and they would leave the subject line intact.
Ezmlm would see this definite non-bounce with the magic subject line and
subscribe a bogus email address to the list. Not only can spammers now use
this address to post spam to the list (in practice: rare, even before we
spam-filtered all incoming list traffic, subscriber or otherwise), but now
you might have every posting going to paypal, to which it replies
KTHX4WRITING!!! ...and now you have a feedback loop.
Or worse, you don't have one. Now there are bogus addresses on the list no
one notices. Until they do.
I didn't automate this, I just went through every list and deleted email
addresses that looked suspicious (and a few that were questionable; if you
didn't look like a human chose your email address, you probably deserve to
realize one day that your subscription quietly vanished). But 9/10ths of
the culprits could be found by grepping for "amazon", "ebay" or "paypal".
At least the Mailman web interface made this easy enough to clean up by hand.
Okay, now you're functional! You can quit here if you don't care about the
mailing list archives. But I did.
Converting them to Mailman archives was easy:
#!/bin/sh
ezmlm2mbox.pl --archive /var/qmail/alias/$1/archive --mbox /tmp/$1.mbox
/usr/local/mailman/bin/arch --wipe $1 /tmp/$1.mbox
rm /tmp/$1.mbox
I don't remember where I got ezmlm2mbox.pl, but
here is my copy of it.
It just builds an mbox file from your ezmlm archives. Then we use a standard
Mailman tool to import the mbox file. Done!
Now, I have to confess something about myself: I am completely OCD about
preventing broken URLs. If I move something, I try to find some way to
keep the old URL redirecting to the new one if possible.
ezmlm has a cgi-bin program for web access to mailing list archives. It's
ugly, and did I mention it's some nasty C code that requires cgi-bin?
It's completely awful in every way. But there are a lot of direct links to
various list postings out there, and I didn't want them all to break. I
briefly considered making any ezmlm URL just point to
http://icculus.org/mailman/listinfo and let the user try to dig out what
they wanted, but a trivial Google search showed there were too many people
saying "this was an interesting comment over here:
link!" without any
context. Which list do you go to on the page full of lists?
I also considered just leaving the cgi-bin program in place. It would keep
all the URLs functional, but at the cost of gigabytes of disk space (since
all the ezmlm archives would have to remain, despite another copy existing
for Mailman) and having it looked like all conversation ceased the day we
migrated...and having it look like ezmlm-cgi's output. This would not do.
So I wrote some code. Now legacy URLs still work, but redirect to the correct
posting in the new Mailman archives. There is a small PHP script that parses
the URL, and does a look up in a 2 megabyte SQLite database, saving me
gigabytes of disk space.
Here's the gist. I wrote a script in Python (please don't laugh, it's my
first!), since Mailman "pickles" their archive indexes, and I had to use
Python to pull the data out...then I just figured, why switch programming
languages, even if the rest is all regex stuff that Perl would excel at?
Here it is:
ezmlm-dump.py Put it in /usr/local/mailman/bin and run it like this:
./dump.py listname1 listname2 listnameN >/tmp/ezmlm-mappings.txt
What it eventually spits out is a lot of SQL. It looks up the Message-Id
in the Mailman archive indexes, which is fast and good about being unique
per message, then it has to read your entire mailing list archives to find
the same post in ezmlm's archives. Disk bandwidth is your enemy here, but I
couldn't find a way to do this faster without losing reliability and writing
a lot of nasty heuristic code. Once it knows where one archive maps to the
other, it builds database INSERTs that store this information.
We only had a few hundred emails that didn't have Message-Ids; some crappy
email clients neglect to supply one. Mailman generates one for you in this
case when importing the archives, but there won't be a match in the ezmlm
pile. I felt that just dropping that email was acceptable, as it was a low
percentage of the total content. Your mileage may vary. I wasn't interested
in trying to compare message bodies to find the missing emails.
Now, take ezmlm-mappings.txt and build an SQLite database:
sqlite ezmlm-mappings.sqlite </tmp/ezmlm-mappings.txt
(Protip: do things like this in one big transaction with SQLite. Tens of
thousands of INSERTs took 5 minutes to run, but wrap it all in one
BEGIN TRANSACTION and COMMIT TRANSACTION, and it takes 4 seconds.)
sqlite3 produced a file that was 25% smaller, fwiw, saving half a megabyte,
but I didn't want to bother reading the PHP manual for the cutesy
object-oriented database interface, so I used version 2. This could be
upgraded fairly easily, and I probably will do that work at some point.
Now, put this PHP script somewhere your web server can get it, put the
SQLite database in there with it, and tell Apache that the old ezmlm-cgi
URL should run
this script instead.
The Apache configuration change looks like this, here:
Alias /cgi-bin/ezmlm/ezmlm-cgi "/webspace/icculus.org/ezmlm/ezmlmremap.php"
Now old broken promises like this...
http://icculus.org/cgi-bin/ezmlm/ezmlm-cgi?64:mss:57:pcgggfcpkhbeledipkkh ...will automatically redirect to the updated URL...
http://icculus.org/pipermail/ut3/2007-October/000057.html ...and you're good to go.
Now you're done. Test it out, make a backup of your ezmlm archives, just in
case, and delete them. Update anything you can with new subscription
information: I intentionally didn't set up listname-subscribe addresses,
since it's begging for abuse. It was best to have those break. Things out
in the wild: READMEs in source tarballs, Tweets, forum posts...they'll just
have to deal with the fallout. Update the canonical sources, like project
webpages, and move on with your life.
Otherwise, everything should be going smoothly now.
In summary: this didn't suck as much as I expected. I thought there'd be a
lot of pain, and some unfortunate tradeoffs I'd have to accept, but besides
some time spent poking around with scripting languages, it all worked out.
The next person, after reading this, will have an even better time than I
did, I think. I hope.
As email administration of any kind is sort of an unrewarding drain, next
time I migrate mailing lists, I may wire everything up through Google Groups
and call it a day. But since I'm managing my own server and lists, I feel
like moving to Mailman made my life, and my users' lives about an order of
magnitude better. Hopefully this will encourage someone else to make the
switch, too.
--ryan.