icculus.org: cvs.icculus.org has passed on from this world after a long battle with obsolescence. He was seven. cvs.i.o died in the comfort of his rackmount server, colocated in Chicago, in the early hours of March 31st. He was employed as a revision control manager, serving thousands of open source developers. He is survived by his children: Subversion, Mercurial, and Git. The family has requested that a donation be made to charity in lieu of flowers. Other stuff: This is a really really long, boring post about mailing lists. If your dog is whining, you should walk him before sitting down to read this, or he's gonna mess on the carpet by the time you're done.So, we have a new mailing list manager now. We were using ezmlm (specifically, ezmlm-idx), and it sucked for a lot of reasons. Now we've got GNU Mailman, which resolves all my complaints with ezmlm, and adds a bunch of really nice features I didn't even know I wanted at the time. I waited a long time to do this conversion, because I thought it would be really painful. It turns out, it wasn't, so for the sake of the next poor soul that googles for "migrate ezmlm to mailman," here's some information they might stumble upon. First, I thought I'd have to ditch qmail. I didn't. Dropping qmail would be way more painful than dropping ezmlm. The integration with qmail was dirt simple. Basically, I just installed Mailman, from source, following the directions in the manual. Ultimately, you end up setting up a few .qmail files per-list, and reserve some URL space on your webserver ("/mailman" and "/pipermail", to be exact). Sane people will just "apt-get install mailman" or something, but this broken-ass system didn't have that luxury (we're solving that with a nice, clean install of Ubuntu in the near future, thank you very much). Building lists is automated with a shell script. It creates the list from the Mailman command line interface, updates /etc/aliases, tweaks some default list settings to my liking, and sets up the .qmail files, so incoming mail for the list goes to Mailman for consideration. This works nicely for my setup, but of course you can build lists from the web interface, too. Please be careful, though: it will delete too many .qmail files in one case: we had quake3@icculus.org and quake3-bugzilla@icculus.org lists, and after converting the buzilla one ("rm .qmail-quake3-bugzilla*"), we lost some .qmail files on conversion of the main list ("rm .qmail-quake3*"). The real fun comes from migrating the existing mailing lists. First, we need to migrate existing subscriptions. To limit end-user confusion about the migration, I would send a note to each list before the move, explaining what was about to happen. Some of them I had to subscribe to before doing so, so the message wouldn't bounce. After about 10 iterations of this, I decided to automate it with a hacky script. Yes, it's a shell script that calls a perl script. It uses ezmlm command line tools to manage subscribing me for one post, if necessary, and qmail-inject to send the message. I'm not proud. It's messy. Get m.sh and m.pl, EDIT THEM FIRST so they don't email me, chmod +x, and then run "./m.sh $listname" ... consider everyone notified. Now, it's time to make the move. Build the Mailman list, as above. A side effect of the script is that it wipes out the existing .qmail files that tossed mail to ezmlm. Once that is done, the new list is accepting mail, and will bounce any incoming email from existing subscribers until you migrate them with this simple shell script. #!/bin/sh ezmlm-list /var/qmail/alias/$listname >/tmp/$1.subscribers /usr/local/mailman/bin/add_members -r /tmp/$1.subscribers -w y -a y $1 rm /tmp/$1.subscribers This sends every email address on the list a note with the new list details, as if they had just subscribed, and then a list of all email addresses to the list owner (presumably, you). The note to the subscribers tells them where to find the list's management page and their login details. It's all the pertinent info, which they'll probably delete without reading. This is why Mailman resends a reminder once a month by default. At this point, subscribers continue to use the list as before at the same email address, except they now have a nicer web interface for managing their subscription. You aren't done with the subscribers yet. Almost certainly you have some undesirables in there. Spammers and scammers tend to send email from "joe job" (that is, fake) addresses. If you ever saw your friend's email address on a spam addressed to you, this is what happened. Most of my mailing lists had addresses from PayPal, eBay, and Amazon (plus other spammier-looking things). What happened is a slimeball of some caliber sent a joe-job email to a mailing list from, say, support@paypal.com. Not someone that works for PayPal, mind you; that's a different kind of slimeball. Ezmlm manages subscriptions through virtual addresses: if you wanted to be on physfs@icculus.org's mailing list, you'd send an email (ANY email!) to physfs-subscribe@icculus.org and it'd get you set up...namely, it'd tell you "just reply to this email so we know it's really you and we'll start the subscription." Yeah, you can see where this is going already. Joe-job comes from support@paypal.com to physfs-subscribe@icculus.org. It's just a spam email, but ezmlm doesn't care what's in your initial email at all (we eventually started spam-filtering the *-subscribe addresses, but SpamAssassin still gets a percentage of false negatives anyhow). Ezmlm responds to joe-job by replying to support address, even though it didn't really originally come from there, with a helpful message like, "okay, just to make sure your friend isn't playing a prank on you by subscribing you to the list, just reply to this email without changing the subject line, okay?" It was a noble intention, and largely, it keeps out joe-jobbers and pranking friends. In normal cases, this is the end of the conversation: for prankers, the person would just delete the email. For joe-jobbers, likewise, if the address even existed. Otherwise, ezmlm handles the bounce fine. Some addresses, like dear support@paypal.com, however, would autoreply to the ezmlm response with a "Thank you for contacting PayPal support! Your business is important to us, so sit tight until we get to your email!!!!!!1" ...and they would leave the subject line intact. Ezmlm would see this definite non-bounce with the magic subject line and subscribe a bogus email address to the list. Not only can spammers now use this address to post spam to the list (in practice: rare, even before we spam-filtered all incoming list traffic, subscriber or otherwise), but now you might have every posting going to paypal, to which it replies KTHX4WRITING!!! ...and now you have a feedback loop. Or worse, you don't have one. Now there are bogus addresses on the list no one notices. Until they do. I didn't automate this, I just went through every list and deleted email addresses that looked suspicious (and a few that were questionable; if you didn't look like a human chose your email address, you probably deserve to realize one day that your subscription quietly vanished). But 9/10ths of the culprits could be found by grepping for "amazon", "ebay" or "paypal". At least the Mailman web interface made this easy enough to clean up by hand. Okay, now you're functional! You can quit here if you don't care about the mailing list archives. But I did. Converting them to Mailman archives was easy: #!/bin/sh ezmlm2mbox.pl --archive /var/qmail/alias/$1/archive --mbox /tmp/$1.mbox /usr/local/mailman/bin/arch --wipe $1 /tmp/$1.mbox rm /tmp/$1.mbox I don't remember where I got ezmlm2mbox.pl, but here is my copy of it. It just builds an mbox file from your ezmlm archives. Then we use a standard Mailman tool to import the mbox file. Done! Now, I have to confess something about myself: I am completely OCD about preventing broken URLs. If I move something, I try to find some way to keep the old URL redirecting to the new one if possible. ezmlm has a cgi-bin program for web access to mailing list archives. It's ugly, and did I mention it's some nasty C code that requires cgi-bin? It's completely awful in every way. But there are a lot of direct links to various list postings out there, and I didn't want them all to break. I briefly considered making any ezmlm URL just point to http://icculus.org/mailman/listinfo and let the user try to dig out what they wanted, but a trivial Google search showed there were too many people saying "this was an interesting comment over here: link!" without any context. Which list do you go to on the page full of lists? I also considered just leaving the cgi-bin program in place. It would keep all the URLs functional, but at the cost of gigabytes of disk space (since all the ezmlm archives would have to remain, despite another copy existing for Mailman) and having it looked like all conversation ceased the day we migrated...and having it look like ezmlm-cgi's output. This would not do. So I wrote some code. Now legacy URLs still work, but redirect to the correct posting in the new Mailman archives. There is a small PHP script that parses the URL, and does a look up in a 2 megabyte SQLite database, saving me gigabytes of disk space. Here's the gist. I wrote a script in Python (please don't laugh, it's my first!), since Mailman "pickles" their archive indexes, and I had to use Python to pull the data out...then I just figured, why switch programming languages, even if the rest is all regex stuff that Perl would excel at? Here it is: ezmlm-dump.py Put it in /usr/local/mailman/bin and run it like this: ./dump.py listname1 listname2 listnameN >/tmp/ezmlm-mappings.txt What it eventually spits out is a lot of SQL. It looks up the Message-Id in the Mailman archive indexes, which is fast and good about being unique per message, then it has to read your entire mailing list archives to find the same post in ezmlm's archives. Disk bandwidth is your enemy here, but I couldn't find a way to do this faster without losing reliability and writing a lot of nasty heuristic code. Once it knows where one archive maps to the other, it builds database INSERTs that store this information. We only had a few hundred emails that didn't have Message-Ids; some crappy email clients neglect to supply one. Mailman generates one for you in this case when importing the archives, but there won't be a match in the ezmlm pile. I felt that just dropping that email was acceptable, as it was a low percentage of the total content. Your mileage may vary. I wasn't interested in trying to compare message bodies to find the missing emails. Now, take ezmlm-mappings.txt and build an SQLite database: sqlite ezmlm-mappings.sqlite </tmp/ezmlm-mappings.txt (Protip: do things like this in one big transaction with SQLite. Tens of thousands of INSERTs took 5 minutes to run, but wrap it all in one BEGIN TRANSACTION and COMMIT TRANSACTION, and it takes 4 seconds.) sqlite3 produced a file that was 25% smaller, fwiw, saving half a megabyte, but I didn't want to bother reading the PHP manual for the cutesy object-oriented database interface, so I used version 2. This could be upgraded fairly easily, and I probably will do that work at some point. Now, put this PHP script somewhere your web server can get it, put the SQLite database in there with it, and tell Apache that the old ezmlm-cgi URL should run this script instead. The Apache configuration change looks like this, here: Alias /cgi-bin/ezmlm/ezmlm-cgi "/webspace/icculus.org/ezmlm/ezmlmremap.php" Now old broken promises like this... http://icculus.org/cgi-bin/ezmlm/ezmlm-cgi?64:mss:57:pcgggfcpkhbeledipkkh ...will automatically redirect to the updated URL... http://icculus.org/pipermail/ut3/2007-October/000057.html ...and you're good to go. Now you're done. Test it out, make a backup of your ezmlm archives, just in case, and delete them. Update anything you can with new subscription information: I intentionally didn't set up listname-subscribe addresses, since it's begging for abuse. It was best to have those break. Things out in the wild: READMEs in source tarballs, Tweets, forum posts...they'll just have to deal with the fallout. Update the canonical sources, like project webpages, and move on with your life. Otherwise, everything should be going smoothly now. In summary: this didn't suck as much as I expected. I thought there'd be a lot of pain, and some unfortunate tradeoffs I'd have to accept, but besides some time spent poking around with scripting languages, it all worked out. The next person, after reading this, will have an even better time than I did, I think. I hope. As email administration of any kind is sort of an unrewarding drain, next time I migrate mailing lists, I may wire everything up through Google Groups and call it a day. But since I'm managing my own server and lists, I feel like moving to Mailman made my life, and my users' lives about an order of magnitude better. Hopefully this will encourage someone else to make the switch, too. --ryan.
