As I’m sure others have encountered who chose to deploy a standard mongrel/rails setup, my site has never been able to auto-recover from a hardware related restart (read crash/blackout/etc). Well, it would seem that the newest mongrel_cluster gem added a “--clean” option to the startup that removes stale pids that would otherwise keep mongrel from starting.
This is great. Now I get to delete my completely non-functional cleanup script that I’d written to do this at boot time. Do you really want to keep power-cycling to figure out why it works when run by hand but fails miserably when run at startup? After I spent a few hours one night trying to figure out the cleanest way to parse out the pid_file entry from the various cluster configuration files in /etc/mongrel_cluster to be able to properly cleanup after a crash and never really liked what I had to do to get the right values out of that, the dang thing never worked and I have had to manually log in after each “outage” to fix it. At it never failed that I didn’t notice any of the times the site was down for at least a couple days so I doubt anybody thinks my site is even alive anymore.
Now I simply update the start form of the mongrel_cluster init script to call --clean and I’m done. Mongrel takes care of itself now. There’s hopefully some really elegant code somewhere in mongrel_cluster or mongrel_rails that deals with parsing the pid_file configuration line and hopefully the dang thing works during boot. The first hope is as easy as reading the code, which I’ll do later, while the second isn’t something I feel like testing right now so only time will tell.
The funny thing about this whole story, at least what I got out of it, was that if I had first just checked for updates, I would have saved myself a night of trying to figure out how to cleanly parse the pid_file config line and the headache of seeing the thing not work. So the moral, boys and girls, is to always check for updates, it could save you a lot of time and maybe a headache or two.
