Shlrm.org Blog

Linux, Java, Ruby, and Politics

Fun With Apache, Mod_rewrite, Mod_proxy and Lighttpd

| Comments

Lighttpd does a much better job generating an index page for lots of files. 12,049 files to be precise, at time of writing. I had set up a long time ago my website to be a fallback mirror for Source Mage, so if a source file got moved, lost, deleted, or the site hosting it went down, away forever, whatever. I remember disabling Apache’s Indexes option on that file list because it would take _forever_to get anything rendered, as well as consuming plenty of CPU and resources.

Well, I was using an internal lighttpd to render the same thing internally, and I didn’t disable its directory listing at all. Whilst I was working on setting up collectd to monitor my Apache scoreboards, I noticed that it could also monitor the lighttpd scoreboard. So I was going about the config to set that up, and I noticed that the Apache wasn’t proxying it’s requests through to lighttpd, and was taking a lot of time and resources to do something simple.

Welp, having ADD like I do, I got sidetracked on doing that instead and started to rewrite my configs to not serve files directly, but proxy it to the lighttpd internally.

Originally, I had two aliases set up “/sourcemage” and “/sourcemage/fallback” which both pointed to the same spot on disk, This way you could reference things using “/sourcemage/file.tar.bz2” or “/sourcemage/fallback/file.tar.bz2”. The reasoning behind this is simple: I set up my fallback mirror before Source Mage standardized on a fallback URL, and I wanted to keep both paths functional. Easy to do with aliases. Oh, and I also have an alias to “/sourcemage/codex” for all my local codex needs.

Switching to a proxied setup was a bit more complicated than I anticipated. The alias stuff doesn’t work the same way everything else does, in that what it finds first it goes with. I needed to have mod_rewrite rules wired in to properly redirect things and manipulate the URL sufficiently. But I have two special cases. I can’t simply redirect everything “/sourcemage/*” to “/sourcemage/fallback.” Also, I wanted to be able to continue to use “/sourcemage/file.tar.bz2”

My solution is relatively simple and follows:

apache.conf
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
#aliases
Alias /sourcemage/codex    "/srv/webMirrors/sourcemage.org/codex"
#rewrite /sourcemage/ to /sourcemage/fallback
#some very fancy rewrite rules to keep the old alias functionality
RewriteEngine On
RewriteRule ^/sourcemage/?$ /sourcemage/fallback/ [R]
RewriteRule ^/sourcemage/codex/?$ - [L]
RewriteRule ^/sourcemage/([a-zA-Z0-9.i_\-+]+)$ /sourcemage/fallback/$1 [R]
# do the actual passing through to the lighttpd (which handles huge
# directory listing much much better
ProxyPass /sourcemage/fallback http://fallback.shlrm.org/
ProxyPassReverse /sourcemage/fallback http://fallback.shlrm.org/

# old notes kept here for reference as to what the above rewrite
# rules are doing -- dkowis 2011-09-07
# going to proxy these to the lighttpd
#Alias /sourcemage/fallback "/var/spool/sorcery/"
#Alias /sourcemage          "/var/spool/sorcery/"
# directories!
#<Directory "/var/spool/sorcery/" >
#       Options none
#    AllowOverride None
#    Order allow,deny
#    Allow from all
#</Directory>

<Directory "/srv/webMirrors/sourcemage.org/codex/" >
        Options Indexes
        Order allow,deny
        Allow from all
</directory>

# vi: set ft=apache:

The rewriting rules are the most complicated part of this. The trick was getting it not to try to rewrite anything when the url matched “/sourcemage/codex” After that, the rest was really easy. Some matching logic to pick up on “/sourcemage/file.tar.bz2” and redirect that to “/sourcemage/fallback/$1” and everything worked as it did before, except way faster.

New Theme!

| Comments

Yay there’s a new theme! It’s a bit nicer than the old one; less fancy graphics and a better layout.

Far more customizable also. I’m satisfied.

Adding Additional Recipes to Your Capistrano deploy.rb

| Comments

This was unbelievably difficult to find on the internet. Perhaps I didn’t know the right things to look for.

I wanted to figure out how to require/load/import additional files into the deploy.rb file so that I would be able to drop recipes in to a config/recipes directory.

Turns out this is the solution.

I ended up using this to simply load all the recipes in the recipes directory:

1
2
3
#load in all the other recipes
$LOAD_PATH.unshift File.join(File.dirname(__FILE__), 'recipes')
Dir['config/recipes/**/*.rb'].each { |recipe| require  File.basename(recipe, '.rb') }

Testing API Calls Using Cucumber and Rails 3

| Comments

As it turns out, in Rails 3.x, webrat and cucumber don’t play along so well. You start getting horribly annoying, and difficult to solve error messages. The solution in that link is to change the testing method from Rails to Rack. That works, for things like visiting pages, but it broke all of my API tests, where I verify the response code of the body and the JSON that I get back.

I looked into upgrading cucumber-rails, and apparently they recommend using Capybara instead of webrat for Rails 3. So I make the necessary changes. Unfortunately Capybara only has a get method. You can only visit pages. Any posts should be exercised by the web forms. This is far less than useful. It has proven to be a huge pain the butt. Hours of trying to find how people do this to no avail. Until I find an obscure answer, at the bottom of a StackOverflow posting.

That code on github is the key to testing APIs. It appears to use some of the more internal guts of the Capybara page drivers to accomplish it’s goal. Works for me.

I am surprised to see that there are few integration test frameworks that support this kind of web service test. Especially since where I work, this is what we build. Having Cucumber features describing those API calls makes everyone’s job easier. I find it unfortunate that it seems to be somewhat ignored by the testing community. Hopefully this will contain the necessary words for other people that are searching for the same thing I spent hours searching for, and will find it in far less time.

I Made a Ruby Gem

| Comments

I forked an xmpp4-simple gem that had some issues, and figured out which they were, and then committed the fixes in my own github repo.

It appears that the original gem is unmaintained, and so I just forked it and updated some parts.

Fedora, Old MaraDNS, and IPv6 == FAIL

| Comments

Given the “end of IPv4” I decided I shouldset up IPv6 on my network and see if I can’t start doing things over that instead. Unfortunately, however, it appears that my DNS server internal to my network, maradns, sucks at IPv6 until version 2.0. Fedora has it at 1.3.something. Debian has it at 1.4. WTF Fedora?

I’ve been working on building MaraDNS 2.0 RPMs for Fedora 13 and 14, but I don’t know the RPM SPEC structure very well. The 2.0 version of MaraDNS has separated the authoritative resolver from the recursive resolver, which is wise. But it means I need to build a spec file that produces two RPMs. I suppose I could build a separate spec file for each one, but that doesn’t seem like the right way to do things.

Well, My Site Asplode.

| Comments

I have a plugin called Broken Links Checker that I have used in the past to find broken links within the wordpress postings. I’ve used it in the past without consequence. It would dutifully go through my posts and report back to me the links that no longer work. Pretty handy, I can go and disable those links, make them strike-through, etc.

I updated it along with updating wordpress to 3.0.3 the other day. I fired off the Broken Links Checker, and as it usually takes a while, I ignored it and went on playing Eve and writing some ruby code (a simple little project to organize a whole lot of files into 4.7GB disks and include manifests and md5s of all the files for an archive. Still doesn’t work yet.) The next morning, I woke up to a large amount of email. I figured it was just spam or something that was significantly different than the normal spam. Unfortunately it was notifications from my Nagios telling me that the box doesn’t respond to ssh or http anymore. Uh oh.

Luckily, I’m running Xen, so it is trivial to get to the console over ssh. I connect to the console and try to log in. The box is totally wedged. OOM Killer had gone crazy with httpd, and anything else that was on that box. So xm destroy. Fire it back up again, wait for it to replay all the transactions, and it had decided to check the disks, since they haven’t been checked in 214 days. All came back up fine. Nagios was again satisfied. However, none of the sites worked. “Error: Cannot access database.” Oh noes. Started the normal debugging process, looking through logs and what not. Found a bunch of local requests from Broken Links Checker checking links. It appears that it caused a DOS to myself. Wonderful. Still no luck with MySQL, so I tried to connect to the mysql server via the command line, I had already determined that it was up, and it’s on a different box, so it didn’t get oom killed when apache killed the box. I get an error telling me that MySQL was denying connections from this host due to too many errors. So a flush-hosts later, and it all works again. Fun.

Guess I should set up some kind of resource limits for Apache on that box. Also, I’m removing that Broken Links Checker plugin, I don’t need to find my broken links that badly…

Userspace Cgroups in Fedora 14

| Comments

There was a lot of buzz on the interblags about a 200-line kernel patch that enables per-tty cgroups automatically. Apparently, one can add them trivially to their home directory without having to patch any kernels.

This blog post talks about how to do it, but it didn’t cooperate with Fedora 14 very well. A bit of googleing later and I found this mailing list post that did it for me. Now I have userspace cgroups for each terminal I open. Handy, I suppose. Might be more useful on an SSH server to guarantee that each person logging in can’t overwhelm the system for the others.