PHP - Fugly but Fast
So wow, three weeks since the launch have gone by just like that! Seni an I have been toiling away behind the scenes, but not many changes have rolled out in terms of features just yet, but fear not we’re going to be launching some new stuff soon (site submittal for the index, database updates, etc.) so if you’re wondering why the blog is so quiet and the site is just sitting there, well, now you know. :-)
One of the changes I decided to make last week while trying to debug some security stuff was to start redirecting users who are viewing the site via a PC rather than showing them the adapted pages. What was happening is a bunch of MySpace/Facebook/Bebo users were skirting around school proxies (I assume) by using Mowser to access the site. In the long term I don’t think I have a problem with them doing that, actually, but right now I’m trying to keep careful eye on the mobile users to see what issues they may be having and I don’t need the logs, etc. cluttered up with a bunch of random traffic. That said, as soon as we get the publisher’s section up, they’ll need a way to check their formatting etc., so I’ll have to figure out something. But just in case you were wondering, that’s what’s going on with the adaption. If you’re hip enough to spoof your browser’s header to look like a phone, all the functionality is still there.
I doubt many of the MySpacers who were using Mowser will read this, but just in case… FYI: Mowser isn’t an “anonymous” proxy by any stretch. With every request, we send out additional headers with the original IP address and User Agent of the Mowser user to the other website, so we’re not hiding anyone’s details from the publishers. We do log every request that comes through, but we don’t log POSTed details such as usernames and passwords and we regularly flush any cached cookies.
I think I may have finally gotten a handle on the encoding stuff. There was a great post about getting PHP to play nice with UTF-8, and Joel had a great UTF-8 overview as well… The adapter is definitely not perfect yet, but I think it’s closer than it was. Every once in a while a page is requested with something I haven’t seen before, and the parsing code isn’t set up correctly so it barfs, but it seems to be doing well.
So that brings me to the real subject of this post which is about the Mowser platform - i.e. what we’re using to parse the pages, store data, etc. and that’s a pretty basic LAMP stack - Debian Linux, Apache, MySQL and PHP. If you’re thinking, “zOMG! FTW! PHP?!?! Why aren’t U using Ruby?” well, I guess we’re just too old skool for that new-fangled stuff… Just kidding. No actually, the buzz is getting so deafening around Ruby and Rails it took me on a *two month* circular journey of discovery before I finally realized PHP is really the best platform for web apps, hands down. Once I started doing a deep dive on what it would take in terms of resources to develop and scale a Ruby app, I decided to go with a battle-tested architecture that would scale without heroic feats of computer engineering, even if it meant working with a less than beautiful language. And a few weeks after I made that final decision, the Twitter debacle hit and I felt smugly justified. Not that we’re going to be getting Twitter level traffic any time soon or that PHP would have magically handled their 1600 requests a second, but going with the same architecture that runs Digg, Wikipedia, large parts of Yahoo! and other major sites is really the safe bet.
I wanted to put my voice out there in the wilderness on this topic because I feel The Hype has gotten a bit too crazy. A month or so ago, I’d tell people what I was doing and they’d say, “Hey, that’s a great idea! What are you writing it in?” and as soon as I’d say PHP, they’d look bored and wander away. That’s just nutty. PHP has its downsides as no platform is perfect, but I actively chose PHP to do development because of its inherent advantages when it comes to the web. I could have chosen Java, Ruby or Python but I went with what I think is best. Yes, it’s fugly, but it’s easy to set up, easy to scale, battled tested and has a pretty good community (some who code like complete monkeys, but lots of other experts as well). When we get to a few million lines of code I may regret that decision, but until then I’m happy. Besides, for the most part what PHP is doing in Mowser’s case is simply gluing a bunch of other well made libraries together like Curl, LibXML, HTMLTidy, and or simply writing and reading from a MySQL DB.
That’s the stuff PHP was made for - so it’s really the right tool for the right job.
Back to work!
-Russ