Dr. Scriptlove (Or How I Learned to Stop Worrying and Love Perl)
It's the end of a beautiful Labor Day Weekend and I'm sitting down with Spidering Hacks from O'Reilly. I bought it just the other day it has had me completely fascinated - it was written by Kevin Hemenway and Tara Calishain and has a ton of really useful tips and scripts for spidering and scraping and generally grabbing info off the web programatically.
Let me back up. Normally I wouldn't have bought the book because all the script examples are in Perl. PERL!! Urgh! Compilable Line Noise! The Ugliest Programming Language in the World. Bleh! Since 1996 when I first bought Programming Perl "The Camel Book" and couldn't make any sense of it I've been avoiding Perl like the plague because I just *couldn't* grok it. With simple langages like VB, portable languages like Java and cleaner languages like Python I just have never been able to give Perl any sort of real consideration. Until, quite by chance, just the other day.
I was looking for something to read a week or so ago and looked over to my bookshelf to see the Perl in Easy Steps that I bought a few months ago at B&N for $9.99. I sat down and ripped through it and thought "hey - that isn't so bad." In fact, I sorta dig the flexibility in the variables and practicality of the functions. Since I had just been messing with SpamAssassin being able to finally "crack the code" of Perl a bit was actually quite nice. I actually did a search through my local and server hard drives for .pl scripts and was amazed at how a little education had made what seemed complete chaos so much clearer - okay, truth be told, the scripts were all barely understandable - but still it was progress.
So after I did this, I castigated myself for even looking at Perl, pulled out the Python interperter and tried to get a script to work on my email server's mbox that I've been meaning to try. Let me explain, Thunderbird does a great job of catching the crap that SpamAssassin doesn't, and throwing those emails into a Junk folder, which is located in an IMAP folder on my server. What I wanted to do was create a little script which would run "sa-learn" on that mbox and then *delete the messages*. After messing with Python's horrible documentation and running into brick wall after brick wall, I gave up. I want JavaDocs for Python. With some basic sample code and JavaDocs you can work out just about any Java library. Even crap spewed out by Axis wsdl2java, etc. Python's docs were just half-ass and bewildering to me. Am I missing something?
Anyways, so I put all this scripting stuff behind me. Back to good ol' Java. But then I ran across Spidering Hacks this weekend and my previous foray into Perl let me actually grok the examples. Wow! Look at all the kick-ass functionality you can use! And all those killer modules! And wow, looking up some of the code examples on line, the CPAN documentation is really well done. Suddenly it dawned on me... maybe what I was looking for all this time was Perl.
PERL!?!?! Nonono. It can't be. I mean, Python is so clean! So beautiful! Only One Way To Do It! It appeals to my sense of all things Good and Right. And Python seems to be the right way to go since Nokia is moving to incorporate it into their Series 60 phones... But Perl just seems so much more functional - and it seems - well documented. And the Perl scripts I've seen are so practical. And the support! Perl is everywhere... There are dozens of books, groups, sites and it's had this following of really damn cool hackers for ever. And honestly, I can't live without SpamAssassin - a Perl app. There's no Python app that falls into that category.
But even still I took a step back. Perl is so fucking twisted. Have you see the list of Special Variables? $_, $., $/, $, $", $\, etc. (it goes on for like 3 pages in the Perl Pocket Reference I bought). That sort of thing is still a real drawback to me. I honestly believe you have to pick and choose your programming tools. Most languages can do just about anything these days - but you need to get proficient in them to be productive. Java's a great language to focus on because it does so much. Verbosely, yes, but with the right jars you can do just about anything pretty quickly. But I'm getting a bit tired of a lot of the crap you have to deal with - like I wrote the other day. Being able to just run a .java file without the compiling step and without classpath issues? I'd be psyched, but that's not happening. That's why I'm looking at Python and now Perl very seriously. I'd like to start doing more rapid development with these types of scripting languages.
Tonight I went back to B&N (my favorite bookstore) and spent some time browsing the Perl section (mostly books from O'Reilly). Even in its third edition, "The Camel Book" is *still* obtuse and unpenetrable. I don't know who that book is written for, but it jumps all over the place, goes off on insane tangents and basically does its utmost to make sure you don't actually understand the language. No wonder I've been completely mystified for so long. Since I have a subscription to Safari, I didn't bother buying any of the other references except the Perl Pocket Reference, which should be good to have while reading Spidering Hacks.
You know what's confusing though? When I was at the bookstore, all the Perl books were right next to the PHP books (obviously). But beyond starting with the same letter, what is Perl's "real" relationship with PHP? Some of the syntax is very similar, but they're obviously not that same. Has Perl lost all battles for real-world web development? I can't remember reading about any big commercial sites out there that are produced with mod_perl any more - not for many years in fact. It's all been Microsoft or Java or more recently PHP. But say you're using PHP and you need to do something outside the bounds of what PHP does - is Perl the weapon of choice here? When doing server-side stuff with Java, you use Java for just about everything. Where does Perl fit nowadays? The same place it alwasy fit before? As the Unix-glue language?
Anyways, in summary, I really recommend Spidering Hacks (and that's not just because Tim invited me to Camp Foo this coming weekend). And if you have any thoughts about Perl and or Python, I'd love to hear them.
-Russ