Programming
OSCON 2008 Reflections, Part 3
Submitted by Steve Simms on Sun, 07/27/2008 - 4:00pm.Perl is an insane language. I love it.
I’m guessing that there are very few languages that can be made to interpret rod logic, as shown by Damian Conway during the opening night’s keynote (or Latin, or Klingon, to highlight other talks he’s done), nor are there all that many languages that let you write positronic variables that let you return results before you’ve calculated them (same talk).
But even if you discount the things you can do with filters, how many languages let you do something like this:
*{";\n"} = sub { print something }
In case you’re not familiar with some of the lesser used Perl syntax rules, that says “make a function called ‘semicolon newline’ and have it print ‘something’ whenever it’s called.”
That will probably result in a “so what” response from the non-techies reading this, but it should cause a certain tightening of the stomach and a feeling of low-level despair to any programmers reading this, possibly with a little jaw-dropping, once they realize what it means. (Hint: What is at the end of every statement in most languages?) And maybe some anti-Perl flaming for even allowing such a thing (don’t worry, it gets worse — for the C/C++ programmers out there, you can also name a function “\0”). But you can write Fortran in any language.
Back in 2004, I attended a tutorial called Perl Best Practices. This year, I attended Perl Worst Practices. Damian C. commented that we must be the smartest, cleverest people at the conference to have convinced our bosses to let us attend a three-hour tutorial with that title. I’m not sure what that says for me. Other talks included Perl Security and The Twilight Perl (showing that things that should be syntactically impossible really aren’t).
What’s neat about Perl’s insanity is that it let’s you do practically whatever you want. Even if you shouldn’t. Even if 99.999% of the time it’s a monumentally dumb idea. And that allows best practices to be developed and codified over time, rather than being limitations imposed by the language (there is that 0.001% where it’s the perfect solution, and saves you days of writing workarounds). This is increasingly becoming a defining point of the Perl community, once you get away from the people who treat Perl like stereotypical PHP.
Learning some of these crazy tricks, while hopefully not the sort of thing you’d ever use in production, gives you a better understanding of the language. If nothing else, that can be really helpful for debugging, or if you ever have to maintain someone else’s code (or code that you’ve written more than six months ago).
Because, once you’ve spent three hours going through SelfGOL statement by statement, there’s probably not much that an inexperienced or undisciplined coder can do to scare you.
Outside of the Perl world, one of my goals for the conference was to get a better understanding of Ruby, since it’s getting a lot of attention. Well, I tried to be open-minded, but even the presenter reinforced that it’s still a relatively new and untried language that’s going through a fair bit of change as it’s maturing, and the syntax didn’t seem all that better when considering code to line noise.
The “fair bit of change” description is especially true for Ruby’s frameworks. Not that Perl doesn’t have its own problems with frameworks (it does, and I may write some thoughts on that at some point, since they’re mostly not safe to use, either), but Rails is probably the main selling point that Ruby has, to the point of being synonymous to a lot of people, and it’s still very much a moving target, without enough of an emphasis on backwards compatibility.
That makes it a very bad choice for any program that you want to stick around for a while, unless you’re willing to invest a lot of your energy in keeping up with the changes to your underlying framework, rather than enhancing your own code (that’s my big problem with Drupal as well, despite my really wanting to like it).
But lest I be accused of having a bias against anything that isn’t Perl, I did come out of the conference with an interest in learning more about the Mozilla framework, along with a renewed desire to help in at least QA with that project, with the hope of eventually getting to know it well enough to write some client software using its tools. And I went to a good session that did an introductory overview of C, since that’s a definite area of weakness in my skill set at the moment. I probably should have attended a Python session as well, but there are only so many timeslots available. Maybe I can set that as a goal for next time.
All in all, I think I was only in one session that was particularly bad. Otherwise, there was a huge range of quality, but I didn’t find myself wishing I had gone to a different session instead. So that was definitely a win.
More on speakers and presentation styles next time.
OSCON 2008 Reflections, Part 2
Submitted by Steve Simms on Sat, 07/26/2008 - 1:00pm.Part 1 was a generic overview of my experience at OSCON this year. In short, I had been looking forward to this conference for about three years, and I wasn’t disappointed.
An event like this would probably be an anthropologist’s or sociologist’s dream study. Get over a thousand mostly highly focused, technical people, 80%+ of whom are introverted, statistically speaking, all with a fairly narrow similar interest, and put them all in one place to see what happens. It’s a lot of fun.
What happens is that a large number of them open up. They’re finally among people who understand them. They have conversations with complete strangers, almost like they were extroverted. (It helps that there are some extroverted people to act as catalysts.) They have that common understanding with these people that they lack with “normal” people, defined as roughly 99% of the rest of the world.
If you think I might be exaggerating, consider this — how many conferences have a “People” track, wherein many of the sessions in that track are related to how to get along with and interact with other people? (And some of them, based on overheard feedback, included practices that many would consider pretty fundamental, like, oh, say, the importance of showering.) That’s how bad geeks can be, in general.
One of the better talks that I did end up attending in that track (while speaker-following) had as one of its titles “Hacking Wetware.” For the uninitiated, “wetware” == “humans”. Oh, and hacking can be more or less defined as “getting to understand at a fundamental level,” not as “breaking in and destroying” or “doing evil things”.
Among other things, including using The Sims as its overarching point of reference, it featured the gem of explaining the stereotypical greeting using the TCP three-way handshake:
- Hi, how are you doing? (SYN)
- Good (ACK), and you? (SYN)
- Good, thanks for asking. (ACK)
It’s great because it lines up so perfectly with the point of the greeting — you really don’t care how people are doing, you’re just establishing communication. I admit, I poke fun at people rather often by either changing or abbreviating line two, and watching them completely miss it. Though, the best one I witnessed was actually done by Christine at a restaurant, when she answered “Wet” (it was raining outside), to which the hapless greeter replied “I’m so glad to hear it. I’m fine.”
Anyway, back to geek-watching. My flight to OSCON was a somewhat poorly thought-through one-stop flight from Manchester to Portland with a layover in Philadelphia. Timing-wise, it worked well, but I hadn’t thought about the fact that Philadelphia is further from Portland than Manchester, which made for a really long second flight.
While in Philadelphia, it occurred to me that there was a decent chance that I might be able to spot other people who would be going to OSCON. So the game became how to spot them. (I did this last year for the National Postal Forum. There was a whole group of postal service employees on my flight.)
An observation that I made some time ago with regard to Christian conferences is that you can usually tell by what t-shirts people are wearing. Christians know that they’re not going to face much of any persecution while at such a conference, so they tend to wear all their religious stuff there. (This isn’t just a religious thing — you wouldn’t walk into a bar in Boston wearing a Yankees cap and jersey without expecting some persecution, and possibly higher prices.)
So, one person made it easy. He was wearing a t-shirt that asked “What’s your uptime?” Easy tell.
Looking at another person, I got the sense that he was a geek, but initially wasn’t sure. Right demeanor, right dress, right luggage, but nothing definite. It wasn’t until we both got on the same train to get to our (same) hotel that I noticed the subtle giveaway — he had a glider from Conway’s Game of Life sewn on his messenger bag. There’s no mistaking that, but only if you’re “in”. Turns out he’s a Perl geek, too, based on the sessions we were both in.
Back to sociology. The other thing that was amusing (and which I practiced a fair bit) was that you can completely ignore people and it’s perfectly fine. As was noted while the speaker was talking about the three-way handshake, idle chit-chat isn’t a strong point of geeks, since many don’t see the point (it’s a pragmatic thing). On the other hand, you could dive right into one of the topics of the day and that was also fine. Never mind that you hadn’t introduced yourself.
One thing I didn’t practice was the habit of a lot of people in the audience to spend the entire session on their laptops. Some were live-blogging the sessions, some were chatting on IRC about the speaker (there was apparently at least one instance of buzzword-bingo during a keynote), and who knows what others were doing. It wasn’t particularly distracting, so it’s not a complaint. Since I didn’t know anyone at the conference, I didn’t have any incentive to join in. Plus, my goal was to focus on the content of the talks, along with how it was being presented.
Let’s see, conference topics are the next thing to cover, but I’ll save that for next time.
OSCON 2008 Reflections, Part 1
Submitted by Steve Simms on Fri, 07/25/2008 - 10:00pm.O’Reilly’s Open Source Convention is now over for 2008, and I now have the rest of the day to relax, rest, and reflect before my flight tomorrow.
I ended up doing a lot of reflecting, so in the interest of not publishing a novel in one chapter, I broke this post up into several parts, and will post them more or less daily (if I remember) over the next few days.
Christine commented that I’ve effectively been off the grid this week, and that’s been mostly true. I’ve handled a few E-Mails, mostly customer-related, but otherwise I’ve been a lot less available than even the last time I was here.
That’s mildly odd, because I’ve also been a good bit less social this time around (last time I hung out a lot with the PostgreSQL folks; this time, other than an evening with the Mozilla QA people and a Birds-of-a-Feather session on open source in churches and missions, I barely spoke with anyone).
On the other hand, I was personally paying for this conference, whereas the last one was paid for by my former employer, so I had a lot more incentive to squeeze every penny of value out of the conference that I could. Not that I was slacking off the last time, by any stretch. Which is exactly why I’ve gotten a few “Hello?” E-Mails this week, I suppose.
I don’t think I learned as much this time as I did last time, on an absolute scale. That doesn’t really surprise me — I knew a lot less last time, and I’ve spent the past four years working on expanding on what I picked up from that conference. But the sessions at this conference did do a lot to fill in gaps and refine my knowledge, and would have been worth it for that reason alone.
One thing I learned last time was that it can be a better use of time to follow certain speakers around than just picking sessions off the chart based on topic. Quite simply, this is because some speakers can make any topic a worthwhile learning experience, while other speakers haven’t spent enough time learning about speaking to be able to effectively present even an interesting topic.
That was borne out this time as well. After the first day of the conference, I decided to follow Paul Fenwick of Perl Training Australia, to the extent that I watched him give the same talk twice. He was probably the best presenter at the conference this year, at least that I saw. His presentation style is fairly similar to Damian Conway’s (another person on my “follow” list).
A mildly related observation is that you know you picked a good (or at least an esoteric) session on Perl when Larry Wall is in the audience. For any CCC people reading this, that would be roughly akin to you giving a talk on the Four Spiritual Laws and having Bill Bright sitting in the first or second row, watching and listening attentively. So, Tim Bunce (of DBI fame) made it onto my “follow” list after that session, too. His presentations weren’t of the same caliber as the other two (few are), but they were very useful.
Speaking of usefulness, I skipped nearly all of the keynotes, except when Damian C. and Paul F. were presenting. Maybe it’s that open source is more of a pragmatic consideration for me than it is an ideological one, though I suspect that at least some of the keynotes were paid advertising. In any case, they didn’t seem all that interesting, and I heard at least a few others (who did attend) say the same thing. This had the added benefit that it let me sleep in.
Oh, and speaking of sleep, having a conference in Portland, OR was really nice once I got here. It meant that an 8:30am start time was actually an 11:30am start time as far as my physiology is concerned, which is a much better time for a morning session for geeks to start, in my opinion.
And that’s probably enough for one post. Observations on geek anthropology, conference topics, and presentation styles will be coming up subsequently.
On File Naming Schemes
Submitted by Steve Simms on Mon, 06/23/2008 - 10:26am.Here’s the main reason why my file naming scheme can never be to just use the names that my customers use:
- June08prayerltr.pub
- Response Sheet.docx
- May Letter - year in review.doc
- June 2008 prayer letter postcard.pub
- june 08.pub
- June 08 PL.pub
- June, 2008.doc
- June 2008a.doc
- May_June Prayer Letter.pdf
- Update June.pdf
These are all files that I have open right now.
This is pretty common throughout a given month. If I stuck all of those in the same directory, and tried pulling up the one I needed, it would be a disaster waiting to happen, besides the fact that I’d be constantly overwriting files.
Since I’ve started, I’ve been using lastname-YYYY-MM-DD.pdf (or .doc, or .pub, or whatever), and for the past few years, I’ve been creating a new directory for each month (named YYYY-MM), which is somewhat redundant with the filename, but it works, and I could combine the directories if I needed to, without running into problems.
It has a couple of drawbacks. It breaks down at the end of each month, when some files get submitted for a mailing on the last day of the month, and more files on the first of the next. Not a big deal, but it means going back and forth between directories, or just having files in the wrong directory.
Also, last year, I started running into the problem where two people with the last name (Smith, which should be no surprise to anyone) would send letters on the same day. So, in those cases, I appended the first letter of the first name to the list (so, smithj for a fictitious John Smith).
Then, there was the family where the parents are missionaries and so is at least one of the kids, and they both submitted letters on the same day, and they have the same first initial. So much for that solution.
When I got three Smiths on the same day, it was clearly time to think up a new naming scheme.
I haven’t implemented it yet, but my current plan is to have the software create and populate one directory per customer, with a “Common Files” directory (for signature images, frequently-used response cards, etc.) and one additional directory per mailing. Then, since the computer’s doing all the work and not me, I’m also planning on having it create file system links for all of the active (and maybe recently finished) mailings in another directory, so I’ll be able to work out of the one directory, and still have an easily-accessible archive.
That should help address the fairly rare case when I’m accidentally working in the wrong month’s directory, and open last month’s letter, or something like that. It’ll also mean fewer files to sort through on a daily basis, since the server will be archiving all of the finished files out of sight.
Order Tracking Misdesign
Submitted by Steve Simms on Wed, 12/12/2007 - 9:54pm.I ordered 1180 prints from Wal-Mart’s photo center a couple of days ago (two different pictures, which will be going in some Christmas mailings), and paid for expedited shipping, so they were supposed to arrive today. Wal-Mart shipped them yesterday, sending me an E-Mail saying so, and that they were expected to arrive today. They didn’t. Or, at least, haven’t, and I’d be a little surprised if FedEx makes another delivery tonight given that it’s 9:45pm as I’m writing this.
The tracking number that Wal-Mart gave me in the shipping confirmation E-Mail doesn’t work at FedEx (which is the company listed as the carrier in the E-Mail). Neither does searching by reference number using the tracking number, invoice number, or order number.
So, I clicked on the online tracking link that was in the E-Mail.
Someone might want to communicate to the designers of that page that it’s not necessary to list the order status for each of the pictures individually.
How to create an 11GB database table with 0 rows
Submitted by Steve Simms on Wed, 09/26/2007 - 3:48pm.There are a decent number of ZIP codes in the US (42,296 to be precise; variations bring the table size up to 79,834). When putting them into a database table, it helps to create an index, but you're not likely to run into much trouble otherwise.
The ZIP+4 database is a different story. Back in April, there were roughly 43.5 million entries in that particular table. The raw text file is 9GB, and doing a simple task like adding an index requires you to check to make sure you're not going to run out of disk space, and have plenty of time on your hands.
The September database has apparently gotten a little bigger, as my estimate was a few GB short, and I ran out of disk space while importing it. The sad thing is that it was probably really close to being done, too.
But the aftermath of that problem is where it gets amusing, in an odd sort of way:
Because of the way databases work, the import basically started a transaction, imported practically all of the data, and then stopped when it ran out of hard drive space.
What it didn't do was delete all of the data that had gotten imported. It's still there, though inaccessible because it was in an incomplete transaction.
Aborting that transaction doesn't delete the data. That doesn't happen until vacuuming, which is normally done periodically by PostgreSQL (since 8.0 or so), and which can be forced if needed (like when you import 11GB of data and immediately delete it all).
Because of Point-in-Time-Recovery, a copy of all 11GB of that non-existent data is currently in the process of being copied to my backup server, and is also taking up a rather large amount of disk space in the mean time.
As a result, "SELECT * FROM usps_zip9s" takes about 15 minutes, and returns nothing.
And now I get to start over. Or, at least I will once I get the table cleared out (11GB) and the PITR files transfered (another 11GB). This time, I'm going to import the data in chunks, just in case.
Troubleshooting business day calculation in PostgreSQL
Submitted by Steve Simms on Sat, 09/15/2007 - 10:27pm.One of the stats I keep track of for work is how quickly we get mailings out the door. It’s actually the only stat that I make public, incidentally — partly for personal accountability, and partly because it’s a selling point.
It’s a stat that’s easy enough to calculate right in the database:
turnaround_time := SELECT mailed - submitted;
Catch 1: I don’t want to include time spent waiting for someone to approve a preview or pay for a mailing.
That makes the calculation a little more complicated, but not by much — just subtract the time spent waiting from the total:
turnaround_time := SELECT mailed - submitted - (GREATER(approved, paid) - responded);
Catch 2: I don’t want to include weekends or postal holidays in the turnaround time either, since I don’t work on either, as a general rule.
This is where it gets messy — PostgreSQL doesn’t have a built-in way of calculating business days, and it certainly doesn’t have a database of postal holidays.
So, a while back, I wrote a database function to do that calculation, given two dates and a set of holidays.
Assuming, for the time being, that no two postal holidays will happen on successive days* (or on weekends), the logic looks something like this:
Part 1:
- If the start time is on a holiday, move the start time to midnight on the next day
- If the start time is on a weekend, move the start time to midnight on Monday
- If the start time is on a holiday, move the start time to midnight on the next day
Part 2:
- If the end time is on a holiday, move the end time to one second before the beginning of the day
- If the end time is on a weekend, move the end time to one second before midnight on Saturday
- If the end time is on a holiday, move the end time to one second before the beginning of the day
Part 3:
- Figure out how much time there is between the adjusted start and end times
- Subtract two days for every seven
- If the weekday of the end time is less than the weekday of the start time (i.e. there’s a weekend in between, but not a full week), subtract another two days
Part 4:
- Subtract one day for every holiday between the start time and the end time
Part 5:
- Return 0 seconds if the remaining time is negative (e.g. if start and end times are both on a weekend)
- Return the remaining time otherwise
As far as I know, that logic is complete, and should work.
But it wasn’t working.
Specifically, when the start and end dates were exactly a week apart, I was getting seven work days instead of five, meaning that the “subtract two days for every seven” line had a bug in it.
Here’s the code that should have worked:
gap := gap - ((((EXTRACT(EPOCH FROM gap)/86400.0) / 7.0 - 0.5)::INTEGER * 2) || ' days')::INTERVAL;
That’s a mouthful, so here’s the expanded version:
Starting with
gap(i.e. the end time minus the start time)…Figure out how many seconds there are in
gap(that’s the EPOCH bit — it’s a cheat to turn a PostgreSQL interval into an integer that I can do arithmetic on).Figure out how many days that represents (60 seconds x 60 minutes x 24 hours = 86,400). The
.0keeps the number from becoming an integer just yet (i.e. don’t round off the number).Divide that number by 7 to get the number of weeks. Again, don’t round yet.
I only want to take whole weeks into account, so turn the number into an integer now. Since PostgreSQL does this by rounding to the nearest whole number, subtract 0.5 first, which is equivalent to saying “round down”.
Multiply that number by two, so that I get two days per week.
Subtract that number from the initial
gap.Turn it back into a PostgreSQL interval by appending the word “days” and type-casting it into an interval (the two vertical lines, or “pipes”, are the PostgreSQL command to append text).
On the face of it, everything looks fine. So, now comes the debugging.
Taking two arbitrary dates that are a week apart, not on weekends, and which definitely don’t have holidays in my database, I run it through each of the steps in the long line above:
select extract(epoch from (‘1950-01-09’::timestamp - ‘1950-01-02’::timestamp)::interval);
=> 604800
select extract(epoch from (‘1950-01-09’::timestamp - ‘1950-01-02’::timestamp)::interval) / 86400.0;
=> 7
select extract(epoch from (‘1950-01-09’::timestamp - ‘1950-01-02’::timestamp)::interval) / 86400.0 / 7.0;
=> 1
select extract(epoch from (‘1950-01-09’::timestamp - ‘1950-01-02’::timestamp)::interval) / 86400.0 / 7.0 - 0.5; => 0.5
So far, everything’s correct. But watch the next line:
select (extract(epoch from (‘1950-01-09’::timestamp - ‘1950-01-02’::timestamp)::interval) / 86400.0 / 7.0 - 0.5)::integer;
=> 0
That’s not right — 0.5 is supposed to round to 1, not 0. See:
select 0.5::integer;
=> 1
It’s the nasty floating point math, striking again. You know, where 2 + 2 can equal 5 for sufficiently large values of 2.
In pretty much every programming language known to man, you have integers and floating point numbers. Integers are whole numbers, and are stored by being converted to binary and stored in a container that’s a specific number of bytes long. A one-byte container (i.e. eight binary digits, or “bits”) lets you store a range of 256 numbers, because that’s how many combinations of 0 and 1 you can fit into eight characters.
Floating point numbers let you get around that 1:1 limitation by storing numbers in something approximating scientific notation, dividing the bits between both parts of the number. This gives you a huge range of numbers, but it has the drawback that it’s not quite as precise, and sometimes bites you when you’re expecting one thing and get another.
I would’ve expected a warning sign to show up before the last step in my troubleshooting, but apparently I was wrong.
To fix the problem, I added a ::NUMERIC(8,2) type cast after subtracting 0.5 and before turning it into an integer. This tells PostgreSQL to turn it into a specific precision number (eight total digits, two of which are after the decimal point — eight was chosen as an arbitrary and excessively large number for this function), and then round it, which ensures that there won’t be any floating point mishaps this time.
So, now my published turnaround times are actually correct! Given how few mailings this (and two other holiday-related implementation bugs that I corrected along the way) affected, the numbers didn’t change much, but it means I’m not seeing some oddball numbers on my work list any more.
* You can adjust the function to allow for successive holidays by changing “if” to “while”. I haven’t done that yet, because I don’t need it, and because I’m afraid of accidentally creating an infinite loop inside the database on a remote production server. You can also account for whole weeks of holidays by grouping the first and second threesomes of statements in while loops.
Update: As it turns out, there was another bug in the code mentioned above. In order to fake a “round down” function, you need to subtract 0.499… (with as many nines as you need for precision), not 0.5, which I discovered when fixing the case when the two times were a week apart by day, but not by hour (e.g. noon on Monday through 9am the next Monday).
My code for that line is now:
gap := gap - ((((EXTRACT(EPOCH FROM (t2::DATE::TIMESTAMP - t1::DATE::TIMESTAMP))/86400.0) / 7.0 - 0.499)::NUMERIC(8,3)::INTEGER * 2) || ' days')::INTERVAL;The
::DATE::TIMESTAMPbit is the equivalent of converting the dates to midnight, which means that two Mondays a week apart will always have the two days subtracted, even if it hasn’t quite been a full week.
Oddities in the Importing of Mailing Lists
Submitted by Steve Simms on Tue, 08/21/2007 - 2:03pm.The largest section of my codebase is devoted to the parsing of mailing lists. It would be a rather small portion of the overall code if people were consistent, but people are anything but consistent when it comes to how they organize their mailing lists.
For example, the first time that most people get an international address on their list, their system breaks down, because they don’t have a “country” column. The strange thing is that it doesn’t occur to most people to just add a country column, and put the name of the country in that column.
What most people do is move the city, state/province, and postal code to the “street 2” line (if they have one), and put the country in the city or zip column. If they don’t have a “street 2”, then the city, state/province, and postal code end up in the city column (best case), and country ends up in the zip column.
Then you come across the educated folks who know that in a variety of countries (e.g. Germany), the numeric postal code comes before the city, rather than after. In that case, as a favor to whoever’s doing their mailings, they put the postal code in the city column, the city in the state column, and the country in the zip code column. That way, you can still merge “City State Zip” in that order, and it shows up more or less properly.
Except that my code knows about various countries’ address formats, and already does that swap, with the result that, in this case, the country would end up in the middle of the address.
Sorting all of that out and getting things back in the right fields involves no small bit of code, especially when all of this is being done automatically, and I’ve become a whole lot more familiar with the postal quirks of dozens of countries over the past few years as a result. :-)
Oh, and don’t get me started on the people who know that in Chinese, you write the address from most-general to most-specific, instead of the other way around. (You don’t do that for Chinese addresses that are written in English, but most people, if they know the former, don’t know the latter.) When the name of the person who’s receiving the letter shows up in the zip code column, all hope is lost.
Still, most of the time, the software is able to import mailing lists without a hitch, nowadays. I’m quite proud of it, all told.
And yet, there’s still the occasional list that causes the program to die a horrible death. Such as the one that came in yesterday, which has a first name, spouse, and last name field (which I’m ready for — take first, add “and” and spouse, and append last).
In this list, when the person listed churches, he put the entire name of the pastor in the first name field, which also isn’t that big a deal, and the software can cope with it — split out the words in the first name field and apply them to title, last, and suffix based on patterns established by many tens of thousands of other addresses. Apply the “and spouse” rule after that.
Unfortunately, he then put the entire name of the associate pastor in the spouse field, which just doesn’t work, even beyond the mental imagery associated with that choice of columns. And never mind that some of the names are going to run off the edge of the envelope (e.g. when several members of the missions committee are listed as the last name).
Alas, some things still need to be done by hand.
Two Shopping Cart Tips
Submitted by Steve Simms on Fri, 08/17/2007 - 2:10pm.For any of you who are planning on writing a web-based shopping cart tool, here are two basic tips to keep in mind:
When giving a shipping estimate, asking for state and zip code is redundant. It’s okay to have both fields, but if someone (like me) puts in the zip code, don’t give an error saying that you also need the state. Because you don’t. And be absolutely sure not to just return the user to the page without giving any sort of error at all.
When someone clicks “Add to Cart”, without entering a quantity, you can safely assume that the quantity is 1. An alert box saying “The following errors were found: — please enter a valid quantity before you add this item to the shopping cart” is something of a turn-off.
Bonus point:
People considering buying something off of a web site generally want to know what the item costs. Most web sites, if they include pricing information but don’t have free shipping, don’t actually tell you what the item really costs until you get 75% of the way through the buying process. You get bonus points for having the total cost right on the page with the item’s description.
Just ask for the shipping zip code, and provide a link for internationals or fancy shipping. Store said zip code in a cookie, and always display the ground rate on the page, giving the potential customer the option to view different rates using some sort of clickable popup.
Handling Errors: Beyond the 500 Internal Server Error
Submitted by Steve Simms on Mon, 06/18/2007 - 11:11am.I just found out over the weekend that my snazzy new-customer-easy-signup process got broken a few weeks ago by a security update. Oops.
Normally, when something breaks, I get an E-Mail saying so, and the code presents the user with a mea culpa (just so that we’re clear that any programming glitch is my problem, not theirs).
That E-Mail is more or less my cue to drop everything and fix it, which is usually a matter of just a couple of minutes, if that. My codebase isn’t nearly complex enough that I subscribe to the “some bugs aren’t worth the cost of fixing them” philosophy, especially when it comes to your typical 404, 403, and 500 errors (file not found, forbidden, and internal server error, respectively).
The E-Mail also tells me who encountered the problem (if they’re logged in), so I can follow-up with them, apologize, and let them know that the problem is fixed. It’s not quite as good as not having problems in the first place, but it comes close.
In this case, though, the cause of the problem looked to the code like it was a trivial user error (user clicked “Send Letter” without attaching any files), so it presented a helpful error message (if that were actually the problem), and didn’t bother to let me know.
Unfortunately, that wasn’t actually the problem. The problem was that there was a rule that was redirecting insecure http:// requests to their secure https:// equivalents on-the-fly (geek speak: 302 redirect), but that rule was causing the files to get lost (geek speak: POST became GET).
As a result, the user would attach the files, upload the files, wait for the upload to finish, and then get an error message saying that they hadn’t attached any files, and to try again. It did have a link to the contact form as well, which is how I eventually found out about it, but that’s still not good.
So, the problem is now fixed, and I’ve contacted everyone who I think (based on the server logs) was affected by the problem, to apologize. In some cases, rather belatedly.
And I’ve learned another lesson about error messages and error reporting. It’s not good enough to just have the code E-Mail me when there’s a problem that it couldn’t handle. I need to have it E-Mail me when there’s a problem that the user wouldn’t expect, as well. That way, even if it is a trivial error, if it’s keeping the user from accomplishing their goal, I can follow-up and redesign the process.
That will be a good thing to remember as I go through the code over the next few weeks looking for ways to improve the user experience.
