I received a list of addresses in a Word file over the weekend, which is a less than ideal format (Excel or CSV with lots of columns is the best, since it requires almost no parsing to move into my database). This file’s woes, however, were compounded by the fact that it had lots of headers, phone numbers, web sites, E-Mail addresses, comments, and miscellaneous formatting interspersed throughout the addresses.
Editing that sort of file by hand wouldn’t have been much fun, so I programmed rules into the parser to remove phone numbers (including various descriptors), E-Mail addresses, web sites, and certain types of comments. As a result, I only had to update about a dozen addresses, out of close to 200, by hand. Not bad. And this will be ready to go the next time I get this kind of file, which is even better.
Adding to the fun, I finally got around to starting code for the parser that looks for patterns in names and addresses. Before tonight, any time I’d get a new customer, I’d have to review every address on their mailing list. (For existing customers who sent a new mailing list, I had code in place to only show me addresses that were different, which made things pretty quick, and I just made a few updates to make it allow for minor differences.)
As of this evening, the parser will now begin to look to see if the name and address seem reasonable according to existing patterns, and only present unusual cases to me (relative to the addresses it has seen). For the same Word file above, this cut off over a third of the addresses with only a few rules. Very nice.
Cost: About three hours of programming, debugging, and testing
Gain: Between 30 seconds and two minutes saved on average per mailing list for existing customers, and about ten minutes on average per new customer (depending on the type of mailing list received in each case).
Considering that I would have spent at least half an hour making adjustments to the Word file this evening alone, that investment is going to pay off pretty quickly. :-)