Internet Rule Number One: Hack on Code, Not on Protocols

Recently I ran into two different cases of other people running other networks that affected me directly in a negative way. Now, we all know that people make mistakes and hardware failures can and will happen. However, in these two cases it wasn’t from “broken code” but rather “broken as designed”. The IETF, a standards organization that I’ve spent some time working with, goes through lots of thought and trouble to design internet protocols so they’re interoperable if you follow the rules. The problem is that sometimes network administrators decide they can “hack around” the way a protocol is supposed to work in order to achieve some goal. Frequently, however, they miss critical aspects of how the protocol is supposed to work or (worse) consciously ignore how protocols are supposed to work because they don’t care about the other networks they break. As long as they’re not breaking their own, of course.

But, to begin my story, I think I need to first highlight the important protocols I’ll be talking about.

The Players

  • IPv4 and IPv6: These are the big players these days when it comes to “things that are going to break on their own soon”. IP addresses are those silly string of numbers that tell the internet who you’re actually sending packets to. Normally, the average Joe doesn’t think about these because the average Joe is lucky enough to type “Domain Names” into their web browser instead of silly strings of numbers. The thing you need to know about IP addresses is that in the near future (possibly by the time I’m done typing if I don’t hurry up) we’ll run out of IPv4 addresses to hand out to things like your cell phones, washing machines and toasters. Unfortunately much of the world isn’t ready for the transition from IPv4 to IPv6, even though it’s been coming for a very very very long time. We all procrastinate, after all.
  • Domain Name System (DNS): The DNS is how we translate those useful names (like pontifications.hardakers.net into silly numbers. Like 67.205.57.145. Or 2001:470:1f00:187::1 (yes, those really are all numbers if you expand your mind a bit).
  • Simple Mail Transfer Protocol (SMTP): This is the guy that is making post offices around the world quiver wondering when their funding from selling postage stamps will dry up. Although this E-Mail thing has been catching on, we’re also finding that more and more people are relying on other services now, like FaceBook, for communicating instead. Interestingly enough, both of my issues below relate to communication. One with E-Mail and one with FaceBook.

    Enter the Era of E-Mail

    Now, E-Mail, it turns out, gets sent around quite a bit. I know that I still get quite a bit of it these days. Unfortunately, some entrepreneurial folks have figured out that the powers from the dark side enable them to use E-Mail for negative reasons as well. I’m speaking of SPAM of course, which currently accounts for about 75% of my E-Mail. [On a side note: I suspect that spam via paper-mail (otherwise known as bulk-advertising) is the one thing keeping most of the world’s post offices still in business.]

    Now, unless you’re a protocol geek like I am, you may not know that E-Mail that needs to get sent from one server to the next also uses DNS records that translate human-readable domain names (like hardakers.net) into IP addresses (like 168.140.236.43 and 2001:470:1f00:187::1). So, lets say you need to email youraunt@hardakers.net the first thing that your ISP does when you ask it to deliver a letter is to look up the IP address.

    What’s supposed to happen

    Normally when you look up where to send something you’ll get a few answers, nicely prioritized by where you should try them first:

     # dig +short hardakers.net mx
     5  mail6.hardakers.net.
     10 dns66.hardakers.net.
     20 dnsm3.hardakers.net.
    

    This shows us (or more appropriately, your ISP) to try and send the mail first to mail6.hardakers.net (priority level 5) and if that fails to trydns66.hardakers.net and then finally to dnsm3.hardakers.net. The server then starts by looking up the numeric address for the first one and then trying to talk to it. If one doesn’t work, it should skip to the next one an keep trying till it has no more to try and then will give up. (And by “give up” I mean, “keep trying for another 7 days or so at regular intervals”.)

    So, lets look up the address of the first one. We’ll look up both the IPv4 and the IPv6 address for it:

     # dig +short mail6.hardakers.net A
     # dig +short mail6.hardakers.net AAAA
     2001:470:1f00:187::1
    

    Note how, in this case, there is no IPv4 address (the line ending with an A didn’t get an answer). There is only an IPv6 address (the answer to the line ending with AAAA). This is perfectly legal, and was actually set up this way intentionally. I wanted to be ready for the cometh of IPv6 and was encouraging mail agents around the world to try me first over IPv6. I thought that was rather good of me: exercise early, exercise often (which reminds me: I’m late for my bike ride).

    So, this has been working quite well for many years (I’ve been quite anxious for IPv6 to take off). Not only that, it likely even reduced some of my spam since many spammers don’t try the remaining listed addresses and rarely have IPv6 support. Spammers don’t even pretend to be compliant with anything. Especially morals.

    Enter btconnect, a UK ISP

    BTConnect is (supposedly) the biggest ISP on the other side of the pond from the United States. They decided to add in another rule to the SMTP protocol: every MX record MUST point to a valid address. IE, you couldn’t create a record for bogus.hardakers.net and use it as an MX record without adding an IP address for it. They did this to try and ensure that the remote address was legitimate and then refuse to send it for their customers (folks like you and me sitting at home on couches; they’re just British couches) if it couldn’t do a proper address lookup. But it turns out a lot of people (who now hate BTConnect) were intentionally putting in fake MX records with no matching A record to try and subvert spammers. The end result is that BTConnect clients are unable to send mail to any domains that were fighting spam in this way. I’m not going to argue which side is being legal here. They’re both doing things that are “unintended”.

    But what’s worse is that BTConnect assumes that the whole world is IPv4-based and treated my perfectly legal AAAA-only record mail6.hardakers.net entry as bogus. This prevented an associate from being able to email me (about designing protocols, ironically). Bad Bad BTConnect! (no bone!) You need to get with the game, because the IPv4 game is about over at this point. And stop hacking protocols because you’re affecting your client’s ability to conduct daily business by sending legitimate E-Mail.

    Enter the (new) Era of Facebook

    Facebook (unfortunately, IMHO) is trying to get everyone to communicate with each other solely through their website. The good news is that they’re actually trying to be up on the IPv6 front and even have an IPv6-only version of their website available. (If you can visit successfully it means you and your ISP is IPv6 enabled. But you’re probably not since 99% of the ISPs out there are not yet compliant).

    Now, many people are actually paranoid about deploying IPv6 enabled infrastructure too quickly and often attempt trickery to try and ensure that if some user out there is trying to get to them that they can. Rather than trust a user’s ISP to have correctly set up IPv6, they assume that all other ISPs out there are IPv6 broken even if they might not be. To reword that in simple terms: many places try and intentionally prevent you from reaching them over IPv6. Because they trust IPv4 and “just aren’t sure” about IPv6 yet. Hence the reason you have to go to a different domain name if you want to use IPv6 with Facebook, and they’re default web page (www.facebook.com) isn’t IPv6 compliant.

    Facebook does this IPv4-only hack in a bit more tricky, and DNS-illegal, sort of way. Here’s the nitty gritty details that will make DNS-experts cringe (but most other people won’t catch the problems). First, this all has to do with apps.facebook.com, which is where Facebook sends you to get your virtual hands dirty by tending to your screen through planting green pixels into fields of brown pixels. So, lets see what it takes to look up address records for apps.facebook.com.

     # dig @glb1.facebook.com. apps.facebook.com AAAA
     apps.facebook.com.      30      IN      CNAME   star.facebook.com.
    
     # dig @glb1.facebook.com. apps.facebook.com A
     apps.facebook.com.      30      IN      A       66.220.153.28
    

    Now, the DNS specialists here will immediately point out that what you see above is illegal in the DNS protocol world. My co-worker, who has memorized the RFCs better than I have, nicely extracted the right quote about this:

     "If a CNAME RR is present at a node, no other data should be 
     present; this ensures that the data for a canonical name and its aliases
     cannot be different.  This rule also insures that a cached CNAME can be
     used without checking with an authoritative server for other RR types."
    

    To reword that in simple terms: you can’t have a CNAME and an A name existing for the same record (even for different query types, like A and AAAA).

    Now… Did this break something? Yes.

    First, I found one web-browser/DNS-stack combination that refused to go further. The instant it got a serious error with a record while searching for an IPv6 address, it gave up and didn’t try to find an IPv4 address. Not exactly wise either, but not illegal. Ironically, this was the exact sort of thing that the Facebook DNS hackery is trying to prevent: the customer not getting to the site. And some green electronic crops probably turned brown and withered. Electronically.

    This DNS hackery also causes the most popular recursive name server in use today to be equally as annoyed with AAAA queries:

     # dig apps.facebook.com aaaa
     ...
     ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 31717
    

    Update: 2011-01-26

    They seem to have now realized that the above breaks thing. So they’ve started doing different illegal things in hopes that it would magically start working.

    # dig @ns4.facebook.com. apps.facebook.com ns
    ;; AUTHORITY SECTION:
    apps.facebook.com.      30      IN      NS      glb2.facebook.com.
    apps.facebook.com.      30      IN      NS      glb1.facebook.com.
    
    # dig @glb2.facebook.com. apps.facebook.com ns
    ;; ANSWER SECTION:
    apps.facebook.com.      30      IN      A       69.63.189.62
    

    Yes, you read that right: query for a NS record to ensure it’s accurate and you get back an A record instead. That’s what you really wanted, right?

    Conclusions

    The biggest conclusion here: if you’re going to hack, do so to speed things up. Do so to make things better. Do so to make things more interoperable. But do not assume that you’ve considered all of the corner cases with a protocol when you decide to modify the rules. The results will likely be less customers reaching your service, not more.

    Oh. And IPv6 is coming. Please get ready. But without the hackery.

Comments (2)

I’ve Got Mail!

Many people have asked me in the past to explain how in the world I handle so much E-Mail. Since it’s such a long story consisting of many parts, I rarely answer it. Also because I think it’s easier to describe using diagrams, examples and sciency looking graphs. In fact, it turns out, that even describing how much mail I get, and why I get so much, is a story in itself. So this is part #1 of like 2 that describe my E-Mail setup. This first part consists entirely of a description of how much mail I get in the first place. Believe you me, it’s quite a bit.

So, how much raw E-Mail do I get?

So before this, I actually wasn’t even sure. It turns out that the answer is simply put as “a lot”. A whole heaping lot. Much of it is, of course, spam (I don’t have an exact percentage at the top of the article). But even assuming that it’s 90% spam, which likely isn’t the case, I still receive a lot of mail. And it’s all my fault because, simply put again, I want that much (gasp). Ok, maybe not the spam.

So let’s start off with some (sciency) graphs showing the raw numbers of E-Mail that I attract. To really understand it all, I need to break it down into chunks and study each piece.

The Long Haul: Mail Per Month

The first graph below shows the amount of mail per month that I received over the last year-ish.

Mail Per Month

Mail Per Month

The important thing to notice in the above graph is that the amount of mail I receive isn’t even consistent month to month as it ranges from 6500 in a month to almost 13,000. Sure, February has less days in it so you’d expect it to be lower because all months were not actually created equal. But even those slight variations don’t account for the huge swing in differences from month-to-month. Some of it certainly is because my work-load with respect to communication comes and goes. Some months I simple receive a lot more mail for work related projects than other months (usually as deadlines approach and panic ensues).

But the biggest reason for the fluctuation is that spam comes in waves too. Just looking at my day to day E-Mail it’s always amazing how much the incoming spam varies. Some of my email addresses (I have many) are widely published and thus widely harvested by the evil address-collecting spam machines. This results in a huge amount of my mail being spam, unfortunately.

But beyond that, you can see trends in the graph where, for a while, there was a significant drop in incoming E-Mail. This was because a major spam ring was taken out of service a while back and that’s where the huge dip comes from (you should have had a spam dip in that time frame too). However things are unfortunately back to spam-normal again. Do you feel like all of a sudden you’re getting more spam than you used to? Well, you’re not alone. Eventually the next spam king-pin took over and we’re back to an abysmal spam rating of something like 90% on-average spam. The peace was nice while it lasted, but now I’m back to evaluating whether my rich Nigerian uncle really did leave me a fortune or not. Fortunately if he didn’t, it turns out I have 1094 other rich Nigerian uncles who also amassed a small fortune if only I could pay the wire-transfer fee to get it safely into my bank account.

The Shorter Haul: Mail Per Day

The next graph shows the amount of mail per-day that I received mostly during the month of May (2009).

Mail per Day

Mail Per Day

There are a couple of interesting actifacts that you can hopefully spot in this graph as well. You’ll notice that has a definite repeating cycle. The cycle is simply this: the low spots are on the weekends. I.E., by far the most mail I receive comes during the work week. This isn’t surprising to me since much of the mail I receive is work related in the first place. Which begins to tell you how much mail I receive for work-related purposes.

Ok, But What Exactly Is It All Then?

There’s the real question. If I get bombarded with so much mail, how much do I actually read??? So, lets pick a day. Ok, let me pick a day since you couldn’t help me there. I picked June 3rd, 2009 which is a Wednesday.

On Wednesday June 3rd, 2009 I received 4514 individual pieces of email. Now, lets quickly do the math shall we? If I tried to read all of that and I did so in, say, a 10-hour period (8 hours for work and 2 hours of reading just the personal mail) that would be 4514/(60*10) = 7.523 email messages per minute that I would have had to read. Though that might be possible if they were all short, I assure you that the people I correspond with are not well known for writing short, brief messages. Long winded rants are, unfortunately, much more common.

Weeding Out The Spam And Rich Uncles

So, the first thing we need to do is remove the auto-discarded spam and duplicate messages (I have a nice filter that removes duplicates so that I’m never bothered twice because someone put me on both the To and the CC line or because I’m subscribed to multiple mailing lists that the message went through). It turns out that in the 4514 messages, I auto-discarded 3163 of them. That’s roughly 70% of them. Since that’s most likely spam, that’s probably close to the real spam percentage that I receive: 70%.

Looking At What’s Left

That leaves only 4514 – 3163 = 1351 messages left to handle. And if I had 10 hours to sift through 1351 messages in my INBOX I could do so at the leisurely rate of 2.25 messages per minute. That’s almost doable (at least if I blacklist a few of the people that mail me the most long of the long winded rants).

But here’s the real secret. Of all those 1351 messages, only 10 actually ended up in my INBOX. That’s important, so let me repeat it. In bold. Only 10 messages actually ended up in my INBOX. And there’s the secret to my success: everything else gets filtered out and placed somewhere else. In fact, if you really look at how I treat mail it turns out I have lots of INBOXes. The one that only received 10 is the one that is just mail sent to my personal account. My work addresses only received 16 to the work INBOX equivalents.

Dealing With Mail in Clumps

So what is really happening, behind the scenes, is that my mail for the day actually got sorted into 44 different places. Not just 1 or 2, but 44. That lets me sort and prioritize my mail so that the important stuff I can see right away in small INBOXes and they don’t get lost in the bulk of the rants.

In the rest of the mail: 638 messages went to a folder for fedora developers consisting of auto-generated emails describing upcoming changes to the operating system. Another 110 were long winded rants about the same operating system that went to a discussion folder (at least I bet they were long winded rants; I didn’t study most of them in detail). 102 were about my favorite linux-based TV recording software: MythTV. Another 120 E-Mails were messages that were most likely spam but placed in a folder for me to double check them because the spam-filtering software wasn’t confident enough to just throw them away without my help.

And so on. You don’t want more of a breakdown than that. Trust me.

Thank You For Waiting;

You’re Message Important To Us Me

That being said even my real INBOX occasionally turns into a black tar-pit where it seems I can never stay afloat. Even with only 10 messages going into it for a particular Wednesday I’m not perfect and frequently I “mean to respond later” but fail to get back to it in a timely manner.

The important thing is that the people that really matter (you) do end up in my highest priority folder (assuming you’re not one of those long-winded ranting folks). Everyone should filter their mail to put their most important email messages first in their lives and let the others stew until they’re nice as savory. I’m going to come back at some point in the not too distant future (I hope) to provide additional guidance for “getting ahead of your email”.

I’ve actually learned something from this long winded analysis too. So I’m glad I wrote it up. What I’ve learned is that I should have a severe headache and should step quietly away from the computer. So I think I will.

Comments (1)

Google Wave: it’s a big one

Anyone who’s talked with me about computers and communication know that I have wanted to rewrite the email architecture and have a lot of good ideas about what is needed to make it happen. Well, yesterday the folks at google trumped me. And boy did they.

Now I’m not one to generally proclaim ahead of time that something is going to be the next big something. In fact the first time I opened a web page back in the 90s long before most people had heard of “http” I merely thought “yeah, that’s nice but nothing amazingly new”. Even http was a minor improvement on other things. The famous web 2.0, that brought us many cool webpages like google maps, facebook, etc, were really just minor steps forward in technology that I again thought were cool, but nothing outstanding.

For the first time, I’m here to say: Google Wave will indeed change the world. Or the way we work with it. It’s the first technology that has ever caught me completely off guard.

Learn About It

The best way to learn about it is to watch the demo video. You probably want to watch at least from about 0:05 to 0:15 on it to get a feel for how cool it is. The trick, I think, will be to stop watching it as it keeps rolling out new things as you watch it (the interesting non-geeky content is a full hour long, out of the hour and a half). Though the video is targeted for developers (and as a developer it targets me perfectly), but it’s not so geeky that everyone else will be annoyed.

I May Actually Quit Using My Current Mail Reader

I’ve tried, over the years, to move away from the mail reader I use today (something 99% of the population have never heard of: Gnus). The reason I have never succeeded in finding anything else that would fit my bill is that gnus helps me manage email like nothing else can. Yesterday, on May 28th 2009, I received 4661 pieces of email. Now, certainly a large portion of that is spam. But a lot of it was stuff I needed to at least consider and the power of gnus lets me sort it appropriately so I can actually handle the load. But that’s a whole other subject for another time (many people wonder how I do it; I should write it up sometime too).

Google Wave, on the other hand, may finally offer enough of a new enough complete change in the way communication happens that I’ll actually be able to keep up with the level of communication that I need.

Features

It provides real-time updates, shared tagging, proper thread control, reduced bandwidth, retroactive publishing a conversation to a new person. All these features are likely enough to actually pull me over. There are issues as well, and I’ll probably document those later, but on the whole they’re a fantastic change in thinking and are a lot along the lines of how I’ve wanted to revamp things but I think they’ve succeeded in taking to a level further than I was thinking.

It’s really like mixing email, web, chat, and usenet news all together in a single form. Or it looks like it at least. Kudos on taking the best of all those highly useful worlds and actually getting them to fit together.

And they already have it working on android and the iphone!

The Right Developmental Path

One of the reasons that I don’t use gmail much, or many other web-based solutions is that I don’t necessarily think that http and javascript are always the right tool for every job. Yes, javascript turns websites into wonderfully interactive sites, but in the end I still prefer writing text/editing into speedy local applications (I’m saying this while typing into a web page, oddly enough).

With waves, however, they’re extending both the web API and the protocol definition itself to the world. The protocol is based on XMPP, which is the standardized version of Jabber, and this is huge. This means that people will be able to write import/export components for waves and thus you can actually continue to edit in something else and publish it as a wave later.

Kudos to their forward thinking about the realm of standardization and allowing data access to other types of applications and programming languages. This is what will make it huge.

There is always a but…

I do wonder about some of the negative communication aspects that could happen. Centralized data storage about a conversation thread is a great thing when the data is generally public in the first place.

However, we still need to be careful when transmitting important information. Wave provides the ability to grant someone retroactive access to a wave. Imagine having a wave discussion and then suddenly excluding person X from a branch of it and then later intentionally or accidentally granting person X access again. Imagine how they’d feel when they realize they’ve been excluded. This happens all the time in email, but when in email when person X sees part of the conversation again he likely didn’t see the message that said “I’ve excluded person X because …”. This is really just a new management issue, but by far the benefit outweighs the negative.

(and there are more odd use cases, but certainly the benefits will outweigh the oddities of them as well)

I Can’t Wait…

And I’m not sure I’ve ever said that before about an upcoming technology.

Leave a Comment