Internet Rule Number One: Hack on Code, Not on Protocols

Recently I ran into two different cases of other people running other networks that affected me directly in a negative way. Now, we all know that people make mistakes and hardware failures can and will happen. However, in these two cases it wasn’t from “broken code” but rather “broken as designed”. The IETF, a standards organization that I’ve spent some time working with, goes through lots of thought and trouble to design internet protocols so they’re interoperable if you follow the rules. The problem is that sometimes network administrators decide they can “hack around” the way a protocol is supposed to work in order to achieve some goal. Frequently, however, they miss critical aspects of how the protocol is supposed to work or (worse) consciously ignore how protocols are supposed to work because they don’t care about the other networks they break. As long as they’re not breaking their own, of course.

But, to begin my story, I think I need to first highlight the important protocols I’ll be talking about.

The Players

  • IPv4 and IPv6: These are the big players these days when it comes to “things that are going to break on their own soon”. IP addresses are those silly string of numbers that tell the internet who you’re actually sending packets to. Normally, the average Joe doesn’t think about these because the average Joe is lucky enough to type “Domain Names” into their web browser instead of silly strings of numbers. The thing you need to know about IP addresses is that in the near future (possibly by the time I’m done typing if I don’t hurry up) we’ll run out of IPv4 addresses to hand out to things like your cell phones, washing machines and toasters. Unfortunately much of the world isn’t ready for the transition from IPv4 to IPv6, even though it’s been coming for a very very very long time. We all procrastinate, after all.
  • Domain Name System (DNS): The DNS is how we translate those useful names (like pontifications.hardakers.net into silly numbers. Like 67.205.57.145. Or 2001:470:1f00:187::1 (yes, those really are all numbers if you expand your mind a bit).
  • Simple Mail Transfer Protocol (SMTP): This is the guy that is making post offices around the world quiver wondering when their funding from selling postage stamps will dry up. Although this E-Mail thing has been catching on, we’re also finding that more and more people are relying on other services now, like FaceBook, for communicating instead. Interestingly enough, both of my issues below relate to communication. One with E-Mail and one with FaceBook.

    Enter the Era of E-Mail

    Now, E-Mail, it turns out, gets sent around quite a bit. I know that I still get quite a bit of it these days. Unfortunately, some entrepreneurial folks have figured out that the powers from the dark side enable them to use E-Mail for negative reasons as well. I’m speaking of SPAM of course, which currently accounts for about 75% of my E-Mail. [On a side note: I suspect that spam via paper-mail (otherwise known as bulk-advertising) is the one thing keeping most of the world’s post offices still in business.]

    Now, unless you’re a protocol geek like I am, you may not know that E-Mail that needs to get sent from one server to the next also uses DNS records that translate human-readable domain names (like hardakers.net) into IP addresses (like 168.140.236.43 and 2001:470:1f00:187::1). So, lets say you need to email youraunt@hardakers.net the first thing that your ISP does when you ask it to deliver a letter is to look up the IP address.

    What’s supposed to happen

    Normally when you look up where to send something you’ll get a few answers, nicely prioritized by where you should try them first:

     # dig +short hardakers.net mx
     5  mail6.hardakers.net.
     10 dns66.hardakers.net.
     20 dnsm3.hardakers.net.
    

    This shows us (or more appropriately, your ISP) to try and send the mail first to mail6.hardakers.net (priority level 5) and if that fails to trydns66.hardakers.net and then finally to dnsm3.hardakers.net. The server then starts by looking up the numeric address for the first one and then trying to talk to it. If one doesn’t work, it should skip to the next one an keep trying till it has no more to try and then will give up. (And by “give up” I mean, “keep trying for another 7 days or so at regular intervals”.)

    So, lets look up the address of the first one. We’ll look up both the IPv4 and the IPv6 address for it:

     # dig +short mail6.hardakers.net A
     # dig +short mail6.hardakers.net AAAA
     2001:470:1f00:187::1
    

    Note how, in this case, there is no IPv4 address (the line ending with an A didn’t get an answer). There is only an IPv6 address (the answer to the line ending with AAAA). This is perfectly legal, and was actually set up this way intentionally. I wanted to be ready for the cometh of IPv6 and was encouraging mail agents around the world to try me first over IPv6. I thought that was rather good of me: exercise early, exercise often (which reminds me: I’m late for my bike ride).

    So, this has been working quite well for many years (I’ve been quite anxious for IPv6 to take off). Not only that, it likely even reduced some of my spam since many spammers don’t try the remaining listed addresses and rarely have IPv6 support. Spammers don’t even pretend to be compliant with anything. Especially morals.

    Enter btconnect, a UK ISP

    BTConnect is (supposedly) the biggest ISP on the other side of the pond from the United States. They decided to add in another rule to the SMTP protocol: every MX record MUST point to a valid address. IE, you couldn’t create a record for bogus.hardakers.net and use it as an MX record without adding an IP address for it. They did this to try and ensure that the remote address was legitimate and then refuse to send it for their customers (folks like you and me sitting at home on couches; they’re just British couches) if it couldn’t do a proper address lookup. But it turns out a lot of people (who now hate BTConnect) were intentionally putting in fake MX records with no matching A record to try and subvert spammers. The end result is that BTConnect clients are unable to send mail to any domains that were fighting spam in this way. I’m not going to argue which side is being legal here. They’re both doing things that are “unintended”.

    But what’s worse is that BTConnect assumes that the whole world is IPv4-based and treated my perfectly legal AAAA-only record mail6.hardakers.net entry as bogus. This prevented an associate from being able to email me (about designing protocols, ironically). Bad Bad BTConnect! (no bone!) You need to get with the game, because the IPv4 game is about over at this point. And stop hacking protocols because you’re affecting your client’s ability to conduct daily business by sending legitimate E-Mail.

    Enter the (new) Era of Facebook

    Facebook (unfortunately, IMHO) is trying to get everyone to communicate with each other solely through their website. The good news is that they’re actually trying to be up on the IPv6 front and even have an IPv6-only version of their website available. (If you can visit successfully it means you and your ISP is IPv6 enabled. But you’re probably not since 99% of the ISPs out there are not yet compliant).

    Now, many people are actually paranoid about deploying IPv6 enabled infrastructure too quickly and often attempt trickery to try and ensure that if some user out there is trying to get to them that they can. Rather than trust a user’s ISP to have correctly set up IPv6, they assume that all other ISPs out there are IPv6 broken even if they might not be. To reword that in simple terms: many places try and intentionally prevent you from reaching them over IPv6. Because they trust IPv4 and “just aren’t sure” about IPv6 yet. Hence the reason you have to go to a different domain name if you want to use IPv6 with Facebook, and they’re default web page (www.facebook.com) isn’t IPv6 compliant.

    Facebook does this IPv4-only hack in a bit more tricky, and DNS-illegal, sort of way. Here’s the nitty gritty details that will make DNS-experts cringe (but most other people won’t catch the problems). First, this all has to do with apps.facebook.com, which is where Facebook sends you to get your virtual hands dirty by tending to your screen through planting green pixels into fields of brown pixels. So, lets see what it takes to look up address records for apps.facebook.com.

     # dig @glb1.facebook.com. apps.facebook.com AAAA
     apps.facebook.com.      30      IN      CNAME   star.facebook.com.
    
     # dig @glb1.facebook.com. apps.facebook.com A
     apps.facebook.com.      30      IN      A       66.220.153.28
    

    Now, the DNS specialists here will immediately point out that what you see above is illegal in the DNS protocol world. My co-worker, who has memorized the RFCs better than I have, nicely extracted the right quote about this:

     "If a CNAME RR is present at a node, no other data should be 
     present; this ensures that the data for a canonical name and its aliases
     cannot be different.  This rule also insures that a cached CNAME can be
     used without checking with an authoritative server for other RR types."
    

    To reword that in simple terms: you can’t have a CNAME and an A name existing for the same record (even for different query types, like A and AAAA).

    Now… Did this break something? Yes.

    First, I found one web-browser/DNS-stack combination that refused to go further. The instant it got a serious error with a record while searching for an IPv6 address, it gave up and didn’t try to find an IPv4 address. Not exactly wise either, but not illegal. Ironically, this was the exact sort of thing that the Facebook DNS hackery is trying to prevent: the customer not getting to the site. And some green electronic crops probably turned brown and withered. Electronically.

    This DNS hackery also causes the most popular recursive name server in use today to be equally as annoyed with AAAA queries:

     # dig apps.facebook.com aaaa
     ...
     ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 31717
    

    Update: 2011-01-26

    They seem to have now realized that the above breaks thing. So they’ve started doing different illegal things in hopes that it would magically start working.

    # dig @ns4.facebook.com. apps.facebook.com ns
    ;; AUTHORITY SECTION:
    apps.facebook.com.      30      IN      NS      glb2.facebook.com.
    apps.facebook.com.      30      IN      NS      glb1.facebook.com.
    
    # dig @glb2.facebook.com. apps.facebook.com ns
    ;; ANSWER SECTION:
    apps.facebook.com.      30      IN      A       69.63.189.62
    

    Yes, you read that right: query for a NS record to ensure it’s accurate and you get back an A record instead. That’s what you really wanted, right?

    Conclusions

    The biggest conclusion here: if you’re going to hack, do so to speed things up. Do so to make things better. Do so to make things more interoperable. But do not assume that you’ve considered all of the corner cases with a protocol when you decide to modify the rules. The results will likely be less customers reaching your service, not more.

    Oh. And IPv6 is coming. Please get ready. But without the hackery.

Comments (2)