Internet Rule Number One: Hack on Code, Not on Protocols

Recently I ran into two different cases of other people running other networks that affected me directly in a negative way. Now, we all know that people make mistakes and hardware failures can and will happen. However, in these two cases it wasn’t from “broken code” but rather “broken as designed”. The IETF, a standards organization that I’ve spent some time working with, goes through lots of thought and trouble to design internet protocols so they’re interoperable if you follow the rules. The problem is that sometimes network administrators decide they can “hack around” the way a protocol is supposed to work in order to achieve some goal. Frequently, however, they miss critical aspects of how the protocol is supposed to work or (worse) consciously ignore how protocols are supposed to work because they don’t care about the other networks they break. As long as they’re not breaking their own, of course.

But, to begin my story, I think I need to first highlight the important protocols I’ll be talking about.

The Players

  • IPv4 and IPv6: These are the big players these days when it comes to “things that are going to break on their own soon”. IP addresses are those silly string of numbers that tell the internet who you’re actually sending packets to. Normally, the average Joe doesn’t think about these because the average Joe is lucky enough to type “Domain Names” into their web browser instead of silly strings of numbers. The thing you need to know about IP addresses is that in the near future (possibly by the time I’m done typing if I don’t hurry up) we’ll run out of IPv4 addresses to hand out to things like your cell phones, washing machines and toasters. Unfortunately much of the world isn’t ready for the transition from IPv4 to IPv6, even though it’s been coming for a very very very long time. We all procrastinate, after all.
  • Domain Name System (DNS): The DNS is how we translate those useful names (like pontifications.hardakers.net into silly numbers. Like 67.205.57.145. Or 2001:470:1f00:187::1 (yes, those really are all numbers if you expand your mind a bit).
  • Simple Mail Transfer Protocol (SMTP): This is the guy that is making post offices around the world quiver wondering when their funding from selling postage stamps will dry up. Although this E-Mail thing has been catching on, we’re also finding that more and more people are relying on other services now, like FaceBook, for communicating instead. Interestingly enough, both of my issues below relate to communication. One with E-Mail and one with FaceBook.

    Enter the Era of E-Mail

    Now, E-Mail, it turns out, gets sent around quite a bit. I know that I still get quite a bit of it these days. Unfortunately, some entrepreneurial folks have figured out that the powers from the dark side enable them to use E-Mail for negative reasons as well. I’m speaking of SPAM of course, which currently accounts for about 75% of my E-Mail. [On a side note: I suspect that spam via paper-mail (otherwise known as bulk-advertising) is the one thing keeping most of the world’s post offices still in business.]

    Now, unless you’re a protocol geek like I am, you may not know that E-Mail that needs to get sent from one server to the next also uses DNS records that translate human-readable domain names (like hardakers.net) into IP addresses (like 168.140.236.43 and 2001:470:1f00:187::1). So, lets say you need to email youraunt@hardakers.net the first thing that your ISP does when you ask it to deliver a letter is to look up the IP address.

    What’s supposed to happen

    Normally when you look up where to send something you’ll get a few answers, nicely prioritized by where you should try them first:

     # dig +short hardakers.net mx
     5  mail6.hardakers.net.
     10 dns66.hardakers.net.
     20 dnsm3.hardakers.net.
    

    This shows us (or more appropriately, your ISP) to try and send the mail first to mail6.hardakers.net (priority level 5) and if that fails to trydns66.hardakers.net and then finally to dnsm3.hardakers.net. The server then starts by looking up the numeric address for the first one and then trying to talk to it. If one doesn’t work, it should skip to the next one an keep trying till it has no more to try and then will give up. (And by “give up” I mean, “keep trying for another 7 days or so at regular intervals”.)

    So, lets look up the address of the first one. We’ll look up both the IPv4 and the IPv6 address for it:

     # dig +short mail6.hardakers.net A
     # dig +short mail6.hardakers.net AAAA
     2001:470:1f00:187::1
    

    Note how, in this case, there is no IPv4 address (the line ending with an A didn’t get an answer). There is only an IPv6 address (the answer to the line ending with AAAA). This is perfectly legal, and was actually set up this way intentionally. I wanted to be ready for the cometh of IPv6 and was encouraging mail agents around the world to try me first over IPv6. I thought that was rather good of me: exercise early, exercise often (which reminds me: I’m late for my bike ride).

    So, this has been working quite well for many years (I’ve been quite anxious for IPv6 to take off). Not only that, it likely even reduced some of my spam since many spammers don’t try the remaining listed addresses and rarely have IPv6 support. Spammers don’t even pretend to be compliant with anything. Especially morals.

    Enter btconnect, a UK ISP

    BTConnect is (supposedly) the biggest ISP on the other side of the pond from the United States. They decided to add in another rule to the SMTP protocol: every MX record MUST point to a valid address. IE, you couldn’t create a record for bogus.hardakers.net and use it as an MX record without adding an IP address for it. They did this to try and ensure that the remote address was legitimate and then refuse to send it for their customers (folks like you and me sitting at home on couches; they’re just British couches) if it couldn’t do a proper address lookup. But it turns out a lot of people (who now hate BTConnect) were intentionally putting in fake MX records with no matching A record to try and subvert spammers. The end result is that BTConnect clients are unable to send mail to any domains that were fighting spam in this way. I’m not going to argue which side is being legal here. They’re both doing things that are “unintended”.

    But what’s worse is that BTConnect assumes that the whole world is IPv4-based and treated my perfectly legal AAAA-only record mail6.hardakers.net entry as bogus. This prevented an associate from being able to email me (about designing protocols, ironically). Bad Bad BTConnect! (no bone!) You need to get with the game, because the IPv4 game is about over at this point. And stop hacking protocols because you’re affecting your client’s ability to conduct daily business by sending legitimate E-Mail.

    Enter the (new) Era of Facebook

    Facebook (unfortunately, IMHO) is trying to get everyone to communicate with each other solely through their website. The good news is that they’re actually trying to be up on the IPv6 front and even have an IPv6-only version of their website available. (If you can visit successfully it means you and your ISP is IPv6 enabled. But you’re probably not since 99% of the ISPs out there are not yet compliant).

    Now, many people are actually paranoid about deploying IPv6 enabled infrastructure too quickly and often attempt trickery to try and ensure that if some user out there is trying to get to them that they can. Rather than trust a user’s ISP to have correctly set up IPv6, they assume that all other ISPs out there are IPv6 broken even if they might not be. To reword that in simple terms: many places try and intentionally prevent you from reaching them over IPv6. Because they trust IPv4 and “just aren’t sure” about IPv6 yet. Hence the reason you have to go to a different domain name if you want to use IPv6 with Facebook, and they’re default web page (www.facebook.com) isn’t IPv6 compliant.

    Facebook does this IPv4-only hack in a bit more tricky, and DNS-illegal, sort of way. Here’s the nitty gritty details that will make DNS-experts cringe (but most other people won’t catch the problems). First, this all has to do with apps.facebook.com, which is where Facebook sends you to get your virtual hands dirty by tending to your screen through planting green pixels into fields of brown pixels. So, lets see what it takes to look up address records for apps.facebook.com.

     # dig @glb1.facebook.com. apps.facebook.com AAAA
     apps.facebook.com.      30      IN      CNAME   star.facebook.com.
    
     # dig @glb1.facebook.com. apps.facebook.com A
     apps.facebook.com.      30      IN      A       66.220.153.28
    

    Now, the DNS specialists here will immediately point out that what you see above is illegal in the DNS protocol world. My co-worker, who has memorized the RFCs better than I have, nicely extracted the right quote about this:

     "If a CNAME RR is present at a node, no other data should be 
     present; this ensures that the data for a canonical name and its aliases
     cannot be different.  This rule also insures that a cached CNAME can be
     used without checking with an authoritative server for other RR types."
    

    To reword that in simple terms: you can’t have a CNAME and an A name existing for the same record (even for different query types, like A and AAAA).

    Now… Did this break something? Yes.

    First, I found one web-browser/DNS-stack combination that refused to go further. The instant it got a serious error with a record while searching for an IPv6 address, it gave up and didn’t try to find an IPv4 address. Not exactly wise either, but not illegal. Ironically, this was the exact sort of thing that the Facebook DNS hackery is trying to prevent: the customer not getting to the site. And some green electronic crops probably turned brown and withered. Electronically.

    This DNS hackery also causes the most popular recursive name server in use today to be equally as annoyed with AAAA queries:

     # dig apps.facebook.com aaaa
     ...
     ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 31717
    

    Update: 2011-01-26

    They seem to have now realized that the above breaks thing. So they’ve started doing different illegal things in hopes that it would magically start working.

    # dig @ns4.facebook.com. apps.facebook.com ns
    ;; AUTHORITY SECTION:
    apps.facebook.com.      30      IN      NS      glb2.facebook.com.
    apps.facebook.com.      30      IN      NS      glb1.facebook.com.
    
    # dig @glb2.facebook.com. apps.facebook.com ns
    ;; ANSWER SECTION:
    apps.facebook.com.      30      IN      A       69.63.189.62
    

    Yes, you read that right: query for a NS record to ensure it’s accurate and you get back an A record instead. That’s what you really wanted, right?

    Conclusions

    The biggest conclusion here: if you’re going to hack, do so to speed things up. Do so to make things better. Do so to make things more interoperable. But do not assume that you’ve considered all of the corner cases with a protocol when you decide to modify the rules. The results will likely be less customers reaching your service, not more.

    Oh. And IPv6 is coming. Please get ready. But without the hackery.

Comments (2)

My Friend’s Older Conversation With AT&T

I recently posted my both funny and depressing text message conversation with AT&T as a result of their spamming me (which, by the way, I still haven’t managed to turn off mostly because I gave up).

A friend of mine (WY0X) gave me permission to post his recap of his on-the-phone conversations with AT&T about a similar, but even worse, problem:

Be really careful with those. I recently had to deal with a scam on Karen’s phone. Apparently AT&T has made it super-easy for 3rd party “providers” to send you a text message, and if you reply AT ALL, that’s all AT&T can see in their system. The 3rd party company then uses the convenient “upload an XML file full of phone numbers and any arbitrary price we desire to extract from said phone users” file to AT&T for AT&T to handle the billing. When you call to contest this $19.99 monthly “subscription” that shows up on your AT&T cell phone bill, they say, “Well, we see you exchanged text messages with the company in our system. You must have accepted an offer from them.” Only after an hour of explaining that my wife was NOT that stupid and NEVER replied to any message that said “will you sign up?”… did they offer to refund the charges and set up “Parental Controls” (HA!) on both of our accounts so NO 3rd party could ever bill anything on them. I highly recommend to all on AT&T.

So seriously, some company could send you this message “Hey, what you doing tonight?” from a number you don’t recognize, and you could send back, “Who is this?” and AT&T would see that as “proof” that you had a business relationship with them. When I pointed this out to an AT&T supervisor they said, “I suppose that could happen — we are getting a ot of complaints right now. However I’ve refunded the fees this month.” … Okay lady, how do I stop it FOREVER, and why are you making it easier for unknown third parties to bill me, your customer, than it is for me to opt-out of such shenanigans? Oh by the way, I will be reporting this to our State Attorney General since it’s generally considered bad business to bill for another party whom you can’t prove has a business relationship of any kind with your customer. You yourselves say you can’t see the text messages for privacy reasons… so how do you know EVERY one of the bills you’re sending out isn’t a scam such as I described?

She was like a deer in headlights, and started reading from the script again. After about four attempts I said, “What would you say if this were my 12 year old’s phone?” “Oh, we have Parental Controls for that!” Well, there ya go lady… fire me up some “Parental Controls” on both lines, please… my wife’s and mine. “But you won’t be able to order any other services!” “That’s absolutely correct, and I can’t see us ever NEEDING those other services either, but my wife did enjoy a few of the Trivia questions she received once a month from these idiots.” That was pretty much the end of the conversation at that point. 30-45 minutes of my life wasted, stopping my cell carrier for billing me for other people’s scam businesses.

AT&T *did* do the “right thing” and refund it, but there were clueless about why I was upset about it. I finally got down to asking everyone I talked to there: “Please prove I have a business relationship with XYZ third party company, which allows you to bill me for their services.” They were dumbfounded. There was nothing on their (so called) customer service scripts to handle someone asking such a “tough” question.

I love the fact I have intelligent friends. I hate the fact I have less-than-intelligent companies.

Comments (4)

Today’s Conversation With AT&T

So, AT&T has gotten in the recent habit of spamming you with “tips”. “tips” are really “spam” when they’re trying to get you to do things that will eventually make them money (ie, by using more of their services).

Here’s the “tip” I got today:

AT&T:
AT&T Free Tip: Get weather, movie or restaurant
tips from Google on your phone.
Text HELP to 466453 to get started.
To end Tips send no to 4436

Easy, enough, I thought.

Me:
No

And a few seconds later, I got the response back:

AT&T:
Sorry, we did not understand
your response.  Reply ONLY the
word "YES" to activate the 4
channel/$6 Mobile TV plan

HUH???

Ok, I thought. Maybe it’s because my phone auto-capitalized the word “No”

Me:
no

And a few seconds later, I got the response back:

AT&T:
Sorry, we did not understand
your response.  Reply ONLY the
word "YES" to activate the 4
channel/$6 Mobile TV plan

NOOOOOOOOOOOOOOOOO

sigh…

[UPDATE 2009/08/20: Read the follow-on story from a friend describing his conversation with AT&T]

Comments (6)

I’ve Got Mail!

Many people have asked me in the past to explain how in the world I handle so much E-Mail. Since it’s such a long story consisting of many parts, I rarely answer it. Also because I think it’s easier to describe using diagrams, examples and sciency looking graphs. In fact, it turns out, that even describing how much mail I get, and why I get so much, is a story in itself. So this is part #1 of like 2 that describe my E-Mail setup. This first part consists entirely of a description of how much mail I get in the first place. Believe you me, it’s quite a bit.

So, how much raw E-Mail do I get?

So before this, I actually wasn’t even sure. It turns out that the answer is simply put as “a lot”. A whole heaping lot. Much of it is, of course, spam (I don’t have an exact percentage at the top of the article). But even assuming that it’s 90% spam, which likely isn’t the case, I still receive a lot of mail. And it’s all my fault because, simply put again, I want that much (gasp). Ok, maybe not the spam.

So let’s start off with some (sciency) graphs showing the raw numbers of E-Mail that I attract. To really understand it all, I need to break it down into chunks and study each piece.

The Long Haul: Mail Per Month

The first graph below shows the amount of mail per month that I received over the last year-ish.

Mail Per Month

Mail Per Month

The important thing to notice in the above graph is that the amount of mail I receive isn’t even consistent month to month as it ranges from 6500 in a month to almost 13,000. Sure, February has less days in it so you’d expect it to be lower because all months were not actually created equal. But even those slight variations don’t account for the huge swing in differences from month-to-month. Some of it certainly is because my work-load with respect to communication comes and goes. Some months I simple receive a lot more mail for work related projects than other months (usually as deadlines approach and panic ensues).

But the biggest reason for the fluctuation is that spam comes in waves too. Just looking at my day to day E-Mail it’s always amazing how much the incoming spam varies. Some of my email addresses (I have many) are widely published and thus widely harvested by the evil address-collecting spam machines. This results in a huge amount of my mail being spam, unfortunately.

But beyond that, you can see trends in the graph where, for a while, there was a significant drop in incoming E-Mail. This was because a major spam ring was taken out of service a while back and that’s where the huge dip comes from (you should have had a spam dip in that time frame too). However things are unfortunately back to spam-normal again. Do you feel like all of a sudden you’re getting more spam than you used to? Well, you’re not alone. Eventually the next spam king-pin took over and we’re back to an abysmal spam rating of something like 90% on-average spam. The peace was nice while it lasted, but now I’m back to evaluating whether my rich Nigerian uncle really did leave me a fortune or not. Fortunately if he didn’t, it turns out I have 1094 other rich Nigerian uncles who also amassed a small fortune if only I could pay the wire-transfer fee to get it safely into my bank account.

The Shorter Haul: Mail Per Day

The next graph shows the amount of mail per-day that I received mostly during the month of May (2009).

Mail per Day

Mail Per Day

There are a couple of interesting actifacts that you can hopefully spot in this graph as well. You’ll notice that has a definite repeating cycle. The cycle is simply this: the low spots are on the weekends. I.E., by far the most mail I receive comes during the work week. This isn’t surprising to me since much of the mail I receive is work related in the first place. Which begins to tell you how much mail I receive for work-related purposes.

Ok, But What Exactly Is It All Then?

There’s the real question. If I get bombarded with so much mail, how much do I actually read??? So, lets pick a day. Ok, let me pick a day since you couldn’t help me there. I picked June 3rd, 2009 which is a Wednesday.

On Wednesday June 3rd, 2009 I received 4514 individual pieces of email. Now, lets quickly do the math shall we? If I tried to read all of that and I did so in, say, a 10-hour period (8 hours for work and 2 hours of reading just the personal mail) that would be 4514/(60*10) = 7.523 email messages per minute that I would have had to read. Though that might be possible if they were all short, I assure you that the people I correspond with are not well known for writing short, brief messages. Long winded rants are, unfortunately, much more common.

Weeding Out The Spam And Rich Uncles

So, the first thing we need to do is remove the auto-discarded spam and duplicate messages (I have a nice filter that removes duplicates so that I’m never bothered twice because someone put me on both the To and the CC line or because I’m subscribed to multiple mailing lists that the message went through). It turns out that in the 4514 messages, I auto-discarded 3163 of them. That’s roughly 70% of them. Since that’s most likely spam, that’s probably close to the real spam percentage that I receive: 70%.

Looking At What’s Left

That leaves only 4514 – 3163 = 1351 messages left to handle. And if I had 10 hours to sift through 1351 messages in my INBOX I could do so at the leisurely rate of 2.25 messages per minute. That’s almost doable (at least if I blacklist a few of the people that mail me the most long of the long winded rants).

But here’s the real secret. Of all those 1351 messages, only 10 actually ended up in my INBOX. That’s important, so let me repeat it. In bold. Only 10 messages actually ended up in my INBOX. And there’s the secret to my success: everything else gets filtered out and placed somewhere else. In fact, if you really look at how I treat mail it turns out I have lots of INBOXes. The one that only received 10 is the one that is just mail sent to my personal account. My work addresses only received 16 to the work INBOX equivalents.

Dealing With Mail in Clumps

So what is really happening, behind the scenes, is that my mail for the day actually got sorted into 44 different places. Not just 1 or 2, but 44. That lets me sort and prioritize my mail so that the important stuff I can see right away in small INBOXes and they don’t get lost in the bulk of the rants.

In the rest of the mail: 638 messages went to a folder for fedora developers consisting of auto-generated emails describing upcoming changes to the operating system. Another 110 were long winded rants about the same operating system that went to a discussion folder (at least I bet they were long winded rants; I didn’t study most of them in detail). 102 were about my favorite linux-based TV recording software: MythTV. Another 120 E-Mails were messages that were most likely spam but placed in a folder for me to double check them because the spam-filtering software wasn’t confident enough to just throw them away without my help.

And so on. You don’t want more of a breakdown than that. Trust me.

Thank You For Waiting;

You’re Message Important To Us Me

That being said even my real INBOX occasionally turns into a black tar-pit where it seems I can never stay afloat. Even with only 10 messages going into it for a particular Wednesday I’m not perfect and frequently I “mean to respond later” but fail to get back to it in a timely manner.

The important thing is that the people that really matter (you) do end up in my highest priority folder (assuming you’re not one of those long-winded ranting folks). Everyone should filter their mail to put their most important email messages first in their lives and let the others stew until they’re nice as savory. I’m going to come back at some point in the not too distant future (I hope) to provide additional guidance for “getting ahead of your email”.

I’ve actually learned something from this long winded analysis too. So I’m glad I wrote it up. What I’ve learned is that I should have a severe headache and should step quietly away from the computer. So I think I will.

Comments (1)