How We Broke the Internet with 12 Characters


Ok well, maybe saying we broke the internet is a bit heavy handed. But now that you are reading…

The other day we ran into POODLE. This POODLE is not to be confused with one of our four legged friends - a Poodle. No, I am refering to this POODLE - Padding Oracle On Downgraded Legacy Encryption.

Here’s the quick on POODLE: An old security protocol, still widely used on the internet, was discovered to have a nasty security vulnerability. This is another recent OpenSSL vulnerability (see Heartbleed). Basically, no one should be using SSLv3 because it is considered legacy. Lots of parts, pieces and tools still do - including ours.

Currently my team is building a RESTful service. This service is dependent upon calls to another service - we’ll call it BadDoggie here so we don’t get confused with This and That. BadDoggie did the good and proper thing - they flipped a switch and started rejecting all calls using SSLv3. So quite naturally, our calls to their service suddenly “broke”.

Without warning, our tests started failing and our code hitting BadDoggie no longer functioned. Myself and another dev spent a solid day chasing the error down and found POODLE as the root cause (not so long really in the realm of nasty bug-land). It turned out to be a 12 character fix - switch the String ‘SSLv1_method’ to the String ‘TLSv1_method’. Easy.

Signal to Noise Ratio

After that long intro - here are my observations. First, the internet is a much more fragile animal than people realize. 12 characters caused our service to fail. That may sound insignificant but POODLE affects more than just our silly little service. Our service is hitting one end point for one API and is just one application. Now take that scenario, multiply it by all the consumers hitting all of BadDoggie’s end points, and POODLE starts getting ugly. Now if we extend that line of thinking to every other service across the internet that has stopped or will stop supporting SSLv3 and we see the potential a very large and wide impact. Just 12 characters. Fragile. Scary.

At the time I was reminded of the not-so-tonuge-in-cheek post Programming Sucks. The author called stuff out like POODLE and BadDoggie’s breaking change; and the fact that no one really knows how everything and all the parts work or how the hell they keep working. Spot on.

The second issue was a communication break down. There are three potential scenarios here:

  1. BadDoggie did not communicate their change
  2. BadDoggie did communicate their change but we never received it
  3. BadDoggie communicated their change, we received it but did nothing

In the case of #1: BadDoggie happens to be a very well known and reputable company. It is possible that the communication never went out - despite their reputation. People do make mistakes. When you have thousands of consuming applications it might get easy to overlook one. Or some dev team of theirs didn’t update the correct web page. Whatever. It happens.

In case of #2: we have to hope that the change actually reached us through bureaucratic channels. I currently work in a very large organization. It is reasonable to question if that communication got swallowed by a bureaucratic machine and never was received by my team. There is good chance that some guy is sitting at a desk right now wondering why my team didn’t subscribe to some list somewhere that would have alerted us to the change.

In the case of #3: it is far more likely that someone (or all) on my team received the communication. We received it, heck, maybe we even read it. But in all the noise - lost the signal. Between meetings, email chains, corporate communications, policy changes, leadership changes, open enrollment, time cards and similar corporate noise (not implying good or bad) - it can be hard to find the signal. On top of that we constantly have to be changing, learning, spending time learning what not to learn, attending after hours meetups, and working on after hours projects just so that we can remain relevant. Find the signal in all the noise.

It is not unlike those annoying people on Twitter that you follow and immediately unfollow the next day:

“Oh, awesome! Dr. Joe. Glad I followed him. Wait… what? That was stupid.”
scroll, scroll, scroll
“Him again? Cheetos? Really? Cheetos?”
scroll, scroll, scroll
“wait.. what? Is this really the same guy on that conference video?”
350 completely useless self-promoting tweets in one day later…
“Again?!? What a friggin’ moron. How the hell does he have a job.”
click
unfollow
refresh

Now, pick 12 characters out of that kind of noise and be convinced that they are important to you. Then, remember that you used them in your code in the first place - a year ago. Again see Programming Sucks. Spot on.

The point I am getting at is that for us, POODLE was less of a technology issue than a communication issue (encode - send - receive - decode). This is true of many ‘issues’ and many such ‘issues’ are out there hiding in the dark.

The more I learn and the longer I work in this industry, the more I am amazed that things hold together as well as they do. It is a wonder that our internet, our web apps, and our mobile apps usually work to some degree or another - most of the time. With that much duct tape and baling wire it is truly a wonder anything works at all.

Technical Advice Regarding POODLE

  1. Turn off SSLv3 to consumers. Warn them of course, but see rant above concerning communication. It is likely to fail somewhere.
  2. Auto-negotiate any communications to ‘secure’ services your apps depend upon.