Good News: Ubuntu Now Ships With unattended-upgrades On By Default!

Last week, we got a strange support request. One of our users had received the following notification:

Hey! Good job.

We’ve detected that you patched some vulnerabilities.

Here’s what changed:

CVE-2016-8704

is no longer present in:

[name of server redacted]

This came as a surprise, since they knew for a fact that no one had touched the package in question, and they were certain they had not enabled unattended upgrades.

Somehow, the vulnerability magically got patched and they wanted to know: what’s going on?

The vuln is a pretty serious remote code execution vulnerability in memcached, and as far as we could tell our user was indeed using the most recent version available for their distribution — 1.4.25-2ubuntu2.1. This version was released on November 3rd, and we could see from our logs that memcached got upgraded that same day.

How did it happen without them knowing about it? The only thing unique about their configuration was that they’re running the recently released Ubuntu 16.10 (Yakkety Yak)¹.

We dug around, and set up some test Yakkety boxes, and lo and behold: unattended upgrades is automatically enabled by default!

For those of you who are unaware, unattended-upgrades is a debian/ubuntu package that, well, does what it says on the tin: it automatically upgrades your packages. The most common configuration, and the one enabled in 16.10, is to upgrade any packages that have a published security patch. Unattended upgrades does this by checking and installing any updates from the ${distro_codename}-security repository.

Ubuntu/debian has had this for years, but it simply was never turned on by default. After a year of many security fails, this news warmed the cockles of my heart and gave me hope for our future! And what’s even amazing is that they turned it on without any fanfare.

It’s the quiet, simple changes that provide the biggest wins.

Of course, there are reasons why administrators don’t always want software to be upgraded without their input. And if it does get updated, there are good reasons for knowing exactly what vulnerabilities are being patched when. Appcanary exists in order to allow you to be notified about security updates without automatically installing them, and to have insight into what’s going being installed if you are patching automatically.

But if you don’t have the capacity to actively manage the packages on your linux systems (and even if you do!), we implore you: set up unattended-upgrades!

Ubuntu enabling this by default is a great sign for the future.

Not running Ubuntu 16.10?

Here’s how to turn on unattended upgrades

Ansible: jnv.unattended-upgrades
Puppet: puppet/unattended_upgrades
Chef: apt
If you’re using the server interactively:

sudo apt-get install unattended-upgrades && sudo dpkg-reconfigure unattended-upgrades

Set up manually: sudo apt-get install unattended-upgrades and

In /etc/apt/apt.conf.d/20auto-upgrades:

APT::Periodic::Update-Package-Lists "1";
APT::Periodic::Unattended-Upgrade "1";

In /etc/apt/apt.conf.d/50unattended-upgrades

// Automatically upgrade packages from these (origin, archive) pairs
Unattended-Upgrade::Allowed-Origins {    
// ${distro_id} and ${distro_codename} will be automatically expanded
    "${distro_id} ${distro_codename}-security";
};

// Send email to this address for problems or packages upgrades
// If empty or unset then no email is sent, make sure that you 
// have a working mail setup on your system. The package 'mailx'
// must be installed or anything that provides /usr/bin/mail.
//Unattended-Upgrade::Mail "root@localhost";

// Do automatic removal of new unused dependencies after the upgrade
// (equivalent to apt-get autoremove)
//Unattended-Upgrade::Remove-Unused-Dependencies "false";

// Automatically reboot *WITHOUT CONFIRMATION* if a 
// the file /var/run/reboot-required is found after the upgrade 
//Unattended-Upgrade::Automatic-Reboot "false";

16.10 is not a Long Term Support release. Regular Ubuntu releases are supported for 9 months, while April releases on even years (i.e. 14.04, 16.04, etc…) are designated LTS, and are supported for 5 years. It’s thus more common to see 12.04, 14.04, and 16.04 in use on servers over other Ubuntu releases. This particular user has a good reason for running 16.10. ↩

We Left Clojure. Here's 5 Things I'll Miss.

On October 11th, Appcanary relied on about 8,500 lines of clojure code. On the 12th we were down to zero. We replaced it by adding another 5,700 lines of Ruby to our codebase. Phill will be discussing why we left, and what we learned both here and at this year’s RubyConf. For now, I want to talk about what I’ll miss.

1) The joy of Lisp

There’s something magical about writing lisp. Alan Kay called it the greatest single programming language ever devised. Paul Graham called it a secret weapon. You can find tens of thousands of words on the elegant, mind-expanding powers of lisp¹. I don’t think my version of the Lisp wizardry blog post would be particularly original or unique, so if you want to know more about the agony and ecstasy of wielding parenthesis, read Paul Graham.

What’s great about Clojure is that while Ruby might be an acceptable lisp, and lisp might not be an acceptable lisp, Clojure is a more than acceptable lisp. If we avoid the minefield of type systems, Clojure addresses the other 4 problems Steve Yegge discusses in the previous link².

2) Immutability

The core data structures in clojure are immutable. If I define car to be "a dirty van", nothing can ever change that. I can name some other thing car later, but anything referencing that first car will always be referencing "a dirty van".

This is great for a host of reasons. For one, you get parallelization for free — since nothing will mutate your collection, mapping or reducing some function over it can be hadooped out to as many clouds as you want without changing your algorithms.

It’s also much easier to can reason about your code. There’s a famous quote by Larry Wall:

[Perl] would prefer that you stayed out of its living room because you weren’t invited, not because it has a shotgun.

He was talking about private methods, but the same is true for mutability in most languages. You call some method and who knows if it mutated a value you were using? You would prefer it not to, but you have no shotgun, and frankly it’s so easy to mutate state without even knowing that you are. Consider Python:

str1 = "My name "
str2 = str1
str1 += "is Max"
print str1
# "My name is Max"
print str2
# "My name"

list1 = [1, 2, 3]
list2 = list1
list1 += [4, 5]
print list1
# [1, 2, 3, 4, 5]
print list2
# [1, 2, 3, 4, 5]

Calling += on a string returned a new one, while calling += on a list mutated it in place! I have to remember which types are mutable, and whether += will give me a new object or mutate the existing one depending on its type. Who knows what might happen when you start passing your variables by reference to somewhere else?

Not having the choice to mutate state is as liberating as getting rid of your Facebook account.

3) Data first programming

Walking away from object-oriented languages is very freeing.

I want to design a model for the game of poker. I start by listing the nouns³: “card”, “deck”, “hand”, “player”, “dealer”, etc. Then I think of the verbs, “deal”, “bet”, “fold”, etc.

Now what? Here’s a typical StackOverflow question demonstrating the confusion that comes with designing like this. Is the dealer a kind of player or a separate class? If players have hands of cards, how does the deck keep track of what cards are left?

At the end of the day, the work of programming a poker game is codifying all of the actual rules of the game, and these will end up in a Game singleton that does most of the work anyway.

If you start by thinking about data and the functions that operate on it, there’s a natural way to solve hard problems from the top-down, which lets you quickly iterate your design (see below). You have some data structure that represents the game state, a structure representing possible actions a player can take, and a function to transform a game state and an action into the next game state. That function encodes the actual rules of poker (defined in lots of other, smaller functions).

I find this style of programming very natural and satisfying. Of course, you can do this in any language; but I find Clojure draws me towards it, while OO languages push me away from it.

4) Unit Testing

The majority of your code is made up of pure functions. A pure function is one which always gives the same output for a given input — doesn’t that sound easy to test? Instead of setting up test harnesses databases and mocks, you just write tests for your functions.

Testing the edges of your code that talk to the outside world requires mocking, of course, and integration testing is never trivial. But the first thing you want to test is the super-complicated piece of business logic deep in your codebase. The business logic your business depends on, like for instance computing whether your version of OpenSSL is vulnerable to HeartBleed.

Clojure pushes you to make that bit of code a pure function that’s testable without setting up complicated state.

5) Refactoring

Here’s a typical clojure function

(defn foo [a b]
  ;; some code here
  (let [c (some-function a b)]
    ;; a ton of 
    ;; complicated code here
)))

In lisp-speak, a parenthesized block is called a “form”. The foo form is the outer form, and it contains the let form, which ostensibly contains other forms that do complicated things.

I know that all the complicated code inside of the let form isn’t going to mutate any state, and that it’s only dependent on the a and b variables. This means that refactoring this code out into its own functions is as trivial as selecting everything between two matching parentheses and cutting and pasting it out. If you have an editor that supports paredit-style navigation of lisp forms, you can rearrange code at lightning speed.

My favourite essay of this ilk is Mark Tarver’s melancholy The Bipolar Lisp Programmer. He describes lisp as a language designed by and for brilliant failures. Back in university, I ate this shit up. My grades were obvious evidence of half the requirement of being a lisp programmer. ↩
I’m aware that clojure’s gensym does not a hygenic macro system make. But, if you have strong opinions on hygenic macros as they relate to acceptable lisps, this article might not be for you. ↩
For the record, I know that this isn’t the “right” way to design OO programs, but the fact that I have to acknowledge this proves my point. ↩

Slippery exceptions in Clojure and Ruby

Recently I spent a couple of hours banging my head against code that looks like this:

(defn parse-file
  [contents]
  (remove nil?
          (code-that throws-an-exception)))

(defn consume-manifest
  [contents kind]
  (try+
    (parse-file kind contents)

    (catch java.lang.Exception e
      (throw+ {:type ::bad-parse :message "Invalid file."}))))

(defn check
  [file kind]
  (try+
    (let [artifacts (consume-manifest (slurp file) kind]
      (if (not-empty artifacts)
        … etc

And much to my surprise, I kept getting the kind of exception parse-file generates deep within the check function, right up against (not-empty artifacts).

I’ve grown somewhat used to Clojure exceptions being unhelpful, but this was taking the cake. Coming from Ruby and pretty much every other language, this brushed up rudely against my expectations.

You can tell that exceptions in Clojure are unloved, given how cumbersome handling them natively is. We’d had some trouble in the past getting slingshot to behave properly, so I zero'ed in on there. Don’t all exceptions in Java descend from Exception?

Stepping through check in the Cursive debugger, I could see that the exception generated was a pure java exception, not a slingshot exception generated by throw+ in consume-manifest. This meant that the exception was slipping straight through uncaught. But calling consume-manifest directly in my repl was causing it to work as intended.

What the hell was going on?

Max took one look at it and set me straight. “Oh. remove is lazy, so the exception isn’t being throw until the lazy sequence is accessed.”

Excuse me? I had an angry expression on my face. He looked sheepish.

“How else would a lazy data structure work?”

Well. I would expect a catch java.lang.Exception to catch every exception.

“Right, well, hear me out. What if you had the following Ruby?”

def lazy_parse(filename)
  File.open(filename).each_line.each_with_index.lazy.map do |line, i|
    raise "You can't catch me, I'm the exception man" if i == 5
    line
  end
end

def consume_file
  begin
    lazy_parse("Gemfile.lock")
  rescue
    puts "Woops, an exception. Good thing we caught it."
  end
end

file = consume_file
puts file.first(10)

(Did you know that Ruby has had lazy enumerables for almost four years now? Worth reading Shaughnessy as well.)

That shut me up good. And in case you were wondering, the stack trace is also useless in Ruby; there simply isn’t any context for it to preserve. Frankly, I’ve just never had to think about lazy data structures in Rubyland; they’ve not been super popular.

It’s hard to reason about this. I want to write wrapper functions that make my code safe to consume downstream. This isn’t feasible for any functions iterating over potentially infinite lazy sequences, but fortunately for us we need to fit this file into memory anyways. In Ruby we’d have to forcibly iterate over every element of the sequence and check for exceptions, but Clojure makes this easy with doall:

(defn parse-file
  [contents]
  (doall (remove nil?
                 (code-that throws-an-exception))))

And now, things behave as intended.

A Gentle Intro to Datomic

We use Datomic as one of our datastores, and have been really enjoying it so far.

I gave a talk to my local Clojure meetup that provided a gentle introduction to Datomic and highlighted some cool features.

The slides are below:

How being lazy about state management in Clojure caused us downtime

On November 10th, we suffered some downtime as our backend application mysteriously crashed and had to be restarted. After looking at our process monitoring service, I found a very suspicious graph:

Somehow, we managed to spin up more then 30,000 threads right before the application crashed. This was very likely the cause of the failure, but how did it happen?

An easy way to get an idea of where a thread leak is coming from is to look at the thread names. In Java you can do this with jstack -l $PID.

To get a list of all thread names of a Java application sorted by most common name, you can do:

jstack -l $PID | grep daemon |  awk '{ print $1 }' | sort | uniq -c |   sort -nr

Which on our end yielded something like this:

  30000 "Analytics"
      2 "Datomic
      2 "C2
      1 "worker-4"
      1 "worker-3"
      1 "worker-2"
      1 "worker-1"
      1 "Timer-0"
      1 "Thread-5"
      1 "Thread-4
      1 "Thread-3
      1 "Thread-2
      1 "Thread-1
      1 "Thread-0

Hmm…

Background

Our backend is written in Clojure and we use Stuart Sierra’s component framework to manage most of our application’s state and lifecycle. Normally this should prevent runaway threads, but unfortunately for us our analytics client’s state was managed independently of the framework. To explain why, I need to first delve a little in to how component works.

Regardless of how beautiful and functional it may be, any application that talks to the outside world will need to manage some state representing these external resources. We need to manage our database connections, our clients for external APIs, our background workers, etc.

One way to deal with this in Clojureland is to create a global singleton object for representing each stateful piece, possibly wrapped in an atom. This feels lacking, though. You still need a way to initialize all these singletons on startup, and having mutable singletons everywhere goes against what I would consider good Clojure style.

Component solves this problem by implementing dependency injection in a Clojurelike way. You define a graph that represents each stateful piece, how they depend on each other, and how each piece starts and stops. On system start, component boots each piece in the right order, passing along references to dependencies when they’re needed.

For example, Appcanary’s dependency graph looks (something) like this:

(defn canary-system
  []
  ;;Initialize logging
  (component/system-map
   :datomic (new-datomic (env :datomic-uri))
   :scheduler (new-scheduler)
   :mailer (component/using  (new-mailer (env :mandrill-key))
                             [:datomic :scheduler])
   :web-server (component/using (new-web-server (env :http-ip) (Integer. (env :http-port)) canary-api)
                             [:datomic])))

The mailer depends on datomic and the scheduler, the web server depends on datomic, and both datomic and the scheduler don’t depend on anything.

Like all the other components, the new-datomic function is a constructor for a record that knows how to start and stop itself. On system start, all the components are started, and the dependencies are filled in.

Sometimes component feels like overkill

Component is great, but it didn’t fit our analytics engine usecase. We use segment.io to handle our app analytics, and we needed to maintain a client to talk to it. An analytics event can potentially be called from anywhere in the app, but it’s cumbersome to pass an analytics client reference to every component, and into every analytics call. If every component depends on something, it feels like maybe it should be a global singleton. Futhermore, I don’t want my components to know much about the analytics client at all; I just want them to know how to trigger events.

What I want to have is an analytics namespace which contains all the events I may want to trigger, and wraps the client inside all of them. This lets me do something like (analytics/user-added-server user) inside of the code that handles server creation.

(One thing to note is that while there is a clojure segment.io client, it’s based off a 1.x release of the underlying java library, while we wanted to use features only available in the 2.0 release. Because of that, I wrote an analytics namespace that called the java library directly).

The first pass of creating the client looked something like this:

(defonce client
  (.build (Analytics/builder (env :segment-api-key))))

(defn track
  "Wrapper for analytics/track"
  [id event properties {:keys [timestamp] :as options}]
  (when (production?)
    (.enqueue client
              ;; Java interop to build the analytics message here)))

There’s only one problem with the above code: the segment api key is loaded from an environment variable.

We deploy appcanary in a pretty standard way – we compile an uberjar and rsync it to the server. API keys live in environment variables, and the production API key is only going to live on the production server. So, at compile time, we have no way of knowing what the segment api key is. As a result, the analytics client needs to be built at runtime and not compile time in order to have access to the api key.

This is where I get lazy

The obvious thing to do to build the analytics client at runtime is to wrap it as a function:

(defn client
  []
  (.build (Analytics/builder (env :segment-api-key))))

(defn track
  "Wrapper for analytics/track"
  [id event properties {:keys [timestamp] :as options}]
  (when (production?)
    (.enqueue (client)
              ;; Java interop to build the analytics message here)))

I saw three downsides here:

We’ll lose some efficiency from not reusing the TCP connection to segment.io
It’s possible that we’ll have to waste a bit of time authenticating the analytics client on each call
We’ll spawn an extra client object per call, which will be garbage collected right away as it goes out of scope immediately after the track call

The above three things aren’t intrinsically bad, and it seemed like optimizing the performance of your analytics engine early on was a wasteful thing to do in a fast-paced startup environment.

What I didn’t consider is that the java library uses ExecutorService to manage a threadpool, and shutdown must be called explicitly. Otherwise, the threads are put on hold instead of being marked for GC (see also this stackoverflow).

The fact that each client spawns a thread that isn’t cleaned up by garbage collection was not documented unfortunately.

Outcome

Every analytics call we made spawned another threadpool, which caused the thread count to grow proportionally with user activity. We hit 40,000 threads before our application crashed.

TL;DR:

We spawned a new client object on every analytics call, not realizing that the underlying library uses a thread pool that’s not shutdown on garbage collection. This is how we ended up killing the server by hitting 40,000 threads

Maven Central Security

The security of your package manager is very important to us at appcanary, and it’s important to make sure the packages you’re downloading are secure in transit.

Back in the summer of 2014, I discovered that Maven Central wasn’t using TLS or any signature verification when serving up java packages.

I gave a talk at !!con 2015 about what I did to help convince them to start using encryption.

Menu

Clojure

A 6-post collection