Programming

A 8-post collection

Improve Your Security: Port Scan Yourself

Ransomware crews are no longer satisfied shutting down hospitals and are now also going after hip tech startups. Last week, a professional crew got involved in ransoming MongoDB instances, and some 28,000 servers were compromised and held for Bitcoin.

The “attack” is so simple it barely deserves the name. The crew scans the internet looking for open and unsecured instances of MongoDB. When they find one, they log in and steal the data. The victims must pay Bitcoin to get their data back. To save on storage costs, sometimes the crew just deletes the data without bothering to save it — and the mark ends up paying for nothing. Even honor among thieves can’t survive the harsh realities of 21st century globalized late cyber-capitalism.

When reading about these attacks, there is a strong temptation blame the victim. Hospitals were at fault because of their antiquated IT departments: huge, sclerotic fiefdoms incapable of moving past Windows 98. We all know that MongoDB sucks, and that Real Programmers would never be caught dead using it. Why, they probably deserved to get hacked!

Don’t do it. As I’ve said before, victim-blaming in security is a trap that must be avoided. Attacks are varied and always changing; best practices can be indistinguishable from cargo cults, and default configurations seem almost designed to foil you.

In this case, MongoDB apparently ships without any authentication, listening to the whole wide world. No wonder so many instances got hacked!

HOWTO not get hacked

Basically, you should have as little as possible accessible to the outside world. Set up a firewall. This tweet sums up the idea best:

If you’re anything like me, the first thought when confronted with that tweet is: “is my firewall actually working right?” Would you bet money that you know what ports are open on a given server in your fleet?

Enter Nmap

trinity using Nmap Nmap as used in The Matrix

Nmap is an incredibly powerful tool that’s synonymous with port scanning. A port scanner is a program that checks to see what network ports are open on a computer.

Here’s what scanning appcanary.com with default settings from my work computer looks like:

$ nmap appcanary.com

Starting Nmap 7.12 ( https://nmap.org ) at 2017-01-16 12:57 EST
Nmap scan report for appcanary.com (45.55.197.253)
Host is up (0.030s latency).
Not shown: 993 closed ports
PORT    STATE    SERVICE
22/tcp  open     ssh
25/tcp  filtered smtp
80/tcp  open     http
135/tcp filtered msrpc
139/tcp filtered netbios-ssn
443/tcp open     https
445/tcp filtered microsoft-ds

You’ll see that 22, 80, and 443 are open. This is because we listen on SSH, HTTP and HTTPS. There are also a bunch of filtered ports. This is because some router or firewall between me and appcanary is blocking that port so it’s not clear whether it’s open or closed.

Running the same scan from a random1 Digital Ocean box gives me the true result without the intermediate network filtering.

Starting Nmap 5.21 ( http://nmap.org ) at 2017-01-16 13:04 EST
Nmap scan report for appcanary.com (45.55.197.253)
Host is up (0.0012s latency).
Not shown: 997 closed ports
PORT    STATE SERVICE
22/tcp  open  ssh
80/tcp  open  http
443/tcp open  https

Nmap has a variety of scanning options, but the most important thing to understand for TCP scans is the difference between SYN scanning and connect scanning.

If you’ve ever tried to write a port scanner (which is a great exercise in network programming and concurrency), you’ve probably tried to write a connect scanner. A connect scanner tries to connect to a port, and closes the connection after success. Nmap’s most popular mode is SYN scanning. In SYN scanning, you send a SYN packet initiating the TCP handshake. If you get an SYN-ACK back, the port is open, and you send a RST instead of an ACK as you would if you were actually connecting. This is “stealthier”, which doesn’t matter as much for our purposes, and is more efficient. Unfortunately it requires nmap to be run as root because it needs raw access to the network interface.

You can specify a SYN scan with nmap -sS, and a connect scan with nmap -sT. By default nmap will prefer to do a SYN scan if it is able to.

By default, nmap will scan the top 1000 most common ports. You can specify a specific ports or a range with -p, i.e. nmap -p22 will scan only port 22, and nmap -p400-500 will scan ports 400 through 500. nmap -p- will scan the complete port range (1-65535).

Developer, port scan thyself

Regularly port scan yourself; it’s the only way to be certain that your databases aren’t listening to the outside world. Run Nmap against your servers, and make sure that only the ports you expect are open.

To make it easier, here’s a script to do it for you. This will run Nmap, compare the output with predefined ports, and ping you on Slack if there’s a mismatch. Run it on a cron job, so you can check on the regular.

How does Appcanary fit in?

Our mission is to help you do security basics right, so you can be free to worry about the hard stuff. We’re building the world’s best patch management product because we had a deep need for one ourselves, and it automated such a vital part of a security team’s job.

When I started writing this article, I didn’t realize how well it fit into what we’re doing with patch management. To do security right, you need to implement systems and continuously verify they are working correctly. Appcanary helps you verify your patch management program, the script above helps you verify your firewall. Both without any bullshit.

We want to hear from you

Did I convince you to port scan your systems after reading the article? Are you using the script above? Something else? Let me know: max@appcanary.com.


  1. If you’re observant, you may have noticed that this version of nmap is much older than the one on my work computer. If you’re really observant, you may have noticed that it has a vulnerability

Everything you need to know about HTTP security headers

Some physicists 28 years ago needed a way to easily share experimental data and thus the web was born. This was generally considered to be a good move. Unfortunately, everything physicists touch — from trigonometry to the strong nuclear force — eventually becomes weaponized and so too has the Hypertext Transfer Protocol.

What can be attacked must be defended, and since tradition requires all security features to be a bolted-on afterthought, things… got a little complicated.

This article explains what secure headers are and how to implement these headers in Rails, Django, Express.js, Go, Nginx, Apache and Varnish.

Please note that some headers may be best configured in on your HTTP servers, while others should be set on the application layer. Use your own discretion here. You can test how well you’re doing with Mozilla’s Observatory.

Did we get anything wrong? Contact us at hello@appcanary.com.

HTTP Security Headers


X-XSS-Protection

X-XSS-Protection: 0;
X-XSS-Protection: 1;
X-XSS-Protection: 1; mode=block

Why?

Cross Site Scripting, commonly abbreviated XSS, is an attack where the attacker causes a page to load some malicious javascript. X-XSS-Protection is a feature in Chrome and Internet Explorer that is designed to protect against “reflected” XSS attacks — where an attacker is sending the malicious payload as part of the request1.

X-XSS-Protection: 0 turns it off.
X-XSS-Protection: 1 will filter out scripts that came from the request - but will still render the page
X-XSS-Protection: 1; mode=block when triggered, will block the whole page from being rendered.

Should I use it?

Yes. Set X-XSS-Protection: 1; mode=block. The “filter bad scripts” mechanism is problematic; see here for why.

How?

Platform What do I do?
Rails 4 and 5 On by default
Django SECURE_BROWSER_XSS_FILTER = True
Express.js Use helmet
Go Use unrolled/secure
Nginx add_header X-XSS-Protection "1; mode=block";
Apache Header always set X-XSS-Protection "1; mode=block"
Varnish set resp.http.X-XSS-Protection = "1; mode=block";

I want to know more

X-XSS-Protection - MDN


Content Security Policy

Content-Security-Policy: <policy>

Why?

Content Security Policy can be thought of as much more advanced version of the X-XSS-Protection header above. While X-XSS-Protection will block scripts that come from the request, it’s not going to stop an XSS attack that involves storing a malicious script on your server or loading an external resource with a malicious script in it.

CSP gives you a language to define where the browser can load resources from. You can white list origins for scripts, images, fonts, stylesheets, etc in a very granular manner. You can also compare any loaded content against a hash or signature.

Should I use it?

Yes. It won’t prevent all XSS attacks, but it’s a significant mitigation against their impact, and an important aspect of defense-in-depth. That said, it can be hard to implement. If you’re an intrepid reader and went ahead and checked the headers appcanary.com returns2, you’ll see that we don’t have CSP implemented yet. There are some rails development plugins we’re using that are holding us back from a CSP implementation that will have an actually security impact. We’re working on it, and will write about it in the next instalment!

How?

Writing a CSP policy can be challenging. See here for a list of all the directives you can employ. A good place to start is here.

Platform What do I do?
Rails 4 and 5 Use secureheaders
Django Use django-csp
Express.js Use helmet/csp
Go Use unrolled/secure
Nginx add_header Content-Security-Policy "<policy>";
Apache Header always set Content-Security-Policy "<policy>"
Varnish set resp.http.Content-Security-Policy = "<policy>";

I want to know more


HTTP Strict Transport Security (HSTS)

Strict-Transport-Security: max-age=<expire-time>
Strict-Transport-Security: max-age=<expire-time>; includeSubDomains
Strict-Transport-Security: max-age=<expire-time>; preload

Why?

When we want to securely communicate with someone, we face two problems. The first problem is privacy; we want to make sure the messages we send can only be read by the recipient, and no one else. The other problem is that of authentication: how do we know the recipient is who they say they are?

HTTPS solves the first problem with encryption, though it has some major issues with authentication (more on this later, see Public Key Pinning). The HSTS header solves the meta-problem: how do you know if the person you’re talking to actually supports encryption?

HSTS mitigates an attack called sslstrip. Suppose you’re using a hostile network, where a malicious attacker controls the wifi router. The attacker can disable encryption between you and the websites you’re browsing. Even if the site you’re accessing is only available over HTTPS, the attacker can man-in-the-middle the HTTP traffic and make it look like the site works over unencrypted HTTP. No need for SSL certs, just disable the encryption.

Enter the HSTS. The Strict-Transport-Security header solves this by letting your browser know that it must always use encryption with your site. As long as your browser has seen an HSTS header — and it hasn’t expired — it will not access the site unencrypted, and will error out if it’s not available over HTTPS.

Should I use it?

Yes. Your app is only available over HTTPS, right? Trying to browse over regular old HTTP will redirect to the secure site, right? (Hint: Use letsencrypt if you want to avoid the racket that are commercial certificate authorities.)

The one downside of the HSTS header is that it allows for a clever technique to create supercookies that can fingerprint your users. As a website operator, you probably already track your users somewhat - but try to only use HSTS for good and not for supercookies.

How?

The two options are

  • includeSubDomains - HSTS applies to subdomains
  • preload - Google maintains a service that hardcodes3 your site as being HTTPS only into browsers. This way, a user doesn’t even have to visit your site: their browser already knows it should reject unencrypted connections. Getting off that list is hard, by the way, so only turn it on if you know you can support HTTPS forever on all your subdomains.
Platform What do I do?
Rails 4 config.force_ssl = true
Does not include subdomains by default. To set it:
config.ssl_options = { hsts: { subdomains: true } }
Rails 5 config.force_ssl = true
Django SECURE_HSTS_SECONDS = 31536000
SECURE_HSTS_INCLUDE_SUBDOMAINS = True
Express.js Use helmet
Go Use unrolled/secure
Nginx add_header Strict-Transport-Security "max-age=31536000; includeSubdomains; ";
Apache Header always set Strict-Transport-Security "max-age=31536000; includeSubdomains;
Varnish set resp.http.Strict-Transport-Security = "max-age=31536000; includeSubdomains; ";

I want to know more


HTTP Public Key Pinning (HPKP)

Public-Key-Pins: pin-sha256=<base64==>; max-age=<expireTime>;
Public-Key-Pins: pin-sha256=<base64==>; max-age=<expireTime>; includeSubDomains
Public-Key-Pins: pin-sha256=<base64==>; max-age=<expireTime>; report-uri=<reportURI>

Why?

The HSTS header described above was designed to ensure that all connections to your website are encrypted. However, nowhere does it specify what key to use!

Trust on the web is based on the certificate authority (CA) model. Your browser and operating system ship with the public keys of some trusted certificate authorities which are usually specialized companies and/or nation states. When a CA issues you a certificate for a given domain that means anyone who trusts that CA will automatically trust the SSL traffic you encrypt using that certificate. The CAs are responsible for verifying that you actually own a domain (this can be anything from sending an email, to asking you to host a file, to investigating your company).

Two CAs can issue a certificate for the same domain to two different people, and browsers will trust both. This creates a problem, especially since CAs can be and are compromised. This allows attackers to MiTM any domain they want, even if that domain uses SSL & HSTS!

The HPKP header tries to mitigate this. This header lets you to “pin” a certificate. When a browser sees the header for the first time, it will save the certificate. For every request up to max-age, the browser will fail unless at least one certificate in the chain sent from the server has a fingerprint that was pinned.

This means that you can pin to the CA or a intermediate certificate along with the leaf in order to not shoot yourself in the foot (more on this later).

Much like HSTS above, the HPKP header also has some privacy implications. These were laid out in the RFC itself.

Should I use it?

Probably not.

HPKP is a very very sharp knife. Consider this: if you pin to the wrong certificate, or you lose your keys, or something else goes wrong, you’ve locked your users out of your site. All you can do is wait for the pin to expire.

This article lays out the case against it, and includes a fun way for attackers to use HPKP to hold their victims ransom.

One alternative is using the Public-Key-Pins-Report-Only header, which will just report that something went wrong, but not lock anyone out. This allows you to at least know your users are being MiTMed with fake certificates.

How?

The two options are

  • includeSubDomains - HPKP applies to subdomains
  • report-uri - Inavlid attempts will be reported here

You have to generate a base64 encoded fingerprint for the key you pin to, and you have to use a backup key. Check this guide for how to do it.

Platform What do I do?
Rails 4 and 5 Use secureheaders
Django Write custom middleware
Express.js Use helmet
Go Use unrolled/secure
Nginx add_header Public-Key-Pins 'pin-sha256="<primary>"; pin-sha256="<backup>"; max-age=5184000; includeSubDomains';
Apache Header always set Public-Key-Pins 'pin-sha256="<primary>"; pin-sha256="<backup>"; max-age=5184000; includeSubDomains';
Varnish set resp.http.Public-Key-Pins = "pin-sha256="<primary>"; pin-sha256="<backup>"; max-age=5184000; includeSubDomains";

I want to know more


X-Frame-Options

X-Frame-Options: DENY
X-Frame-Options: SAMEORIGIN
X-Frame-Options: ALLOW-FROM https://example.com/

Why?

Before we started giving dumb names to vulnerabilities, we used to give dumb names to hacking techniques. “Clickjacking” is one of those dumb names.

The idea goes like this: you create an invisible iframe, place it in focus and route user input into it. As an attacker, you can then trick people into playing a browser-based game while their clicks are being registered by a hidden iframe displaying twitter - forcing them to non-consensually retweet all of your tweets.

It sounds dumb, but it’s an effective attack.

Should I use it?

Yes. Your app is a beautiful snowflake. Do you really want some genius shoving it into an iframe so they can vandalize it?

How?

X-Frame-Options has three modes, which are pretty self explanatory.

  • DENY - No one can put this page in an iframe
  • SAMEORIGIN - The page can only be displayed in an iframe by someone on the same origin.
  • ALLOW-FROM - Specify a specific url that can put the page in an iframe

One thing to remember is that you can stack iframes as deep as you want, and in that case, the behavior of SAMEORIGIN and ALLOW-FROM isn’t specified. That is, if we have a triple-decker iframe sandwich and the innermost iframe has SAMEORIGIN, do we care about the origin of the iframe around it, or the topmost one on the page? ¯\_(ツ)_/¯.

Platform What do I do?
Rails 4 and 5 SAMEORIGIN is set by default.

To set DENY:
config.action_dispatch.default_headers['X-Frame-Options'] = "DENY"
Django MIDDLEWARE = [ ... 'django.middleware.clickjacking.XFrameOptionsMiddleware', ... ]
This defaults to SAMORIGIN.

To set DENY: X_FRAME_OPTIONS = 'DENY'
Express.js Use helmet
Go Use unrolled/secure
Nginx add_header X-Frame-Options "deny";
Apache Header always set X-Frame-Options "deny"
Varnish set resp.http.X-Frame-Options = "deny";

I want to know more


X-Content-Type-Options

X-Content-Type-Options: nosniff;

Why?

The problem this header solves is called “MIME sniffing”, which is actually a browser “feature”.

In theory, every time your server responds to a request it is supposed to set a Content-Type header in order to tell the browser if it’s getting some HTML, a cat gif, or a Flash cartoon from 2008. Unfortunately, the web has always been broken and has never really followed a spec for anything; back in the day lots of people didn’t bother to set the content type header properly.

As a result, browser vendors decided they should be really helpful and try to infer the content type by inspecting the content itself while completely ignore the content type header. If it looks like a gif, display a gif!, even though the content type is text/html. Likewise, if it looks like we got some HTML, we should render it as such even if the server said it’s a gif.

This is great, except when you’re running a photo-sharing site, and users can upload photos that look like HTML with javascript included, and suddenly you have a stored XSS attack on your hand.

The X-Content-Type-Options headers exist to tell the browser to shut up and set the damn content type to what I tell you, thank you.

Should I use it?

Yes, just make sure to set your content types correctly.

How?

Platform What do I do?
Rails 4 and 5 On by default
Django SECURE_CONTENT_TYPE_NOSNIFF = True
Express.js Use helmet
Go Use unrolled/secure
Nginx add_header X-Content-Type-Options nosniff;
Apache Header always set X-Content-Type-Options nosniff
Varnish set resp.http.X-Content-Type-Options = "nosniff";

I want to know more


Referrer-Policy

Referrer-Policy: "no-referrer" 
Referrer-Policy: "no-referrer-when-downgrade" 
Referrer-Policy: "origin" 
Referrer-Policy: "origin-when-cross-origin"
Referrer-Policy: "same-origin" 
Referrer-Policy: "strict-origin" 
Referrer-Policy: "strict-origin-when-cross-origin" 
Referrer-Policy: "unsafe-url"

Why?

Ah, the Referer header. Great for analytics, bad for your users’ privacy. At some point the web got woke and decided that maybe it wasn’t a good idea to send it all the time. And while we’re at it, let’s spell “Referrer” correctly4.

The Referrer-Policy header allows you to specify when the browser will set a Referer header.

Should I use it?

It’s up to you, but it’s probably a good idea. If you don’t care about your users’ privacy, think of it as a way to keep your sweet sweet analytics to yourself and out of your competitors’ grubby hands.

Set Referrer-Policy: "no-referrer"

How?

Platform What do I do?
Rails 4 and 5 Use secureheaders
Django Write custom middleware
Express.js Use helmet
Go Write custom middleware
Nginx add_header Referrer-Policy "no-referrer";
Apache Header always set Referrer-Policy "no-referrer"
Varnish set resp.http.Referrer-Policy = "no-referrer";

I want to know more


Set-Cookie: <key>=<value>; Expires=<expiryDate>; Secure; HttpOnly; SameSite=strict

This isn’t a security header per se, but there are three different options for cookies that you should be aware of.

  • Cookies marked as Secure will only be served over HTTPS. This prevents someone from reading the cookies in a MiTM attack where they can force the browser to visit a given page.

  • HttpOnly is a misnomer, and has nothing to do with HTTPS (unlike Secure above). Cookies marked as HttpOnly can not be accessed from within javascript. So if there is an XSS flaw, the attacker can’t immediately steal the cookies.

  • SameSite helps defend against Cross-Origin Request Forgery (CSRF) attacks. This is an attack where a different website the user may be visiting inadvertently tricks them into making a request against your site, i.e. by including an image to make a GET request, or using javascript to submit a form for a POST request. Generally, people defend against this using CSRF tokens. A cookie marked as SameSite won’t be sent to a different site.

It has two modes, lax and strict. Lax mode allows the cookie to be sent in a top-level context for GET requests (i.e. if you clicked a link). Strict doesn’t send any third-party cookies.

You should absolutely set Secure and HttpOnly. Unfortunately, as of writing, SameSite cookies are available only in Chrome and Opera, so you may want to ignore them for now.

Platform What do I do?
Rails 4 and 5 Secure and HttpOnly enabled by default. For SameSite, use secureheaders
Django Session cookies are HttpOnly by default. To set secure: SESSION_COOKIE_SECURE = True.

Not sure about SameSite.
Express.js cookie: { secure: true, httpOnly: true, sameSite: true }
Go http.Cookie{Name: "foo", Value: "bar", HttpOnly: true, Secure: true}

For SameSite, see this issue.
Nginx You probably won’t set session cookies in Nginx
Apache You probably won’t set session cookies in Apache

Thanks to @wolever for python advice.

Thanks to Guillaume Quintard for Varnish comands.


  1. This is opposed to “stored” XSS attacks, where the attacker is storing the malicious payload somehow, i.e. in a vulnerable comment field of a message board. 

  2. If you’re going to point out in the Hacker News comments that this blog itself gets an F from the Mozilla observatory, you’re right! On the other hand, it’s serving static content, and we are comfortable avoiding XSS protection and strict SSL enforcement for static content. That, and it’s served by github pages/cloudflare, so it’s hard to get very granular about the headers we want set. 

  3. So if you’re especially paranoid, you might be thinking “what if I had some secret subdomain that I don’t want leaking for some reason?” You have DNS zone transfers disabled, so someone would have to know what they’re looking for to find it, but now that it’s in the preload list… 

  4. The Referer header is the Hampster Dance in that it’s notorious for being misspelled. It would break the web to try to backport the correct spelling, so instead the W3C decided to go for the worst of both worlds and spell it correctly in Referrer-Policy

Good News: Ubuntu Now Ships With unattended-upgrades On By Default!

Last week, we got a strange support request. One of our users had received the following notification:

Hey! Good job.

We’ve detected that you patched some vulnerabilities.

Here’s what changed:

CVE-2016-8704

is no longer present in:

[name of server redacted]

This came as a surprise, since they knew for a fact that no one had touched the package in question, and they were certain they had not enabled unattended upgrades.

Somehow, the vulnerability magically got patched and they wanted to know: what’s going on?

The vuln is a pretty serious remote code execution vulnerability in memcached, and as far as we could tell our user was indeed using the most recent version available for their distribution — 1.4.25-2ubuntu2.1. This version was released on November 3rd, and we could see from our logs that memcached got upgraded that same day.

How did it happen without them knowing about it? The only thing unique about their configuration was that they’re running the recently released Ubuntu 16.10 (Yakkety Yak)1.

We dug around, and set up some test Yakkety boxes, and lo and behold: unattended upgrades is automatically enabled by default!

For those of you who are unaware, unattended-upgrades is a debian/ubuntu package that, well, does what it says on the tin: it automatically upgrades your packages. The most common configuration, and the one enabled in 16.10, is to upgrade any packages that have a published security patch. Unattended upgrades does this by checking and installing any updates from the ${distro_codename}-security repository.

Ubuntu/debian has had this for years, but it simply was never turned on by default. After a year of many security fails, this news warmed the cockles of my heart and gave me hope for our future! And what’s even amazing is that they turned it on without any fanfare.

It’s the quiet, simple changes that provide the biggest wins.

Of course, there are reasons why administrators don’t always want software to be upgraded without their input. And if it does get updated, there are good reasons for knowing exactly what vulnerabilities are being patched when. Appcanary exists in order to allow you to be notified about security updates without automatically installing them, and to have insight into what’s going being installed if you are patching automatically.

But if you don’t have the capacity to actively manage the packages on your linux systems (and even if you do!), we implore you: set up unattended-upgrades!

Ubuntu enabling this by default is a great sign for the future.

Not running Ubuntu 16.10?

Here’s how to turn on unattended upgrades

  • Ansible: jnv.unattended-upgrades
  • Puppet: puppet/unattended_upgrades
  • Chef: apt
  • If you’re using the server interactively:

    sudo apt-get install unattended-upgrades && sudo dpkg-reconfigure unattended-upgrades

  • Set up manually: sudo apt-get install unattended-upgrades and

    • In /etc/apt/apt.conf.d/20auto-upgrades:

      APT::Periodic::Update-Package-Lists "1";
      APT::Periodic::Unattended-Upgrade "1";
      
    • In /etc/apt/apt.conf.d/50unattended-upgrades

      // Automatically upgrade packages from these (origin, archive) pairs
      Unattended-Upgrade::Allowed-Origins {    
      // ${distro_id} and ${distro_codename} will be automatically expanded
          "${distro_id} ${distro_codename}-security";
      };
      
      // Send email to this address for problems or packages upgrades
      // If empty or unset then no email is sent, make sure that you 
      // have a working mail setup on your system. The package 'mailx'
      // must be installed or anything that provides /usr/bin/mail.
      //Unattended-Upgrade::Mail "root@localhost";
      
      // Do automatic removal of new unused dependencies after the upgrade
      // (equivalent to apt-get autoremove)
      //Unattended-Upgrade::Remove-Unused-Dependencies "false";
      
      // Automatically reboot *WITHOUT CONFIRMATION* if a 
      // the file /var/run/reboot-required is found after the upgrade 
      //Unattended-Upgrade::Automatic-Reboot "false";
      

  1. 16.10 is not a Long Term Support release. Regular Ubuntu releases are supported for 9 months, while April releases on even years (i.e. 14.04, 16.04, etc…) are designated LTS, and are supported for 5 years. It’s thus more common to see 12.04, 14.04, and 16.04 in use on servers over other Ubuntu releases. This particular user has a good reason for running 16.10. 

We Left Clojure. Here's 5 Things I'll Miss.

On October 11th, Appcanary relied on about 8,500 lines of clojure code. On the 12th we were down to zero. We replaced it by adding another 5,700 lines of Ruby to our codebase. Phill will be discussing why we left, and what we learned both here and at this year’s RubyConf. For now, I want to talk about what I’ll miss.

1) The joy of Lisp

XKCD #297

There’s something magical about writing lisp. Alan Kay called it the greatest single programming language ever devised. Paul Graham called it a secret weapon. You can find tens of thousands of words on the elegant, mind-expanding powers of lisp1. I don’t think my version of the Lisp wizardry blog post would be particularly original or unique, so if you want to know more about the agony and ecstasy of wielding parenthesis, read Paul Graham.

What’s great about Clojure is that while Ruby might be an acceptable lisp, and lisp might not be an acceptable lisp, Clojure is a more than acceptable lisp. If we avoid the minefield of type systems, Clojure addresses the other 4 problems Steve Yegge discusses in the previous link2.

2) Immutability

The core data structures in clojure are immutable. If I define car to be "a dirty van", nothing can ever change that. I can name some other thing car later, but anything referencing that first car will always be referencing "a dirty van".

This is great for a host of reasons. For one, you get parallelization for free — since nothing will mutate your collection, mapping or reducing some function over it can be hadooped out to as many clouds as you want without changing your algorithms.

It’s also much easier to can reason about your code. There’s a famous quote by Larry Wall:

[Perl] would prefer that you stayed out of its living room because you weren’t invited, not because it has a shotgun.

He was talking about private methods, but the same is true for mutability in most languages. You call some method and who knows if it mutated a value you were using? You would prefer it not to, but you have no shotgun, and frankly it’s so easy to mutate state without even knowing that you are. Consider Python:

str1 = "My name "
str2 = str1
str1 += "is Max"
print str1
# "My name is Max"
print str2
# "My name"

list1 = [1, 2, 3]
list2 = list1
list1 += [4, 5]
print list1
# [1, 2, 3, 4, 5]
print list2
# [1, 2, 3, 4, 5]

Calling += on a string returned a new one, while calling += on a list mutated it in place! I have to remember which types are mutable, and whether += will give me a new object or mutate the existing one depending on its type. Who knows what might happen when you start passing your variables by reference to somewhere else?

Not having the choice to mutate state is as liberating as getting rid of your Facebook account.

3) Data first programming

Walking away from object-oriented languages is very freeing.

I want to design a model for the game of poker. I start by listing the nouns3: “card”, “deck”, “hand”, “player”, “dealer”, etc. Then I think of the verbs, “deal”, “bet”, “fold”, etc.

Now what? Here’s a typical StackOverflow question demonstrating the confusion that comes with designing like this. Is the dealer a kind of player or a separate class? If players have hands of cards, how does the deck keep track of what cards are left?

At the end of the day, the work of programming a poker game is codifying all of the actual rules of the game, and these will end up in a Game singleton that does most of the work anyway.

If you start by thinking about data and the functions that operate on it, there’s a natural way to solve hard problems from the top-down, which lets you quickly iterate your design (see below). You have some data structure that represents the game state, a structure representing possible actions a player can take, and a function to transform a game state and an action into the next game state. That function encodes the actual rules of poker (defined in lots of other, smaller functions).

I find this style of programming very natural and satisfying. Of course, you can do this in any language; but I find Clojure draws me towards it, while OO languages push me away from it.

4) Unit Testing

The majority of your code is made up of pure functions. A pure function is one which always gives the same output for a given input — doesn’t that sound easy to test? Instead of setting up test harnesses databases and mocks, you just write tests for your functions.

Testing the edges of your code that talk to the outside world requires mocking, of course, and integration testing is never trivial. But the first thing you want to test is the super-complicated piece of business logic deep in your codebase. The business logic your business depends on, like for instance computing whether your version of OpenSSL is vulnerable to HeartBleed.

Clojure pushes you to make that bit of code a pure function that’s testable without setting up complicated state.

5) Refactoring

Here’s a typical clojure function

(defn foo [a b]
  ;; some code here
  (let [c (some-function a b)]
    ;; a ton of 
    ;; complicated code here
)))

In lisp-speak, a parenthesized block is called a “form”. The foo form is the outer form, and it contains the let form, which ostensibly contains other forms that do complicated things.

I know that all the complicated code inside of the let form isn’t going to mutate any state, and that it’s only dependent on the a and b variables. This means that refactoring this code out into its own functions is as trivial as selecting everything between two matching parentheses and cutting and pasting it out. If you have an editor that supports paredit-style navigation of lisp forms, you can rearrange code at lightning speed.


  1. My favourite essay of this ilk is Mark Tarver’s melancholy The Bipolar Lisp Programmer. He describes lisp as a language designed by and for brilliant failures. Back in university, I ate this shit up. My grades were obvious evidence of half the requirement of being a lisp programmer. 

  2. I’m aware that clojure’s gensym does not a hygenic macro system make. But, if you have strong opinions on hygenic macros as they relate to acceptable lisps, this article might not be for you. 

  3. For the record, I know that this isn’t the “right” way to design OO programs, but the fact that I have to acknowledge this proves my point. 

Slippery exceptions in Clojure and Ruby

Recently I spent a couple of hours banging my head against code that looks like this:

(defn parse-file
  [contents]
  (remove nil?
          (code-that throws-an-exception)))

(defn consume-manifest
  [contents kind]
  (try+
    (parse-file kind contents)

    (catch java.lang.Exception e
      (throw+ {:type ::bad-parse :message "Invalid file."}))))

(defn check
  [file kind]
  (try+
    (let [artifacts (consume-manifest (slurp file) kind]
      (if (not-empty artifacts)
         etc

And much to my surprise, I kept getting the kind of exception parse-file generates deep within the check function, right up against (not-empty artifacts).

I’ve grown somewhat used to Clojure exceptions being unhelpful, but this was taking the cake. Coming from Ruby and pretty much every other language, this brushed up rudely against my expectations.

You can tell that exceptions in Clojure are unloved, given how cumbersome handling them natively is. We’d had some trouble in the past getting slingshot to behave properly, so I zero'ed in on there. Don’t all exceptions in Java descend from Exception?

Stepping through check in the Cursive debugger, I could see that the exception generated was a pure java exception, not a slingshot exception generated by throw+ in consume-manifest. This meant that the exception was slipping straight through uncaught. But calling consume-manifest directly in my repl was causing it to work as intended.

What the hell was going on?

Max took one look at it and set me straight. “Oh. remove is lazy, so the exception isn’t being throw until the lazy sequence is accessed.”

Excuse me? I had an angry expression on my face. He looked sheepish.

“How else would a lazy data structure work?”

Well. I would expect a catch java.lang.Exception to catch every exception.

“Right, well, hear me out. What if you had the following Ruby?”

def lazy_parse(filename)
  File.open(filename).each_line.each_with_index.lazy.map do |line, i|
    raise "You can't catch me, I'm the exception man" if i == 5
    line
  end
end

def consume_file
  begin
    lazy_parse("Gemfile.lock")
  rescue
    puts "Woops, an exception. Good thing we caught it."
  end
end

file = consume_file
puts file.first(10)

(Did you know that Ruby has had lazy enumerables for almost four years now? Worth reading Shaughnessy as well.)

That shut me up good. And in case you were wondering, the stack trace is also useless in Ruby; there simply isn’t any context for it to preserve. Frankly, I’ve just never had to think about lazy data structures in Rubyland; they’ve not been super popular.

It’s hard to reason about this. I want to write wrapper functions that make my code safe to consume downstream. This isn’t feasible for any functions iterating over potentially infinite lazy sequences, but fortunately for us we need to fit this file into memory anyways. In Ruby we’d have to forcibly iterate over every element of the sequence and check for exceptions, but Clojure makes this easy with doall:

(defn parse-file
  [contents]
  (doall (remove nil?
                 (code-that throws-an-exception))))

And now, things behave as intended.