How To Write Code
That Doesn't Suck

wry observations from the deep end of the software cesspool


Stripe CTF, minimalist solution edition

I'd heard of the previous rounds of Stripe's Capture The Flag coding competition but hadn't actually looked into them in detail, but a friend of mine was giving this round a go and so I decided to play along. I had no vision of winning, so to keep the effort time-boxed I went for the minimalist solution that I could envision to each problem. I only made it to level 3 then stalled out, mostly because I'm unfamiliar with Scala and thus implementing my envisioned solution would have required a significant amount of time. I also wasted a significant chunk of time on level 1 (probably more than equal to the time spent coding all the other levels) trying to pull off what I assumed was an invited hack before trying a straightforward solution. Overall I found it a lot of fun and would encourage anyone interested in software to give it a go next time.

level 0

In this level we're given a short ruby script that "highlights" (by adding angle brackets) all of the words in an input text which aren't found in a dictionary. The script works but is slow, as it simply reads the dictionary into an array and does a naive search of that array for each word in the input. My solution was trivial, simply read the dictionary into a hash, and check for a key. Could have just been two changed lines if I bothered to try to remember a terser array to hash technique.

 path = ARGV.length > 0 ? ARGV[0] : '/usr/share/dict/words'
-entries ="\n")
+entries = {}
+File.readlines(path).each do |entry|
+  entries[entry.chomp] = true
 contents = $
 output = contents.gsub(/[^ \n]+/) do |word|
-  if entries.include?(word.downcase)
+  if entries.has_key? word.downcase
 print output

Result scored 117/50 required points.

level 1

On this level I wasted a bunch of time trying a hack rather than coding a solution. The task is to submit a git commit with a hash "lexicographically less than the value contained in the repository's difficulty.txt file". The fact that difficulty.txt was a local file rather than some constant or stored on a server seemed to me to be asking for a hack like submitting a difficulty.txt file with a value like "ffffff" in it. Could not make that work, but would be interested to hear from anyone who did. After giving up on the hack it took about 5 minutes on Google and Stack Overflow to find this utility:

beautify_git_hash: Beautify the Git commit hash! This is a little useless toy inspired by BitCoin's "proof of work" concept. It enables you to modify your Git commit to enforce a certain prefix on the Git commit hash.

I cloned this into my project and tested it by hand to see that it worked. Since this beautifies un-pushed commits via an amend, I changed prepare_index() in miner to do a commit instead of an add, then simply replaced default implementations solve routine with a call to beautify_git_hash. High level loop didn't change at all, so here's the guts of my code:

prepare_index() {
     perl -i -pe 's/($ENV{public_username}: )(\d+)/$1 . ($2+1)/e' LEDGER.txt
     grep -q "$public_username" LEDGER.txt || echo "$public_username: 1" >> LEDG
     git commit LEDGER.txt -m 'Give me a Gitcoin'
 solve() {
     fix=`../ 000000`
     bash -c "$fix"

I fired up the miner and in a couple minutes I'd mined a gitcoin, which gave me a result of 50/50. Didn't have to improve performance of the beautify_git_hash script at all so credit for this win goes to the original author, Volker Grabcsch.

level 2

Hey, finally a node level, how exciting! Well, not really, this turned out to be just a rehash of the level 0. In this level we have to create a proxy that can blacklist attackers based on ip. A full implementation is provided with one key function missing: currently_blacklisted(ip). The obvious implementation proved good enough, with a couple quick tests to figure out the right constant to use to detect a "bad" number of requests.

var blacklist = {};
function currently_blacklisted(ip)
  if (!(ip in blacklist)) blacklist[ip] = 0;
  blacklist[ip] += 1;
  return blacklist[ip] > 4;

This got me a result of 95/85. I actually spent a little more time on this later and was able to get higher scores during local testing, but never got a positive score on submitting them, so I suspect something was borked in the test environment.


a simple canonical port numbering scheme for web services

My work in the past few years has involved producing quite a few web services. Some are public facing but many are middle-tier type things accessed by other services or REST back-ends for client apps. Almost all have leveraged some sort of framework such as rails or django. Each such service must be bound to a port to be accessed, and since binding to the standard http port (80) requires root privileges most frameworks have some higher default port they run on during development. For example rails defaults to 3000, django to 8000, dropwizard uses 8080, etc. This works well enough when throwing together a single service, but what if you need a bunch of these running at once? Each framework lets you configure alternate ports, but what values should you actually use?

The answer to that question is really a function of how the services will be used and maintained. If you're part of a team developing multiple services that need to work together you will have to agree on some sort of standard, or pick arbitrary numbers and simply record them in a central place. In practice that "standard" may amount to little more than "we've been using rails, so our first service is 3000, our next is 3001, etc.". For a variety of reasons, including preserving your sanity, I suggest a somewhat more formal approach--a canonical port numbering scheme based on the name of your service.

A service's name might not be particularly well defined in all cases, but usually there is some simple short name that a team refers to a service by, and that's the one you should start with. The name of the project, app, or source code repository are all reasonable candidates. There are a variety of ways to turn that name into a number (e.g. a hash like CRC-16), but I've come to prefer simply interpreting the first few characters of the service name as a base-32 encoded number (base-32 is like hex but goes from 0 to v instead of just 0 to f). This approach typically ensure unique ports as long as you pick service names that don't start with the same three letters. It also allows you to guess the service from the port. Here's an example--assume you have a service named api. Interpreted as a base-32 number that's 11058. With a couple tweaks to account for the valid port range, restricted ports, and characters that aren't valid base-32 digits you have a complete solution. Here's an example algorithm in javascript:

Here are a few examples of service names and the ports they yield, and what you get from converting the port number back to a string

api         => 12082 => api...
db          => 14688 => db0... right padded with 0
foo-bar     => 17176 => foo...
aardvark    => 11611 => aar...
alligator   => 11957 => all... note that lexical sort is preserved
bee         => 12750 => bee...
cat         => 13661 => cat...
caterpillar => 13661 => cat... collides with cat
dog         => 15120 => dog...
elephant    => 16046 => ele...
vole        => 33557 => vol...
zebra       => 33227 => veb... z isn't a valid digit

Turns out this scheme easily avoids most of the common "well-known" ports, as they mostly decode to strings you'd never think to start a service name with. Some examples:

                6379 => 67b... redis
               27017 => qc9... mongodb
                3000 => 2to... rails default
                8000 => 7q0... django default
                8080 => 7sg... dropwizard et al
                2195 => 24j... apple push notifications

While just having a sane and consistent way of picking port numbers may be it's own reward, this approach is actually an important component of automated service deployment tools I've been working with for the past year. I hope to explore these further in an upcoming post, but for example consider that well chosen service names be part of a domain name. You can leverage this into a "config-less" approach to inter-service communication, e.g. if the api service running on it could assume the database service it should talk to lives at, and will be found at port 14688. Initially the two services could be deployed on one box pointed two by both domain names. If later the load increases and the db service is moved to another box there's no need to update the config for the api service.


Self-assigning sane domain names to EC2 Instances

Here's a quick background on the goal of this effort.

  • AWS is organized into several regions, each totally separate from the others
  • we want to be able to deploy software in any region
  • in each region we will create EC2 instance to create other instances and deploy software from
  • by default instances have an unfriendly public domain name e.g.
  • we want that instance to have an easy to remember domain name, e.g.
  • we want that domain name to be assigned programatically from the instance itself

Understanding this journey will probably be clearest if we start with a peek at the end. Domain names can be assigned via Route53, which has an API. We're writing all our stuff for node.js so here's the Javascript code for the final request we want to make to Route53.

var AWS = require("aws-sdk");

  region: "us-west-2",
  credentials: {
    secretAccessKey: "XxXxxXxxx1xXxXXXxxXXX/XXXxx1XxxXx1XxXxXX",
    sessionToken: "XXxXX ...400 more characters...  XxxX='

  "HostedZoneId": "/hostedzone/Z15D5CDO6LK4GG",
  "ChangeBatch": {
    "Changes": [
        "Action": "CREATE",
        "ResourceRecordSet": {
          "Name": "",
          "Type": "CNAME",
          "TTL": 300,
          "ResourceRecords": [
              "Value": ""

Most of this is boiler plate, except for the region and credentials config values, and the HostedZoneId, Name (our desired domain name), and Value (the automatically assigned domain name) as parameters to Route53. To make this work, all we have to do is find a way for the instance to know each of these things.

Item 1: the new domain name

This is going to be composed of 3 pieces: the fixed string "deploy", the name of the region the instance is running in (will be "us-west-2" in these examples), and our organization's domain, "". There are actually a lot of ways for EC2 instances to know their region (some hackier than others) so I'll focus on the other two fields for now. My first thought was "hey, can't you assign tags to EC2 instances when you launch them? I'll just add a tag that says what domain to use." This was easy to try out, and my tags show up nicely in the EC2 console, but how is code running on the instance going to access the tags?

Digression 1: EC2 instance meta-data

My naive hope that tags were somehow magically present on the instance as environment variables or in a config file was quickly dashed. However I knew that instances had access to some meta-data about themselves via making HTTP requests to a magic IP address ( "Oh cool," I thought, "tags will be in there somewhere."

Digression 1.1: meta-data paths

The data available through this API is organized hierarchically, that is if you fetch a URL like /latest/dynamic/ it returns a list of "sub-folders" like instance-identity/, and you can then ask for /latest/dynamic/instance-identity/ and so on. The interesting bits are actually buried pretty deep, for example region is inside a JSON document found at /latest/dynamic/instance-identity/document (you can find the availability-zone in there too, but that's also available as a plain string at /latest/meta-data/placement/availability-zone). There are command line tools available that allow access to more commonly used values, and a variety of existing modules for node as well. None of them seemed to include tags, and I wasn't sure where'd those might be, so I thought "I'll write a module that starts at the root and spiders it's way down!". Exactly how much code-that-sucks was involved in that effort will require another blog post, but the result was a new module: ec2-instance-data. Read the readme for details, but the basic idea is you get an object that will have all of the instance's meta-data as nested objects, so you could then get the region via something like

var region = instance.latest.dynamic.instanceIdentity.document.region;

Ok, where were we? Oh, right, trying to get the value of an instances tags from the meta-data. You can probably guess the next part: they're not in there. Well how do you get them? Googling led me to this Stack Overflow question and from the answers I discovered there's an API for that: DescribeTags. Amazon recently released an "official" node module aws-sdk, which includes this API, so I thought, hey cool I can use that. Except, as pointed out in some comments on Stack Overflow, you need credentials to call that API.

Digression 2: IAM Roles for EC2 Instances

Hey, didn't I read something a while ago about automagically giving EC2 instances permissions to call AWS APIs? Why yes I did, it's called IAM Roles for EC2 Instances. In short an IAM Role is a named group of permissions (a "policy" in AWS-speak), and you can specify one when you create an instance. So I did the obvious, created a role with permissions to call DescribeTags (and the Route53 APIs), named it "my-role" and assigned it to my instance. (Well I would have, except you can only assign roles when you launch an instance, so I actually terminated my instance and started over with a new one.) Ok, goody, now I can automagically call the DescribeTags API right? Um no, at least not with the AWS SDK for Node. The AWS SDK for Ruby gets some IAM love however:

If the client constructor does not find credentials in AWS.config, or in the environment, it retrieves temporary credentials that have the same permissions as those associated with the IAM role. The credentials are retrieved from the Instance Meta Data Service (IMDS).
Who has two thumbs and just wrote a client for the "IMDS" (and no it's not called that anywhere else)? This guy. Which led to the following code:
var instance = require("ec2-instance-data");
var AWS = require("aws-sdk");

instance.init(function (error, metadata) {
    var credentials = new AWS.Credentials(metadata.iamSecurityCredentials());
    AWS.config.update({ credentials: credentials, region: metadata.region() });
    var ec2 = new AWS.EC2.Client();
    ec2.describeTags({}, console.log);

Which produces a whole ton of output because it gets all the tags for every instance you have in the region, not just the current instance's tags. To get just that you have to add a filter to describeTags() call. Which leads me to a classic moment of "how-many-names-for-the-same-thing-can-you-work-into-a-single-function-call":

    Filters: [ {
      Name: "resource-id", // name 1
      Values: [
        data["latest"]["meta-data"]["instance-id"]] // name 2
    }, console.log);

{ Tags: [
     { ResourceId: 'i-528ed860', // name 3!!!
       ResourceType: 'instance',
       Key: 'Name',
       Value: 'deploy-test' },
     { ResourceId: 'i-528ed860',
       ResourceType: 'instance',
       Key: 'HostedZone',
       Value: '' },
     { ResourceId: 'i-528ed860',
       ResourceType: 'instance',
       Key: 'purpose',
       Value: 'deploy' } ] }

Which actually has the tags!, but a quick transform makes it saner:

  ec2.describeTags(..., function (err, response) {
    var tags = {};
    response.Tags.forEach(function (tag) { tags[tag.Key] = tag.Value; });
    console.log(JSON.stringify(tags, null, "  "));


  "Name": "deploy-test",
  "HostedZone": "",
  "purpose": "deploy"

Ok, making progress. Back to our domain name, we can now get the "deploy" part from the "purpose" tag, the domain from the "HostedZone" tag, and we've already pulled the region from the meta-data. Putting them all together yields

var ResourceRecordSetName = [tags.purpose, metadata.region()].concat(HostedZone.split('.')).join('.');

Item 2: AWS Credentials

Actually already had to do this in order to call describeTags, but glossed over it a bit above. The magic is in metadata.iamSecurityCredentials(), which is defined in ec2-instance-data. The credentials can be found at /latest/meta-data/iam/security-credentials/my-role/. Except it turns out be not as simple as one would hope because the meta-data service actually sticks the name of the role into the path but the instance doesn't know it's role. This means we have to get it in two steps, first get all the roles, then using the name of the first one (currently there is only one), then get the credentials for that role. Here's that method, which also clocks in with 3 superfluous renamings.

// NOTE: maps meta-data names to the frustratingly similar ones expected by aws-sdk
self.iamSecurityCredentials = function () {
    var role = Object.keys(deep_get(self, "/latest/meta-data/iam/security-credentials"))[0];
    if (!role) return undefined;
    var securityCredentials = deep_get(self, "/latest/meta-data/iam/security-credentials")[role];
    return {
 accessKeyId: securityCredentials.AccessKeyId,
        secretAccessKey: securityCredentials.SecretAccessKey,
        sessionToken: securityCredentials.Token

Item 3: HostedZoneId

While we now have the HostedZone (== domain) name, what we need for our call to Route53 is the id. Fortunately Route53 let's us look that up with AWS.Route53.client.listHostedZones(...), but for some inscrutable reason it doesn't accept a filter, so we're left looping over the result.

AWS.Route53.listHostedZones({}, function (err, zones) {
  var zone = null;
  // find the zone matching our HostedZone tag
  for (i = 0; i < zones.HostedZones.length; i++) {
    if (zones.HostedZones[i].Name === tags.HostedZone) {
      zone = zones.HostedZones[i];
{ Id: '/hostedzone/Z1XXXXXXXXXXGG',
  Name: '',
  CallerReference: '0A35A3DD-F107-E33C-89C9-D2F3D9967815',
  Config: { Comment: ' domain' },
  ResourceRecordSetCount: 11 }

Woot, there's our HostedZoneId.

Items 4 and 5: current domain name and region

The current domain name is readily available from the meta-data (/latest/meta-data/public-hostname), and we've already pulled the region from meta-data, so we're now good to go. Our final call looks like this

// note: credentials were already set for the call describe tags

  "HostedZoneId": zone.Id
  "ChangeBatch": {
    "Changes": [
        "Action": "CREATE",
        "ResourceRecordSet": {
          "Name": ResourceRecordSetName, // the new domain name we defined above
          "Type": "CNAME",
          "TTL": 300,
          "ResourceRecords": [
              "Value": metadata.latest["meta-data"]["public-hostname"];

Shortly after that command we can then log into our deploy server at!


automatic routing for RESTful webpy controllers

I've recently been working in Python again (after a decade long hiatus), including some rework of middle-tier REST(-ish) services built on webpy. This is a lean-and-clean web application framework that lends itself well to writing simple web services without all the cruft a "full-service" framework like Django brings with it. There was one particular annoyance for me in the service I was working on, which was that URL-to-controller routing was maintained separately from the controller classes themselves. Here's a stripped down example:

import web

class BaseController:
   # some cool stuff useful to all the other controllers

class RecordSearchController(BaseController):

   def GET(self, customer_id, record_code):
       """lookup records matching given code for given customer"""
       # lookup in database
       # return JSON

class AnnotateRecordController(BaseController):

   def POST(self, customer_id, record_code):
       """attach note (given in post body) to given record into database"""
       # parse form
       # write to database
       # return OK

... a couple hundred more lines of that sort of stuff ...

urls = (
    '/search/([0-9]+)/([A-Z0-9]+)/?', RecordSearchController,
    '/record/annotate/([0-9]+)/([A-Z0-9]+)/?', AnnotateRecordController,

web.application(urls, ...)

This structure meant if you added a new controller class you'd have to go add its URL somewhere else, and also made it (too) easy to have inconsistent URLs or ones that had no obvious relationship to the controller (which is confusing for someone like me to come in and do maintenance). Seemed to me there had to be a better way, and in fact has a built in approach called auto_application() . Unfortunately it's got a couple (for me annoying) limitations, for one it uses the exact class name as a default path (for these controllers yielding paths like /RecordSearchController and changing the names of our controllers was a non-starter). It also doesn't seem to support RESTful parameters (the bits in parens in our urls above). There is also a slightly more flexible recipe for using metaclasses to do automatic URL registration. This also didn't quite meet my needs (e.g. it doesn't even try to automatically map controller names to urls) but it did set me in the generally correct direction. Here's an example of how things would look using this recipe:

urls = []

# metaclass definition cloned from recipe
class ActionMetaClass(type):
    def __init__(klass, name, bases, attrs):
        urls.append("%s.%s" % (klass.__module__, name))

class RecordSearchController(BaseController):
   __metaclass__ = ActionMetaClass
   url = "/search/([0-9]+)/([A-Z0-9]+)/?"

   def GET(self, customer_id, record_code):
       # ...

class AnnotateRecordController(BaseController):
   __metaclass__ = ActionMetaClass
   url = '/record/annotate/([0-9]+)/([A-Z0-9]+)/?

   def POST(self, customer_id, record_code):
       # ... 


web.application(urls, ...)

This gets the URL into the controller's code so solves some of my problem, but definitely still didn't feel as clean (or DRY) as I'd like. For one thing I'd now have to add the __metaclass__ = ActionMetaClass boilerplate to every controller (I couldn't add it to the BaseController for complicated reasons). It also bugged me that this required more than one line--I really wanted to be able to write something like:

class RecordSearchController(BaseController):
   __metaclass__ = ActionMetaClass("/search/([0-9]+)/([A-Z0-9]+)/?")

My first thought was to just add a URL param to ActionMetaClass.__init__ Unfortunately I quickly discovered you can't (directly) pass parameters to the value of __metaclass__ in this way, since its calling signature is assumed to match the default type(name, bases, dict)). I also ran into some pain around the fact that BaseController already was using a __metaclass__ to do some cool tricks like adding some standard envelope fields to every response. At this point I set out some formal goals for my implementation:

  • not require changing any of the existing routes or class names
  • automatically generate canonical paths from class names which matched as many of the existing paths as possible
  • require at most one additional line of code per controller class, even those that don't use a canonical path
  • automatically add REST parameters to the URLs
  • play nicely with other metaclasses in the inheritance chain

I wound up meeting most of the goals, with a couple pragmatic tweaks. Before I break down exactly how the solution I came up with works here's the actual code (stripped of comments/doc strings for brevity, I'll try to get the full code up as a gist at some point soon).

import re
import inspect
from functools import partial

url_params = {
    'customer_id': '([0-9]+)',
    'record_code': '([A-Z0-9]+)',

urls = []

def autoroute(path = None):

    def default_path(name):
      components =  re.findall('([A-Z0-9]{0,1}[a-z]+|(?:[A-Z0-9](?![a-z]))+)', name)
      if "Controller" in components: components.remove("Controller")
      return '/' + '/'.join(map(lambda s:s.lower(), components))

    def _new(path, name, bases, dict):
        global urls
        if not "url" in dict:
          if not path: path = default_path(name)
          args = inspect.getargspec('GET' in dict and dict['GET'] or dict['POST']).args
          url = '/'.join([path] + [url_params[arg] for arg in args] + ['?'])
          dict["url"] = url
        if bases and hasattr(bases[0], '__metaclass__'):
          metaclass = getattr(bases[0], '__metaclass__')
          metaclass = type
        controller = metaclass.__new__(metaclass, name, bases, dict)
        urls += [controller.url, controller]
        return controller

    return partial(_new, path)

This allowed the controller classes to look pretty much just the way I wanted them to:

class RecordSearchController(BaseController):
   __metaclass__ = autoroute("/search")

   def GET(self, customer_id, record_code):
       # ...

class AnnotateRecordController(BaseController):
   __metaclass__ = autoroute()

   def POST(self, customer_id, record_code):
       # ... 

There are three key elements to this solution: class name to URL mapping, parameter detection/injection, and metaprogramming magic.

class name to URL mapping

For the large majority of the existing URL-controller pairs the start of URL's path could be derived from the controller's class name as follows:

  1. split the camel case class name into words using a regex (e.g. AnnotateRecordController["Annotate", "Record", "Controller"]
  2. remove "Controller" from the list (→ ["Annotate", "Record"])
  3. reverse the order (classes mostly followed a VerbNoun (or AdjectiveNoun) pattern, while the URLs were mostly /noun/verb (→ ["Record", "Annotate"]
  4. lowercase and join with slashes (→ "/record/annotate")

This algorithm is implemented in the default_path(). function.

parameter detection/injection

Although there were dozens of controllers, there actually were less than a handful of distinct REST parameters that appeared in the URLs, and fortunately they had very consistent names across the controllers GET. and POST methods. Unfortunately, prior to this refactoring, the patterns were repeated many times (typically two parameters per URL) and multiple parameters shared patterns, so e.g. changing the record_code from numeric ([0-9]+) to alphanumeric required dozens of replace operations each of which had to be vetted by hand by cross-referencing with the controller's source. These I pulled out into a single short dictionary of parameter names, url_params. I then use a little bit of reflection magic to figure out which params go in which URLs by looking at the argspec of the controllers GET method (or POST if there's not one). I can then use the name the args to look up the correct patterns to tack onto the path generated from the class name. This logic can be found in the center of _new. I also broke with my strict one-liner rule for a special url level override, this is done via setting a url attribute in the controller rather than passing it to autoroute to simplify the signature, and it turns out that having the URL around in the class is useful for testing/debugging as well.

metaprogramming magic

So remember the pain I had with not being able to pass a custom path to a metaclass's __init__? For a moment I thought of generating classes dynamically to get around this, but by RTFM I discovered that the value assigned to __metaclass__ doesn't have to be a class, it can be any callable, including a function--importantly it can even be a partially evaluated function! This insight is embodied in _new(), whose signature is like type.__new__'s but with one extra parameter (the non-standard path, if any). That parameter is then partially evaluated in autoroute's return statement, the result being a functools.partial that has the correct signature.

Note that in the end there's nothing super webpy specific about this solution other than the assumption that controllers are classes and they have methods named GET/POST/etc. Most of it should be adaptable to any another framework.


How to dump SQL for all views in a mysql database

I often generate a fair number of views while developing reporting SQL, and have off an on looked for a way to easily save their source code to a file. mysqldump doesn't directly support extracting only views, but with a little command line trickery and a query against INFORMATION_SCHEMA you can make it do the right thing:

mysql -u username INFORMATION_SCHEMA
      --skip-column-names --batch
      -e "select table_name from tables where table_type = 'VIEW'
          and table_schema = 'database'"
      | xargs mysqldump -u username database
      > views.sql

The skip-column-names and batch options produce output with just one view name per line, which is what xargs needs as input. Be sure to replace both occurences of username and database with appropriate values, and add -h for remote hosts and -p if the user requires a password. Here's a one-line example for user root with no password on localhost, with a database named "foo":

mysql -u root INFORMATION_SCHEMA --skip-column-names --batch -e "select table_name from tables where table_type = 'VIEW' and table_schema = 'foo'" | xargs mysqldump -u root foo > views.sql