This isn’t your grandmother’s API permissions control layer…

I’m guessing your grandmother probably didn’t have an API permissions control layer, but if she did this wouldn’t be it.

This post is mostly about Nucleus, our name for the storage layer which drives the Total ReCal components. The only way to communicate with Nucleus is over our RESTful API. This comes as somewhat of a shock to some people who believe that the way to move data around is a batch script with direct database access, but I digress…

What I’m going to try to do here is summarise just how epically confusing our permissions handling system for Nucleus is, mostly for the benefit of Alex and myself who (over the next week or so) will be trying to implement this layer without breaking anything important. It’s really, really essential that we get this done before we start promoting the service because of a few simple reasons:

  • Data security is important, and we don’t want anybody being able to read everything without permission.
  • Data security is important, and we don’t want anybody being able to write all over the place without permission.
  • Changing this kind of thing on a live service is like trying to change the engine block on a Formula 1 car whilst it’s racing.
  • We need to be able to guarantee the system can hold up to DoS attacks or runaway processes hammering the APIs.
  • People are already asking for access to this data for important things, like their final year projects.

So, where to go from here? Let’s take a look at everything which will be going on in the finished version.

Server Rate Limiting

Even before the Nucleus code kicks in, the server is fine-tuned to avoid overloading from any IP address or hostname. Using a combination of the OS firewall and the web server configuration overall request rates and bandwidth usage is kept below thresholds to ensure that the server is never overloaded. Due to the RESTful nature of the API (in which each request must represent a complete transaction) we have no requirement to ensure server affinity, so if the load gets too heavy we can easily scale horizontally using pretty much any load balancer.

To keep the pipes clear for our ‘essential’ services we do maintain a whitelist of IPs which have higher (but still not uncapped) limits.

Key Based Access

The only way to access any data in Nucleus is with an access token, issued by our OAuth system. These come in two flavours, either a user token (which grants permission for a specific user), or an autonomous token (which is issued at an application level, and is ‘anonymous’). The very first thing that happens with any request is that the token it gives is validated. No token, no access. Invalid token, no access. Revoked token, no access. To keep things nice and fast we store the token lookup table in memory with a cache of a few minutes, since most requests occur in ‘bursts’.

Key Rate Limiting

Key Rate Limiting Rate limiting is an important part of keeping things going. Unlike the server-based rate limiting which affects IP addresses and hostnames, key based limitation prevents one application or user from going overboard with their requests. Each key is limited to a certain volume of requests per hour, the exact volume varying depending on the type of key. Individual user keys have relatively few, through to whitelisted internal applications which have extremely high caps.

Like Twitter we’re also implementing smart rate limiting, which will globally reduce API limits in times of unusually high demand such as freshers’ week. We’d rather that everybody can access some data than nobody can access any. To give people an idea what’s going on all responses will include information on how many requests per hour they’re allowed and how many they’ve used, letting applications ‘budget’ accordingly.

Scoping

Scoping is the next level of permissions, now that we’ve made sure the system making the request isn’t abusing the system and has a valid token. Scoping is quite hard to wrap your brain around, but once you’ve got it then it makes a lot of sense. In short, access tokens are given ‘scopes’ which define the type of data they’re allowed to ask for, but not necessarily what within that scope they are allowed to subsequently access. This means, for example, that a user can grant an application permission to look at their events but not other data such as their contact details. When an application is registered with us we decide which scopes are valid for its autonomous token, and when a user grants permission using OAuth they are clearly told which scopes the application is asking for. After all, you don’t want an automatic book renewing application being able to take a look at your home contact address.

Every request to Nucleus exists within a scope, and we make sure that the access token which is given actually has permission to access that scope of data. In the case of events we have a variety of scopes including public (just the name, start/end times and location of events in publicly timetabled spaces), basic (the same details, but including personal events as well) and full (The whole set of data, including things like attendee lists for lectures). There are also some scopes governing if a key is valid to write events, making sure that an application designed to give you a nice calendar view can’t instead fill your schedule with rubbish.

Object Permissions

Now we’re getting into the meat and potatoes of permissions, designed to simultaneously give the ability to set sweeping permissions at a high level and really, really accurate permissions at the object level. Permissions are similar to scopes in that there is a predefined list, but they differ in that they tell Nucleus how any given key can interact with any specific object. They are used to make sure that one user can’t go fiddling with another’s data, or to stop a runaway application from accidentally overwriting data it doesn’t own.

As an example, an application designed to show your upcoming events would have passed rate limiting, had its key validated, and been checked to have the ‘basic’ scope to read events. Object permissions say which of those events it can actually see, and this is where it gets really complicated.

Global Type Permissions

Some permissions are granted for every single object of a given type (such as every event), and these are known as global type permissions. An example is the ‘inherent’ permissions which are available to any valid key within the scope, such as the ability to read public data.

We can grant global type permissions either to all keys (a ‘universal’ permission), or to specific applications, groups or users. Whilst this is unlikely to occur in practice (and would be bad practice even if we did), it does allow us to build ‘god’ applications which are capable of performing actions across the board.

Universal Object Permissions

Alongside global type permissions we can approach the problem of mass permissions from the object end. Universal Object Permissions are stored along with the object, and are permissions which are available to any key. This type of permission finds itself suited to things such as public events, where we want to give anybody permission to see the details regardless of if they’ve been manually added.

Specific Object Permissions

The most restrictive type of permission is a specific object permission. These are a one-to-one relationship between an application, a group or a user which grants permissions for one object. An example would be a private event, which you wouldn’t want anybody else to be able to see (although if it’s occurring in a publicly timetabled space everybody will be able to see that an event is there, just not what it is). In this case the object would just have one specific permission granting you the ability to read and modify it.

Specific permissions also come in handy with shared events. The system would allow for you to grant specific users the ability to see an event, for example just inviting your study group to share a room booking.

Multiple Permissions

All permissions are standalone, there is no inherent hierarchy. They also stack neatly regardless of which route the permission is granted by, so it’s entirely possible that you may have a global read public data permission, a universal read details permission granted to your group, and a final specific permission allowing you to modify an event.

Wow…

That’s a high-level overview of what’s going on to make sure Nucleus is solid enough to stand up to a hammering whilst making sure that people can’t be naughty and at the same time being flexible enough to fine-tune permissions.

The API will prevent people from ‘orphaning’ events without permissions, as well as provide a route for reading and setting permissions (with permissions controlling who can set permissions…) and making sure that sensible permissions are set by default so that you can’t accidentally create an event you can’t then control.

Epic Permissions. Coming in the next couple of weeks.