This isn’t your grandmother’s API permissions control layer…

I’m guessing your grandmother probably didn’t have an API permissions control layer, but if she did this wouldn’t be it.

This post is mostly about Nucleus, our name for the storage layer which drives the Total ReCal components. The only way to communicate with Nucleus is over our RESTful API. This comes as somewhat of a shock to some people who believe that the way to move data around is a batch script with direct database access, but I digress…

What I’m going to try to do here is summarise just how epically confusing our permissions handling system for Nucleus is, mostly for the benefit of Alex and myself who (over the next week or so) will be trying to implement this layer without breaking anything important. It’s really, really essential that we get this done before we start promoting the service because of a few simple reasons:

  • Data security is important, and we don’t want anybody being able to read everything without permission.
  • Data security is important, and we don’t want anybody being able to write all over the place without permission.
  • Changing this kind of thing on a live service is like trying to change the engine block on a Formula 1 car whilst it’s racing.
  • We need to be able to guarantee the system can hold up to DoS attacks or runaway processes hammering the APIs.
  • People are already asking for access to this data for important things, like their final year projects.

So, where to go from here? Let’s take a look at everything which will be going on in the finished version.

Server Rate Limiting

Even before the Nucleus code kicks in, the server is fine-tuned to avoid overloading from any IP address or hostname. Using a combination of the OS firewall and the web server configuration overall request rates and bandwidth usage is kept below thresholds to ensure that the server is never overloaded. Due to the RESTful nature of the API (in which each request must represent a complete transaction) we have no requirement to ensure server affinity, so if the load gets too heavy we can easily scale horizontally using pretty much any load balancer.

To keep the pipes clear for our ‘essential’ services we do maintain a whitelist of IPs which have higher (but still not uncapped) limits.

Key Based Access

The only way to access any data in Nucleus is with an access token, issued by our OAuth system. These come in two flavours, either a user token (which grants permission for a specific user), or an autonomous token (which is issued at an application level, and is ‘anonymous’). The very first thing that happens with any request is that the token it gives is validated. No token, no access. Invalid token, no access. Revoked token, no access. To keep things nice and fast we store the token lookup table in memory with a cache of a few minutes, since most requests occur in ‘bursts’.

Continue reading

Update

This has been a big month for Total ReCal. We’ve now perfected our event importers for Blackboard assignments and academic timetables, and we’ve started working on the main web application (screenshots too). We’ve also launched a beta registration page for interested staff and students to sign up for early access. Finally, our Talis Keystone service that the University has recently purchased will be in place very soon meaning we can also start importing book return dates for staff and students.

After numerous code re-writes we’ve got a rock solid API for adding, updating and deleting events in our Nucleus data store. Our import code has also had many updates to support logging of changes to events which will be invaluable to students to keep them up to date. Once the main Total ReCal application has been developed we’re going to sit down and work out how we’re going to best make use of these logs.

When a lecturer calls in sick the central timetabling department isn’t informed (unless it will affect lecturers for a long period of time). Therefore based on our current nightly timetable imports we won’t find out about any changes. We’re going to develop a tool for faculty administration staff to make changes to events as they’re going to be more aware of what the situation is day to day. This means that we can then inform students of changes that day as soon as someone changes it.

In terms of the front end, I’ve forked our common web design, called it ‘common web design x’, made it fluid to adapt to browser size, made it completely semantic HTML5 based, and taken the concept of progressive enhancement to new levels. It will also make use of our new OAuth 2.0 based single sign on service that I’ve written and it will automatically adapt to mobile layouts.

How (And Why) We’re Building An API

We’ve explained what Mongo and NoSQL is, and why we’re using it. Now it’s the turn of the actual data access and manipulation methods, something we’ve termed Nucleus.

Nucleus is part of a bigger plan which Alex and I have been looking at around using SOA ((Service Oriented Architecture)) principles for data storage at Lincoln, in short building a central repository for just about anything around events, locations, people and other such ‘core’ data. We’re attempting to force any viewing or manipulation of those data sets through central, defined, secured and controlled routes more commonly known as Application Programming Interfaces, or APIs.

In the past it would be common for there to be custom code sitting between services, responsible for moving data around. Often this code would talk directly to the underlying databases and provide little in the way of sanity checking, and following the ancient principle of “Garbage In, Garbage Out” it wouldn’t be unheard of for a service to fail and the data synchronisation script to duly fill an important database with error messages, stray code snippets and other such nonsense which wasn’t valid. The applications which then relied on this data would continue as though nothing was wrong, trying to read this data and then crashing in a huge ball of flames. Inevitably this led to administrators having to manually pick through a database to put everything back in its place.

Continue reading

What We’ve Been Up To

It’s all been a bit quiet on the Total ReCal front for the past week or so, but not because we’ve been quietly doing nothing. Instead we’ve been quietly working on the supporting systems which let Total ReCal do it’s thing without needing to handle every single aspect of time/space management, user authentication and who knows what else.

The first thing we’ve got mostly complete is our new authentication system, built around the OAuth 2.0 specification (version 10). For those of you unfamiliar with OAuth, it’s a way of providing systems with authorisation to perform an action without actually giving them a user’s credentials, much as modern luxury cars come with a ‘valet key‘ which might provide a valet with limited driving range, limited top speed and no ability to open the boot. In the case of the University we’ve come up with a service whereby a user (in this case a student or staff member) issues authorisation for a service to access or modify data stored within the University on their behalf.

Taking Total ReCal the example, the user would issue a key which allows Total ReCal to read their timetable, assessments data and library data (from which it can extract various events such as lectures, hand-in dates and book due dates).What it doesn’t give is permission to read personal details, to book rooms under that person’s authority, to renew library books or indeed anything else which requires a specific permission. In addition to this, Total ReCal never sees the user’s authentication information – it simply doesn’t need to because the key it’s been given by the user is authority enough to do what it needs.

We need OAuth for a variety of reasons. First of all, we were getting bored of having to write a whole new authentication system for every single application, and this makes our lives much easier. Secondly and more relevantly we want Total ReCal to be a demonstration of the Service Oriented Architecture way, showing that it’s possible to make use of small, focussed services which we bolt together as we need rather than monolithic applications which do everything, but don’t play nicely with other monolithic applications trying to do everything. Authentication is a key example of this since it’s something in common to almost every application. Thirdly, we want to be able to explore more ways of giving the user control and this is one of them. By relying on the OAuth authorisation route, users are given crystal clear information on what Total ReCal is, what it does, and how it intends to use their information. It’s then up to the user whether they want to use Total ReCal or not, and they can revoke the permission at any time. In future we hope to see lots more applications take this route, not necessarily just from within the University but also from outside.

Continue reading