Introducing Total ReCal – the Web Application

Total ReCal interface

Building on our experience from developing the Common Web Design (CWD), we present the CWDx 1.0, designed specifically for awesome web based applications.

CWDx is 100% HTML5 and CSS3, includes extensive WIA-ARIA mark-up for accessibility, is Ajax driven and is wonderfully minimalistic.

The application is build using the CodeIgniter 2.0 PHP framework, and uses MongoDB as the database. The front end uses jQuery, jQuery UI and a number of functions from PHP.js.

The calendar itself is an amazing jQuery plugin called FullCalendar which emulates a lot of Google Calendar’s functionality.

Currently students can view their academic timetable and if they have any, their Blackboard assignment deadlines. Both staff and students also have their own calendar which they can write to. Everyone can subscribe to their calendar on a mobile device or in another application such as Google Calendar.

In the next few weeks users will have access to their library book return dates, will be able to create new calendars and share events, and we’re currently developing a CalDav system that will allow true event syncing across multiple devices and applications.

This isn’t your grandmother’s API permissions control layer…

I’m guessing your grandmother probably didn’t have an API permissions control layer, but if she did this wouldn’t be it.

This post is mostly about Nucleus, our name for the storage layer which drives the Total ReCal components. The only way to communicate with Nucleus is over our RESTful API. This comes as somewhat of a shock to some people who believe that the way to move data around is a batch script with direct database access, but I digress…

What I’m going to try to do here is summarise just how epically confusing our permissions handling system for Nucleus is, mostly for the benefit of Alex and myself who (over the next week or so) will be trying to implement this layer without breaking anything important. It’s really, really essential that we get this done before we start promoting the service because of a few simple reasons:

  • Data security is important, and we don’t want anybody being able to read everything without permission.
  • Data security is important, and we don’t want anybody being able to write all over the place without permission.
  • Changing this kind of thing on a live service is like trying to change the engine block on a Formula 1 car whilst it’s racing.
  • We need to be able to guarantee the system can hold up to DoS attacks or runaway processes hammering the APIs.
  • People are already asking for access to this data for important things, like their final year projects.

So, where to go from here? Let’s take a look at everything which will be going on in the finished version.

Server Rate Limiting

Even before the Nucleus code kicks in, the server is fine-tuned to avoid overloading from any IP address or hostname. Using a combination of the OS firewall and the web server configuration overall request rates and bandwidth usage is kept below thresholds to ensure that the server is never overloaded. Due to the RESTful nature of the API (in which each request must represent a complete transaction) we have no requirement to ensure server affinity, so if the load gets too heavy we can easily scale horizontally using pretty much any load balancer.

To keep the pipes clear for our ‘essential’ services we do maintain a whitelist of IPs which have higher (but still not uncapped) limits.

Key Based Access

The only way to access any data in Nucleus is with an access token, issued by our OAuth system. These come in two flavours, either a user token (which grants permission for a specific user), or an autonomous token (which is issued at an application level, and is ‘anonymous’). The very first thing that happens with any request is that the token it gives is validated. No token, no access. Invalid token, no access. Revoked token, no access. To keep things nice and fast we store the token lookup table in memory with a cache of a few minutes, since most requests occur in ‘bursts’.

Continue reading

How we make things faster.

Today we’ve been playing around with our timetable parser to Nucleus connection and trying to work out why we were taking a projected 19 days to finish up parsing and inserting.

This was a problem of many, many parts. First up was Alex’s code, which was performing an update to the event on Nucleus for each one of the 1.76 million lines associating students with events. Great fun, since Total ReCal communicates with Nucleus over HTTP and our poor Apache server was melting. This was solved by using an intermediate table into which we could dump the 1.76 million lines (along with some extra data we’d generated, such as event IDs) and then read them back out again in the right order to make the inserts tidier. This reduced the number of calls to about 46500, a mere 2% of the number of things to do.

Next, we ran into an interesting problem inserting the events. The whole thing would go really quite fast until we’d inserted around 48 events, at which point it would drop to one insertion a second. Solving this involved sticking a few benchmark timers in our code to work out where the delay was happening, and after much probing it was discovered that the unique ID generation code I’d created couldn’t cope with the volume of queries, and (since it was time based) was running out of available ID numbers and having to keep running through its loop until it found a new one, taking around a second a line. Changing this to use PHP’s uniqid() function solved that little flaw by making the identifier a bit longer, meaning that the chance of a collision is now really, really small.

At the moment we’re running at about 33 inserts a second, meaning the complete inserting and updating of our entire timetable (at least the centrally managed one, the AAD faculty are off in their own little world) is done in a little over 20 minutes. We’ve had to turn off a couple of security checks, but even with these enabled the time little more than doubles and we’re currently not making use of any kind of caching on those checks (so we can get it back down again). There are also lots of other optimisations left to do.

A bit of quick number crunching reveals to me that we’re now running the process in a mere 0.08% of our original 19 days. Not bad.

We need faster interwebz!

Now that we’ve got live data being produced from Blackboard and CEMIS we can start writing scheduled jobs to insert this data into the Total ReCal database however in the case of CEMIS we’re having a few problems.

Everyday a CSV file is created of all of the timetable events for each student. This file is (currently) 157mb in size and has approximately 1.7 million rows. In his last post, Nick explained that we have now developed an events API for our Nucleus metadata service which is going to be the repository for all of this time space data. Currently we’re able to parse each row in this CSV file and insert it into Nucleus over the API at about 0.9s per row. This isn’t good enough. As my tweet the other day shows, we need to significantly speed this up:

So our timetable import into #totalrecal (1.7m records) will currently take 19 days. Our target is 90 mins. HmmmTue Oct 12 17:24:31 via web

At the moment we’re simply streaming data out of the CSV file line by line (using PHP’s fgets function) and then sending it to Nucleus over cURL. Our two main problems are that the CSV file is generated one student at a time and so ideally needs to be re-ordered to group events by the unique event ID in order to improve performance by reducing the number of calls to Nucleus because we can just send the event with all associated students as one. Our second problem is parsing and building large arrays results in high memory usage and currently our server only has 2gb of memory to play with. We’ve capped PHP at 1gb memory at the moment however that is going to impact on Apache performance and other processes running on the server. Ideally we don’t want to just stick more memory into the machine because that isn’t really going to encourage us to fine tune our code so that isn’t an option at the moment.

Over the next few days we’re going to explore a number of options including altering the current script to instead send batched data using asynchronous cURL requests, and also then re-writing that script in a lower level language, however the second is going to take a bit of time as one of us learns a new language. Both should hopefully result in significantly improved performance and a decrease in execution time.

I’ll write another post soon that explains our final solution.

How (And Why) We’re Building An API

We’ve explained what Mongo and NoSQL is, and why we’re using it. Now it’s the turn of the actual data access and manipulation methods, something we’ve termed Nucleus.

Nucleus is part of a bigger plan which Alex and I have been looking at around using SOA ((Service Oriented Architecture)) principles for data storage at Lincoln, in short building a central repository for just about anything around events, locations, people and other such ‘core’ data. We’re attempting to force any viewing or manipulation of those data sets through central, defined, secured and controlled routes more commonly known as Application Programming Interfaces, or APIs.

In the past it would be common for there to be custom code sitting between services, responsible for moving data around. Often this code would talk directly to the underlying databases and provide little in the way of sanity checking, and following the ancient principle of “Garbage In, Garbage Out” it wouldn’t be unheard of for a service to fail and the data synchronisation script to duly fill an important database with error messages, stray code snippets and other such nonsense which wasn’t valid. The applications which then relied on this data would continue as though nothing was wrong, trying to read this data and then crashing in a huge ball of flames. Inevitably this led to administrators having to manually pick through a database to put everything back in its place.

Continue reading