Integrated Service Development

The university, in common with the wider HE sector, is looking at SOA (Service Oriented Architecture) as the way forward.

A major issue is that many of our existing systems exist as discrete silos of data and functionality, yet they are for the most part sufficient for the tasks they were originally acquired for.

A prime example would be our timetabling system. With it, our timetabling team are able to meet the university’s need for an academic timetable, with all that that entails. It is an established part of the university and is supported by process and procedures that have been honed and refined over the years. To replace it would be a major undertaking.

To qualify what I mean by “the university’s need for an academic timetable” is that we are able to generate a timetable that enables the university to allocate its resources in such a way that it is able to deliver its academic program. Any functionality beyond that (for example, publishing the timetable) has had to be provided by other means in common with most of our other systems. This usually entails some form of overnight batch processing.

One consequence of the way our systems currently interact is that when they are all “butted” together as they are, it takes several days of passing from one system to another, processing it and reprocessing it to reflect any changes in any circumstance or environment. Accordingly, we are far from real time and are unable to give a truly accurate view of the “whole picture” at any one time as there will always be some system with data that is lagging behind.

The above encapsulates what constitutes two of the main drivers in our move towards SOA: namely better management information and a near real time experience for our user.

The Total ReCal Plugins

A specific problem that the university faces is the aggregation, integration and publishing of ‘space-time data’; that is, data relating to the use of space (i.e. room bookings, geo-spatial location data) and time (i.e. timetables, event schedules, library book returns).

This project will address this problem by developing plugins for existing university systems that expose useful data which can then be aggregated into new web-based services. One of these web-based services will be a new calendaring system for students (initially, hopefully staff later).

All student’s calendars will comprise of three core layers; academic timetable, assignment deadlines and book return dates. We will create plugins for the three DMS ((Data Management System)) the University uses for these, Blackboard, SirsiDynix Horizon (HiP library portal), and our in-house developed timetable system.

Because we will have developed a standard for storing space-time data from these systems we are also going to create a number of other plugins for other systems so they can add to the datastore. These systems include WordPress, and providing the University has moved to version 2007 in time, Microsoft SharePoint.

Detailed here are our initial ideas as to how we intend to develop plugins for the systems to access their data.

Blackboard

One of the big motivations behind this project is that, as students, there is no easy way of finding out hand in deadlines for assignments, being informed if the deadlines change, and seeing the deadlines marked on a calendar alongside our academic timetables (so that we can realise that we’ve got one week not two until that deadline!). For example at the moment, the media faculty releases an Excel spreadsheet that mixes deadlines for every module for every year group which isn’t very useful if I’m trying to work out what has changed if a deadline is updated.

By September all faculties will be using Blackboard for detailing assignments. Many already are, and some have been for several years. When creating an assignment, there is an optional field that the academic can fill in to specify the deadline. Unfortunately, less than 10% of assignments created on Blackboard during the last academic year had anything in this field. Another problem we have is that a number of schools and faculties are making use of the Turn It In service (via a Blackboard plugin) and we have yet to investigate how Turn It In stores the data in Blackboard.

As we understand it, and we will have this verified, the license the University has with Blackboard allows us to develop on top of the Blackboard API and also access the underlying database (which is MS-SQL based). As neither Nick or I are particularly well versed in Java, and also the API doesn’t seem to give us access to the information we need we believe the route we should go down is to access the data straight from the database.

Therefore we will create a script that will be executed on a cron job that checks for new assignments in the Blackboard database, and verifies the date and time of existing assignments. Additionally we will try and enforce that academics must use the deadline field when creating assignments.

Horizon

Through work that we’re doing on our Jerome “un-project” we have a head start on the accessing data from Horizon. The University has invested in Talis Keystone which integrated with Horizon and abstracts our the data over a friendly REST/SOAP web service. Using the APIs we’re developing for Jerome we intend to access book return dates for individuals and publish these as one of the Total ReCal layers.

Academic Timetables

Back in November 2009 I was incredibly bored one night and I hacked around with our student timetables to create subscribable iCalendar feeds. The script works by screen-scraping our timetables (here is mine) and then interpreting the JavaScript on the page to produce an array of events which can then be turned into ics format.

Our timetable system was written in-house many years ago so we’ve got a lot of control over the output. For the time being we’re not going to completely replace the HTML version of the timetables but add in a new script that will generate the ics feeds along with the timetable renders (this happens on a cron job at 3am every morning).

WordPress and others

A side project of mine has been developing a system that can add location awareness to our online services. When you visit one of these services your IP address is sent to this system and then matched against a list of IP ranges for the University’s wireless and wired networks. The response, if you are on campus is the building that you’re in, which campus you’re on and whether you’re on a wired or wireless connection. If you’re not on campus then it will list your closest campus and where roughly in the world you are (using the MaxMind database).

We will develop a WordPress plugin that will query this system when someone creates a blog post on our blogs.lincoln.ac.uk platform and then push this information to Total ReCal. A hypothetically mashup we could then build with this data something like a heat-map of blog posts tagged “research” and overlay this on Google Maps so we can see where the most research blogging is going on at the University of Lincoln.

When we know the situation with SharePoint we can also plan for potential plugins for it too.

Why NoSQL?

After looking at the initial brief for Total ReCal, we realised that it would be necessary to build a new data storage layer to handle the time/space information which drives the project. There are many reasons for this both technical and political, but the key reason is that since we are running what is effectively an abstraction and amalgamation service we want to be able to interface directly with our own copy of the data; here’s why.

Speed is often considered a luxury when dealing with large data sets, and especially in larger institutions it’s common to think nothing of waiting a few minutes for a report to finish building or for your operation to finish processing, but we wanted to offer something where you could happily hit it with 20-30 queries a second over an API. This is particularly relevant given our larger Nucleus un-project to expose public (and some private) data over APIs to allow mashups. In short, we don’t want to have to wait for even half a second whilst another service gets the data we’re after, and we especially don’t want to have to waste more time parsing the data into a useful format.

We looked at several possibilities for how to store the data. An obvious one to take a look at is a traditional RDBMS ((Relational Database Management System)) such as PostgreSQL or MS-SQL. In this instance we would most likely have been using MySQL, since it fits smoothly into the almost universally supported LAMP ((Linux, Apache, MySQL, PHP)) stack which is available on our key development server. Alex and myself are both well-versed in using MySQL as a database and interfacing with it using PHP, so should we have opted for an RDBMS it would be the obvious choice despite the rest of the University standardising on MS-SQL.

Continue reading

And we’re off…

I’ve just had a good chat with Alex Hawker, Programme Manager of the Flexible Delivery Programme that Total ReCal is funded under. We received some good comments about the bid that we put in and there’s a particular interest in the idea of working with ‘space-time’ data. Our (student) developers, Alex and Nick, are on leave for the next week, but we’ve already met to discuss the points that JISC have asked us to clarify and develop in order to satisfy funding requirements. They are:

  • More detailed dissemination and workplan
  • Engagement with other institutions to test the solution against other applications
  • Data gathering to be undertaken when students are available

My first task is to address these in the Project Plan, which will be submitted by the 29th. Much of the project budget has been allocated to buying Alex and Nick’s time as developers, so we’ll have the equivalent of almost one developer (30hrs) working full-time on this project for six months.

In addition to the Project Plan, our immediate ToDo list currently looks like this:

  • Setup UserVoice (Done)
  • Setup Bitbucket (Done)
  • Intro blog post – what we’re doing and why
  • MongoDB/NoSQL evaluation
  • Set up surveys for users
  • Blog post about the plugins we intend to create
  • Blog post about the work we’re doing towards integrated service development

We should have all of those done by the 6th August. Some of this has already been touched on in Alex and Nick’s personal blogs over the last few months.