A project I have been working on part-time for the last year has recently been launched. The project involved building an eReader application for Sangari Global Education through a company called 8 Leaf Digital Productions. The basic idea was to take a large body of content, with associated resources, already in existence and deliver this data to eReader tablets with the available content to the user being dictated by access levels. Working with Ryan Nadel from 8 Leaf Digital we designed the system and started down the bumpy road of turning an idea into reality.
The content we had to work with was in the familiar form of books containing chapters. We shifted our thinking a little and tried not to worry about the books and turned chapters into lessons. With this new paradigm we were free to group sets of lessons together in order to create units which were the substitute for the idea of a book.
My role in the project (outside of architecture) was around third party integration, data processing and the construction of a API that the accompanying Android application could communicate with to download various resources. A good friend of mine, Geoff Spears, was brought into the project quite late and did an amazing job of delivering a robust Android application to consume the data – many of the challenges I faced on this project were alleviated by having Geoff join the team.
Throughout the project, as is the case when building anything new, we faced a few challenges and as a result came up with some innovative solutions. I thought this would be a good place to talk about those challenges and their solutions.
The data processing for the Sangari project was originally supposed to be done through a GUI with associations between data packages being made by a human operator. The larger idea being that an administrator could login to the web application, upload some data and then use the tools we built to process that data. There were some issues around communication on the project and when we did finally receive the data to be processed we realized that it was far too complex to be processed by a human and a bulk uploading system had to be implemented. With looming deadlines and a large body of data to process I tried to think about ways in which I could make this task easier on myself.
The JDBC backed paradigm that I had already developed for this project with presented some real challenges when we were faced with late changes to the data model. The number of data objects we had grew by three fold and the idea of writing boiler plate JDBC code for all of these objects made my stomach curl. It is a pain to do this stuff the first time around let alone having to go back and do it all again.
I started looking at the relationships between the data objects we were processing and noticed a pattern. The data could be related in an n-tree type of manner; however, because the project was being developed with what I would call an uber-agile methodology I was trying to make sure we did not have too many constraints in the way we put it together. I wanted to make sure we had the flexibility down the road to make associations across any of the objects and not introduce any duplication.
I broke everything down so that each data object held an individual piece of data. Along with the primary piece of data each table also included a auto generated ID for the primary key and time stamps for when it was created and last updated. This increased the number of objects we were dealing with but it did allow for me to handle each object in a generic manner. All the data was related using join tables which gave me the relational flexibility without data duplication that I was looking for. The real beauty that came out of this was that I was able to write a small set of methods that could handle all these data objects and their relationships regardless of their type.
Without flooding this blog with too much code and still giving you an idea of the type of flexibility this scenario gave me look at the following method I used for inserting a join between two tables. There is more logic around how I decided to make a join happen but this method covers the underlying principle… no matter what data field we were processing I could use the same generic methods to handle the communication with the database. The strings passed in were used for the name of the table and the main data field. The individual IDs were returned form the database after the represented objects were inserted.
As you can see by the following entity relationship diagram the data model I ended up with was moderately complex with only ‘has a’ dependencies. After processing the data for the project this small set of methods I had written inserted approximately 80K separate pieces of data without any errors and a similar set of methods was used for updating and extraction. Although this design could do little to alleviate the complexity of processing the data it did make the job of interacting with the database less tedious and provided me with a lot of flexibility in how I could treat the data as the project evolved.
Another major problem we faced as we plodded forward working on this project revolved around native PDF processing on the Android tablet. There was very little native PDF support on the Android platform and the PDF render time was unacceptably slow. To get around this problem we decided to pre-process the PDFs and turn each page into a PNG. All the meta data normally stored with the PDF for that page, along with extra data created by our process, was included with the PNGs representing a chapter as a JSON file that mapped said PNGs to the relevant data. The resulting group of PNGs and JSON file was then zipped up and delivered to the reader via an API. A multi-threaded downloader on the client allows for a fairly seamless experience when browsing the content. So far the reader has been distributed to 1000+ teachers without any major issues.