That Syncing Feeling

Yeah, the title’s derivative. It’s Sunday night, I’m tired, and I don’t get dessert ’til I finish this post. Sue me.

Working on another app in my free time. Primarily iOS clients to start, but it requires a Rails back-end, which will likely end up a proper web app, given time. The issue du jour, as suggested by the title, is keeping the app’s data in sync with the server. And here’s the gotchas:

  • Multiple users might edit some entries at any time
  • For at least some of the resources, all of the data should be kept in sync. For others, an “after this date” Twitter-style sync is fine.
  • The app has to cope with intermittent network access, and update when it can

No pressure.

To handle syncing a particular resource, I’ve setup a special Rails route: /resource/syncbase This returns a JSON array of all items available to the current user. But unlike simply doing a GET from /resource, each entry in the array only contains id, created_at, and updated_at. This keeps the payload to a minimum, while giving the app a full snapshot of the server’s data.

The app then sorts the returned array by id, and fetches a sorted list of its local items’ id and timestamp values. Then it’s a matter of running through both lists in a loop, comparing ids. When the left (local) list’s value is lower, it’s been deleted on the server, and we can do the same to our entry. If the right (server) value is lower, it’s new, and will need to be queued for download. If the values match, we compare their updated_at values, with the newest one “winning”.

It’s a naive setup, but should work for a first pass with myself and another test user. I can already see some gaps in it:

  • It trusts the device clocks instead of relying on the server’s
  • By not working off a “last updated” time, we’re sending a lot in the initial /resource/syncbase call

I’m not a huge fan of it, but one way to get around the above is with a deleted_at column in the model. You never actually delete a row – just update deleted_at and clear everything but id, for privacy and to save space. With that in place, the client can ask for changes since the latest of the three timestamp columns, and the serve can respond with just the changed rows.

Offline access seems like it’ll be relatively easy. Core Data has built-in support for relationships, so while locally-created and unsynced entities won’t have an id value, they and any related data can stick around until the network is available. The trick will be to upload them, then fill in whatever id the server hands back. It brings up the device clock issue again though. Do we also pull back the created_at the server sets, or override it with the time the user added it on their device? Same goes for updated_at.

Enh – more to learn and figure out.

Leave a comment

Your email address will not be published. Required fields are marked *