Disclaimer: I'm still learning CouchDB so there may be smarter ways of doing the things I propose here.

The problem

A typical requirement of any website is to enforce all registered usernames (and sometimes also email addresses) to be unique. This is trivial to do with a relational database like MySQL, but CouchDB doesn't have unique keys.

The standard solution

The usual way is to make the username (as chosen by the user) the _id of the User document. That will enforce uniqueness, but it also makes it difficult to change the username later, when there are lots of references to the document using the _id. It also doesn't solve the problem of having additional uniqueness requirements for other fields like email addresses.

Separating the unique id documents

Instead of the usual way, I'm using a separate document to hold the unique username as its _id, and another (auto-id) document that contains the actual user data. Each user would then be described in the CouchDB database by two documents like this:

{ "_id": "username/kennu",
  "type": "UsernameReservation",
  "user_ref": "71bacba712381adefg13712312312369" }

{ "_id": "71bacba712381adefg13712312312369",
  "type": "User",
  "username": "kennu",
  "password": "xxxx",
  "email": "webmaster@kfalck.net" }

This makes it easy to refer to the User document using the permanent auto-generated _id, while uniqueness is enforced by the separate UsernameReservation document. When the username is changed, a new UsernameReservation document is created to reserve the new name, and then the old document is deleted.

Additional unique fields can be added fairly trivially with documents like this:

{ "_id": "email/webmaster@kfalck.net",
  "user_ref": "71bacba712381adefg13712312312369" }

Handling the transaction

Because CouchDB doesn't have transactions, it's complicated to ensure that the username reservation happens only when the User document is also saved successfully with the corresponding username. In my code (I'm using couchdb-python and Document schema classes), I've defined a "save" method that will execute the following steps:

  1. Make sure the UsernameReservation document is created and belongs to the reserving user.
  2. Save changes to the User document.
  3. If an exception occurred, delete the UsernameReservation (unless it already existed before) and re-raise the exception.
  4. If a new User document was created, update the UsernameReservation with a reference to its _id.
  5. If the username was changed, delete the previous UsernameReservation document.

I'm pretty sure that can be made more beautiful. The basic problem is how to implement a multi-step CouchDB transaction properly in client-side code. It's impossible to make it survive server crashes and similar situations, but maybe a generic transaction could remember the _revs of the documents that were modified, and then restore them to the original state if something goes wrong.

Managing multi-master conflicts

There is one final challenge related to distributed CouchDB: what happens, if the same _id (e.g. username/kennu) is created simultaneously on two or more database servers?

In this situation, CouchDB will mark both documents as _conflicting revisions and more or less randomly choose one of them as the winner. It's up to the application to decide what to do when it encounters a username in _conflicting state. There are a few alternatives that are a bit nasty:

  • Assign a random username to both competing users and release the conflicting name to be re-acquired by whoever gets it first.
  • Put the user accounts on hold and require an administrator to manually resolve the conflict.
  • Somehow choose the winner and just apply the above to the loser.

Avoiding conflicts with a single master

Since usernames and emails are typically not changed very often, it might be more feasible to use only one master database for them. In this scenario, the application would write UsernameReservation documents only to the master, which guarantees they are never conflicting with each other.

Summary

Creating unique fields in CouchDB seems to be quite challenging compared to relational databases. It would be very helpful if CouchDB supported unique fields directly, or if it provided some kind of multi-step transactions for implementing them reliably using separate document _ids as described in this article.

Published 29.6.2009