Building OAuth providers is a new and not well known art. Scaling it up is probably even less known. This space is for adding best practices for scaling OAuth providers.

Smaller Deployments

Pure RDBMS approach

This is what is used by most small providers as this is often what a library or tutorial shows.

A typical implementation will do 4 database lookups and 1 insert:

find access token based on oauth_token
find consumer based on either oauth_consumer or foreign key in access_token record
find and store nonce
find tokens user record

First step to improve this is to ensure correct indexing for the above queries.

add unique index on token in access token database
if looking consumer up by it's consumer key add a unique index on this field
add unique index to nonce/timestamp i nonce table

This should work on most small web services, but we can definitely improve it.

Eliminating the Nonce table

The first step when optimizing this could be getting rid of the nonce table. It is really nothing more than a lookup and you should have some policy in place already to not allow timestamps older than say an hour. That said a properly indexed nonce lookup can be very fast. Just remember to clear out old nonces.

This means we could store this instead in some sort of shared hash lookup system like MemCache or Tokyo Tyrant. These are great in that you can store something with an expiry time so the entry would automatically be removed.

One problem with an ephemeral store such as MemCache is that it doesn't guarantee that it gets stored until expiry. Thus you could imagine an attack specifically aimed at filling up your nonce cache.

Memcaching the AccessToken

After the first use of the token it gets cached in something like MemCache together with the consumer and possibly user information. Doing this eliminates all database hits after the first access.

If a token gets revoked this should clear the cache entry for it.

Larger deployments

Large services may have many large geographically distant data centers, which makes it difficult to contact a central database for verification.

Encrypting information within the AccessToken

This approach is what Yahoo uses and is a very neat approach. It eliminates most database calls by storing everything needed to verify the token in the actual token itself. This information should be not only encrypted but also signed using a centrally shared key.

Issuing the token

A central OAuth server serializes the following data:

consumer secret
token secret
user id
expiry
possibly a permissions scope.

In other words what you store in your database in the RDBMS approach. This is encrypted and signed with a key shared between your servers.

Verifying the token

The signature is verified and the token is decrypted. The secrets are used to verify the signature.

Token Revocation

This is the tricky part of this approach. This does require some sort of verification with a central source. The good news is that this verification can be cached locally either within the server memory or in a datacenter local MemCache server. Verifying it could be as simple as a HTTP get to a central server which returns an HTTP response code of 200 for valid or 403 for revoked. This response could be stored locally with a short expiry time so it forces a recheck every hour or 10 minutes.

Other approaches would be to use Token Revocation Feeds akin to the old Certificate Revocation Lists. These would be generated by the central OAuth server and subscribed to by the servers or data centers.

Scaling OAuth providers