Building OAuth providers is a new and not well known art. Scaling it up is probably even less known. This space is for adding best practices for scaling OAuth providers.
Smaller Deployments
Pure RDBMS approach
This is what is used by most small providers as this is often what a library or tutorial shows.
A typical implementation will do 4 database lookups and 1 insert:
- find access token based on oauth_token
- find consumer based on either oauth_consumer or foreign key in access_token record
- find and store nonce
- find tokens user record
First step to improve this is to ensure correct indexing for the above queries.
- add unique index on token in access token database
- if looking consumer up by it's consumer key add a unique index on this field
- add unique index to nonce/timestamp i nonce table
This should work on most small web services, but we can definitely improve it.
Eliminating the Nonce table
The first step when optimizing this could be getting rid of the nonce table. It is really nothing more than a lookup and you should have some policy in place already to not allow timestamps older than say an hour. That said a properly indexed nonce lookup can be very fast. Just remember to clear out old nonces.
This means we could store this instead in some sort of shared hash lookup system like MemCache or Tokyo Tyrant. These are great in that you can store something with an expiry time so the entry would automatically be removed.
One problem with an ephemeral store such as MemCache is that it doesn't guarantee that it gets stored until expiry. Thus you could imagine an attack specifically aimed at filling up your nonce cache.
Memcaching the AccessToken
After the first use of the token it gets cached in something like MemCache together with the consumer and possibly user information. Doing this eliminates all database hits after the first access.
If a token gets revoked this should clear the cache entry for it.
Larger deployments
Large services may have many large geographically distant data centers, which makes it difficult to contact a central database for verification.
Encrypting information within the AccessToken
This approach is what Yahoo uses and is a very neat approach. It eliminates most database calls by storing everything needed to verify the token in the actual token itself. This information should be not only encrypted but also signed using a centrally shared key.
Issuing the token
A central OAuth server serializes the following data:
- consumer secret
- token secret
- user id
- expiry
- possibly a permissions scope.
In other words what you store in your database in the RDBMS approach. This is encrypted and signed with a key shared between your servers.
Verifying the token
The signature is verified and the token is decrypted. The secrets are used to verify the signature.
Token Revocation
This is the tricky part of this approach. This does require some sort of verification with a central source. The good news is that this verification can be cached locally either within the server memory or in a datacenter local MemCache server. Verifying it could be as simple as a HTTP get to a central server which returns an HTTP response code of 200 for valid or 403 for revoked. This response could be stored locally with a short expiry time so it forces a recheck every hour or 10 minutes.
Other approaches would be to use Token Revocation Feeds akin to the old Certificate Revocation Lists. These would be generated by the central OAuth server and subscribed to by the servers or data centers.
Comments (1)
Gregg Kellogg said
at 9:55 am on Oct 16, 2009
The proposed Session Extension (http://oauth.googlecode.com/svn/spec/ext/session/1.0/drafts/1/spec.html) provides for token expiration by reducing the lifetime of an Access Token through the oauth_expires_in parameter and creating a separate Access Token renewal workflow. This would seem to be a useful extension for handling the Token Revocation problem, if oauth_expires_in is sufficiently shorter than oauth_authorization_expires_in.
This topic has been somewhat controversial lately, but it seems like a reasonable approach for supporting decentralized service.
You don't have permission to comment on this page.