Few stories: development

Sunday, January 15, 2012

OAuth2 provider instruction

I've decided to implement OAuth2 support in my web-site. Since my application is written in perl (based on Catalyst framework) I came to CPAN:

Found CatalystX::OAuth2::Provider, but it is too straightforward for my needs. And, looking through code, it seems to me not working at all.
There is a OAuth2::Lite, but marked as "beta", and not recommended for production use.
There is a Net::OAuth2 module, but it is a client only.

So, I've decided that it'll be better to write my own implementation - just for my needs. And gone looking for samples / instructions.

I've start with reading oauth2 site - not much, really, except for the link to RFC. Searching internet reveals some more articles, mostly about theory and some thoughts, and some server implementations. But... nothing like simple samples of realisation, or some code stub. Of course I can read other servers' code, but it is a bit annoying to decipher unknown web-framework written in not very familiar language. And so, I've read throughout RFC and made my own TODO list of what must be implemented in OAuth2 provider server.

My list is not universal, it is aimed to solve my particular situation. Also, when RFC allows different approaches I choose one, that suits me best.

Pre-register clients. Client registration includes:

client_id
password
uri, where user must be redirected to after access granted
logo, description
allowed scopes to request access to

authorize endpoint (/oauth/authorize)

Validate request:

find client by 'client_id' param - if not - show error page 'unauthorized_client' to user
if 'redirect_uri' param is present - check that it matches registered client pattern - if not - show error page to user - no redirect!
'response_type' param == 'code' - if not - error = 'unsupported_response_type'
check that all substrings in 'scope' param are allowed for this client - if not - error = 'invalid_scope'
if 'state' param is present - save it in session

Authenticate user. Either get authentication information if already logged in, either provide ability to log in.
Show grants page with full description of grant request: client name, description, logo and list scopes.
If access not granted - redirect back with error = 'access_denied'.
If granted:

generate unique code string
store code with user_id, client_id, scopes, timestamp. If redirect_uri was in request - store it too.
form redirect url: redirect_uri from request OR client uri from registration, if 'state' was in request - add it, add list of scopes.

token endoint (/oauth/token)

Authenticate client (HTTP Basic) - get client_id, if not - error = 'invalid_client'
Check that 'grant_type' param == 'authorization_code', if not - error = 'unsupported_grant_type'
Find 'code' param value in DB. Check that client_id stored with code == client_id from authentication.
Ensure that code is not expired - stored timestamp > current timestamp - 60.
If 'redirect_uri' param is present in request - ensure that it is the same as stored with the code. If any of last three wrong - error = 'invalid_grant'
Create unique access token string, store it in DB (with client_id, user_id, timestamp, scopes).
Add headers to respond: Cache-Control: no-store and Pragma: no-cache
Respond with json {"access_token": token string, "token_type": "Bearer"}

Protected resource
And now, when you want to check that request is authorized to get information that belongs to user with 'user_id' and falls under scope 's1' you must:

check that 'Authorization' header is present
That it contains scheme "Bearer"
Decode base64 string, find access token in DB
Ensure, that this token allow access to user_id and scopes string contains 's1'

Few more words:
I assume that site supports some kind of per-user sessions.
Scope string is space-delimited list of any substrings, that could identify your site's sections or grants or any other kind of access restrictions.
'error' means that request must be redirected to redirection uri with param 'error'.
How to generate unique code and token strings is beyond this article, I think everyone has their favourite method.

Further reading:
I really think that it is very useful to read RFC - http://tools.ietf.org/html/draft-ietf-oauth-v2-22
And also RFC for Bearer scheme - http://tools.ietf.org/html/draft-ietf-oauth-v2-bearer-15
If now you want to add support for other flows - it'll be much easier.

It is not still implemented, just work-in-progress. I hope that I didn't make too much mistakes. Comments, advices and criticism are appreciated.

Tuesday, May 10, 2011

Database tables and columns naming conventions

Once again I've got to explain what I think about database tables and columns naming conventions. This time, I've decided to write down my points, to be able next time just to give a link :).

Starting with tables - main question is plural or singular form should be used. Main point in favor of plural form is something like "it is a collection of entities, so it must be named in plural form". Singular form is preferred when you want to write query like "where user.id = ", and not "where users.id = ".

When it comes to columns' names, there are much more arguing about it. Main one is about primary key - should it be ID or entity_id. In case you've used plural form for tables, it become more pleasant in queries - "where users.user_id = ". Next bone of contention is about short forms in columns' names - could you name it like userGoodsCnt, or you must name it like amount_of_goods_in_user_cart.

It's funny to notice that usually, ones that love plural form of tables' names and long version of primary key prefer long version of columns' names too. Why do they love long names so much? In most cases they say that it's because "database schema must be as much readable as it could be". It is very important to have readable and pretty much self-explanatory schema, then why others disagree? Ones that like short forms says that "it is _understandable_, and it is shorter. save a byte - save a tree". One more point - description of column or table could be kept near it, not necessarily in name - in description, in repository, in comments...

From my point of view, it is all about when and how you work with schemes. Either you usually look on DB chart, or DB schema or some entities' relations graph... in this case you prefer long, self-describing names, since comments are hidden at this time. And, what is usually much more important, you look at this chart from time to time, either you are DBA with a lot of such schemes, or it's just not your primary job - you just refer to schema sometimes, so you do not need to remember it and want to get all information from just one look. On the other side are those who work with tables' and columns' names many times a day, those who write code and queries. They usually keep whole DB schema in mind, and since they write all those names with their own fingers - they want it to be as short as possible.
First group are architects, who do not write code, technical directors and so on. Second are developers and coders.

Summary: if you are a leader and architect, do not forget about those who will work with schema every day - do not spoil their comfort in favor of pleasure look of DB chart. Developer's comfort is very important - not lesser then yours. But they spend much more time with DB - don't forget about it. It doesn't mean that table could be named as table1 with columns A,B,C - no way.

I myself prefer to name tables in plural form, but to name columns as short as possible. It's because I use ORM, so I do not usually see ugly queries like "where users.id = ", but I use forms like "select from users where name like" - it is ok with me. But when it comes to columns' names - I try to keep it as short as possible, as long as it is understandable to those, who work in the same project. I'm pretty much sure, that if you can't spend time to read and understand DB schema with shortened forms (like usr instead of user) - then you'll not have enough time to make something good to project's technical development.