Thursday, April 03, 2008

Sessions in PHP (and cakephp Session abstraction)- how it works

The session in php is supposed to encapsulate the logic of retrieving the cookie values, mapping the cookie to a value at the web server backend, purging this backend data when cookies expire etc.

CakePHP provides a class called Session, that attempts to hide these details a step further. If you want to identify a web user across many web pages, all you need to do is enable sessions and then you can store information of the user in a Session object which will be always available at the web backend.

CakePHP provides a "helper" and a "component" that provides the next layer of abstraction to their basic Session object. If you're coding the "view" of the MVC, you use the helper - if you're coding the controller, you would use the "component".

This is all good for smallish-scale apps. As you start using more than one web server, you can't keep the cookie mappings in the web server, as the web requests are now going to one of many web servers. CakePHP has a mechanism using php session callbacks to allow you to store the cookie mappings in a central database.

This article attempts to walk you through the architecture of cakephp + php that allows this customization.
I would be referring to cake/libs/session.php so you should have it open in an editor as you go through this.

If we look in cake/libs/session.php, Session::__construct() function, we see that there is a call to $this->__initSession() followed by a call to session_start(). The __initSession() function sets some options in php that configures session handling at that level. Follow the switch statement on CAKE_SESSION_SAVE, and if it is set to 'database', you will see that session_set_save_handler() is used to tell php about some callbacks, namely:
__open : an initialization function
__close: a cleanup function, that is generally used to purge unnecessary data (gc)
__read: php passes this function a "session id" and expects the app to return the data that is mapped to this session id, which we shall call the "session value"
__write: php passes this function a "session id" and a "session value" and expects the app to store that mapping in the db

So, going back to the Session constructor, after the call to set the php callbacks, session_start() is called. This is obviously a function implemented in php core. It will now look at the cookie passed by the browser.

If this is a first time web user, there will be no cookie, so session_start() will generate a "session id" - an 128 bit value that is random enough to have a very low probability of collision. Then since session_start()
knows about the callback functions that were already set, it will call __open and __read

If you look at Session::__open() you will see it does nothing except return "true", signaling php that everything was ok on the initialization step. Now let's look at Session::__read().

As I said earlier, php passes a "key", which is nothing but the session id it just generated. The Session::read() code is expected to now retrieve the "session value" mapped to this "session id" by querying the db. Again if this is a first time user, there will be no value in the db and nothing will be returned to session_start(). Now session_start() will create a $_SESSION hash, that will be empty as nothing was returned to session_start() from Session_read(). It will also send this cookie value to the browser using a Set-Cookie response header. The cookie value looks like this:

CAKEPHP=session_id

Here the string CAKEPHP is defined in app/config/core.php and can be changed. For the curious, notice that this is passed into session.name in the Session::__initSession() function, that is how session_start() knows what name to use for the cookie. The session id in the cookie is of course what session_start() generated - that 128 bit random number we talked about.

At this point your application can write values to $_SESSION, whatever values you write will be saved to the database, upon a session_write_close() call to php. session_write_close() will use the 2 other callbacks, __write and __close to do that.

So at what point should the application call session_write_close()? It should be called before control leaves the current web request. One such method is when Controller::redirect() is called - see cake/libs/controller/controller.php. This function results in a redirect to the browser which takes the user to a different page, potentially hitting a different web server. So the current state must be saved to the database before that. The other method is the more natural path, of the request simply completing. The way CakePHP is designed, we're always guaranteed to have an instance of the ConnectionManager class running. The destructor of ConnectionManager calls session_write_close() - look in cake/libs/model/connection_manager.php.

So as you can see, if you use CakePHP, your app doesn't need to call session_write_close(), it is automatically handled for you by the framework.

So what happens inside session_write_close()? It will call Session::__write() passing the session id as the "key" and the $_SESSION hash as the "value". Session__write() first checks if a mapping exists for this session id, if so it updates it with the new values from the $_SESSION object - remember these may be values you set in you application, like user name, user email etc. For a first time user, there will be no mapping in the database, so Session::__write() inserts a new row.

Then session_write_close() will call Session::__close() which according to a coin flip (well actually using a function with lower odds than 0.5), will clean session mappings that have expired. How does CakePHP know if the cookies have expired? - interestingly enough the session cleanup here has nothing to do with the actual cookie expiration in the browser. CakePHP in fact uses a long lived cookie on the browser, but sets a configurable timeout for the mappings at the server. The timeout is calculated using the formula:

$timeout = CAKE_SESSION_TIMEOUT * factor,

where $factor is one of 10,100 or 300 for CAKE_SECURITY settings of 'high', 'medium' and 'low'. Both CAKE_SESSION_TIMEOUT and CAKE_SECURITY settings can be changed in app/config/core.php.

So we followed through what happens for a first time web user to your site - before the request leaves the page, a cookie is generated, sent to the browser, an empty $_SESSION object created that can be manipulated by the application, and then the $_SESSION object is saved to the database.

Now when the user visits the site again, the browser will send the cookie back to the site. Now session_start() has a cookie and it will extract the session id from it. Then when in calls Session::__read() it will pass the session id as the "key". The row will be found in the table and the session object will be returned to session_start(). session_start() will now create the $_SESSION object based on this value that Session::__read() returned.

Again, you're free to change, add values to the $_SESSION object and CakePHP guarantees that any changes will be written to the database before control leaves the web request logic.

CakePHP also provides some helper functions to write/read to/from the $_SESSION object. They are Session::write() and Session::read() respectively.

Hope this de-mystifies how sessions work a bit and helps in using the Session object with confidence.

No comments: