Using MySQL Cluster as a Distributed PHP Session Storage

In this little text I will discuss a high available / high performance distributed session management cluster solution. Main focus is set on web sessions, PHP and MySQL Cluster.

Introduction

Nowadays web pages are very often required to deliver personalized dynamic content, handle shopping cards or just keep track of user statistics.

All this cannot be served without managing web sessions linking feature data to a particular visitor's web browser.

Usually this session data is stored in the web application' s memory or on disks. With more and more visitors adding load the solution is distributed to more servers. The solution becomes finally more complex using load balancers with sticky connections to compensate for the non distributed session information and replicating session data among web servers to achieve more availability.

A central database for session persistence often adds another failure point to the architecture or rarely delivers the performance at a reasonable cost level.

This text describes a low cost session management solution base on MySQL Cluster Database allowing a simple but high available architecture. MySQL Cluster serves as a central session storage running on multiple computers. Data within this Cluster is automatically distributed, load balanced and replicated to deliver the necessary performance and high availability.

Cluster can be accessed using MySQL Server maintaining. The architecture has proven to easily accommodate the web session management of thousands of concurrent visitors using a low price standard of the shelf hardware.

Web Sessions?

  • Web services
  • Shopping carts
  • Personalized web pages
  • Any kind of "Know / Recognize your customer"
Web services rely on HTTP for communication
  • HTTP protocol is stateless
  • No permanent connection between web client and server
  • Isolated single requests and their reply
  • No persistence of a connection or client identification from one request to next
In a shopping cart application the cart needs to be filled and associated with a customer, a delivery address as well as payment information details. Shopping cart applications and any kind of personalized web pages require the identification of a web client and/or storage of additional (state) information. Main objective of web sessions is the maintainance of a web visitor's state for more than a single web request across multiple pages of a single site.

Hypertext Transfer Protocol (HTTP) as protocol between the browser and the web server is stateless. The protocol requires no permanent connection between the browser as a client and the web server. A HTTP transaction is complete by the browser sending a request to the web server and the response as a reply.

There is no persistence of a connection to the next request. In addition no requirements have been defined for any browser to identify itself for a single request.

Because all web based services relies on HTTP for communication, maintaining state in a web application is an additional challenge.

Web Sessions!

Web sessions help to maintain a client's state from request to next request. We can finally store information attached to a web visitor and make him or her use a shopping basket. 2 locations to store the client state:
  • Client
  • Server

Both methods require the identification of client. Several methods exist to transfer the client's browser state or identity. The state can be stored on client or server side:

  • Cookies
  • URL rewriting
  • Hyperlinks
  • Forms

Cookies are an extension of HTTP that were introduced to help provide stateful HTTP transactions. Cookies can be used to identify a single browser for recurring HTTP transactions or to store state information. States can also be passed in the URL, but accidental disclosure of this information poses serious security risks.

Session Management

Using the above described methods session management is obviously a method to allow "virtual connections" to be established between browser and server. A session identifier is assigned to each user Identifier stored in a client side cookie if possible.
  • Maintain client data for each user session
  • Provide storage for session data on server side
  • Allocate & track session ids
  • Associate each session id with a client each request
  • Individual requests are matched in a session.

Advantages of Sessions

  • Simplicity for developer
  • Often transparently available
  • Often build-in
  • Less network utilization
  • Data security

Hints

Session data is usually for storing temporary data only. Session data in a usual web application is deleted after 30 minutes or even less. This can be noticed every time you enter a web shop, fill you shopping card, get distracted to do something else and ... shopping card content is gone.

Please also note that URL rewriting is a solution and even a must when handling WAP sessions.

PHP Sessions 101

PHPs session extension transparently takes care of server side session handling since PHP version 4. The respective script is quite simple. In order to make use of sessions you merly need to call the function session_start(). Session variables are stored in the $_SESSION[] array.

Session contents are stored on script termination and restored on consecutive session_start() calls. Most aspects of PHP session handling are still configurable.

Example


...
<body>
<?php
session_start();
if (!isset($_SESSION['count'])) {
    $_SESSION['count'] = 0;
} 
$_SESSION['count']++;

echo "Request count: ".$_SESSION['count']."<br>\n";
?>
<a href="#">next</a>
</body>
...

PHP Session IDs

PHP supports session ID propagation by both URL parameters and cookies. The default name of the ID parameter/cookie is PHPSESSID (settable using session_name()). The default session ID is a MD5 checksum based on date, time, the client IP and other request data (settable using session_id()). Transparent session ID rewriting can take care of adding the session ID to URLs in absence of cookies.

Challenges in Session Management

Session data even if only temporary should survive a web server crash. In order to allow customers to continue shopping even in case of a hardware or any other failure a solution should be highly available. Session data might very often be used and is highly dynamic, performance is an issue. Today some 100 visitors a day, in the near future a couple thousands. Scaleability is the issue here.
  • Session persistence
    • Reload session data after system downtime
    • Interface: Export to other parts of application
    • Survive system failure
  • Performance
    • lowest answering times required for this little part of the web application
  • Scalability
    • maintain performance even with growing system load
  • Fail-over & high availability
    • maintain continuous operation in case of system failure

Session Persistence

Several locations are suppported by PHP and other languages in order to handle session persistence.
  • Disks
    • Local disks
    • NFS, SMB
A shared file system like NFS or SMB can be used for storage. They are
  • transparent for file based sessions
  • and
  • only require a change of file save paths
But shared mounts need extra system configuration. Most shared file systems like NFS have issues with locking. A file server becomes single point of failure.
  • Shared Storage (hardware or software)
    • Database
    • Centrally and shared
    • Per web server
I regard session storage in a database as the only solution in big scale web applications. Nevertheless many problems need to be solved in these environments:
  • Very often performance issues
    • Bad response time
    • Insufficient throughput at a reasonable price level
    • doesn't scale
  • Central database
  • Another single point of failure

Session Persistence and PHP

Whereas J2EE has HTTP session or SFSB objects (including fail- over for object references) and application servers like WebLogic support a transparent (database based) session persistence PHP users have access to a variety of solutions using the PHP session API.

PHP sessions are persistent by design. PHP has a sessions module available. A "pluggable" database support exists (the default file persistence is build in) but manual programming required. The respective session management module will be simply loaded by the a function call to session_module_name ( ) at the beginning of your script.

Using the PHP build-in session file based sessions a file will be created in the configured path. The file name contains the session ID.

Session variable contents are stored in serialized (text) format:

  count|i:5;foo|s:3:"bar";

or in a more compact binary format. Customization of session storage is possible from within PHP extensions or PHP userland code.

PHP Custom Session Handlers

A user defined session handler has to implement the following functions:
  • open() - open/lock session storage
  • close() - close/unlock session storage
  • read() - read stored serialized data
  • write() - store serialized session data
  • destroy() - delete session data
  • gc() - remove expired sessions

PHP SQL Session Storage

In order to store session data in a MySQL database we need a table like this:


    CREATE TABLE `test`.`session` (
      `id` CHAR(32) PRIMARY KEY,
      `data` TEXT,
      `usetime` TIMESTAMP ,
    ) ENGINE=<...>

In the recommended and most common solution the data column will hold the session data in the same serialized format as it is used in the file based session storage.

Please note that the ENGINE clause is only available in MySQL. Using other databases you have to omit this keyword. If you intend to use MySQL you should use a storage engine allowing row level locking (such as InnoDB or Cluster like discussed later on).

PHP Session Handler functions

The following code will implement the full session management API needed to run session management in a (My)SQL database.

$_my_dbh = NULL;

function my_open($path, $name)
{
  global $_my_dbh;

  $_my_dbh = mysql_connect("localhost", "root", "");
  mysql_select_db("test", $_my_dbh);

  mysql_query("BEGIN TRANSACTION", $my_dbh);
}

function my_close()
{
  global $my_dbh;

  mysql_close($my_dbh);
}

function my_read($id)
{
  global $my_dbh;
  $res = mysql_query(
    "SELECT data FROM session WHERE id='$id' FOR UPDATE", $my_dbh);
  $row = mysql_fetch_assoc($res);
  mysql_free_result($res);
  if (is_array($row)) 
    return $row['data'];
  return "";
}

function my_write($id, $data)
{
  global $my_dbh;
  $data = mysql_escape_string($data);

  mysql_query("INSERT INTO session (id, data) VALUES ('$id', '$data') 
         ON DUPLICATE KEY UPDATE data='$data'", $my_dbh);

  mysql_query("COMMIT TRANSACTION", $my_dbh);
}

function my_destroy($id)
{
  global $my_dbh;

  return mysql_query("DELETE FROM session WHERE id = '$id'", $my_dbh);
}

// delete idle sessions
function my_gc($lifetime)
{
  global $my_dbh;

  mysql_query("DELETE FROM session WHERE
     usetime < DATE_SUB(NOW(), INTERVAL $lifetime SECOND)", $my_dbh);
}

Performance and Scalability

In order to meet the requirements of satisfying performance and scalability we have to consider the following aspects:
  • Throughput and response time
  • Some web pages have thousands of visitors
  • Each page access requires session data access
  • Every new visitor will generate new data
Solutions are discussed in this section: using distributed systems, load balancing and load sharing as well as data partioning.

Distributed System

In a distributed environment all system load (user web requests generated by a page hit causing a session call) is distributed to multiple web server instances usually located on different computers. A distributed allows to increase the overall throughput of the web system and can perform a fail-over in case of a system failure.

A load balancer or virtual server is necessary to distribute the HTTP requests to the differernt web servers. Local storage of session data requires forwarding of same client request to always same server

Load Balancing

Basically two methods to load balance your application exist:
  • Static
    • Load assigned and distributed at beginning
    • No adjustment during run-time
    • Very fast due to low overhead
  • Dynamic
    • Determines distribution at run-time
    • Distribution of work load depending on information collected
    • Many algorithms (e.g. measure net trafic)

Partitioning / "Sticky Connections"

In order to distribute load to different web servers a central session storage or local session storages are needed. In case of a central storage each single user request can be routed to any server. The session data connected to the session id will be read from the central storage. But a shared storage will introduce a single point of failure.

A usual way to "balance" distributed load is partitioning. Very often an algrithm is used on the session id to equaly distribute sessions to different servers. Each session will then be routed to a dedicated server with a local storage. Unfortunately this will require "sticky connections" since partitioning is made strickly by web users and a local web server storage is used. Session data can not be shared among different web servers. Each user is then routed to a dedicated web server. This will introduce new problems even if several server can share the load now. E.g. "bleeding techniques" are needed to move new users to a new server over a longer period of time just to shut down a single server for maintenance.

Fail-Over and Availability

Easiest minimum security against data loss is an external external storage (single point of failure).

Even if it is a single point of failure the session data is at least still available if a system goes off-line.

  • Replication to a standby server
  • Some AS / web – servers offer it build-in
  • Memory to Memory replication
Problems:

  • Failure-detection (e.g. complicated API programming of hardware load balancers)
  • often fail-over time high
  • Unused resources by an idle spare computer

MySQL Cluster & PHP for Session Management

Cluster for Session Management

  • Build-in automatic load balancing
  • Load balancer (or virtual server) of distributed (uniform) web applications
  • Web or Application Server
  • Application logic
  • Front end view
  • MySQL Server
  • Distributed SQL layer
  • MySQL Cluster data nodes
  • Shared but redundant and distributed data storage

Web Applications and Cluster

  • Single non-interacting, uniform and distributable applications accessing a shared storage
  • Standard database API to access MySQL Server
  • PHP, perl, JDBC, ODBC
  • Full set of SQL available
  • Transparent access to data
  • Native Cluster API for ultra low latency
  • Standard session library of programming language
  • Shared central session data storage

Cluster advantages

  • Independence of programming language
  • Storing gigabytes of session data
  • Low answering times (5 – 10 ms)
  • High throughput (10000 transactions per second)
  • Nothing shared architecture
  • Scalability
  • Application, MySQL Server, data storage layer
  • Off the shelf hardware
  • Fast automated fail-over (in milliseconds)

Session handling with Cluster

  • Following requests
  • From same client
  • Going to any web server
  • Even long period of time in between requests possible

PHP and Cluster

Changing the PHP session interface we've seen earlier to use MySQL only requires a change of the storage engine:

  ALTER TABLE session ENGINE=ndbcluster;

as MySQL Cluster meets all of our requirements for a session storage table:

  • transactional
  • row level locking
  • BLOB support

Solution advantages

  • Fully distributed system
  • Easy deployment due to simple architecture, usually every web server will have a local MySQL Cluster Frontend Server installed
  • Session persistence using standard libraries
  • Replication moved from application or web server to synchronous automated replication in data nodes
  • No bleeding in balance strategy necessary, standard round robin can be used
  • Automated load balancing in MySQL Cluster data nodes due to perfect automated data partitioning and distribution
  • Extreme fast fail-over ("transparent fail-over")
  • Low answering times allows web application without cache
  • Cost effective LAMP stack (Total Cost of Ownership)
  • No loss of session data if any instance fails
  • Web server, MySQL Server, Data Nodes

Literature