DBBalancer: Load balancing database connection pool for Postgres.

Latest news

02/11/2001: Version 0.4.0ALPHA allows UNIX socket connections.
25/09/2001: Preliminary Debian packages for DBBalancer now available
10/09/2001: Version 0.3.0ALPHA allows for a non-XML configuration file and removes the requirement for Xerces. Enhancements have been made to transaction scoping under write replication mode.
25/03/2001: Version 0.2.0ALPHA adds variable number of connections and threads. It also supports TRUST and PASSWORD authentication methods.
27/01/2001: Version 0.1.6ALPHA is the first version released.

What's this?

DBBalancer is some sort of middleware that would sit in between of database clients, like C, C++, TCL, Java JDBC, Perl DBI, and a database server. Currently the only server supported is Postgres, but the architecture is open to embrace more servers in a future. One of his strongest (IMHO ;-)) points is that it can be tried or used without changing a line of the existing code, because the balancing is done at Postgres protocol level.

DBBalancer can do different things.

It's a connection pool...
... a load balancer,
.. and a database replicator.

And can be used any combination of these things at the same time.

What's this good for?

As I told before, there are three main functionalities:

Connection Pool

As every connection pool, DBBalancer pre-allocates several connections, saving the computationaly expensive work of allocating one connection for each execution of some code. This is specially useful in web applications, where usually there's little state persistence between a request and another. Depending on the patterns of use, there's a variable performance gain, being the biggest one for the typical connect-select-disconnect used in most dynamic web pages.

This pool has a variable number of connections and execution threads. The connection number should always be greater or equal than the thread number. The opposite would possible lead to client timeouts. These numbers vary on execution time, depending on the request queue size. Usually with a new thread is created a new db connection, and the reverse happens when a thread is destroyed.

This connection pool also recovers automatically the connections in the case of a database crash, excluding the connections from the pool while they're being recovered.

Load Balancer

When you have a connection pool, you immediatly (at least, that was my case ;-) think of a load balancing one. Why have to be all the connections against the same server? If we scatter all them between different servers, the load is balanced by itself. But then another problem surfaces. What about the consistency of data between the different servers from which we're balancing? The reads are obviously no problem, just the writes. I only have been able to think of two solutions: a) Having a database with replication support. b) Implementing some thing that generate multiple parallel writes from one. If solution a) is possible, we could use it. But so far, Postgres lacks replication support, so we'll have to use the third functionatily to make enable balancing.

Write Replicator

With write replication, the client sees, as always, one server, but he will be really talking to all the clustered servers at the same time. This will have a little performance penalty (with should be substracted from the pooling gain, anyway) in writes, but would enable a big win in reads, that's the operation that's done mostly, specially in web applications.

How does it work?

What we have is a multithreaded daemon, called DBBalancerDaemon that can run in two modes.

The first one, called Reader is one in which connections are dispatched between all the connections available. If the configuration is set to use only one host, we could do reads and writes without problem, but if there are several hosts, we could only do reads, hence the name Reader.

The second one, called Writer allows, as I explained previously, to replicate, in parallel, every client session between a serie of hosts, having several connections in each anyway. This, in absence of errors, would keep all the databases in sync. Errors are very dangerous here because, if happening in only one host could drive the full system out of sync.

So now we could see how the three functionalities explained before could be implemented using this daemon and his two modes:

Connection Pool

This would be achieved by one Reader, whose configuration file only contained connections to one host.

Load Balancer

We'll use one Reader, but now his configuration file would have several hosts. If the database doesn't have replication we won't be able to make writes.

Load Balancer & Write Replicator

This two functionalities, including the Connection Pool one, would be achived thru two instances of the daemon. One Reader and one Writer each one in a different port. The reads should be done connecting to the Reader and the writes connecting to the Writer. This is the only case in which we should have to modify a little an existing application to make it use DBBalancer.

Which are the limitations?

Right now, mostly all the functionality of a direct connection to Postgres should work fine, even considering that I have not directly tested most of it. There are some things, though, that I assume that won't work. The ones I've found so far are:

Asynchronous querys. Here there are two different problems, depending if we are using the Reader or Writer mode. I'll add more explanations to this document in the future.

Things that aren't implemented yet, but maybe in a future will .

The only auth methods supported by the daemon are trust and cleartext password. So the clients have to use any of them, and the postmasters of the databases in every host also have also to be configured to accept any of them.
Some way of checking the daemon status online (HTTP, CORBA, SNMP or something).
Some way of changing configuration without stopping the daemon, maybe by the same method that's used to check the status.
MySQL support.

How can you get it, build it, get the docs, etc?

You can start by going to the DBBalancer SourceForge site where you can find the sources, binaries, and debian packages (courtesy of Andrew McMillan).

The docs DocBook sources are included with the packages, but you can also get or read them here in HTML or RTF format.

So far the system has only been tested on a few systems: my own system, a quite updated RH6.0 Linux, with glibc 2.1.2. As soon as I hear reports of working or not working in more systems I'll notice it here. Success has also been reported on Debian systems running 'sid' and 'potato' releases.

Remember that either if you want to build it or even if you want to use the dynamic binary, you should get the support libraries used here. They are:

ACE: This is a general framework for C++ that's very useful for many things.

How was this made?

As you may have noticed by the ACE library requirement, this program is based in that library. It uses some objects and utilities that came with it. Some of them are:

ACE_Task
ACE_Thread_Mutex
ACE_Get_Opt
ACE_Method_Object
ACE_DEBUG
etc

If you want to see an UML diagram of all this, you can get it here.

Where is this going?

My basic motivation to make this work was to strenghten my knowledge of C++ and the ACE library (which I liked a lot). Then, after this, I felt that there wasn't a good possibility of scaling with the main Open Source databases, Postgres and MySQL. As clustering and load balancing is a very common technique used with web servers, I thought that could be fine trying to take it to databases. Then came the replication problem which lead me to the write replication "solution". I know that both Postgres and MySQL have been working on replication and right now (January 2001) there are stable versions with replication capabilities. If you are using one of these latest versions, you just could forget about the second mode of DBBalancerDaemon, the Writer mode, and use the Reader.

I don't know if this is gonna be of real utility to anyone, but here it is. By now I'd be happy if people gave me their impressions about the program and, specially, good suggestions and ideas on how to improve it. And of course I also accept patches.

Page accessed Old-fashioned-ego-growing-counter

times.

Project hosted by