Load balancing in EC2 with Nginx and HAProxy

We wanted to setup a loadbalanced web cluster in AWS for expansion. My first inclination was to use ELB for this, but I soon learned that ELB doesn’t let you allocate a static IP, requiring you to refer to it only by DNS name. This would be OK except for the fact that our current DNS provider, Dyn, requires IP addresses when using their GSLB (geo-based load balancer) service.

Rather than let this derail the whole project, I decided to look into the software options available for loadbalancing in EC2. I’ve been a fan of hardware load balancers for a while, sort of looking down at software-based solutions without any real rationale, but in this case I really had no choice so I figured I’d give it a try.

My first stop was Nginx. I’ve used it before in a reverse-proxy scenario and like it. The problem I had with it was that it doesn’t support active polling of nodes – the ability to send requests to the webserver and mark the node as up or down based on the response. As far as I can tell, using multiple upstream servers in Nginx allows you to specify max_fails and fail_timeout, however a “fail” is determined when a real request comes in. I don’t want to risk losing a real request – I like active polling.

This led me to HAProxy. I’d never used HAProxy before but it seemed to be ideally suited to this (since it’s exclusively a load balancer). The option httpchk even allows for active polling of nodes – yay!

Unfortunately, HAProxy doesn’t support SSL. From the HAProxy site:

People often ask for SSL and Keep-Alive support. Both features will complicate the code and render it fragile for several releases. By the way, both features have a negative impact on performance :

Having SSL in the load balancer itself means that it becomes the bottleneck. When the load balancer’s CPU is saturated, the overall response times will increase and the only solution will be to multiply the load balancer with another load balancer in front of them. the only scalable solution is to have an SSL/Cache layer between the clients and the load balancer. Anyway for small sites it still makes sense to embed SSL, and it’s currently being studied. There has been some work on the CyaSSL library to ease integration with HAProxy, as it appears to be the only one out there to let you manage your memory yourself.

Poop! I figured out a workaround however, by using both Nginx and HAProxy on the same instance. HAProxy listens on port 80 and 8443 (so that it can relay decrypted SSL traffic to the nodes on a separate port, so that the nodes are aware that it was originally SSL traffic). Nginx is configured as a reverse proxy, listens on port 443 only, and has the SSL cert & key. The upstream for the Nginx is just localhost:8443 – HAProxy.

This was pretty easy to setup and works very well. I benchmarked HAProxy on an EC2 t1.micro instance (in front of two m1.large instances running our webapp) using ab -n 5000 -c 50 -t 60 and found it actually performed better than one of our hardware load balancers. That was pretty eye-opening (and sad).

The HAProxy and Nginx configs are below, in the hopes that it helps someone. The main warning I’d give is that using this will cause the logs on your nodes to interpret all requests as coming from the IP of the load balancer. I had to rewrite some code to have the app use the X-Forwarded-For address rather than the REMOTE_ADDR, but other than that this has been working out pretty well.

/etc/nginx/nginx.conf
Main thing is to make sure the server isn’t listening on port 80 (since HAProxy needs to).

user              nginx;
worker_processes  1;

error_log  /var/log/nginx/error.log;

pid        /var/run/nginx.pid;

events {
    worker_connections  1024;
}

http {
    include       /etc/nginx/mime.types;
    default_type  application/octet-stream;

    log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
                      '$status $body_bytes_sent "$http_referer" '
                      '"$http_user_agent" "$http_x_forwarded_for"';

    access_log  /var/log/nginx/access.log  main;

    sendfile        on;
    keepalive_timeout  65;

    #
    # The default server
    #
    server {
        listen       81;
        server_name  _;

        location / {
            root   /usr/share/nginx/html;
            index  index.html index.htm;
        }

        error_page  404              /404.html;
        location = /404.html {
            root   /usr/share/nginx/html;
        }

        # redirect server error pages to the static page /50x.html
        #
        error_page   500 502 503 504  /50x.html;
        location = /50x.html {
            root   /usr/share/nginx/html;
        }

    }

    # Load config files from the /etc/nginx/conf.d directory
    include /etc/nginx/conf.d/*.conf;

}

/etc/nginx/conf.d/ssl-offloader.conf

upstream haproxy {
        server localhost:8443 ;
}

server {
        listen       443;
        server_name f.q.d.n 1.2.3.4 ; # I put the FQDN and IP here, but maybe "_" will work too
#  server_name  _;

        ssl                  on;
        ssl_certificate      /etc/nginx/ssl-cert/cert.pem;
        ssl_certificate_key  /etc/nginx/ssl-cert/cert.key;

        ssl_session_timeout  5m;

        ssl_protocols  SSLv3 TLSv1;
        ssl_ciphers     ECDHE-RSA-AES256-SHA384:AES256-SHA256:RC4:HIGH:!MD5:!aNULL:!EDH:!AESGCM;
        ssl_prefer_server_ciphers   on;

        location / {
                proxy_set_header X-Real-IP $remote_addr;
                proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
                proxy_set_header Host $http_host;
                proxy_set_header X-NginX-Proxy true;

                proxy_pass http://haproxy/;
                proxy_redirect default;
                proxy_redirect http://$host/ https://$host/;
                proxy_redirect http://hostname/ https://$host/;

                proxy_read_timeout 15s;
                proxy_connect_timeout 15s;
        }

}

/etc/haproxy/haproxy.cfg

#---------------------------------------------------------------------
# Global settings
#---------------------------------------------------------------------
global
    log         127.0.0.1 local2

    chroot      /var/lib/haproxy
    pidfile     /var/run/haproxy.pid
    maxconn     4000
    user        haproxy
    group       haproxy
    daemon

    # turn on stats unix socket
    stats socket /var/lib/haproxy/stats

#---------------------------------------------------------------------
# common defaults that all the 'listen' and 'backend' sections will
# use if not designated in their block
#---------------------------------------------------------------------
defaults
    mode                    http
    log                     global
    option                  httplog
    option                  dontlognull
    option http-server-close
    option forwardfor       except 127.0.0.0/8
    option                  redispatch
    retries                 3
    timeout http-request    3s
    timeout queue           1m
    timeout connect         2s
    timeout client          5s
    timeout server          5s
    timeout http-keep-alive 1s
    timeout check           10s
    maxconn                 3000

       stats enable
       stats auth evan:change_me_brother

#---------------------------------------------------------------------
# main frontend which proxys to the backends
#---------------------------------------------------------------------
frontend  main_http *:80
        option forwardfor except 127.0.0.1  
        option httpclose
        default_backend         web_http

frontend main_https *:8443
        option forwardfor except 127.0.0.1  
        option httpclose
        default_backend         web_https

#---------------------------------------------------------------------
# round robin balancing between the various backends
#---------------------------------------------------------------------
backend web_http
    balance     roundrobin
#       option httpchk GET / HTTP/1.1rnHost: host.com
        option httpchk
    server  node1 192.168.1.20:80 check port 80
    server  node2 192.168.1.30:80 check port 80
    server  node3 192.168.1.40:80 check port 80


backend web_https
    balance     roundrobin
#       option httpchk GET / HTTP/1.1rnHost: host.com
        option httpchk
    server  node1 192.168.1.20:8443 check port 8443
    server  node2 192.168.1.30:8443 check port 8443
    server  node3 192.168.1.40:8443 check port 8443
Advertisements

4 thoughts on “Load balancing in EC2 with Nginx and HAProxy”

  1. Hi,

    If you’re using Apache on the backend, you might find mod_rpaf useful. It will allow you to obtain “real” client IP address in your logs/app., instead of the reverse proxy.

    Nice article.

  2. Thinking about it too, Nginx just for SSL, haproxy for the real work. But there is this dreaded “upload buffering problem” in nginx for which I don’t want to use a wanky third-level module or/and waste developing time. If haproxy and nginx are on the same machine – maybe there is a chance to replace proxy_pass, hm, redirects, rewrites … no, silly idea.

  3. Wondering about this too. I need to handle SSL for multiple sites among multiple servers. Currently using Nginx to handle static files as well as load balancing and reverse proxying. Need to add more servers to the mix to handle other protocols. Right now it’s kind-of a hodge-podge. We’re looking to put everything behind a load balancer. Since Nginx can only LB for HTTP, it’s HAProxy.

    I was trying to figure out if and which tier is going to handle what. This article solidified it for me. HAProxy goes to Nginx for SSL connections and static resources; LBs for the web farm for all dynamic content; and finally also LBs our other, non-HTTP TCP traffic. Nginx terminates and proxies SSL traffic to the web farm. It doesn’t sound like a big hassle. Since we already use Nginx, we can just proxy SSL to that, and configure HAProxy to handle everything else.

    What are your stats? How much did they increase when you went with this setup?

Comments are closed.