We wanted to setup a loadbalanced web cluster in AWS for expansion. My first inclination was to use ELB for this, but I soon learned that ELB doesn’t let you allocate a static IP, requiring you to refer to it only by DNS name. This would be OK except for the fact that our current DNS provider, Dyn, requires IP addresses when using their GSLB (geo-based load balancer) service.
Rather than let this derail the whole project, I decided to look into the software options available for loadbalancing in EC2. I’ve been a fan of hardware load balancers for a while, sort of looking down at software-based solutions without any real rationale, but in this case I really had no choice so I figured I’d give it a try.
My first stop was Nginx. I’ve used it before in a reverse-proxy scenario and like it. The problem I had with it was that it doesn’t support active polling of nodes – the ability to send requests to the webserver and mark the node as up or down based on the response. As far as I can tell, using multiple upstream servers in Nginx allows you to specify max_fails and fail_timeout, however a “fail” is determined when a real request comes in. I don’t want to risk losing a real request – I like active polling.
This led me to HAProxy. I’d never used HAProxy before but it seemed to be ideally suited to this (since it’s exclusively a load balancer). The option httpchk even allows for active polling of nodes – yay!
Unfortunately, HAProxy doesn’t support SSL. From the HAProxy site:
People often ask for SSL and Keep-Alive support. Both features will complicate the code and render it fragile for several releases. By the way, both features have a negative impact on performance :
Having SSL in the load balancer itself means that it becomes the bottleneck. When the load balancer’s CPU is saturated, the overall response times will increase and the only solution will be to multiply the load balancer with another load balancer in front of them. the only scalable solution is to have an SSL/Cache layer between the clients and the load balancer. Anyway for small sites it still makes sense to embed SSL, and it’s currently being studied. There has been some work on the CyaSSL library to ease integration with HAProxy, as it appears to be the only one out there to let you manage your memory yourself.
Poop! I figured out a workaround however, by using both Nginx and HAProxy on the same instance. HAProxy listens on port 80 and 8443 (so that it can relay decrypted SSL traffic to the nodes on a separate port, so that the nodes are aware that it was originally SSL traffic). Nginx is configured as a reverse proxy, listens on port 443 only, and has the SSL cert & key. The upstream for the Nginx is just localhost:8443 – HAProxy.
This was pretty easy to setup and works very well. I benchmarked HAProxy on an EC2 t1.micro instance (in front of two m1.large instances running our webapp) using ab -n 5000 -c 50 -t 60 and found it actually performed better than one of our hardware load balancers. That was pretty eye-opening (and sad).
The HAProxy and Nginx configs are below, in the hopes that it helps someone. The main warning I’d give is that using this will cause the logs on your nodes to interpret all requests as coming from the IP of the load balancer. I had to rewrite some code to have the app use the X-Forwarded-For address rather than the REMOTE_ADDR, but other than that this has been working out pretty well.
/etc/nginx/nginx.conf
Main thing is to make sure the server isn’t listening on port 80 (since HAProxy needs to).
user nginx; worker_processes 1; error_log /var/log/nginx/error.log; pid /var/run/nginx.pid; events { worker_connections 1024; } http { include /etc/nginx/mime.types; default_type application/octet-stream; log_format main '$remote_addr - $remote_user [$time_local] "$request" ' '$status $body_bytes_sent "$http_referer" ' '"$http_user_agent" "$http_x_forwarded_for"'; access_log /var/log/nginx/access.log main; sendfile on; keepalive_timeout 65; # # The default server # server { listen 81; server_name _; location / { root /usr/share/nginx/html; index index.html index.htm; } error_page 404 /404.html; location = /404.html { root /usr/share/nginx/html; } # redirect server error pages to the static page /50x.html # error_page 500 502 503 504 /50x.html; location = /50x.html { root /usr/share/nginx/html; } } # Load config files from the /etc/nginx/conf.d directory include /etc/nginx/conf.d/*.conf; }
/etc/nginx/conf.d/ssl-offloader.conf
upstream haproxy { server localhost:8443 ; } server { listen 443; server_name f.q.d.n 1.2.3.4 ; # I put the FQDN and IP here, but maybe "_" will work too # server_name _; ssl on; ssl_certificate /etc/nginx/ssl-cert/cert.pem; ssl_certificate_key /etc/nginx/ssl-cert/cert.key; ssl_session_timeout 5m; ssl_protocols SSLv3 TLSv1; ssl_ciphers ECDHE-RSA-AES256-SHA384:AES256-SHA256:RC4:HIGH:!MD5:!aNULL:!EDH:!AESGCM; ssl_prefer_server_ciphers on; location / { proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header Host $http_host; proxy_set_header X-NginX-Proxy true; proxy_pass http://haproxy/; proxy_redirect default; proxy_redirect http://$host/ https://$host/; proxy_redirect http://hostname/ https://$host/; proxy_read_timeout 15s; proxy_connect_timeout 15s; } }
/etc/haproxy/haproxy.cfg
#--------------------------------------------------------------------- # Global settings #--------------------------------------------------------------------- global log 127.0.0.1 local2 chroot /var/lib/haproxy pidfile /var/run/haproxy.pid maxconn 4000 user haproxy group haproxy daemon # turn on stats unix socket stats socket /var/lib/haproxy/stats #--------------------------------------------------------------------- # common defaults that all the 'listen' and 'backend' sections will # use if not designated in their block #--------------------------------------------------------------------- defaults mode http log global option httplog option dontlognull option http-server-close option forwardfor except 127.0.0.0/8 option redispatch retries 3 timeout http-request 3s timeout queue 1m timeout connect 2s timeout client 5s timeout server 5s timeout http-keep-alive 1s timeout check 10s maxconn 3000 stats enable stats auth evan:change_me_brother #--------------------------------------------------------------------- # main frontend which proxys to the backends #--------------------------------------------------------------------- frontend main_http *:80 option forwardfor except 127.0.0.1 option httpclose default_backend web_http frontend main_https *:8443 option forwardfor except 127.0.0.1 option httpclose default_backend web_https #--------------------------------------------------------------------- # round robin balancing between the various backends #--------------------------------------------------------------------- backend web_http balance roundrobin # option httpchk GET / HTTP/1.1rnHost: host.com option httpchk server node1 192.168.1.20:80 check port 80 server node2 192.168.1.30:80 check port 80 server node3 192.168.1.40:80 check port 80 backend web_https balance roundrobin # option httpchk GET / HTTP/1.1rnHost: host.com option httpchk server node1 192.168.1.20:8443 check port 8443 server node2 192.168.1.30:8443 check port 8443 server node3 192.168.1.40:8443 check port 8443
Hi,
If you’re using Apache on the backend, you might find mod_rpaf useful. It will allow you to obtain “real” client IP address in your logs/app., instead of the reverse proxy.
Nice article.
Thinking about it too, Nginx just for SSL, haproxy for the real work. But there is this dreaded “upload buffering problem” in nginx for which I don’t want to use a wanky third-level module or/and waste developing time. If haproxy and nginx are on the same machine – maybe there is a chance to replace proxy_pass, hm, redirects, rewrites … no, silly idea.
Wondering about this too. I need to handle SSL for multiple sites among multiple servers. Currently using Nginx to handle static files as well as load balancing and reverse proxying. Need to add more servers to the mix to handle other protocols. Right now it’s kind-of a hodge-podge. We’re looking to put everything behind a load balancer. Since Nginx can only LB for HTTP, it’s HAProxy.
I was trying to figure out if and which tier is going to handle what. This article solidified it for me. HAProxy goes to Nginx for SSL connections and static resources; LBs for the web farm for all dynamic content; and finally also LBs our other, non-HTTP TCP traffic. Nginx terminates and proxies SSL traffic to the web farm. It doesn’t sound like a big hassle. Since we already use Nginx, we can just proxy SSL to that, and configure HAProxy to handle everything else.
What are your stats? How much did they increase when you went with this setup?
Since version 1.5.1.dev12 haproxy fully supports SSL-offloading. It really rocks, I’m loving it!