Architecture category

Boost your nginx web server performance by rewriting favicon.ico requests correctly

Popular sites receive high concurrent web visitor requests and some clients send bad favicon.ico fetch requests. If you are running nginx you probably didn’t notice that, for example: /category/favicon.ico generate significant 404 errors in your server error log file, because this file is located only in root, e.g. /favicon.ico.

Every request for non-existent file takes some load and every 404 missing file error is usually logged into error_log file, unless you have disabled error logging. To forward all non-root favicon.ico requests to /favicon.ico you can set-up the following rewrite rule in nginx.conf file in the correct server{} location:

rewrite ^/(.*)/favicon.ico$ /favicon.ico last;

Voila. Restart the nginx daemon and you will see slightly decreased load on your box.

Comments

Installing high-performance Nginx for first time

I have been somewhat lazy nowadays (hey it’s a summer!), but I thought I need to post a quick guide about installing high performance Nginx daemon. It’s a super fast web serving daemon that is the best choice for static content serving at very high speeds. It requires small hardware resources and places very small load on the hardware.

Download the latest stable version 0.7.61 or fetch a newer version from nginx.net.

Installing nginx:
(depending on the required modules, I prefer to disable all modules that are not needed, for example if you do not wish to host SSL pages, don’t enable SSL and so on).

tar zxvf nginx-0.7.61.tar.gz
cd nginx-0.7.61
./configure

(you can also specify –prefix value if you wish to install nginx in a different folder)

Now you will need to configure nginx.conf file before firing up Nginx daemon.

For better security run Nginx under a non-privileged user:

user nobody;

For medium to high load sites I suggest increasing worker_processes to 5 to 15, or even more if your hardware allows it.

worker_processes 10;

To increase max allowed clients that can access your site increase worker_connections. A good value for start is 256 and higher if you have a more powerful hardware.

events {
worker_connections 512;
}

Please note! If you run a Nginx in reverse proxy configuration you will need to divide this value by 4 because two connections will be opened to client and two to the back-end. If you have worker_processes set to 10 and worker_connections set to 256, then max clients that can access your site will be worker_processes * worker_connections / 2, thus 10 * 512 / 2 becomes 2560 connections.

Now in http {} section you will need to define some settings like mime.types, default_type, log_format, access_log, sendfile, tcp_nopush, keepalive_timeout. That’s pretty self explaining. Just a quick note: Nginx requires very low hardware requirements for keeping keepalive connections active and it’s well know to take less than 30Mb of RAM for 10000 active keepalive connections. Make sure you have correctly calculated max client settings using the formulas above.

Now comes the server {} section were you define Listen to address, ports and custom settings, like redirects etc.

Example:

server {
listen      10.10.10.10:80;
server_name  localhost;

access_log  logs/host.access.log  main;

location / {
root   html;
index  index.html index.htm;
}

error_page  404              /404.html;

error_page   500 502 503 504  /50x.html;
location = /50x.html {
root   html;
}

10.10.10.10 IP is just an example, you will need to specify your allocated server IP.

I hope this quick guide helped you out a little. We will soon publish more in-depth guide about configuring Nginx.

Comments (2)

Server virtualization vs dedicated server

I would still love to see virtualized server in a high bandwidth environment hosting a large site with thousands of concurrent connections. I guess there is not much use for very large sites to use virtualization unless it provides advanced features like:

  • live migration
  • fail-over
  • snapshots
  • load balancing

However, virtualization software with the above features usually cost thousands of dollars and it’s out of reach for many folks that need to make some bucks from own sites and still survive occasional Digg or Yahoo frontpage landings. Dedicated servers offer 100% performance and doesn’t have any overhead… unless of course you run many servers and want to consolidate hardware.

If you have a small site or site that doesn’t generate normal income don’t worry about virtualization and high-availability ;) You can still do high-availability without virtualization. For much less, err…. almost for free – you just need to pay for the hardware and someone to set-up and run the server farm for you.

Comments

Generate static content for performance

Running a very popular and high-bandwidth website or blog? The load is killing the server? The website content is generated in real-time using PHP, .NET or CGI scripts?

You can easily increase your website performance by generating static files and serving such files much faster rather than generating files in real-time. Static files can be kept on the hard drive or in memory for blazing fast access, for example, using memcached.

Generating PHP static content
You can generate PHP output in multiple ways – from PHP code directly writing output to files or by fetching specific PHP output via HTTP protocol, or using PERL code. If you are using specific web server based values and features – you will have to fetch file via HTTP and write output to local file. This is very easy to do and doesn’t require much programming skill.

Fetching file via HTTP using WGET

wget http://www.your-domain-is-here.com/page1.php -O /var/www/page1.html

Where /var/www/page1.html is the output file.

Fetching file using CURL

curl -o /var/www/page1.html http://www.your-domain-is-here.com/page1.php

fetch -o /var/www/page1.html http://www.your-domain-is-here.com/page1.php

lynx -source http://www.your-domain-is-here.com/page1.php > /var/www/page1.html

Generating php output using a PHP script:

<?php
$files = file(“filelist.txt”);
for($i = 0; $i < count($files); $i++) {
ob_start();
include($files[$i]);
$page = ob_get_contents();
ob_end_clean();
if(strpos($files[$i], “.php”) > -1) {
$file = fopen(substr($files[$i], -4) . “.html”);
fputs($file, $page);
fclose($file);
}
}
?>

where filelist.txt is the list of the files with every file listed in new line you want it to be generated.

Perl code for fetching a remote website page. Please save the following code into the filename fetch.pl:

#!/usr/bin/perl

use LWP::Simple;
getprint “
http://www.your-domain-is-here.com/page1.php“;
$content = get( $url );
print $content;

And then run it with the following Linux command line:

perl fetch.pl > outputfile.html

In the next article we will review a solution where website content is saved to memcached and content served from fast memory-based cache without generating content on the fly.

Comments (1)

Forget LAMP use LNMP

Welcome folks. We have quite a few posts and praisings for LAMP architecture, but from now on there is a clear and better alternative – LNMP.

LNMP stands for Linux, Nginx, MySQL and PHP.

This is clearly the winner because of it’s performance and reliability you can count on.  Bundle the front-end Nginx web server with Memcached and you can serve millions of daily hits with easy on a simple commodity software. Stay tunned for more news!

Ps. for folks that still use LAMP – don’t panic, we suggest using LAMP as well, but for best performance just swap Apache with Nginx and voila !

Comments (2)

new PHP performance player php-fpm

One of our readers pointed at using php-fpm for PHP script processing. I spoke with my colleagues and we finally agreed that we will do php-fpm testing and in case the performance is really significant – probably start using it in production environments.

We will set-up two test environments:

First one will be Apache + php-fpm

and secondary test environment:

nginx + php-fpm

I read the features of php-fpm, it’s architecture and I believe the performance will be very much improved comparing to mod_php and the like. As long as your server supports FastCGI, php-fpm should be working without any problems. We will post results very shortly. Keep it cool! :)

Ps. almost forgot posting php-fpm URL..

Comments (7)

Wordpress benchmarks running on Nginx, Php, Apache and Fastcgi

This weekend we had a chance to test out high-bandwidth Wordpress blog set-up configurations and performance. We had a customer that landed multiple Digg front page stories and we had to tune the server to deal with the high peak time traffic.

For best performance we usually deploy Nginx and tie it together with Apache as a back-end for PHP processing, memcached, super cache and do a lot of rewriting rules and other optimizations.

The server is powered by one CPU Quad Core 5430 series processor, with 2GB RAM and SATA drives, runs on a SuperMicro server board. It’s a powerful box that can handle traffic well if correctly tuned and optimized.

We were running the latest Wordpress blog software with some custom rewrites done on the front end Nginx daemon (front end proxy). All static content was served by Nginx and all PHP queries were forwarded to Apache 2 version compiled from source as well as latest PHP 5.2.6 loaded as a module.

Nginx front end, Apache + PHP, Super Cache, some custom rewrite rules:
Requests done: 1000, concurrency 30 threads

Server Software:        nginx/0.7.11
Server Hostname:        www.neatorama.com
Server Port:            80

Document Path:          /
Document Length:        300421 bytes

Concurrency Level:      30
Time taken for tests:   0.691438 seconds
Complete requests:      1000
Failed requests:        0
Write errors:           0
Total transferred:      300687000 bytes
HTML transferred:       300421000 bytes
Requests per second:    1446.26 [#/sec] (mean)
Time per request:       20.743 [ms] (mean)
Time per request:       0.691 [ms] (mean, across all concurrent requests)
Transfer rate:          424678.70 [Kbytes/sec] received

Connection Times (ms)
min  mean[+/-sd] median   max
Connect:        0    0   0.3      0       3
Processing:     4   20   6.1     19      41
Waiting:        0   16   6.6     14      28
Total:          4   20   6.2     19      41

Percentage of the requests served within a certain time (ms)
50%     19
66%     24
75%     26
80%     27
90%     27
95%     29
98%     30
99%     33
100%     41 (longest request)

Nginx + FastCGI, running spawn-fcgi from Lightspeed distribution with 30 child threads, all static content served by Nginx and PHP served by PHP 5.2.6 php-cgi version (custom compiled from the source of course).

Server Software: nginx/0.7.11
Server Hostname: www.*****.com
Server Port: 80

Document Path: /
Document Length: 170281 bytes

Concurrency Level: 30
Time taken for tests: 103.429538 seconds
Complete requests: 1000
Failed requests: 0
Write errors: 0
Non-2xx responses: 900
Total transferred: 170518100 bytes
HTML transferred: 170281000 bytes
Requests per second: 9.67 [#/sec] (mean)
Time per request: 3102.886 [ms] (mean)
Time per request: 103.430 [ms] (mean, across all concurrent requests)
Transfer rate: 1609.99 [Kbytes/sec] received

Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.1 0 1
Processing: 1748 3076 492.0 3035 4844
Waiting: 768 1673 370.9 1659 2948
Total: 1748 3076 492.0 3035 4844

Percentage of the requests served within a certain time (ms)
50% 3035
66% 3248
75% 3405
80% 3496
90% 3732
95% 3943
98% 4192
99% 4398
100% 4844 (longest request)

As you can see Nginx + Apache, PHP, Super Cache was the clear winner – no questions asked.

Conclusion: we will stick with Nginx as a front-end proxy serving all static content while PHP processing forwarded to Apache web server running on the same server (using worker), running Super Cache and doing as much rewrites and file checking using Nginx for best performance. Apache is a big resource hog, however, our tests show it still outperforms running PHP scripts via FastCGI.

I would like to note that running PHP as a FastCGI required slightly lower memory usage, however the CPU load shoot up and I was not sure if we could handle Digg traffic that easily. Enjoy!

Comments (6)

Why LAMP (Linux Apache MySQL PHP) is the best

With more and more complicated sites coming to the internet, you can create functional and attractive websites. When you receive proposals from your developer, you can choose LAMP which is the best technology available today. Today internet has large number of opportunities for creation and management of a website. Now you can build a website absolutely free and open source with Linux Apache MySQL PHP, popularly known and abbreviated as LAMP, a powerful and common bundle of technology components.

The combined components of LAMP viz., Linux is a very popular operating system; Apache is a web server, MySQL is a database and PHP is a scripting language. Choosing LAMP is the best way for you to gain complete control and power over your web site. You can get started with the free scripts, examples etc., which are available on the web site, by taking help of tools and free editors. A web hosting company can make a good profit margin with less cost involved in it since LAMP is absolutely open sourced. Now you have got a big choice before you to choose from a large number of web hosting providers.

As there are good improvements in the installers for LAMP components, you can install them without difficulty. When you are able to debug, you can use PHP debuggers to write perfect code. LAMP is very strong in its performance, security and reliability and is considered to be the best architecture to serve the web pages. LAMP will function as an excellent platform for your web development needs. You will benefit from the major advantages of LAMP which are open source, ‘geeky’ community supported and also of low cost compared to other technologies. PHP has a simple web integration and Apache is known for its security features. With LAMP you can run a dynamic server as well as website. With this combined LAMP technology you can establish a good software distribution package. You can acquire the components at a very low acquisition cost. LAMP allows you to use the web browser to execute program and receive static and dynamic content.

By using the scripting languages, you have got the benefit of efficient and easy manipulation of various text streams. You can use the bundle software LAMP as a best alternative solution to a commercial package. The popularity of LAMP is on the increase as it is available as free software.

Comments (3)

Avoid control panels and use LAMP

If you are planning architecture for a very large online service or website, it’s recommended you avoid control panel software from the very beginning. If you intend to run a web-based service, why do you need running email, DNS and many other services on the same server? How about scalability and performance tuning when load and visitors increase?

The fewer services you run on the server, the harder it is to exploit, and the more system resources will be available for website hosting and script processing. Most web hosting control panels run a lot of software – altering the default configuration of Apache and other software may stop the control panel from running.

For top performance and scalability, you should use Linux, Apache, MySQL and PHP (LAMP). It’s all open-source and reliable, and it runs high-load websites such as LiveJournal, Wordpress.com and many others. As with all open-source technologies, it’s much easier to scale in the future and offers great flexibility – caching, full control over headers (setting expires), application optimization and much more.

When the load grows, you can easily split a MySQL server load to a fully dedicated MySQL server. Afterward, you can install reverse proxy solution or run a proxy server in front of your website to lower the load and scale horizontally by adding multiple servers that process PHP scripts and master-slave MySQL configuration with replication. For MySQL query caching, you can use memcached on multiple servers – this will offload MySQL database servers greatly.

We will go into more detailed tuning of LAMP servers in the near future.

Comments (1)

Why MogileFS is the best choice

I bet you have already heard about MogileFS if you are reading this article. MogileFS is an open-source and distributed file system that offers many good properties and features that are hard to find in some of the expensive and proprietary file systems currently available.

MogileFS is a perfect choice for your next storage system if you are planning to build a high-scale service with large storage requirements that is capable of being distributed to multiple servers and low-cost hard drives. It features excellent fail-over capabilities that can be set up using Linux open-source HA project – and there are quite a few projects and solutions are available. MogileFS high-availability storage can be run on simple PC hardware in non-RAID configuration. No hardware RAID is required, because MogileFS provides full fail-over – it replicates data between multiple devices. If one server dies the MogileFS continues working without problems. This saves thousands of dollars and provides HORIZONTAL SCALABILITY from small to large projects that require large storage space and high availability at the lowest possible cost.

You can set up MogileFS replication based on predefined classes and replicate files that are important. Files that are generated from the sources – for example, resized thumbnails – can be easily regenerated from your applications if the disk or server hardware fails.

MogileFS is not POSIX compliant and thus must be implemented in your applications from the very beginning. Multiple APIs are available for PHP, Perl and Python languages, and implementation is quite trivial.

You only need a minimum of two servers to run MogileFS – trackers can be run on the same server where storage nodes are running; however, 4 boxes are preferred and trackers need to be set up in high availability.

MogileFS can be set up in any number of storage servers horizontally that provide high-availability and load balancing, and it is a much better alternative to the widely used NFS, which has many problems.

Comments