Archive for January, 2008

Optimize MySQL Select Query size

The less data your website pulls from the back-end MySQL server, the faster it is. Imagine you need to display just 3 rows from a table with 2,500 records, but your code is looping through all 2,500 records generating extra, unnecessary load.

For example:

SELECT name,lastname from UserTable;

Optimized SQL query would be:

SELECT name,lastname from UserTable Limit 3;

Or, for example, you have a blob record of an article with 550 words and you want to display only the first 50 characters for the excerpt. Why do you need to pull out all 550 words if you can write a query that is friendlier and works much faster?

SELECT title, SUBSTRING(article,1,50) from ArticleTable Where Id=”5″;

The key to the optimization is writing smart code and always optimizing for performance.

Optimize MySQL queries:

To pull only data you need,
Limit loops,
Limit answer length when needed,
Use correct indexes,
Use MySQL caching settings.

Comments (1)

Why MogileFS is the best choice

I bet you have already heard about MogileFS if you are reading this article. MogileFS is an open-source and distributed file system that offers many good properties and features that are hard to find in some of the expensive and proprietary file systems currently available.

MogileFS is a perfect choice for your next storage system if you are planning to build a high-scale service with large storage requirements that is capable of being distributed to multiple servers and low-cost hard drives. It features excellent fail-over capabilities that can be set up using Linux open-source HA project - and there are quite a few projects and solutions are available. MogileFS high-availability storage can be run on simple PC hardware in non-RAID configuration. No hardware RAID is required, because MogileFS provides full fail-over - it replicates data between multiple devices. If one server dies the MogileFS continues working without problems. This saves thousands of dollars and provides HORIZONTAL SCALABILITY from small to large projects that require large storage space and high availability at the lowest possible cost.

You can set up MogileFS replication based on predefined classes and replicate files that are important. Files that are generated from the sources - for example, resized thumbnails - can be easily regenerated from your applications if the disk or server hardware fails.

MogileFS is not POSIX compliant and thus must be implemented in your applications from the very beginning. Multiple APIs are available for PHP, Perl and Python languages, and implementation is quite trivial.

You only need a minimum of two servers to run MogileFS - trackers can be run on the same server where storage nodes are running; however, 4 boxes are preferred and trackers need to be set up in high availability.

MogileFS can be set up in any number of storage servers horizontally that provide high-availability and load balancing, and it is a much better alternative to the widely used NFS, which has many problems.

Comments

Using IPv4 anycast for load balancing

IPv4 anycast is actually a very old technique, and it works as follows. One set of address space is announced in multiple physical locations using BGP. All data sent to this IP address block will travel to the “nearest” location BGP hop-wise, because the router will do BGP balancing and choose the path that has the shortest BGP ASN hops in it.

This works pretty well for stateless UDP-related services - for example, DNS UDP-based load balancing and distribution. This is what first-class providers such as Neustar Ultra Services (former UltraDNS), CriticalDNS and many root TLD operators use to offer first-class DNS services.

Many ISPs will block any route announcements that are smaller than /24.

Comments (5)

Wordpress CSS include optimization

Who does run Wordpress without a single plug-in? Not many, I guess. If you run a popular blog using a few plug-ins, it’s time for quick CSS file optimization.

To speed up the blog loading time a little, you should serve less hits - this provides lower latency and server network stack usage. You should found out any plug-ins of your blog use their own CSS files. If plug-ins have their own style sheet that it is loaded in every page for web visitors, you should consider moving CSS definitions to your main style sheet.

Remember, even the smallest performance boost does pay back in long run. Imagine, if you are serving 10 million hits daily where 100,000 extra hits will be due to of plug-in CSS, those can be easily merged with your main CSS file.

Comments (2)

Caching content to disk using Tugela cache

Today many folks use memcached for fast memory data access because RAM is cheap. However, for some heavyweight websites, 16GB of RAM or even 32GB is not enough, and keeping a lot of data in the RAM becomes expensive. It’s time to cache your content and data to hard disk using Tugela cache a “clone” derived from memcached. It will cache data in BerkeleyDB B-Tree database on the file system. Tugela cache is fast and compatible with memcached APIs; it will be a smooth transition to Tugela cache if you are already taking advantage of memcached.

Remember, it’s always better to run a few small inexpensive boxes rather than a big “monster.” You can always buy 5 boxes with 4GB RAM each for dedicated memcached servers. This will make a total of about 20GB for data caching. It will also provide redundancy, but may also add problems with keeping the state, unless you do memcached sharding and applications can determine where to fetch the data.

Scaling your all servers and services vertically is the KEY.

Comments

Nginx performance testing

The other day I spoke with my friend who runs Nginx powered large scale site doing load balancing using 3 back-end boxes for file downloads. The performance looks amazing and I am sure we will set-up the test environment and do some real balancing tests.

We are currently preparing setting up our Linux based servers with 1Gbps connectivity/switching and tests with real results will follow shortly.

Comments

Positioning CSS and JS files in HTML content

Have you seen a webpage that starts to load somewhat strangely in your browser and after a couple of seconds the design changes? Or the page loading stalls for a few seconds and then loads fine?

It’s important to correctly position the external file calls to Stylesheet and JavaScript files.

Rule #1. Move all JavaScript external file calls to the bottom of your HTML code.

Rule #2. Move all StyleSheets external file calls to the top of your HTML code.

There are of course a few exceptions - for example, if you have a specific Javascript code that calls up an ad or specific block to display. In this case you will call the external file from the right place in your source. Another case can be some specific timer that can’t have a delay until the full page loads.

Always remember to have as few external include calls as possible. The more website content that can be cached, the better speeds it will provide, generally.

Comments (1)

Smart caching for websites and blogs

If you are creating a new website or modifying an existing one, you should always keep in mind that site speed depends on many factors. From the very beginning, you should carefully plan your site architecture, code and overall structure to make sure your site is easy scalable in the future without huge expenses and a great deal of infrastructure changes.

One of the most important performance boosts is the use of caching. In this guide, we will review multiple caching strategies that will boost your website performance and, if done correctly, decrease your site loading times and offer better online experience to your web visitors.

Javascript code.
It’s better to place Javascript code into external JS files and then call from the HTML or PHP code. By using this method, we take advantage of caching. This works great if you have a Javascript code used on multiple pages.

Example:

<script language=”JavaScript” src=”http://www.yourwebsitegoeshere.com/mycode.js” type=”text/JavaScript”></script>

Style sheets.
Move all CSS definitions to an external file, and call the file from your HTML or PHP code.

Example:

<link rel=”stylesheet” type=”text/css” href=”http://www.yourwebsitegoeshere.com/my-own.css”/>

Database queries.
The more database queries you use to generate the website or blog, the higher load it will place on your database server. This will add extra latency, and if you are getting a high number of concurrent web visitors (for example “Slashdot” effect), it may cripple the server, causing a halt. You may want to check out memcached and implement it in your website or blog code. It’s also very important to have the right indexes on the database tables. Limit SQL SELECT queries with LIMIT, you don’t need to select all table rows if you need to display only 10 records from 1,520.

If you have millions of records, a database sharding must be implemented. You can split records in multiple tables or even databases this will provide much lower query latency and greatly speed up your MySQL performance. With database sharding, you can also easily lower index creation time, as well as increase and scale INSERT and UPDATE operations.

Wordpress blog caching
By default, Wordpress works perfectly fine even for popular blogs when there are no high bursts of concurrent connections . If you get Dugg, the standard Wordpress install will not scale, and most likely the server hosting your blog will be inaccessible. Wordpress uses PHP to generate output on the fly thus, all pages are generated in real time, including data from the database.

If your content doesn’t change that often, why generate files on the fly all the time? Correct. There are wp-cache and wp super cache plug-ins that offer Wordpress caching. In other words, the plug-in will generate a static file on the server hard drive and serve it to visitors without querying the database at all. When the content changes, the static cache file is removed and a new file is generated once again.

You can also offload static content to Content Delivery Network (CDN) - see below. Offload static content section, or serve static images from a different server that is optimized to serve static content only.

Memcached
Memcached is perfect for file, data and SQL query caching that is heavily used and accessed. Caching is very good and recommended for content that is not so important to be generated from database back-end in real-time. Examples include recent comments, statistics, logged-in users in the system and news links.

Please keep in mind that there’s little of benefit to caching MySQL queries or files in memcached if the output is dedicated for one user only for example, a customer control panel displaying client-only settings or dynamic content that depends on cookies or very specific GET values.

Offloading static content
From the very beginning implement static content separate URL from your main site. Separate static content from dynamic content serving URL - for example, with subdomain http://images.yourwebsiteishere.com/ so that all images are served from this URL.

At the very beginning, you can have http://images.yourwebsiteishere.com/ on the same server to save on hosting expenses. When your site or blog grows, this will allow more flexibility in the future, and you can then point this URL to a Content Delivery Network like Akamai or ValueCDN, serve images from another server and so on.

Apache mod_expires
You should implement the HTTP protocol expire header feature to lower the hits for content that doesn’t change very often, including images, CSS, Javascript and even static HTML content. You need to compile Apache with mod_expires module for this to work.  For Apache2, you need to add the –enable-expires switch to ./configure line.

Next, you activate it with the LoadModule expires_module modules/mod_expires.so line.

Then, add the following lines to the Apache configuration file (usually httpd.conf). You can also add this to .htaccess but from the performance point of view, it’s better to have in httpd.conf.

<IfModule mod_expires.c>
ExpiresActive on
ExpiresByType image/jpg “access plus 1 weeks”
ExpiresDefault “access plus 1 days”
</IfModule>

ExpiresByType image/jpg “access plus 1 weeks” tells the browser to reload the file with the document type image/jpg from the server one week after it was first accessed.

ExpiresDefault “access plus 1 day” tells the browser to reload default files from the server one day after it was accessed.

Comments (1)