Introduction
This article will be the first in a series based on the Apache Web server software. In this article, we'll cover the very basics of Apache such as what it is and what it does, getting and installing it and properly configuring it to serve your first pages. Finally, some basic maintenance that you must do to keep everything running smoothly.
While this is a very basic introduction to Apache, the following articles will go into much more depth of various aspects such as CGI programming, access control and security, and extending Apache functionality with modules. A special article delving into the details of mod_perl programming is also in the works.
What is Apache?
The Apache Web server or simply, "Apache" as it has become known is an HTTP daemon, or software that handles the requests from Internet users for documents located on your server that make up your Web site. Apache is like other Web server software you may have heard about such as Microsoft's "Internet Information Server" (IIS) or iPlanet, Roxen, Zeus, or AOLserver - just to name a few.
What makes Apache so special and probably why you've heard it mentioned all over the Internet is actually several things. The one thing that many people mention often is the fact that it's free as in Open Source, meaning that you can download the full source code to the software free of charge (see the Apache license for more.) This would certainly explain it's widespread use and in fact, Apache is the world's number one Web server, beating out all others including it's nearest competitor, Microsoft's IIS according to the Netcraft survey.
More than just cost, which is a nice perk is the performance, stability and feature set of Apache that makes it very nice indeed. Apache has never been known for it's performance necessarily, as the Apache developer team always stressed "correctness before speed" meaning that Apache adheres to Web standards over the sacrifice in all-out performance and there are faster servers out there such as Zeus and the Linux-related project, "Tux." However, there is more to performance than just the number of hits you can crank out, such as stability - and Apache is no slouch in either case. One of my favorite pieces of software and among the ranks of "set it and forget it" operation. Rest assured however, Apache can handle very busy Web sites indeed and do so in style (the site slashdot.org as well as the one you're currently browsing come to mind.) Once the new version of Apache, dubbed the "2.x series" is released from beta soon, Apache will have an entirely new engine under the hood and should hold its own just fine (with enhancements like true multi-threading in addition to the current prefork method of Apache process handling.)
Features are another area where Apache shines and it certainly covers all the basics that you would forseeably need. Should those not be enough, due to Apache's widespread acceptance and developer base you have numerous projects, patches and plugins from which to choose that extend Apache's abilities beyond those "out of the box." Things like LDAP or database authentication, tighter user-session tracking and ad serving to name a few. Apache is quite extensible, and you could even write your own code that interacts with the Apache API within several stages of the request-and-delivery process.
You're probably here because you're running a Unix-based, if not Solaris-based server and wish to start serving Web pages. You're in luck as Apache was designed first and foremost for the Unix operating system, and works great with Solaris. Compilation and installation can be done in only a few minutes once you know what you're looking at.
Getting and Installing Apache
If you're familiar with compiling and installing software from source code, you can safely skip this section and move on to the "Configuring Your Server" section below. This section is for those that are new to adding custom software to their systems. It should be noted that later versions of Solaris now come with the Apache Web server on the Companion Software CD, but as is the case with many binary distributions, you could be one or more versions behind the latest which may have important security or feature enhancements, if not just bug fixes. We'll take the approach of installing Apache from scratch to remain compatible with older versions of Solaris as well as give you an example of how it should be done. One other important thing to consider is if you are planning on running extensions to Apache down the line such as PHP or mod_perl, you will most likely need to compile from source anyway and won't necessarily work with the Solaris included version. Finally, in presenting it this way, the procedure is cross-platform and will work on Linux just as well as MacOS X without relying on Solaris package tools, etc. Those of you that are interested can read about the Solaris freeware available on the Get Solaris Freeware page of Sun's Web site.
Compiling Apache is normally fairly easy to do so providing of course, that you have the prerequisite C compiler mentioned above and any special libraries that may be needed. The method to accomplish this is fairly standard now and most often completes without any issues. Unless you are installing any additional modules like PHP or mod_perl, simply follow the instructions below. More detailed instructions will accompany subsequent articles on Apache on how to modify this basic procedure if necessary.
You will start with a "tarball" or "tar" backup of the source code tree which may or may not have been further compressed with "gzip." In the case of the latter, look for an ending such as ".gz" or ".tgz" while in the case of the former, it will simply have ".tar" at the end. Your best bet will be to have both GNU "tar" and "gzip" utilities installed on your system. If you are, the easiest way to extract the source code tree from the tarball is to issue the command:
tar -xzf apache-1.3.23.tar.gz
This will automatically uncompress and untar the tarball and generate the source code tree, which will usually (but not always) have the same name as the tarball, minus the ".tar, .tgz, .gz" or other extensions in the same directory that you're currently in.
Change into the source code directory and look for a file named "configure." One thing you'll need to decide (or have decided) by the onset is where you wish to stash the Apache installation. The perennial favorite is the /usr/local/ filesystem, which clearly differentiates things that you've installed on your Solaris box vs. what comes with or has been installed along with Solaris. To do this, you run the configure script with the option, "--prefix=/usr/local" which causes that filesystem to serve as the root for the install. Therefore, Apache binaries would then be located under /usr/local/bin while the man pages for instance, are installed under /usr/local/man and so on... However, Apache is often installed in it's own directory just to keep everything neat and together, and you really don't need to integrate Apache as you might with other applications. So for most people, the prefix specified with "--prefix=/usr/local/apache" is the right way to go and you can use the following command to get the ball rolling:
./configure --prefix=/usr/local/apache
You'll see a lot scroll by as configure automatically determines various locations of libraries and such in your environment. Everything should complete successfully and usually there won't be any specific message indicating such, but it should be fairly obvious. Now you need to actually compile the software, and this is done with the "make" command. Again, the GNU replacement for the stock Solaris make might be something you should look into, as it supports multiple CPUs through the use of the "-j X" flag, the "X" being your number of CPUs plus one. This will compile software much, much faster on average. In any event, simply run the make commands as follows:
make [-j3] (e.g. for dual CPU systems)
This will compile the software using any options that you've specified. If all goes well, you can then install the software and begin trying it out. To do this, supply the "install" option to make thusly:
make install
That's it! Your Apache Web server and all its related files can be found under the /usr/local/apache directory, including your Web site, CGI scripts and log files.
Configuring Your Server
This part of the installation could either be the easiest, or the toughest. If you don't have any special requirements or just want to take the quickest route to getting a page to come up in your browser, you could actually skip this section entirely and move on to the "Serving Your First Pages" section. Chances are though that you'd like to customize a few settings on your Apache server and we'll go into a few of the important ones here.
Apache is configured through the use of a flat text file that is made up of comments and directives with values, some of which can be nested. This file and method of configuration is a joy to work with and quite simple once you know the ropes. Forgetting which GUI panel a particular setting was located on or plodding through several Web pages such as those found on some server's "admin panels" don't apply to Apache - which is both it's strong point and the bane of new users alike. Much like everything else on a Unix operating system, Apache is controlled by a text configuration file that can be edited with any text editor, even remotely through a "ssh" or "telnet" session from halfway around the world. No special requirements such as a browser, and no special software to install or launch.
This file used to be broken up into three parts, but has merged into one over the years and is known as the httpd.conf file, located in /usr/local/apache/conf/ if you used the prefix suggested above. Make a copy of this file for safe keeping and set about editing it with your favorite editor as we go through some of the important options. Most of them can be left just the way they are, while others might require slight tweaking. Any line starting with a pound sign ("#") is a comment and looks similar to the following snippet:
# This is a comment
#
ServerAdmin webmaster@domain.com
ServerName www.domain.com
The first thing you'll see as you scroll through the file is the ServerType directive. It is strongly suggested that you leave this setting as-is, but if you have really, really low demand for Web serving (such as maybe a home LAN or just stashing documentation somewhere) or have very little RAM to spare, then you can run Apache via inetd/xinetd like other stock Solaris services might be, or run Apache in standalone mode. Standalone mode means that Apache is started up, usually at boot time - and stays running, ready to answer queries - until the system is shut down. This method ensures that a page can be served up as soon as a request comes in, rather than having to incur the overhead of starting up Apache, serving the page and then destroying itself. Unless you're in the situation mentioned above, leave this one alone.
Scroll down a bunch more in the file, and you'll notice that things like ServerRoot are set correctly already. This information was taken from your "--prefix" specification when running the "configure" script. Several other values are also listed. You'll want to scroll down to the User and Group directives.
A recurring theme that you'll notice in the articles on this site is software that tends to allow you to run your daemons as a particular user and/or group. Postfix and ProFTPD are two such examples. This is an important security feature that keeps daemons running at a low privilege, rather than root so that in the event the daemon is exploited, access and damage can be restricted greatly. I prefer to create unique users for each of these types of services for additional, finer-grained control. Also, if you do a "ps" listing for example, you can clearly see these services by the users they run as, and can sort these listings accordingly and simply. Remember that if you use "apache" as the user that a user by that name actually exists. Use a shell of "/bin/false" and do not enable a password on that account for further "no login, ever" protection. So, for User you can use "apache" or just simply use "nobody." For Group you'll want to use "nobody" in either case rather than the default of "#-1" which is neither descriptive nor useful.
The next directive we want to check and likely modify is the ServerAdmin value, which is the address shown in error reponses that Apache generates as the contact in case of problems. By default, the value is set to "root@your.machine.com" and often, as is the case with business servers vs. personal ones, the address "webmaster@domain.com" is used instead. If nothing else, you should probably create an alias that points the address "webmaster@domain.com" to your address. Later on, you can redirect this address at will without having to worry about who has what address. Also, Web crawling spiders that spammers often run can possibly pick up your "root@your.machine.com" and before you know it, you'll be getting amazing offers in root's inbox.
The next important directive is ServerName and it's value is what your visitors see for self-referencing URLs that Apache generates and also in errors, much like the Email address above. You might notice that the value has already been generated for you, containing your hostname, but commented out. The most common value here is usually in the form "www.domain.com" but if you don't use the "www" prefix, leave that out here as well. This name should be a valid and registered DNS name or you might run in to trouble, or you'll need to make certain other modifications (e.g. /etc/hosts) for it to work properly. If you don't have a valid DNS name, you can use an IP address instead of a name, or if you simply want to use Apache for development and/or testing, use the loopback address, or "127.0.0.1."
For the intended scope of this article, these are the only directives you really need to pay much attention to get up and running properly. In future articles, we'll discuss other directives and what they do, but for the impatient, see the Run-time Configuration Directives section of the Apache documentation on the Apache Web site. Most directives and their use are fairly obvious from the well commented httpd.conf file however. If you have a RAID drive where all your content is located, you may wish to tweak the DocumentRoot directive for example. This involves several other necessary changes and can get messy quick without knowing what you're doing. Again, see the above documentation if you're curious, impatient or have a special need.
Serving Your First Pages
Now comes the big moment... You've retrieved the source code, compiled it and consequently installed it on your system. Now what?
Change into your Apache installation's bin/ directory (/usr/local/apache/bin) and look for a file called "apachectl." This is the main script you will use to interact with Apache such as starting it up and shutting it down. To start Apache and hence, make your Web site public to the world, simply execute the following at the command-line:
./apachectl start
If everything in your httpd.conf file jives, Apache will start up and you're all set. However, you might see various warnings or errors that might prevent Apache from starting up, and you'll need to address those. Unfortunately, there are a great number of reasons why things might not work and are beyond the scope of this article - but in nearly all cases the errors are very straightforward and easy to understand and fix. Careful editing of the httpd.conf file serves to lessen the chance of errors cropping up, and be sure to follow the comments in the file, or the Apache documentation - for clarification.
If you're working with an already running instance of Apache and wish to test changes to your config file before restarting the server which may possibly leave your Web site down - you should issue the following command, which will scan your httpd.conf file and report on any errors without affecting the running server:
./apachectl configtest
They say, "all good things must come to an end" and you will need to either restart or shut down your Apache server for one reason or another. You could simply kill your master httpd process, but this is ugly, and not the way to go. Instead, use the same "apachectl" script as above, but instead specify "stop" instead of "start." You can even combine the two with the "restart" or "graceful" options.
Once you've successfully started the Apache Web server, you can test it either at the command line, or in a browser. Simply fire up your browser of choice, and enter your domain name and you should be presented with the default Apache start page indicating success!
Now that you know that you can serve your Web pages, you'll most likely want to create your own content and build up a Web site. Simply place your files in the document root (as indicated by the DocumentRoot directive in your httpd.conf file as above) and you should see the results immediately in your browser. By default, Apache looks for a file named "index.html" whereas Microsoft's IIS looks for "index.htm" or "home.htm." You can use anything you want, and inform Apache that these other variations are also defaults by modifying the DirectoryIndex directive (simply add your filenames, separated by a space - keep them to an absolute minimum!)
At this point, if you don't know anything about Web site design and authoring or HTML coding, you'd be best off catching up on that and experimenting first before continuing on to the follow-up articles to this one, which will delve into deeper aspects of Apache above and beyond HTML coding and you'll need to know this.
Basic Maintenance
As alluded to above, Apache is among the "set it and forget it" kinds of software in that once it's running, it usually stays that way for very long times which are often as long as the uptime of your server. Changing Web site content doesn't require anything special as far as Apache is concerned. The only time you really have to restart Apache is when you make any changes to the httpd.conf file or wish to upgrade to a newer version.
There is some regular maintenance that must be done with Apache to keep things running smoothly, however. Log files can grow to immense sizes and alarmingly fast - depending on how busy your Web site is or if you have buggy code somewhere in your CGI applications spitting out errors. I've seen instances where a bad script spit out one line to the error log about every second, non-stop. This is something you might want to avoid... Once you have your server and Web site set up, and the dust has settled, often times either through the use of a regularly scheduled script or some modification of your httpd.conf file and the use of the "logrotate" utility that comes bundled with Apache - your log files can handle themselves automatically!
By default, however - Apache is set to log to two files; access_log and error_log with the former usually being much, much larger than the latter. Both are in your log directory as specified in the httpd.conf file. There are several other files like the scoreboard and PID files, but they're self-maintaining and never grow more than a few bytes, total. The log files are all we're really interested in and if you don't watch these two, you could end up filling up your hard drive and losing access and error information or worse - bringing the server down itself if you're logging to a critical filesystem like /var.
If you have a quiet site or lots of room on your filesystem to burn, just check these files from time to time and rotate them. The simplest way to do this is to copy the contents to a meaningful filename for archival if you wish to do so, and then clearing out the current file. You can do this with or without cycling the Apache server (that is, shutting it down and then starting it up again temporarily). The method I like to use is as follows:
cd /usr/local/apache/logs
cat access_log > access_log.old ; cat /dev/null > access_log
gzip -9 access_log.old
This example would copy everything in the access_log file into the access_log.old file all while the server is still happily running. If you have a busy site, you might miss a second or two of information in between the time you copy the old access log to the new file and nullify the old one. By putting the two commands on the same command line, that reduces that brief moment to be practically inconsequential.
The other method would require you to shut down the Apache Web server with the "apachectl" script, either nullify or delete the log files and then start the server up again with the same script. During the time that you're clearing out log files though, your Web server is effectively down and visitors cannot get to your pages - which is why this method is not recommended.
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment