Friday, December 08, 2006

HTML--A Brief Overview

HTML--A Brief Overview
The information superhighway is often mentioned in the mainstream media these days. When the media uses that term, it is often referring to the World Wide Web and describe it as if it were something being hardwired together. In reality, the World Wide Web (often referred to as just "the Web") is a collection of systems on the Internet that run software that communicate using a common protocol.
This may sound like a description of the Internet in general because most systems use a common communications protocol (TCP/IP). That is because the model is similar. But instead of people having to write down or remember the addresses, locations, or names resources they need, the software provides the links.
The user starts at one location and then connects to other locations and resources. There are three categories of software required to perform these tasks: the server (providing the information), the Web page, and client software (known as a browser). Major corporations run their own Web servers, smaller companies and individuals use Internet Service Providers (ISPs) to hold their Web pages. The Web browser can be GUI (most are) or CUI (Character User Interface, which most frequently is used by UNIX users). It is the client portion of the equation.
It is the Web page that provides the programming flexibility of the Web itself. Although the language looks complex in the beginning, new material can be created quite easily and modified quickly. With ISPs providing inexpensive or even free Web services to their customers, many people are setting up their own pages. The high level Web page of an individual, company, or organization is referred to as the home page because it is the starting point when looking at their Web pages. Each Web page can contain many links or connections to other Web pages and resources.
What Are URLs?
The links between Web pages (or means of accessing resources through the Web) are through the Universal Resource Locator (or, URL for short). The URL specifies the protocol, user name and password (often omitted), system name, location, and name of the desired file. When working with a Web page, the typical URL looks like the following:
http://www.host.domain/directory/file.html
Several protocols are available, as shown in Table 15.1:
Table 15.1. Available World Wide Web protocols.
Protocol Description
file Get file on current system (client)
ftp File Transfer Protocol
gopher Information Service protocol superseded by http
http HyperText Transport Protocol
mailto Send e-mail
news Net News Transport Protocol (NNTP)
telnet Terminal session communications
With the exception of the http protocol, these have been available on the Internet for several years. Only http is new with the Web.
Chapter 21, "Introducing HyperText Transfer Protocol (HTTP), provides much more detail on http itself.
What Is Hypertext?
Hypertext is the description applied to any document that contains links to other portions of the document or other documents. Instead of reviewing the document in a linear manner (reading a book from beginning to end), it is possible to jump around to other areas. Normal documents often have hypertext-like entries--the reference to Chapter 21 (for more information on http) in the previous section is a link to another portion of this book. The primary difference from a reference and a hypertext link is the effort involved to get to the other area.
With book references, it is up to the user to find the page that the reference is on (through the table of contents or index), and then physically move to it. With hypertext links, the link is executed (by selecting it via mouse or hotkey), and the software gets the material for the reader.
With many tools, you are able to jump to new material via the hypertext link and then back to your original location. With a book, you have to keep your finger or a bookmark at the original location.
Hypertext does not provide any new capability, it just makes it so much easier to take advantage of it.
Description of HTML
The programming of individual Web pages is done through HTML (HyperText Markup Language), which is a subset of SGML (Standard Graphics Markup Language). The HTML code describes what the page should look like to the client software (Web browser) and describes links to other pages.
The language itself defines a set of codes or tags (requests in troff terminology) that tell the Web browser how to display text, images, and links. Like troff requests, HTML tags are ASCII text. The language standard provides guidelines on how these items should be displayed, but it is up to the client software to determine the final form.
When coding HTML, you will encounter WYSIPWYG (What You See Is Probably What You Get). When working with a GUI-based word processor, you have the ability to work in WYSIWYG (What You See Is What You Get) mode--the image on the screen is exactly how it will appear on paper. Because the individual Web browsers interpret the HTML slightly differently, the results will vary between products. The HTML specifications only provided general guidelines on displaying elements, so there can be wide variation.
The HTML language elements, also known as markup tags or just tags, begin with the less than symbol (<) and end with a greater than symbol (>). Immediately following the less than symbol is the command name (which is not case sensitive). For many of the commands, they are followed with attributes and assigned values. Be careful with the assigned values because they may be case sensitive.
The tags describe document elements (document parts or sections). Like the pic requests .PS that requires a .PE, some of the tags require a closure tag; others do not. A closure tag consists of the less than symbol, a slash (/), the command name without any attributes, followed by the greater than symbol. When working with tags that require closure, be very careful when nesting them as the closure tag will close the most recent command of that type.
Some of the elements include:
Title text goes here

First level of heading text goes here





Notice that the tag requires a closure tag in the form of . The
(horizontal rule) tag does not.
There are several versions of HTML. The original was, of course, version 1. Every browser available should be able to recognize version 1 HTML elements. All but the oldest browsers will support version 2 elements. Most browsers should support version 3, which introduces HTML elements to support tables. As with any standard, it is always evolving and growing.
Several browser vendors (Netscape and Microsoft, for example) have added their own non-standard elements to HTML. When a Web page is coded using the extensions of a particular browser, you will often see a message similar to:
This page optimized for the XYZ browser.
Often followed by a graphical representation of the browser's trademark.
________________________________________
NOTE: Most Web browsers will simply ignore any HTML tags that they do not recognize. If you code a tag incorrectly or use a newer HTML version than the browser supports, you will get odd results. If you are unlucky, the Web browser itself will crash, but you will not get an error message. Some of the tools verify the syntax of your HTML code.
________________________________________
My personal suggestion is that you code for the majority of the Web browsers to enable the most people to view your page. The official standard is maintained by the World Wide Web Consortium. You can get more information on the standard HTML at the following Web page:
http://www.w3.org/

Using a Web Browser
Your operating system may come with a Web browser, or you may have received a copy with other software, or you may have to download one from the Internet. But once you have it installed, there are two basic types of browser: GUI and CUI.
When the Web began, most of the users were connected through UNIX systems with character (or text) interfaces. This precluded the use of pretty graphics to represent links and limited the way that text could be represented. As usage has progressed, the majority of users have GUI interfaces that provide much more capability.
The individual Web browsers all behave a little differently, so you will have to learn how yours works. In general, they all have a location for you to enter an URL and provide some status information on the transfer of data between the host and your client. A good place to start is the home page for your browser. Most browsers have a button or menu option that will fill in the URL for you and go right to that page.
Most will also have a back button or menu option. This should take you to the page you previously visited. This is equivalent to your finger in the book when you look at another section. Most browsers will support multiple levels of previous pages so you can follow a link completely away from your original location and get back there again.
As shown in the section for URLs, one of the types is file. By using this type, you can create HTML files on your client system and look at them before placing them on a server for the world to see.
Some vendors are taking advantage of the file URL type when distributing documentation or other materials (sales literature, for instance). Instead of having to provide a tool for you to look at their information or coding to a proprietary standard (like the Microsoft Windows help facility), they code in HTML. To use their documents, you start up your Web browser and point to their files.
Your machine is not cluttered with different viewers and the vendor's material can be viewed on many different types of machines.
Coding HTML
Coding HTML documents has traditionally been a manual process, just like with troff. With the increased popularity (consumer demand) and business use of the Web, GUI-based Web authoring tools have become available. Although these tools are available and relatively inexpensive (often free or included with other software), there is still value to being able to code basic HTML. Even though there are GUI word processors, troff is still used in some applications.
This chapter provides an introduction to HTML only--it covers the important language elements and provides examples of their usage.
________________________________________
NOTE: In general, you can name your HTML code any name but it should have a suffix of .htm or .html. You should check with your system administrator for the location to place your Web pages, most servers look for them in a directory called public_html under your home directory. If you want people to be able to get your top-level page automatically, you should name it index.htm (index.html) or welcome.htm (welcome.html). Your system administrator can tell you the exact form it should be in.
________________________________________
See the section on GUI tools later in this chapter for more information.
A Minimal HTML Document
The minimum reasonable HTML document contains four elements:
• pair that contains the entire document
• pair that contains heading information
pair contained in the heading
• pair that contains the body of the document
Figure 15.1 shows the output of the minimal HTML document using the Mosaic Web browser from the NSCA (National Super Computing Agency). Listing 15.1 shows the source for it.
Figure 15.1.
Minimal HTML document viewed through Mosaic.
________________________________________
NOTE: You will notice that the activity indicator (the square postage-stamp sized box near the upper right corner of the browser) is black in the Mosaic examples. This is because I ran the browser locally (on my PC) instead of connected to the net. It was much faster that way and the results are the same.
The activity indicator in the Netscape examples shows the AT&T "World" logo instead of the Netscape "N" logo because I use a version from the AT&T Worldnet service (the software and the service were free).
________________________________________
Listing 15.1. Source for minimal HTML document.


This is the Title


This is the body of the text


The text enclosed in the tag is displayed at the top of the window. There may be only one title, if you include more than one in the <head> section, usually only the last one will actually display. The block contained within the <head> tag is used to set up the document and show the title. The block contained within the <body> is where the most tags and text are placed.<br />As you see from the URL in the figure, this HTML document was displayed from a file on my system; it was not placed on a Web server for the world to see.<br />This minimal HTML document demonstrates the portions of the document, but really is not very useful. Many more tags and much more text is required. <br />Font Control<br />Within the body of the document, you can control the fonts that your text is displayed in. To start off, there are six levels of headings available specified using tags <h1> through <h6>, respectively.<br />Figure 15.2 shows the behavior of the heading tags using the Mosaic Web browser. Listing 15.2 shows the source for it.<br />Figure 15.2. <br />Heading tags viewed through Mosaic. <br />Listing 15.2. Source for heading tags.<br /><html><br /><head><br /><title> Heading Font Control


Heading Level 1 - ABCDEF abcdef <H1>


Heading Level 2 - ABCDEF abcdef <H2>


Heading Level 3 - ABCDEF abcdef <H3>


Heading Level 4 - ABCDEF abcdef <H4>


Heading Level 5 - ABCDEF abcdef <H5>

Heading Level 6 - ABCDEF abcdef <H6>

Heading Level 7 - ABCDEF abcdef <H7>
Heading Level 8 - ABCDEF abcdef <H8>
Heading Level 9 - ABCDEF abcdef <H9>
Heading Level 10 - ABCDEF abcdef <H10>


Looking at Figure 15.2, you will notice that the lines start to get weird after heading level 6. After you go beyond what the standard allows, then things can get odd.
Because the less than and greater than signs have special meaning to HTML, in order to print them, you have to use special character representations. These are in the form of ampersand <&> followed by a mnemonic followed by a semicolon (;) to complete the special character. In Listing 15.2, < and > were used. If you wanted to print an ampersand, you would use &.
Using a version of Netscape navigator, the same source will produce a slightly different screen as shown in Figure 15.3.
Figure 15.3.
Heading tags viewed through Netscape Navigator.
For the other fonts, there are logical and physical style tags. With logical font tags, it is up to the browser to decide how to display them. Logical tags include for emphasis (usually displayed in italics), for important text (usually displayed in bold), and others.
Figure 15.4 shows the behavior of the logical font style tags using the Mosaic Web browser. Figure 15.5 shows the behavior of the logical font style tags using the Netscape Navigator browser. Listing 15.3 shows the source for it.
Figure 15.4.
Logical font styles viewed through Mosaic.
It is not very obvious that the different font types are really different in Figure 15.4. It is much more obvious in Figure 15.5 what the different fonts are (they are better supported).
Figure 15.5.
Logical font styles viewed through Netscape Navigator.
Listing 15.3. Source for logical font styles.

Logical Font Styles


Postal or E-mail address - ABCDEF abcdef <ADDRESS>


Citations - ABCDEF abcdef <CITE>

Program Code - ABCDEF abcdef <CODE>

Emphasis - ABCDEF abcdef <EM>

Keyboard Input - ABCDEF abcdef <KBD>

Literal (Sample) Characters - ABCDEF abcdef <SAMP>


Strong or Important - ABCDEF abcdef <STRONG>


Variable Name - ABCDEF abcdef <VAR>



With the exception of the invalid heading tags, all of them appeared on their own lines (by definition, a heading gets its own line). When specifying font types, it is necessary to tell the browser to go to a new line through the
(line break) tag.
Physical tags include for italics, for bold, and others.
Figure 15.6 shows the behavior of the physical font style tags using the Mosaic Web browser. Figure 15.7 shows the behavior of the physical font style tags using the Netscape Navigator browser. Listing 15.4 shows the source for it.
Figure 15.6.
Physical font styles viewed through Mosaic.
Mosaic supports the standard physical font styles, but treats the Netscape extensions as plain text. Netscape Navigator supports the standard physical font styles and its own extensions. Although not obvious from the screen in Figure 15.7, the tag line does actually blink.
Figure 15.7.
Physical font styles viewed through Netscape Navigator.
Listing 15.4. Source for heading tags.


Physical Font Styles


Bold - ABCDEF abcdef <B>

Italics - ABCDEF abcdef <I>

Strike Out - ABCDEF abcdef <S>

Underline - ABCDEF abcdef <U>

Typewriter Text - ABCDEF abcdef <TT>

Blink (Netscape extension) - ABCDEF abcdef <BLINK>


font size 1 (Netscape Extension) - ABCDEF abcdef <FONT SIZE=1>


font size 3 (Netscape Extension) - ABCDEF abcdef <FONT SIZE=3>


font size 5 (Netscape Extension) - ABCDEF abcdef <FONT SIZE=5>


font size 7 (Netscape Extension) - ABCDEF abcdef <FONT SIZE=7>




Physical font styles can be combined to produce multiple effects like bold italics or bold underlined.
Formatting Text
When text appears in an HTML document, the browser decides how to display it. You can control the fonts and you can also control how it is formatted. By default, you enter your text-free format and it is automatically justified.
A new paragraph starts with the

tag, and if you want to force a line break, you use the
tag. The browser decides how to format the text except that it always starts a new paragraph at the beginning of a line (with a blank line above it) and will start text on a new line (without a blank line above it) when you use the line break.
If you have text that is a quotation, put it between

tags--it will normally appear indented, the same way that quotations appear in books. If you have text that requires very specific formatting, you can contain it within a
 (preformatted) block--it will appear the way you entered it.
Figure 15.8 demonstrates these text formatting tags with the Mosaic Web browser. Figure 15.9 shows the same HTML document with the Netscape Navigator browser. Listing 15.5 shows the source for it.
Figure 15.8.
Text formatting tags viewed through Mosaic.
Mosaic does not support the
tag and could not fit the first paragraph entirely on the first line. Netscape Navigator handled these properly.
Figure 15.9.
Text formatting tags viewed through Netscape Navigator.
Listing 15.5. Source for text formatting tags.


Text Formatting


This is normal text that was typed in on two lines. It will show
as one line if the window is wide enough

This paragraph breaks right here
and then continues on the next
line.

A horizontal rule (line) appears below this line


and above this one.

this is an address field that is its own paragraph

that takes 2 lines

This text is treated as a block quote and is usually
indented

   This text is preformatted.
I've put 3 spaces before each of 2 lines.



The heading and paragraph tags were extended as part of HTML version 3. In the new version, the text can be aligned to the left (default), center, or right. Netscape also supports the
tag to center text.
Figure 15.10 demonstrates the extended text formatting tags using the Mosaic Web browser. The Netscape Navigator browser behaves the same way and is not shown. Listing 15.6 shows the source for it.
Figure 15.10.
Extended text formatting tags viewed through Mosaic.
Listing 15.6. Source for heading tags.


Extended Text Formatting


This heading is left aligned

This heading is centered

This heading is right aligned

This text is left aligned
even on a second line


This text is centered
even on a second line


This text is right aligned
even on a second line


This text is centered
on multiple lines using the

Netscape extensions



Lists
HTML supports the following five different types of lists:
• Unordered
• Ordered
• Directory
• Menu
• Glossary
With the exception of glossary (or definition) lists, each element within the list is specified by the
  • tag (list item).
    Unordered lists are specified using the
      tag and appear with bullets. At the end of the list, the
    tag is used. If another
  • No comments: