Website communications and definitions
Web pages, such as the one you are now reading, "live" on
web servers all over the
Internet. Those pages are written in
HTML (HyperText Markup Language), a simple language that allows us to create
hypertext links from one page to another. Designers put the content of the page (the text of the page) in the HTML file, and usually determine what the page will look like with a separate "style sheet" file. Style sheets are written in a language called
CSS (Cascading Style Sheets).
Web browsers, such as the program you are using right now on your computer to see this web page, speak to
web servers in a language (a "protocol") called
HTTP (HyperText Transfer Protocol).
But before the web browser can talk to the
web server, it needs to know the web server's
IP address on the
Internet - just as you have to know your friend's phone number before you can call him. So how do web browsers translate a friendly name like
www.google.com into an IP address? By talking to a
DNS (Domain Name Service) server.
Once the web browser knows the
IP address of the server, it can make an
HTTP protocol connection and ask for the page you want to see.
Web page
Every website is made up of one or more web pages -- like the one you are looking at right now! This text is part of a web page, and is written in the
HyperText Markup Language (HTML). In addition to text with hyperlinks, tables, and other formatting, web pages can also contain images. Less commonly, web pages may contain Flash animations, Java applets, or MPEG video files.
Web server
Web servers are the computers that actually run websites. The term "web server" also refers to the piece of software that runs on those computers, accepting
HTTP connections from
web browsers and delivering web pages and other files to them, as well as processing form submissions. The most common web server software is
Apache, followed by
Microsoft Internet Information Server. Many, many other web server programs also exist.
The Internet
"The Internet" refers to the worldwide network of interconnected computers, all of which use a common protocol known as TCP/IP to communicate with each other. Every publicly accessible website is hosted by a web server computer, which is a part of the Internet. Every personal computer, cell phone or other device that people use to look at websites is also a part of the Internet. The Internet also makes possible email, games and other applications unrelated to the World Wide Web.
XHTML
Although all modern word processors and many specialized tools can be used to make web pages without learning XHTML at all, learning XHTML itself is a useful way to learn more about the web and provides more control over the results. Luckily, XHTML is very simple and quite easy to learn.
What's this XHTML stuff? What happened to HTML? XHTML is the latest generation of HTML. HTML was originally intended to be an instance of SGML, a general-purpose markup language. But many HTML pages do not comply with the requirements of SGML, which makes HTML tougher for computers to work with in useful ways.
In more recent years, the
World Wide Web Consortium has taken steps to correct the problem. SGML has been largely replaced by XML (Extensible Markup Language), a new general-purpose markup language that is easier to work with than SGML. And XHTML, which replaces HTML, is a newer standard which complies fully with the requirements of XML but remains compatible with older web browsers.
A Simple ExampleHere is a simple example of a valid XHTML document. To try this out for yourself, simply create a new file called mypage.html with any text editor, such as Windows notepad. Paste in the HTML below, make any changes that please you, and save the document. Then pick "open" from the File menu of your web browser, locate the file you have just made, and open it. If you make further changes, you will need to "save" again and then click "reload" or "refresh" in your browser to see the results.
Of course, this is just a simple example. XHTML can do far, far more than this. A complete tutorial can be found at
Dave's HTML Guide.
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN""http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html>
<head>
<title>Title of My Page Goes Here</title>
</head>
<body>
<h1>Heading Of My Page Goes Here</h1>
<p><a href="http://news.google.com/">Follow this link to Google News</a>
</p>
<p>Here is a picture of my cat:</p>
<p><img src="cat.jpg" alt="Photograph of my cat"/></p>
</body>
</html>
The DOCTYPE tells the web browser what version of XHTML we're using. In this case I've specified XHTML 1.0 Strict, because this code is 100% compliant with the rules of XHTML. You don't need to understand this line in detail - just know that you should include it if you plan to write standards-compliant web pages. And you should.
Those who must use HTML elements that aren't included in strict XHTML can use the "transitional DTD" (Data Type Declaration) instead:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
Understanding XHTML:
The XHTML elements in the page above are nearly self-explanatory. All elements that describe the page but are not actually part of the content appear inside the head element. All of the elements that actually make up the visible page itself are part of the body element. Everything between the opening <head>"tag" and the closing </head> "tag" is considered a part of the head element. The same goes for body. And everything should be contained within a single html element.
The text between
<h1> and
</h1> is displayed as a "level one heading," which is typically a very large, bold font.
The
p element encloses a paragraph. In strict XHTML, most elements such as images and links must be enclosed in a paragraph or another "block-level" element.
The text between the opening and closing
<a> and
</a> "tags" becomes a link to another web page; the URL of the web page to be linked to is found in the HREF attribute of the
<a> element as shown in the example above.
The
<img> element includes an image in the page; the image is displayed at that point in the page, as long as the image file specified by the URL in the
src attribute actualy exists. Since the
src attribute I used here contains a simple filename, the cat picture will be shown as long as the file cat.jpg is in the same directory as the page. The same trick can be used in href attributes in
<a> elements, to conveniently link to pages in the same directory.
The
alt attribute of the
img element contains text to be displayed to blind users. XHTML requires it, and since this text is also read by search engines like Google, it's important to include it - Google probably won't know your page is about cats if there is no text about cats on the page!
The "alt text" should describe the image in a useful way for those (including both computers and people) who cannot otherwise see it.
The
<img> element has a / before the > to signify that it is not a container and that no closing
</img> is expected.
Hypertext
Hypertext is text that contains
hyperlinks. The
HTML documents we see on the
World Wide Web are the best-known example of a hypertext system, but it is not the only one. Hypertext doesn't necessarily have to include links to documents in other places; a simple hypertext system can link to places within a document or system.
Web browser
When you sit down and look at web pages, you are using a web browser. This is the piece of software that communicates with
web servers for you via the
HTTP protocol, translates
HTML pages and image data into a nicely formatted on-screen display, and presents this information to your eyeballs -- or to your other senses, in the case of browsers for the vision-impaired and other alternative interface technologies. Web browsers also appear in simpler devices such as Internet-connected cell phones, like many Nokia models, and PDAs (Personal Digital Assistants) such as the Palm Pilot.
The most common web browser is
Microsoft Internet Explorer, followed by
Google Chrome and
Firefox browser . Apple's
Safari browser is now the standard on Macs, and the
Opera shareware browser has a loyal following. The
Lynx browser is the most frequently used text-only browser and has been adapted to serve the needs of the vision-impaired.
HTTP
In order to fetch a
web page for you, your
web browser must "talk" to a
web server somewhere else. When web browsers talk to web servers, they speak a language known as HTTP, which stands for HyperText Transfer Protocol. This language is actually very simple and understandable and is not difficult for the human eye to follow.
A Simple HTTP Example
The browser says: GET / HTTP/1.0
Host: www.boutell.com
And the server replies: HTTP/1.0 200 OK
Content-Type: text/html
<head>
<title>Welcome to Boutell.Com, Inc.!</title>
</head>
<body>
The rest of Boutell.Com's home page appears here
</body>
The first line of the browser's request, GET / HTTP/1.0, indicates that the browser wants to see the home page of the site, and that the browser is using version 1.0 of the HTTP protocol. The second line, Host: www.boutell.com, indicates the website that the browser is asking for. This is required because many websites may share the same IP address on the Internet and be hosted by a single computer. The Host: line was added a few years after the original release of HTTP 1.0 in order to accommodate this.
The first line of the server's reply, HTTP/1.0 200 OK, indicates that the server is also speaking version 1.0 of the HTTP protocol, and that the request was successful. If the page the browser asked for did not exist, the response would read HTTP/1.0 404 Not Found. The second line of the server's reply, Content-Type: text/html, tells the browser that the object it is about to receive is a web page. This is how the browser knows what to do with the response from the server. If this line were Content-Type: image/png, the browser would know to expect a PNG image file rather than a web page, and would display it accordingly.
A modern web browser would say a bit more using the HTTP 1.1 protocol, and a modern web server would respond with a bit more information, but the differences are not dramatic and the above transaction is still perfectly valid; if a browser made a request exactly like the one above today, it would still be accepted by any web server, and the response above would still be accepted by any browser. This simplicity is typical of most of the protocols that grew up around the Internet.
Human Beings Can Speak HTTP
In fact, you can try being a web browser yourself, if you are a patient typist. If you are using Windows, click the Start menu, select "Run," and type "telnet www.mywebsitename.com 80" in the dialog that appears. Then click OK. Users of other operating systems can do the same thing; just start your own telnetprogram and connect to your website as the host and 80 as the port number. When the connection is made, type:GET / HTTP/1.0
Host: www.mywebsitename.com
Make sure you press ENTER twice after the Host: line to end your HTTP headers. Your telnet program probably will not show you what you are typing, but after you press ENTER the second time, you should receive your website's home page in HTML after a short pause. Congratulations, you have carried out your very own simple HTTP transaction.
IP address
An IP address (Internet Protocol address) is a unique identifier that distinguishes one device from any other on a TCP/IP-based computer network, such as the
Internet. The IP address provides enough information to route data to that specific computer from any other computer on the network. In the case of the Internet, this enables you to communicate with
web servers, instant messaging servers and other computers all over the world.
IP addresses are usually not entered directly by end users. Instead,
DNS servers are used to map permanent and user-friendly names like boutell.com to unfriendly and impermanent IP addresses, such as 64.246.52.10.
An IP address is made up of four numbers, each between 0 and 255. For instance, as of this writing, the IP address of boutell.com is:
64.246.52.10
The most general information is conveyed by the first number, and the specific identification of a single computer within a single network is usually made by the last number. In general, delegation of responsibility for various portions of the IP address space is carried out by the Asia Pacific Network Information Centre (
APNIC), the American Registry for Internet Numbers (
ARIN), the Latin-American And Caribbean Internet Addresses Registry (
LACNIC), and the RIPE Network Coordination Centre (
RIPE NCC).
DNS Server
Every time you follow a link or type in the name of a website, such as www.boutell.com, that name must be translated into an
IP address on the
Internet. This translation is done by the domain name system. A DNS server is a program that participates in the task of providing this service. Some DNS servers respond to queries from
web browsers and other programs, make further inquiries, and return IP addresses, such as 208.27.35.236, which is the current IP address of www.boutell.com. Other DNS servers have primary responsibility for answering DNS inquiries about names within a particular domain, such as the boutell.com domain. Every time a new domain is registered, a DNS server must be configured to give out address information for that domain, so that users can actually find websites in that domain. In most cases, web hosting companies provide this service for the domains that they host; it is rare for webmasters to run their own DNS servers.
How DNS Usually Works
Let's say you want to visit www.google.com. Your computer hasn't already looked up www.google.com since it was turned on. Or it has kept that information for long enough that it considers it appropriate to check again. So your computer asks the DNS server of your ISP (Internet Service Provider - the people who sell you an Internet connection, companies such as BT or Virgin).
The DNS server of your ISP first talks to one of thirteen "root" DNS servers. The root DNS servers answer questions at the highest level possible: the top-level domain. For instance, "who is in charge of DNS for the com domain?"
In practice, your ISP's DNS server caches (remembers) this information for a significant period of time, and does not contantly harrass the root servers just in case responsibility for com has changed in the last five seconds. Similarly, your ISP's DNS server remembers other informaton for appropriate lengths of time as well to avoid extra queries. But let's assume, just for fun, that no one has ever asked your ISP for the IP address ofwww.google.com before! Now your ISP's DNS server knows which DNS servers are responsible for thecom top-level domain. So your ISP's DNS server reaches out and contacts one of those servers and asks the next question: who is responsible for DNS in thegoogle.com domain?
The response will list two or more DNS servers that have authority over thegoogle.com domain.
Finally, your ISP's DNS server contacts one of those DNS servers and asks for the address of www.google.com, and hands the response back to your computer.
As mentioned above, in real life your ISP's DNS server will remember all of this information. That means that a typical user will get an immediate response when asking for the address of a frequently-visited site like Google.
But how long is it safe to remember that information? After all, the IP addresses of servers do change, though usually not often. Fortunately, your ISP's DNS server doesn't have to guess! The DNS records that come back from the "upstream" DNS servers include an "expire" field that indicates how long the information can be kept before the authoritative server should be asked again.