 |
Lab #14
Introduction to XML
14.1 What is XML?
XML stands for Extensible Markup Language and it is used to create structured
data files which can be easily created or read by a computer program. XML files
are quite often used to exchange data between two computers (especially two programs)
where the format of the data (its appearance) is not as important as its structure
(how the data is represented).
XML looks quite a bit like HTML. Take for example, the following fragment of
an XML file used in a program I am working on:
sample.xml
|
<?xml version="1.0"?>
<Level> <Category name="Expressions"> <Description>Alterations to variables</Description> <LanguageItem name="prefix"> <Description>Before the variable</Description> </LanguageItem> <LanguageItem name="postfix"> <Description>After the variable</Description> </LanguageItem> <LanguageItem name="typeCast"> <Description>Cast variable to a type</Description> </LanguageItem> </Category> <Category name="Multiplicative"> <Description>Non-Arithmetic operations</Description> <LanguageItem name="multiplication"> <Description>Multiplication of two variables</Description> </LanguageItem> <LanguageItem name="division"> <Description>Division of two variables</Description> </LanguageItem> <LanguageItem name="modulus"> <Description>Remainder of division operation</Description> </LanguageItem> </Category> .... more follows but was edited for brevity...
|
The file starts with the following line:
<?xml version="1.0"?>
Here, we declare that the file is an XML file, and that the version of XML we
used to create the file is 1.0. (As XML changes, the version number will be modified
accordingly. However, version 1.0 is all we currently have.) Following the version
line, the remainder of the file contains the data we wish to exchange in a highly
structured format.
In the example above, we show a sample XML file which is created in a human readable
(text) format, with tags (Level, Category, LangaugeItem,
and Description). The structure of the document dictates that a
Level tag can contain one or more Category tags; and that Category
tags can contain a Description and one or more LanguageItem tags.
Like HTML, tags may contain attributes which are associated with values. For
example, the last LanguageItem tag shown above has a name attribute
value of modulus. The names of the tags and the attributes can be entirely
up to the programmer creating the file. That is, we chose the names of our tags
when we designed the file you see above. However, in doing so we needed to ensure
that the program reading in the XML file would not only understand the names of
the tags and their possible attributes, but also the structure (rules which define
the sub-contents or sub-elements of each tag) of the file.
XML files are really not meant for human consumption. That is, programs are expected
to read XML files, not humans, so there are many unforgiving rules associated
with working with XML compared to that of HTML. If you have worked with HTML,
you may have forgotten (on occasion) to insert a closing tag (e.g. </p>).
Many browsers will overlook these human errors and proceed on despite
the problem. The browsers in this case generally will guess at the location of
the missing closing tag (based on other nearby tags).
XML however strictly enforces a closing tag for each opening tag. As such, you
will notice that each LanguageItem opening tag has a corresponding closing
tag, as does each Description tag. This is a design issue that the creators
of XML added to the language. By doing so, it removes the ability for a program
to attempt to guess where a tag may end. Now, if the ending tag is missing, the
file is invalid (incorrect) and can not be processed.
XML has led to the development of the next language for hypertext documents on
the World Wide Web, XHTML. XHTML (which stands for the Extended HyperText Markup
Language) merge HTML and XML to create what is anticipated to be the successor
to HTML.
Lab Activity #1:
|
- Locate the XHTML recommendation by the World Wide Web Consortium.
What is the URL of the recommendation document? ____________________
What is the the latest version of the document? _______________________
- Find an XML tutorial on the web.
What is the URL of the tutorial that you located?
________________________
- Type in the sample file above and add the closing Level tag
to the end of the file. Save the file to your computer and attempt
to load the file using your web browser.
Describe what happens when you load the file. ___________________
Which web browser did you use to load the file? _______________
|
14.2 Examining XML Structure Definitions
The structure of the data contained in an XML document is defined by another
document known as a DTD (Document Type Definition). The DTD file (generally
named with a .dtd extension) provides the supporting information so that the
corresponding XML file can be checked for validity. The DTD file is not required
by the XML file, however its presence is strongly encouraged. Let us take
a closer look at XML by creating a sample file around the Pete's Pet Store
example used in earlier labs.
Assume that we wished to export a listing of all of the products in our store
to another application. Ideally, we want to provide that data about each product,
but not include any formatting information since we don't know how that application
might wish to display our information. This is actually a common use for XML
- exporting data from one site to another.
As example, take a look at the popular web site about technology http://www.slashdot.org.
Slashdot (as it is more commonly known) creates a variety of news stories
based on categories and headlines submitted by their users. The article listing
from Slashdot's main page is available as an XML formatted file (partially
shown via the link below) by downloading http://slashdot.org/slashdot.xml.
Please click here to see an example of slashdot XML.
As you can see, the XML file starts with the top level backslash tag.
This tag can contain one or more story tags, and so on. The structure
of the XML file is defined in another file named backslash.dtd. This
DTD file is located at the URL specified in the second line of the XML file
and is also available over the World Wide Web. blackslash.dtd is shown
in the following table:
backslash.dtd
|
<!DOCTYPE backslash [ <!ELEMENT backslash (story*)> <!ELEMENT story (title, url, time, author, department, topic, comments, section, image)> <!ELEMENT title (#PCDATA)> <!ELEMENT url (#PCDATA)> <!ELEMENT time (#PCDATA)> <!ELEMENT author (#PCDATA)> <!ELEMENT department (#PCDATA)> <!ELEMENT topic (#PCDATA)> <!ELEMENT comments (#PCDATA)> <!ELEMENT section (#PCDATA)> <!ELEMENT image (#PCDATA)> ]>
|
The DTD file tells us the following about our XML document:
- The DTD describes the structure of the blackslash node (element)
and those elements beneath it (its sub-elements).
- The backslash element can contain zero or more story elements
(defined by using the story* symbology described later)
- Any story element must contain the following sub elements
- title,
- url,
- time,
- author,
- department,
- topic,
- comments,
- section, and
- Each story element contains no sub-elements but each element is
of the #PCDATA type. #PCDATA stands for parsed character
data (in other words, text data). Elements can also be of the EMPTY
type (note that there is no # symbol), signifying that the element has no
content, but may contain attributes (which we will discuss later).
Let's return to our pet store example. If we wanted to create an XML file which
will become a listing of each item our store carries, we would need to create
a DTD structure that can accommodate such a file. In designing the DTD, it should
be clear that we would need to place the following information in the XML file:
- item name,
- item product number,
- item description, and
- item price
For now, we will create the DTD from the first two items in the list (you'll be
finishing the DTD in Lab Activity #2). Thus, it's likely we would want to create
a DTD that organized the data in the following format:
<!DOCTYPE backslash [ <!ELEMENT store (products*)> <!ELEMENT products (itemName, itemProductNumber)> <!ELEMENT itemName (#PCDATA)> <!ELEMENT itemProductNumber (#PCDATA)> ]>
Lab Activity #2:
|
- Modify our example DTD file (store.dtd) to capture the following
data fields for each product in the store. Note that we added
a few fields not previously seen in our pet store example.
- item description
- item price (in US Dollars)
- quantity on hand
- supplier name
- restock time (number of days to resupply)
- Next, create the corresponding product file (product.xml)
which refers to the store.dtd file you just created. Include the
three products we currently offer and add an additional three products
of your choosing.
|
14.3 Moving the DTD file into the XML file
On some occasions it may be preferred to bundle the XML file and the DTD file
together, so that it is all inclusive in one file. In the examples in Section
2 we keep them separated. However, with a tad more effort we can bundle our files
together. First, let's start with our XML file. We will use the example from
my recent project. Here we have the start of the XML file (tableStructure.xml).
tableStructure.xml
|
<?xml version="1.0"?>
<!-- File Name: tableInfo.xml -->
<Level>
<Category name="Expressions">
<Description>Alterations to variables</Description>
<LanguageItem name="prefix">
<Description>Before the variable</Description>
</LanguageItem>
<LanguageItem name="postfix">
<Description>After the variable</Description>
</LanguageItem>
<LanguageItem name="typeCast">
<Description>Cast variable to a type</Description>
</LanguageItem>
</Category>
<Category name="Multiplicative">
<Description>Non-Arithmetic operations</Description>
<LanguageItem name="multiplication">
<Description>Multiplication of two variables</Description>
</LanguageItem>
<LanguageItem name="division">
<Description>Division of two variables</Description>
</LanguageItem>
<LanguageItem name="modulus">
<Description>Remainder of division operation</Description>
</LanguageItem>
</Category>
(file truncated for brevity)
|
In the example above, we have also included a comment line. Comments in XML (as
in HTML) start with the characters <!-- and terminate with the characters
-->.
Next, we want to include the DTD in the the .xml file so that the .xml file contains
both the data (what's already there) and the definition of the structure of the
document. To do this, we take the contents of our DTD file and embed the DTD
as shown in the next table.
tableStructure.xml (revised)
|
<?xml version="1.0"?>
<!-- File Name: tableInfo.xml -->
<!DOCTYPE Level
[
<!ELEMENT Level (Category*, Description?)>
<!ELEMENT Category (Description?, LanguageItem+)>
<!ELEMENT LanguageItem (Description?, SubItem*)>
<!ELEMENT SubItem (Description?)>
<!ELEMENT Description (#PCDATA)>
<!-- Higher levels must have names -->
<!ATTLIST Category name CDATA #REQUIRED>
<!ATTLIST LanguageItem name CDATA #REQUIRED>
<!ATTLIST SubItem name CDATA #REQUIRED>
]
>
<Level>
<Category name="Expressions">
<Description>Alterations to variables</Description>
<LanguageItem name="prefix">
<Description>Before the variable</Description>
</LanguageItem>
<LanguageItem name="postfix">
<Description>After the variable</Description>
</LanguageItem>
<LanguageItem name="typeCast">
<Description>Cast variable to a type</Description>
</LanguageItem>
</Category>
<Category name="Multiplicative">
<Description>Non-Arithmetic operations</Description>
<LanguageItem name="multiplication">
<Description>Multiplication of two variables</Description>
</LanguageItem>
<LanguageItem name="division">
<Description>Division of two variables</Description>
</LanguageItem>
<LanguageItem name="modulus">
<Description>Remainder of division operation</Description>
</LanguageItem>
</Category>
(file truncated for brevity)
|
Note that we have placed the DTD after the initial <?xml version="1.0"?>
statement, and prior to the root tag (Level). It is also likely that you
have noticed we added a few other features to our DTD file. We'll discuss several
of them now, and complete the discussion in the following section.
Our DTD section appears as:
<!DOCTYPE Level
[
<!ELEMENT Level (Category*, Description?)>
<!ELEMENT Category (Description?, LanguageItem+)>
<!ELEMENT LanguageItem (Description?, SubItem*)>
<!ELEMENT SubItem (Description?)>
<!ELEMENT Description (#PCDATA)>
<!-- Higher levels must have names -->
<!ATTLIST Category name CDATA #REQUIRED>
<!ATTLIST LanguageItem name CDATA #REQUIRED>
<!ATTLIST SubItem name CDATA #REQUIRED>
]
>
The DTD defines five elements (Level, Category, LanguageItem,
SubItem, and Description). Of these five, only the Description
node does not contain another element (it only contains a textual value). Following
the definition of the elements, attribute definitions are also provided. We will
discuss attribute creation and use in the DTD in the next section of this lab.
However, there is one final point to make with regard to the definition of each
element, and that is the new notation we used. Specifically we are referring
here to the "*", "?", and "+" characters. These are common characters in what
is known as regular expressions in computer science terms. But, in a nutshell,
they dictate the number of sub-elements that may appear.
For example, for the Category sub-element, the definition (<!ELEMENT
Level (Category*, Description?)>) states that a Level element can
contain zero or more Category sub-elements, followed by zero or one Description
sub-elements. In the table that follows, we define each of these new expression
characters. If an expression character is not present, then the element shown
must occur once in as a sub-element.
Expression Character
|
Meaning
|
* (asterisk)
|
The element that precedes the character must appear zero
or more times.
|
? (question mark)
|
The element that precedes the character must appear zero
or one time.
|
+ (addition sign)
|
The element that precedes the character must appear one
or more times.
|
Lab Activity #3:
|
- For each of the following elements (based on the revised
tableStructure.xml file shown above), write out which sub-elements
may appear in the XML file, and how many times each may appear.
- Category
- LangaugeItem
- SubItem
- Description
- Modify your XML file from Lab Activity #2 to include the
DTD file.
- Rewrite your DTD to include at least two of the expression
characters in your DTD.
|
14.4 DTD Attributes
As we briefly mentioned earlier, the second half of our DTD file contained attributes
for the tags we created. These attributes were created with the following lines
in the DTD:
<!ATTLIST Category name CDATA #REQUIRED>
<!ATTLIST LanguageItem name CDATA #REQUIRED>
<!ATTLIST SubItem name CDATA #REQUIRED>
Each attribute is created by first naming the tag that it is associated with (Category
for the first attribute created), then the name of attribute (in each case the
attribute name is defined), followed by the type of data the attribute
will contain (CDATA represents character data). Finally, the #REQUIRED
indicates that a value must be provided for this attribute.
In practice, the first Category and LanguageItem we define in the
body of the XML document uses these attributes to provide additional information
about the tags.
<Category name="Expressions">
<Description>Alterations to variables</Description>
<LanguageItem name="prefix">
<Description>Before the variable</Description>
</LanguageItem>
In the sample above, we define a Category node named "Expressions" which
then contains a description of the node, as well as the sub-element (or sub-node)
named "prefix" (which happens to be a LanguageItem). The "prefix" element
also contains a description. By using attributes we can provide additional information
about each element or node, just as we did with HTML tags.
Lab Activity #4:
|
- Edit both the store XML file (from Lab Activity #3) to incorporate
in no less than 5 new attributes. Which ones did you create?
- Refer to the World Wide Web Consortium's recommendation on
XML and determine the other types of fields that can be used in
creating attributes. What are they and what do they accomplish?
|
About the labs:
These labs were developed in conjunction with the Jones
and Bartlett textbook Computer Science
Illuminated by Nell Dale and John Lewis.
ISBN: 0-7637-1760-6
Lab content developed by Pete DePasquale and John Lewis.
|