7.1 XML
XML is a commonly used data communication format in web services. Today, it's assuming a more and more important role in web development. In this section, we're going to introduce how to work with XML through Go's standard library.
I will not make any attempts to teach XML's syntax or conventions. For that, please read more documentation about XML itself. We will only focus on how to encode and decode XML files in Go.
Suppose you work in IT, and you have to deal with the following XML configuration file:
<?xml version="1.0" encoding="utf-8"?>
<servers version="1">
<server>
<serverName>Shanghai_VPN</serverName>
<serverIP>127.0.0.1</serverIP>
</server>
<server>
<serverName>Beijing_VPN</serverName>
<serverIP>127.0.0.2</serverIP>
</server>
</servers>
The above XML document contains two kinds of information about your server: the server name and IP. We will use this document in our following examples.
Parse XML
How do we parse this XML document? We can use the Unmarshal
function in Go's xml
package to do this.
func Unmarshal(data []byte, v interface{}) error
the data
parameter receives a data stream from an XML source, and v
is the structure you want to output the parsed XML to. It is an interface, which means you can convert XML to any structure you desire. Here, we'll only talk about how to convert from XML to the struct
type since they share similar tree structures.
Sample code:
package main
import (
"encoding/xml"
"fmt"
"io/ioutil"
"os"
)
type Recurlyservers struct {
XMLName xml.Name `xml:"servers"`
Version string `xml:"version,attr"`
Svs []server `xml:"server"`
Description string `xml:",innerxml"`
}
type server struct {
XMLName xml.Name `xml:"server"`
ServerName string `xml:"serverName"`
ServerIP string `xml:"serverIP"`
}
func main() {
file, err := os.Open("servers.xml") // For read access.
if err != nil {
fmt.Printf("error: %v", err)
return
}
defer file.Close()
data, err := ioutil.ReadAll(file)
if err != nil {
fmt.Printf("error: %v", err)
return
}
v := Recurlyservers{}
err = xml.Unmarshal(data, &v)
if err != nil {
fmt.Printf("error: %v", err)
return
}
fmt.Println(v)
}
XML is actually a tree data structure, and we can define a very similar structure using structs in Go, then use xml.Unmarshal
to convert from XML to our struct object. The sample code will print the following content:
{{ servers} 1 [{{ server} Shanghai_VPN 127.0.0.1} {{ server} Beijing_VPN 127.0.0.2}]
<server>
<serverName>Shanghai_VPN</serverName>
<serverIP>127.0.0.1</serverIP>
</server>
<server>
<serverName>Beijing_VPN</serverName>
<serverIP>127.0.0.2</serverIP>
</server>
}
We use xml.Unmarshal
to parse the XML document to the corresponding struct object. You should see that we have something like xml:"serverName"
in our struct. This is a feature of structs called struct tags
for helping with reflection. Let's see the definition of Unmarshal
again:
func Unmarshal(data []byte, v interface{}) error
The first argument is an XML data stream. The second argument is storage type and supports the struct, slice and string types. Go's XML package uses reflection for data mapping, so all fields in v should be exported. However, this causes a problem: how can it know which XML field corresponds to the mapped struct field? The answer is that the XML parser parses data in a certain order. The library will try to find the matching struct tag first. If a match cannot be found then it searches through the struct field names. Be aware that all tags, field names and XML elements are case sensitive, so you have to make sure that there is a one to one correspondence for the mapping to succeed.
Go's reflection mechanism allows you to use this tag information to reflect XML data to a struct object. If you want to know more about reflection in Go, please read the package documentation on struct tags and reflection.
Here are some rules when using the xml
package to parse XML documents to structs:
If the field type is a string or []byte with the tag
",innerxml"
,Unmarshal
will assign raw XML data to it, likeDescription
in the above example:Shanghai_VPN127.0.0.1Beijing_VPN127.0.0.2
If a field is called
XMLName
and its type isxml.Name
, then it gets the element name, likeservers
in above example.- If a field's tag contains the corresponding element name, then it gets the element name as well, like
servername
andserverip
in the above example. - If a field's tag contains
",attr"
, then it gets the corresponding element's attribute, likeversion
in above example. - If a field's tag contains something like
"a>b>c"
, it gets the value of the element c of node b of node a. - If a field's tag contains
"="
, then it gets nothing. - If a field's tag contains
",any"
, then it gets all child elements which do not fit the other rules. - If the XML elements have one or more comments, all of these comments will be added to the first field that has the tag that contains
",comments"
. This field type can be a string or []byte. If this kind of field does not exist, all comments are discard.
These rules tell you how to define tags in structs. Once you understand these rules, mapping XML to structs will be as easy as the sample code above. Because tags and XML elements have a one to one correspondence, we can also use slices to represent multiple elements on the same level.
Note that all fields in structs should be exported (capitalized) in order to parse data correctly.
Produce XML
What if we want to produce an XML document instead of parsing one. How do we do this in Go? Unsurprisingly, the xml
package provides two functions which are Marshal
and MarshalIndent
, where the second function automatically indents the marshalled XML document. Their definition as follows:
func Marshal(v interface{}) ([]byte, error)
func MarshalIndent(v interface{}, prefix, indent string) ([]byte, error)
The first argument in both of these functions is for storing a marshalled XML data stream.
Let's look at an example to see how this works:
package main
import (
"encoding/xml"
"fmt"
"os"
)
type Servers struct {
XMLName xml.Name `xml:"servers"`
Version string `xml:"version,attr"`
Svs []server `xml:"server"`
}
type server struct {
ServerName string `xml:"serverName"`
ServerIP string `xml:"serverIP"`
}
func main() {
v := &Servers{Version: "1"}
v.Svs = append(v.Svs, server{"Shanghai_VPN", "127.0.0.1"})
v.Svs = append(v.Svs, server{"Beijing_VPN", "127.0.0.2"})
output, err := xml.MarshalIndent(v, " ", " ")
if err != nil {
fmt.Printf("error: %v\n", err)
}
os.Stdout.Write([]byte(xml.Header))
os.Stdout.Write(output)
}
The above example prints the following information:
<?xml version="1.0" encoding="UTF-8"?>
<servers version="1">
<server>
<serverName>Shanghai_VPN</serverName>
<serverIP>127.0.0.1</serverIP>
</server>
<server>
<serverName>Beijing_VPN</serverName>
<serverIP>127.0.0.2</serverIP>
</server>
</servers>
As we've previously defined, the reason we have os.Stdout.Write([]byte(xml.Header))
is because both xml.MarshalIndent
and xml.Marshal
do not output XML headers on their own, so we have to explicitly print them in order to produce XML documents correctly.
Here we can see that Marshal
also receives a v parameter of type interface{}
. So what are the rules when marshalling to an XML document?
- If v is an array or slice, it prints all elements like a value.
- If v is a pointer, it prints the content that v is pointing to, printing nothing when v is nil.
- If v is a interface, it deal with the interface as well.
- If v is one of the other types, it prints the value of that type.
So how does xml.Marshal
decide the elements' name? It follows the proceeding rules:
- If v is a struct, it defines the name in the tag of XMLName.
- The field name is XMLName and the type is xml.Name.
- Field tag in struct.
- Field name in struct.
- Type name of marshal.
Then we need to figure out how to set tags in order to produce the final XML document.
- XMLName will not be printed.
- Fields that have tags containing
"-"
will not be printed. - If a tag contains
"name,attr"
, it uses name as the attribute name and the field value as the value, likeversion
in the above example. - If a tag contains
",attr"
, it uses the field's name as the attribute name and the field value as its value. - If a tag contains
",chardata"
, it prints character data instead of element. - If a tag contains
",innerxml"
, it prints the raw value. - If a tag contains
",comment"
, it prints it as a comment without escaping, so you cannot have "--" in its value. - If a tag contains
"omitempty"
, it omits this field if its value is zero-value, including false, 0, nil pointer or nil interface, zero length of array, slice, map and string. If a tag contains
"a>b>c"
, it prints three elements where a contains b and b contains c, like in the following code:FirstName string
xml:"name>first"
LastName stringxml:"name>last"
Asta Xie
You may have noticed that struct tags are very useful for dealing with XML, and the same goes for the other data formats we'll be discussing in the following sections. If you still find that you have problems with working with struct tags, you should probably read more documentation about them before diving into the next section.
Links
- Directory
- Previous section: Text files
- Next section: JSON