Unmarshalling XML

Go provides a function Unmarshal and a method func (*Parser) Unmarshal to unmarshal XML into Go data structures. The unmarshalling is not perfect: Go and XML are different languages.

We consider a simple example before looking at the details. We take the XML document given earlier of

  1. <person>
  2. <name>
  3. <family> Newmarch </family>
  4. <personal> Jan </personal>
  5. </name>
  6. <email type="personal">
  7. jan@newmarch.name
  8. </email>
  9. <email type="work">
  10. j.newmarch@boxhill.edu.au
  11. </email>
  12. </person>

We would like to map this onto the Go structures

  1. type Person struct {
  2. Name Name
  3. Email []Email
  4. }
  5. type Name struct {
  6. Family string
  7. Personal string
  8. }
  9. type Email struct {
  10. Type string
  11. Address string
  12. }

This requires several comments:

  1. Unmarshalling uses the Go reflection package. This requires that all fields by public i.e. start with a capital letter. Earlier versions of Go used case-insensitive matching to match fields such as the XML string “name” to the field Name. Now, though, case-sensitive matching is used. To perform a match, the structure fields must be tagged to show the XML string that will be matched against. This changes Person to
  1. type Person struct {
  2. Name Name `xml:"name"`
  3. Email []Email `xml:"email"`
  4. }
  1. While tagging of fields can attach XML strings to fields, it can’t do so with the names of the structures. An additional field is required, with field name “XMLName”. This only affects the top-level struct, Person
  1. type Person struct {
  2. XMLName Name `xml:"person"`
  3. Name Name `xml:"name"`
  4. Email []Email `xml:"email"`
  5. }
  1. Repeated tags in the map to a slice in Go

  2. Attributes within tags will match to fields in a structure only if the Go field has the tag “,attr”. This occurs with the field Type of Email, where matching the attribute “type” of the “email” tag requires `xml:"type,attr"`

  3. If an XML tag has no attributes and only has character data, then it matches a string field by the same name (case-sensitive, though). So the tag `xml:"family"` with character data “Newmarch” maps to the string field Family

  4. But if the tag has attributes, then it must map to a structure. Go assigns the character data to the field with tag ,chardata. This occurs with the “email” data and the field Address with tag ,chardata

A program to unmarshal the document above is

  1. /* Unmarshal
  2. */
  3. package main
  4. import (
  5. "encoding/xml"
  6. "fmt"
  7. "os"
  8. //"strings"
  9. )
  10. type Person struct {
  11. XMLName Name `xml:"person"`
  12. Name Name `xml:"name"`
  13. Email []Email `xml:"email"`
  14. }
  15. type Name struct {
  16. Family string `xml:"family"`
  17. Personal string `xml:"personal"`
  18. }
  19. type Email struct {
  20. Type string `xml:"type,attr"`
  21. Address string `xml:",chardata"`
  22. }
  23. func main() {
  24. str := `<?xml version="1.0" encoding="utf-8"?>
  25. <person>
  26. <name>
  27. <family> Newmarch </family>
  28. <personal> Jan </personal>
  29. </name>
  30. <email type="personal">
  31. jan@newmarch.name
  32. </email>
  33. <email type="work">
  34. j.newmarch@boxhill.edu.au
  35. </email>
  36. </person>`
  37. var person Person
  38. err := xml.Unmarshal([]byte(str), &person)
  39. checkError(err)
  40. // now use the person structure e.g.
  41. fmt.Println("Family name: \"" + person.Name.Family + "\"")
  42. fmt.Println("Second email address: \"" + person.Email[1].Address + "\"")
  43. }
  44. func checkError(err error) {
  45. if err != nil {
  46. fmt.Println("Fatal error ", err.Error())
  47. os.Exit(1)
  48. }
  49. }

(Note the spaces are correct.). The strict rules are given in the package specification.