GLib Reference Manual | ||||
---|---|---|---|---|
#include <glib.h> enum GMarkupError; #define G_MARKUP_ERROR enum GMarkupParseFlags; GMarkupParseContext; GMarkupParser; gchar* g_markup_escape_text (const gchar *text, gssize length); gchar* g_markup_printf_escaped (const char *format, ...); gchar* g_markup_vprintf_escaped (const char *format, va_list args); gboolean g_markup_parse_context_end_parse (GMarkupParseContext *context, GError **error); void g_markup_parse_context_free (GMarkupParseContext *context); void g_markup_parse_context_get_position (GMarkupParseContext *context, gint *line_number, gint *char_number); const gchar* g_markup_parse_context_get_element (GMarkupParseContext *context); const GSList* g_markup_parse_context_get_element_stack (GMarkupParseContext *context); gpointer g_markup_parse_context_get_user_data (GMarkupParseContext *context); GMarkupParseContext* g_markup_parse_context_new (const GMarkupParser *parser, GMarkupParseFlags flags, gpointer user_data, GDestroyNotify user_data_dnotify); gboolean g_markup_parse_context_parse (GMarkupParseContext *context, const gchar *text, gssize text_len, GError **error); void g_markup_parse_context_push (GMarkupParseContext *context, GMarkupParser *parser, gpointer user_data); gpointer g_markup_parse_context_pop (GMarkupParseContext *context); enum GMarkupCollectType; gboolean g_markup_collect_attributes (const gchar *element_name, const gchar **attribute_names, const gchar **attribute_values, GError **error, GMarkupCollectType first_type, const gchar *first_attr, ...);
The "GMarkup" parser is intended to parse a simple markup format that's a subset of XML. This is a small, efficient, easy-to-use parser. It should not be used if you expect to interoperate with other applications generating full-scale XML. However, it's very useful for application data files, config files, etc. where you know your application will be the only one writing the file. Full-scale XML parsers should be able to parse the subset used by GMarkup, so you can easily migrate to full-scale XML at a later time if the need arises.
GMarkup is not guaranteed to signal an error on all invalid XML; the parser may accept documents that an XML parser would not. However, XML documents which are not well-formed[5] are not considered valid GMarkup documents.
Simplifications to XML include:
Only UTF-8 encoding is allowed.
No user-defined entities.
Processing instructions, comments and the doctype declaration are "passed through" but are not interpreted in any way.
No DTD or validation.
The markup format does support:
Elements
Attributes
5 standard entities: & < > " '
Character references
Sections marked as CDATA
typedef enum { G_MARKUP_ERROR_BAD_UTF8, G_MARKUP_ERROR_EMPTY, G_MARKUP_ERROR_PARSE, /* The following are primarily intended for specific GMarkupParser * implementations to set. */ G_MARKUP_ERROR_UNKNOWN_ELEMENT, G_MARKUP_ERROR_UNKNOWN_ATTRIBUTE, G_MARKUP_ERROR_INVALID_CONTENT, G_MARKUP_ERROR_MISSING_ATTRIBUTE } GMarkupError;
Error codes returned by markup parsing.
G_MARKUP_ERROR_BAD_UTF8
|
text being parsed was not valid UTF-8 |
G_MARKUP_ERROR_EMPTY
|
document contained nothing, or only whitespace |
G_MARKUP_ERROR_PARSE
|
document was ill-formed |
G_MARKUP_ERROR_UNKNOWN_ELEMENT
|
error should be set by GMarkupParser functions; element wasn't known |
G_MARKUP_ERROR_UNKNOWN_ATTRIBUTE
|
error should be set by GMarkupParser functions; attribute wasn't known |
G_MARKUP_ERROR_INVALID_CONTENT
|
error should be set by GMarkupParser functions; content was invalid |
G_MARKUP_ERROR_MISSING_ATTRIBUTE
|
error should be set by GMarkupParser functions; a required attribute was missing |
#define G_MARKUP_ERROR g_markup_error_quark ()
Error domain for markup parsing. Errors in this domain will be from the GMarkupError enumeration. See GError for information on error domains.
typedef enum { G_MARKUP_DO_NOT_USE_THIS_UNSUPPORTED_FLAG = 1 << 0, G_MARKUP_TREAT_CDATA_AS_TEXT = 1 << 1, G_MARKUP_PREFIX_ERROR_POSITION = 1 << 2 } GMarkupParseFlags;
Flags that affect the behaviour of the parser.
G_MARKUP_DO_NOT_USE_THIS_UNSUPPORTED_FLAG
|
flag you should not use. |
G_MARKUP_TREAT_CDATA_AS_TEXT
|
When this flag is set, CDATA marked
sections are not passed literally to the passthrough function of
the parser. Instead, the content of the section (without the
<![CDATA[ and ]]> ) is
passed to the text function. This flag was added in GLib 2.12.
|
G_MARKUP_PREFIX_ERROR_POSITION
|
Normally errors caught by GMarkup itself have line/column information prefixed to them to let the caller know the location of the error. When this flag is set the location information is also prefixed to errors generated by the GMarkupParser implementation functions. |
typedef struct _GMarkupParseContext GMarkupParseContext;
A parse context is used to parse a stream of bytes that you expect to
contain marked-up text. See g_markup_parse_context_new()
,
GMarkupParser, and so on for more details.
typedef struct { /* Called for open tags <foo bar="baz"> */ void (*start_element) (GMarkupParseContext *context, const gchar *element_name, const gchar **attribute_names, const gchar **attribute_values, gpointer user_data, GError **error); /* Called for close tags </foo> */ void (*end_element) (GMarkupParseContext *context, const gchar *element_name, gpointer user_data, GError **error); /* Called for character data */ /* text is not nul-terminated */ void (*text) (GMarkupParseContext *context, const gchar *text, gsize text_len, gpointer user_data, GError **error); /* Called for strings that should be re-saved verbatim in this same * position, but are not otherwise interpretable. At the moment * this includes comments and processing instructions. */ /* text is not nul-terminated. */ void (*passthrough) (GMarkupParseContext *context, const gchar *passthrough_text, gsize text_len, gpointer user_data, GError **error); /* Called on error, including one set by other * methods in the vtable. The GError should not be freed. */ void (*error) (GMarkupParseContext *context, GError *error, gpointer user_data); } GMarkupParser;
Any of the fields in GMarkupParser can be NULL
, in which case they
will be ignored. Except for the error
function, any of these
callbacks can set an error; in particular the
G_MARKUP_ERROR_UNKNOWN_ELEMENT
, G_MARKUP_ERROR_UNKNOWN_ATTRIBUTE
,
and G_MARKUP_ERROR_INVALID_CONTENT
errors are intended to be set
from these callbacks. If you set an error from a callback,
g_markup_parse_context_parse()
will report that error back to its caller.
start_element () |
Callback to invoke when the opening tag of an element is seen. |
end_element () |
Callback to invoke when the closing tag of an element is seen.
Note that this is also called for empty tags like
<empty/> .
|
text () |
Callback to invoke when some text is seen (text is always
inside an element). Note that the text of an element may be spread
over multiple calls of this function. If the G_MARKUP_TREAT_CDATA_AS_TEXT
flag is set, this function is also called for the content of CDATA marked
sections.
|
passthrough () |
Callback to invoke for comments, processing instructions
and doctype declarations; if you're re-writing the parsed document,
write the passthrough text back out in the same position. If the
G_MARKUP_TREAT_CDATA_AS_TEXT flag is not set, this function is also
called for CDATA marked sections.
|
error () |
Callback to invoke when an error occurs. |
gchar* g_markup_escape_text (const gchar *text, gssize length);
Escapes text so that the markup parser will parse it verbatim. Less than, greater than, ampersand, etc. are replaced with the corresponding entities. This function would typically be used when writing out a file to be parsed with the markup parser.
Note that this function doesn't protect whitespace and line endings from being processed according to the XML rules for normalization of line endings and attribute values.
Note also that if given a string containing them, this function will produce character references in the range of &x1; .. &x1f; for all control sequences except for tabstop, newline and carriage return. The character references in this range are not valid XML 1.0, but they are valid XML 1.1 and will be accepted by the GMarkup parser.
text : |
some valid UTF-8 text |
length : |
length of text in bytes, or -1 if the text is nul-terminated
|
Returns : | a newly allocated string with the escaped text |
gchar* g_markup_printf_escaped (const char *format, ...);
Formats arguments according to format
, escaping
all string and character arguments in the fashion
of g_markup_escape_text()
. This is useful when you
want to insert literal strings into XML-style markup
output, without having to worry that the strings
might themselves contain markup.
const char *store = "Fortnum & Mason"; const char *item = "Tea"; char *output; output = g_markup_printf_escaped ("<purchase>" "<store>%s</store>" "<item>%s</item>" "</purchase>", store, item);
format : |
printf() style format string
|
... : |
the arguments to insert in the format string |
Returns : | newly allocated result from formatting
operation. Free with g_free() .
|
Since 2.4
gchar* g_markup_vprintf_escaped (const char *format, va_list args);
Formats the data in args
according to format
, escaping
all string and character arguments in the fashion
of g_markup_escape_text()
. See g_markup_printf_escaped()
.
format : |
printf() style format string
|
args : |
variable argument list, similar to vprintf()
|
Returns : | newly allocated result from formatting
operation. Free with g_free() .
|
Since 2.4
gboolean g_markup_parse_context_end_parse (GMarkupParseContext *context, GError **error);
Signals to the GMarkupParseContext that all data has been
fed into the parse context with g_markup_parse_context_parse()
.
This function reports an error if the document isn't complete,
for example if elements are still open.
context : |
a GMarkupParseContext |
error : |
return location for a GError |
Returns : | TRUE on success, FALSE if an error was set
|
void g_markup_parse_context_free (GMarkupParseContext *context);
Frees a GMarkupParseContext. Can't be called from inside one of the GMarkupParser functions. Can't be called while a subparser is pushed.
context : |
a GMarkupParseContext |
void g_markup_parse_context_get_position (GMarkupParseContext *context, gint *line_number, gint *char_number);
Retrieves the current line number and the number of the character on that line. Intended for use in error messages; there are no strict semantics for what constitutes the "current" line number other than "the best number we could come up with for error messages."
context : |
a GMarkupParseContext |
line_number : |
return location for a line number, or NULL
|
char_number : |
return location for a char-on-line number, or NULL
|
const gchar* g_markup_parse_context_get_element (GMarkupParseContext *context);
Retrieves the name of the currently open element.
If called from the start_element or end_element handlers this will
give the element_name as passed to those functions. For the parent
elements, see g_markup_parse_context_get_element_stack()
.
context : |
a GMarkupParseContext |
Returns : | the name of the currently open element, or NULL
|
Since 2.2
const GSList* g_markup_parse_context_get_element_stack (GMarkupParseContext *context);
Retrieves the element stack from the internal state of the parser.
The returned GSList is a list of strings where the first item is
the currently open tag (as would be returned by
g_markup_parse_context_get_element()
) and the next item is its
immediate parent.
This function is intended to be used in the start_element and
end_element handlers where g_markup_parse_context_get_element()
would merely return the name of the element that is being
processed.
context : |
a GMarkupParseContext |
Returns : | the element stack, which must not be modified |
Since 2.16
gpointer g_markup_parse_context_get_user_data (GMarkupParseContext *context);
Returns the user_data associated with context
. This will either
be the user_data that was provided to g_markup_parse_context_new()
or to the most recent call of g_markup_parse_context_push()
.
context : |
a GMarkupParseContext |
Returns : | the provided user_data. The returned data belongs to
the markup context and will be freed when g_markup_context_free()
is called.
|
Since 2.18
GMarkupParseContext* g_markup_parse_context_new (const GMarkupParser *parser, GMarkupParseFlags flags, gpointer user_data, GDestroyNotify user_data_dnotify);
Creates a new parse context. A parse context is used to parse marked-up documents. You can feed any number of documents into a context, as long as no errors occur; once an error occurs, the parse context can't continue to parse text (you have to free it and create a new parse context).
parser : |
a GMarkupParser |
flags : |
one or more GMarkupParseFlags |
user_data : |
user data to pass to GMarkupParser functions |
user_data_dnotify : |
user data destroy notifier called when the parse context is freed |
Returns : | a new GMarkupParseContext |
gboolean g_markup_parse_context_parse (GMarkupParseContext *context, const gchar *text, gssize text_len, GError **error);
Feed some data to the GMarkupParseContext. The data need not be valid UTF-8; an error will be signaled if it's invalid. The data need not be an entire document; you can feed a document into the parser incrementally, via multiple calls to this function. Typically, as you receive data from a network connection or file, you feed each received chunk of data into this function, aborting the process if an error occurs. Once an error is reported, no further data may be fed to the GMarkupParseContext; all errors are fatal.
context : |
a GMarkupParseContext |
text : |
chunk of text to parse |
text_len : |
length of text in bytes
|
error : |
return location for a GError |
Returns : | FALSE if an error occurred, TRUE on success
|
void g_markup_parse_context_push (GMarkupParseContext *context, GMarkupParser *parser, gpointer user_data);
Temporarily redirects markup data to a sub-parser.
This function may only be called from the start_element handler of
a GMarkupParser. It must be matched with a corresponding call to
g_markup_parse_context_pop()
in the matching end_element handler
(except in the case that the parser aborts due to an error).
All tags, text and other data between the matching tags is
redirected to the subparser given by parser
. user_data
is used
as the user_data for that parser. user_data
is also passed to the
error callback in the event that an error occurs. This includes
errors that occur in subparsers of the subparser.
The end tag matching the start tag for which this call was made is
handled by the previous parser (which is given its own user_data)
which is why g_markup_parse_context_pop()
is provided to allow "one
last access" to the user_data
provided to this function. In the
case of error, the user_data
provided here is passed directly to
the error callback of the subparser and g_markup_parse_context()
should not be called. In either case, if user_data
was allocated
then it ought to be freed from both of these locations.
This function is not intended to be directly called by users interested in invoking subparsers. Instead, it is intended to be used by the subparsers themselves to implement a higher-level interface.
As an example, see the following implementation of a simple parser that counts the number of tags encountered.
typedef struct { gint tag_count; } CounterData; static void counter_start_element (GMarkupParseContext *context, const gchar *element_name, const gchar **attribute_names, const gchar **attribute_values, gpointer user_data, GError **error) { CounterData *data = user_data; data->tag_count++; } static void counter_error (GMarkupParseContext *context, GError *error, gpointer user_data) { CounterData *data = user_data; g_slice_free (CounterData, data); } static GMarkupParser counter_subparser = { counter_start_element, NULL, NULL, NULL, counter_error };
In order to allow this parser to be easily used as a subparser, the following interface is provided:
void start_counting (GMarkupParseContext *context) { CounterData *data = g_slice_new (CounterData); data->tag_count = 0; g_markup_parse_context_push (context, &counter_subparser, data); } gint end_counting (GMarkupParseContext *context) { CounterData *data = g_markup_parse_context_pop (context); int result; result = data->tag_count; g_slice_free (CounterData, data); return result; }
The subparser would then be used as follows:
static void start_element (context, element_name, ...) { if (strcmp (element_name, "count-these") == 0) start_counting (context); /* else, handle other tags... */ } static void end_element (context, element_name, ...) { if (strcmp (element_name, "count-these") == 0) g_print ("Counted %d tags\n", end_counting (context)); /* else, handle other tags... */ }
context : |
a GMarkupParseContext |
parser : |
a GMarkupParser |
user_data : |
user data to pass to GMarkupParser functions |
Since 2.18
gpointer g_markup_parse_context_pop (GMarkupParseContext *context);
Completes the process of a temporary sub-parser redirection.
This function exists to collect the user_data allocated by a
matching call to g_markup_parse_context_push()
. It must be called
in the end_element handler corresponding to the start_element
handler during which g_markup_parse_context_push()
was called. You
must not call this function from the error callback -- the
user_data
is provided directly to the callback in that case.
This function is not intended to be directly called by users interested in invoking subparsers. Instead, it is intended to be used by the subparsers themselves to implement a higher-level interface.
context : |
a GMarkupParseContext |
Returns : | the user_data passed to g_markup_parse_context_push() .
|
Since 2.18
typedef enum { G_MARKUP_COLLECT_INVALID, G_MARKUP_COLLECT_STRING, G_MARKUP_COLLECT_STRDUP, G_MARKUP_COLLECT_BOOLEAN, G_MARKUP_COLLECT_TRISTATE, G_MARKUP_COLLECT_OPTIONAL = (1 << 16) } GMarkupCollectType;
A mixed enumerated type and flags field. You must specify one type
(string, strdup, boolean, tristate). Additionally, you may
optionally bitwise OR the type with the flag
G_MARKUP_COLLECT_OPTIONAL
.
It is likely that this enum will be extended in the future to support other types.
G_MARKUP_COLLECT_INVALID
|
used to terminate the list of attributes to collect. |
G_MARKUP_COLLECT_STRING
|
collect the string pointer directly from
the attribute_values[] array. Expects a
parameter of type (const char **). If
G_MARKUP_COLLECT_OPTIONAL is specified
and the attribute isn't present then the
pointer will be set to NULL .
|
G_MARKUP_COLLECT_STRDUP
|
as with G_MARKUP_COLLECT_STRING , but
expects a paramter of type (char **) and
g_strdup() s the returned pointer. The
pointer must be freed with g_free() .
|
G_MARKUP_COLLECT_BOOLEAN
|
expects a parameter of type (gboolean *)
and parses the attribute value as a
boolean. Sets FALSE if the attribute
isn't present. Valid boolean values
consist of (case insensitive) "false",
"f", "no", "n", "0" and "true", "t",
"yes", "y", "1".
|
G_MARKUP_COLLECT_TRISTATE
|
as with G_MARKUP_COLLECT_BOOLEAN , but
in the case of a missing attribute a
value is set that compares equal to
neither FALSE nor TRUE .
G_MARKUP_COLLECT_OPTIONAL is implied.
|
G_MARKUP_COLLECT_OPTIONAL
|
can be bitwise ORed with the other fields. If present, allows the attribute not to appear. A default value is set depending on what value type is used. |
gboolean g_markup_collect_attributes (const gchar *element_name, const gchar **attribute_names, const gchar **attribute_values, GError **error, GMarkupCollectType first_type, const gchar *first_attr, ...);
Collects the attributes of the element from the data passed to the GMarkupParser start_element function, dealing with common error conditions and supporting boolean values.
This utility function is not required to write a parser but can save a lot of typing.
The element_name
, attribute_names
,
attribute_values
and error
parameters passed
to the start_element callback should be passed
unmodified to this function.
Following these arguments is a list of
"supported" attributes to collect. It is an
error to specify multiple attributes with the
same name. If any attribute not in the list
appears in the attribute_names
array then an
unknown attribute error will result.
The GMarkupCollectType field allows specifying the type of collection to perform and if a given attribute must appear or is optional.
The attribute name is simply the name of the attribute to collect.
The pointer should be of the appropriate type
(see the descriptions under
GMarkupCollectType) and may be NULL
in case a
particular attribute is to be allowed but
ignored.
This function deals with issuing errors for missing attributes
(of type G_MARKUP_ERROR_MISSING_ATTRIBUTE
), unknown attributes
(of type G_MARKUP_ERROR_UNKNOWN_ATTRIBUTE
) and duplicate
attributes (of type G_MARKUP_ERROR_INVALID_CONTENT
) as well
as parse errors for boolean-valued attributes (again of type
G_MARKUP_ERROR_INVALID_CONTENT
). In all of these cases FALSE
will be returned and error
will be set as appropriate.
element_name : |
the current tag name |
attribute_names : |
the attribute names |
attribute_values : |
the attribute values |
error : |
a pointer to a GError or NULL
|
first_type : |
the GMarkupCollectType of the first attribute |
first_attr : |
the name of the first attribute |
... : |
a pointer to the storage location of the
first attribute (or NULL ), followed by
more types names and pointers, ending
with G_MARKUP_COLLECT_INVALID .
|
Returns : | TRUE if successful
|
Since 2.16