This document describes how to configure and use the HTMLDoc object.
Called when a new htmldoc object is created.
$doc = new \hexydec\html\htmldoc($config);The options set into the object are setup for general use, but can be configured with the following options:
| option | Description | Defaults |
|---|---|---|
inline |
HTML elements that are considered inline | Array('b', 'big', 'i', 'small', 'ttspan', 'em', 'a', 'strong', 'sub', 'sup', 'abbr', 'acronym', 'cite', 'code', 'dfn', 'em', 'kbd', 'strong', 'samp', 'var', 'span') |
singleton |
HTML elements that are singletons | Array('area', 'base', 'br', 'col', 'command', 'embed', 'hr', 'img', 'input', 'keygen', 'link', 'meta', 'param', 'source', 'track', 'wbr') |
closeoptional |
HTML elements that don't have to be closed | Array('head', 'body', 'p', 'dt', 'dd', 'li', 'option', 'thead', 'th', 'tbody', 'tr', 'td', 'tfoot', 'colgroup') |
pre |
HTML elements that contain pre-formatted content | Array('textarea', 'pre', 'code') |
plugins |
HTML elements that have a custom handler class | Array('script', 'style') |
| option | Description | Defaults |
|---|---|---|
boolean |
HTML attributes that are boolean values | Array('allowfullscreen', 'allowpaymentrequest', 'async', 'autofocus', 'autoplay', 'checked', 'contenteditable', 'controls', 'default', 'defer', 'disabled', 'formnovalidate', 'hidden', 'indeterminate', 'ismap', 'itemscope', 'loop', 'multiple', 'muted', 'nomodule', 'novalidate', 'open', 'readonly', 'required', 'reversed', 'scoped', 'selected', 'typemustmatch') |
default |
Default attributes that can be removed | Array( |
empty |
Attributes to remove if empty | Array('id', 'class', 'style', 'title') |
urls |
Attributes that contain urls | Array('href', 'src', 'action', 'poster') |
Open an HTML file from a URL.
Note the charset of the document is determined by the charset directive of the Content-Type header. If the header is not present, the charset will be detected using the method described in the load() method.
$doc = new \hexydec/html\htmldoc();
$doc->open($url, $context = null, &$error = null);| Parameter | Type | Description |
|---|---|---|
$url |
String | The URL of the HTML document to be opened |
$context |
Resource | A stream context resource created with stream_context_create() |
$error |
String | A reference to a description of any error that is generated. |
Loads the inputted HTML as a document.
$doc = new \hexydec/html\htmldoc();
$doc->load($html, $charset = null);| Parameter | Type | Description |
|---|---|---|
$html |
String | The HTML to be parsed into the object |
$charset |
String | The charset of the document, or null to auto-detect |
Find elements within the document using a CSS selector.
$doc = new \hexydec/html\htmldoc();
if ($doc->load($html, $charset)) {
$found = $doc->find($selector);
}A CSS selector defining the nodes to find within the document. The following selectors can be used:
| Selector | Example |
|---|---|
| Any element | * |
| Tag | div |
| ID | #foo |
| Class | .foo |
| Attribute | [href] |
| Attribute equals | [href=/foo/bar/] |
| Attribute begins with | [href^=/foo] |
| Attribute contains | [href*=foo] |
| Attribute ends with | [href$=bar/] |
| First Child | :first-child |
| Last Child | :last-child |
| Child selector | > |
Selectors can be put together in combinations, and multiple selectors can be used:
$found = $doc->find('div.foo');
$found = $doc->find('a.foo[href^=/foo]');
$found = $doc->find('div.foo[data-attr*=foo]:first-child');
$found = $doc->find('table.list th');
$found = $doc->find('ul.list > li');
$found = $doc->find('form a.button, form label.button');An HTMKDoc object containing the matched nodes.
Builds a new HTMLDoc collection containing only the node at the index requested.
$doc = new \hexydec/html\htmldoc();
if ($doc->load($html, $charset)) {
$found = $doc->find($selector)->eq($index);
}An integer indicating the zero based index of the element to return. A minus value will return that many items from the end of the collection.
An HTMLDoc collection containing the element at the index requested, or an empty HTMLDoc collection if the index is out of range.
Returns a new HTMLDoc collection containing the first element in the collection.
Returns a new HTMLDoc collection containing the last element in the collection.
Extracts an array of tag objects from an HTMLDoc collection.
Minifies the HTML document with the inputted or default options.
$doc = new \hexydec/html\htmldoc();
$doc->load($html);
$doc->minify($options);The optional $options array contains a list of configuration parameters to configure the minifier output, the options are as follows and are recursively merged with the default config:
| Parameter | Type | Options | Description | Default |
|---|---|---|---|---|
lowercase |
Boolean | Lowercase tag and attribute names | true | |
whitespace |
Boolean | Strip whitespace from text nodes (Preserves whitespace between inline items defined in htmldoc::$config['elements']['inline']) |
true | |
comments |
Array | Remove comments, set to false to preserve comments | Array() |
|
ie |
Whether to preserve Internet Explorer specific comments | true | ||
urls |
Array | Minify internal URL's | Array() |
|
absolute |
Process absolute URLs to make them relative to the current document | true | ||
host |
Remove the host for own domain | true | ||
scheme |
Remove the scheme from URLs that have the same scheme as the current document | true | ||
attributes |
Array | Minify attributes | Array() |
|
default |
Remove default attributes as defined in htmldoc::$config['attributes']['default'] |
true | ||
empty |
Remove attributes with empty values, the attributes processed are defined in htmldoc::$config['attributes']['empty'] |
true | ||
option |
Remove the value attribute from option tags where the text node has the same value |
true | ||
style |
Remove whitespace and last semi-colon from the style attribute |
true | ||
class |
Sort class names | true | ||
sort |
Sort attributes | true | ||
boolean |
Minify boolean attributes to render only the attribute name and not the value. Boolean attributes are defined in htmldoc::$config['attributes']['boolean'] |
true | ||
singleton |
Boolean | Removes spaces and slash in singleton attributes, e.g. <br /> becomes <br> |
true | |
quotes |
Boolean | Removes quotes from attribute values where possible | true | |
close |
Boolean | Removes closing tags for elements defined in `htmldoc::$config['elements']['closeoptional']` where possible | true | |
Compile the document into an HTML string and save to the specified location, or return as a string.
$doc = new \hexydec/html\htmldoc();
$doc->load($html);
$doc->save($file, $options);| Parameter | Type | Options | Description | Default |
|---|---|---|---|---|
$file |
String | The location to save the HTML, or null to return the HTML as a string |
null | |
$options |
Array | An array of output options, the input is merged with `htmldoc::$config['output']`. *Note that for most scenarios, specifying this argument is not required* | >Array()
| |
charset |
The charset the output should be converted to. The default null will prevent any charset conversion. |
null |
||
quotestyle |
Defines how to quote the attributes in the output, either double, single, or minimal. Note that using the minify() method using the option 'quotes' => true will change the default setting to minimal |
"double" |
||
singletonclose |
A string defining how singleton tags will be closed. Note that using the minify() method using the option 'singleton' => true will change the default setting to > |
" />" |
||
closetags |
A boolean specifying whether to force elements to render a closing tag. If false, the renderer will follow the value defined in tag::$close (Which will be set according to whether the tag had no closing tag when the document was parsed, or may be set to false if the document has been minified with minify()) |
false |
||
Returns the HTML document as a string if $file is null, or true if the file was successfully saved to the specified file. On error the method will return false.