Skip to content

Latest commit

 

History

History
 
 

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 

readme.md

HTMLDoc API

This document describes how to configure and use the HTMLDoc object.

Methods

__construct()

Called when a new htmldoc object is created.

$doc = new \hexydec\html\htmldoc($config);

$config

The options set into the object are setup for general use, but can be configured with the following options:

elements
option Description Defaults
inline HTML elements that are considered inline Array('b', 'big', 'i', 'small', 'ttspan', 'em', 'a', 'strong', 'sub', 'sup', 'abbr', 'acronym', 'cite', 'code', 'dfn', 'em', 'kbd', 'strong', 'samp', 'var', 'span')
singleton HTML elements that are singletons Array('area', 'base', 'br', 'col', 'command', 'embed', 'hr', 'img', 'input', 'keygen', 'link', 'meta', 'param', 'source', 'track', 'wbr')
closeoptional HTML elements that don't have to be closed Array('head', 'body', 'p', 'dt', 'dd', 'li', 'option', 'thead', 'th', 'tbody', 'tr', 'td', 'tfoot', 'colgroup')
pre HTML elements that contain pre-formatted content Array('textarea', 'pre', 'code')
plugins HTML elements that have a custom handler class Array('script', 'style')
attributes
option Description Defaults
boolean HTML attributes that are boolean values Array('allowfullscreen', 'allowpaymentrequest', 'async', 'autofocus', 'autoplay', 'checked', 'contenteditable', 'controls', 'default', 'defer', 'disabled', 'formnovalidate', 'hidden', 'indeterminate', 'ismap', 'itemscope', 'loop', 'multiple', 'muted', 'nomodule', 'novalidate', 'open', 'readonly', 'required', 'reversed', 'scoped', 'selected', 'typemustmatch')
default Default attributes that can be removed Array(
    'style' => Array('type' => 'text/css'),
    'script' => Array('type' => 'text/javascript', 'language' => true),
    'form' => Array('method' => 'get'),
    'input' => Array('type' => 'text')
)
empty Attributes to remove if empty Array('id', 'class', 'style', 'title')
urls Attributes that contain urls Array('href', 'src', 'action', 'poster')

open()

Open an HTML file from a URL.

Note the charset of the document is determined by the charset directive of the Content-Type header. If the header is not present, the charset will be detected using the method described in the load() method.

$doc = new \hexydec/html\htmldoc();
$doc->open($url, $context = null, &$error = null);
Parameter Type Description
$url String The URL of the HTML document to be opened
$context Resource A stream context resource created with stream_context_create()
$error String A reference to a description of any error that is generated.

load()

Loads the inputted HTML as a document.

$doc = new \hexydec/html\htmldoc();
$doc->load($html, $charset = null);
Parameter Type Description
$html String The HTML to be parsed into the object
$charset String The charset of the document, or null to auto-detect

find()

Find elements within the document using a CSS selector.

$doc = new \hexydec/html\htmldoc();
if ($doc->load($html, $charset)) {
	$found = $doc->find($selector);
}

$selector

A CSS selector defining the nodes to find within the document. The following selectors can be used:

Selector Example
Any element *
Tag div
ID #foo
Class .foo
Attribute [href]
Attribute equals [href=/foo/bar/]
Attribute begins with [href^=/foo]
Attribute contains [href*=foo]
Attribute ends with [href$=bar/]
First Child :first-child
Last Child :last-child
Child selector >

Selectors can be put together in combinations, and multiple selectors can be used:

$found = $doc->find('div.foo');
$found = $doc->find('a.foo[href^=/foo]');
$found = $doc->find('div.foo[data-attr*=foo]:first-child');
$found = $doc->find('table.list th');
$found = $doc->find('ul.list > li');
$found = $doc->find('form a.button, form label.button');

Returns

An HTMKDoc object containing the matched nodes.

eq()

Builds a new HTMLDoc collection containing only the node at the index requested.

$doc = new \hexydec/html\htmldoc();
if ($doc->load($html, $charset)) {
	$found = $doc->find($selector)->eq($index);
}

$index

An integer indicating the zero based index of the element to return. A minus value will return that many items from the end of the collection.

Returns

An HTMLDoc collection containing the element at the index requested, or an empty HTMLDoc collection if the index is out of range.

first()

Returns a new HTMLDoc collection containing the first element in the collection.

last()

Returns a new HTMLDoc collection containing the last element in the collection.

get()

Extracts an array of tag objects from an HTMLDoc collection.

minify()

Minifies the HTML document with the inputted or default options.

$doc = new \hexydec/html\htmldoc();
$doc->load($html);
$doc->minify($options);

The optional $options array contains a list of configuration parameters to configure the minifier output, the options are as follows and are recursively merged with the default config:

Parameter Type Options Description Default
lowercase Boolean Lowercase tag and attribute names true
whitespace Boolean Strip whitespace from text nodes (Preserves whitespace between inline items defined in htmldoc::$config['elements']['inline']) true
comments Array Remove comments, set to false to preserve comments Array()
ie Whether to preserve Internet Explorer specific comments true
urls Array Minify internal URL's Array()
absolute Process absolute URLs to make them relative to the current document true
host Remove the host for own domain true
scheme Remove the scheme from URLs that have the same scheme as the current document true
attributes Array Minify attributes Array()
default Remove default attributes as defined in htmldoc::$config['attributes']['default'] true
empty Remove attributes with empty values, the attributes processed are defined in htmldoc::$config['attributes']['empty'] true
option Remove the value attribute from option tags where the text node has the same value true
style Remove whitespace and last semi-colon from the style attribute true
class Sort class names true
sort Sort attributes true
boolean Minify boolean attributes to render only the attribute name and not the value. Boolean attributes are defined in htmldoc::$config['attributes']['boolean'] true
singleton Boolean Removes spaces and slash in singleton attributes, e.g. <br /> becomes <br> true
quotes Boolean Removes quotes from attribute values where possible true
close Boolean Removes closing tags for elements defined in `htmldoc::$config['elements']['closeoptional']` where possible true

save()

Compile the document into an HTML string and save to the specified location, or return as a string.

$doc = new \hexydec/html\htmldoc();
$doc->load($html);
$doc->save($file, $options);

Arguments

Parameter Type Options Description Default
$file String The location to save the HTML, or null to return the HTML as a string null
$options Array An array of output options, the input is merged with `htmldoc::$config['output']`. *Note that for most scenarios, specifying this argument is not required* >Array()
charset The charset the output should be converted to. The default null will prevent any charset conversion. null
quotestyle Defines how to quote the attributes in the output, either double, single, or minimal. Note that using the minify() method using the option 'quotes' => true will change the default setting to minimal "double"
singletonclose A string defining how singleton tags will be closed. Note that using the minify() method using the option 'singleton' => true will change the default setting to > " />"
closetags A boolean specifying whether to force elements to render a closing tag. If false, the renderer will follow the value defined in tag::$close (Which will be set according to whether the tag had no closing tag when the document was parsed, or may be set to false if the document has been minified with minify()) false

Return Value

Returns the HTML document as a string if $file is null, or true if the file was successfully saved to the specified file. On error the method will return false.