Skip to content

Commit 1d003ea

Browse files
committed
Reinstated file
1 parent 07525cf commit 1d003ea

File tree

1 file changed

+283
-4
lines changed

1 file changed

+283
-4
lines changed

README.md

Lines changed: 283 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,285 @@
1-
# Dennis AMP Library
1+
[![Build Status](https://travis-ci.org/Lullabot/amp-library.svg?branch=master)](https://travis-ci.org/Lullabot/amp-library)
2+
# AMP PHP Library
23

3-
This is a fork of Lullabot AMP Library. The main purpose is to add support for AMP tags and attributes that are not supported on the origin.
4-
This branch needs to be maintained until the library is updated, see <https://github.com/Lullabot/amp-library/issues/231>
4+
An open source PHP library and console utility to convert HTML to [AMP HTML](https://www.ampproject.org/) and report HTML compliance with the AMP HTML specification.
5+
6+
### What is the AMP PHP Library?
7+
8+
The AMP PHP Library is an open source and pure PHP Library that:
9+
- Works with whole or partial HTML documents (or strings). Specifically, the AMP PHP Library:
10+
- Reports compliance of a whole/partial HTML document with the [AMP HTML specification](https://www.ampproject.org/). We implement an AMP HTML validator in pure PHP to report compliance of an arbitrary HTML document / HTML fragment with the AMP HTML standard. This validator is a ported subset of the [canonical validator](https://github.com/ampproject/amphtml/tree/master/validator) that is implemented in JavaScript
11+
- Specifically, the PHP validator supports tag specification validation, attribute specification validation, CDATA validation, CSS validation, layout validation, template validation and attribute property-value pair validation. It will report tags and attributes that are missing, illegal, mandatory according to spec but not present, unique according to spec but multiply present, having wrong parents or ancestors or children and so forth.
12+
- _Note_: while the AMP PHP library (already) supports many of the features and capabilities of the canonical validator, it is not intended to achieve parity in _every_ respect with the canonical validator. Even _within_ the features we support (e.g. CSS validation) there may be certain validation issues that we don't flag but the canonical validator does.
13+
- Using the feedback given by the in-house PHP validator, the AMP PHP library tries to "correct" some issues found in the HTML to make it more AMP HTML compliant. This would, for example, involve:
14+
- Removing illegal attributes e.g. `style` attribute within `<body>` tag
15+
- Removing all kinds of illegal tags e.g. `<script>` within `<body>` tag, a tag with a disallowed ancestor, a duplicate unique tag etc.
16+
- Removing illegal property value pairs e.g. removing `minimum-scale=hello` from `<meta name="viewport" content="width=device-width,minimum-scale=hello">`
17+
- Adding or correcting the tags necessary for a minimally valid AMP document:
18+
- `<head>`, `<body>`, `meta viewport`, `meta charset`, `<style>` and `<noscript>` tags
19+
- The `link rel=canonical` tag if you let the library know the canonical path of the document
20+
- Javascript `<script>` tags for the various AMP components and generic AMP Javascript `<script>` tag
21+
- Boilerplate CSS
22+
- If there are mutually exclusive attributes for a tag, removing all but one of them
23+
- Fixing issues with `amp-img` tags that have problems like inconsistent units, invalid attributes, missing mandatory attributes, invalid implied or specified layouts.
24+
- _Notes_:
25+
- The library does a decent job of _removing_ bad things and in a few cases makes some corrections/additions to the HTML. As the library cannot understand the true _intention_ of the user, a lot the validation problems in the HTML may eventually need to be fixed manually by the human.
26+
- In general, the library will try to fix validation errors in `<head>` and if its not successful in doing so, _remove_ those tags from `<head>`. Within `<body>` the AMP PHP library is less aggressive and in most cases will _not_ remove the tag from the document if the tag does not validate after it attempts any fixes on it.
27+
- The library needs to be provided with well formed HTML / HTML5. Please don't give it faulty, incorrect html (e.g. non closed `<div>` tags etc). The correction it does is related to AMP HTML standard issues only. Use a HTML tidying library if you expect your HTML to be malformed.
28+
- Converts some non-amp elements to their AMP equivalents automatically
29+
- A `<img>` tag is converted to an `<amp-img>` tag
30+
- A `<iframe>` tag is converted to an `<amp-iframe>` tag
31+
- A [`<audio>`](https://github.com/Lullabot/amp-library/blob/master/tests/test-data/fragment-html/audio-to-amp-audio-conversion-fragment.html) tag is converted to an `<amp-audio>` tag
32+
- A [`<video>`](https://github.com/Lullabot/amp-library/blob/master/tests/test-data/fragment-html/video-fragment-and-placeholder-test.html) tag is converted to an `<amp-video>` tag
33+
- [Twitter embed code](https://github.com/Lullabot/amp-library/blob/master/tests/test-data/fragment-html/twitter-fragment.html) for tweets is converted to an `<amp-twitter>` tag.
34+
- [Instagram embed code](https://github.com/Lullabot/amp-library/blob/master/tests/test-data/fragment-html/instagram-fragment.html) for instagrams is converted to an `<amp-instagram>` tag.
35+
- [Youtube embed code](https://github.com/Lullabot/amp-library/blob/master/tests/test-data/fragment-html/youtube-fragment.html) for videos is converted to an `<amp-youtube>` tag
36+
- [Dailymotion embed code](https://github.com/Lullabot/amp-library/blob/master/tests/test-data/fragment-html/dailymotion-fragment.html) for videos is converted to an `<amp-dailymotion>` tag
37+
- [Pinterest embed code](https://github.com/Lullabot/amp-library/blob/master/tests/test-data/fragment-html/pinterest-fragment.html) for pins is converted to an `<amp-pinterest>` tag
38+
- [Soundcloud embed code](https://github.com/Lullabot/amp-library/blob/master/tests/test-data/fragment-html/soundcloud-fragment.html) for audio music is converted to an `<amp-soundcloud>` tag
39+
- [Vimeo embed code](https://github.com/Lullabot/amp-library/blob/master/tests/test-data/fragment-html/vimeo-fragment.html) for videos is converted to an `<amp-vimeo>` tag
40+
- [Vine embed code](https://github.com/Lullabot/amp-library/blob/master/tests/test-data/fragment-html/vine-fragment.html) for videos is converted to an `<amp-vine>` tag
41+
- Facebook [iframe](https://github.com/Lullabot/amp-library/blob/master/tests/test-data/fragment-html/facebook-iframe-fragment.html) and [Javascript SDK](https://github.com/Lullabot/amp-library/blob/master/tests/test-data/fragment-html/facebook-non-iframe-fragment.html) embed code for posts and videos is converted to an `<amp-facebook>` tag
42+
- _Notes_:
43+
- Some of these embed code conversions may not have the advanced features you may require. File an issue if you need enhancements to the functionality already provided or new embed code conversions
44+
- Some of the embed codes have an associated `<script>` tag. These conversions will work even if no `<script>` tag was added to your HTML document. The AMP library will add the appropriate AMP component `<script>` tag to the `<head>` if it is provided a full html document.
45+
- You may experiment with the command line utility `amp-console` on the above HTML fragments to see how the converted HTML looks
46+
- Provides both a console and programmatic interface with which to call the library. It works like this: the developer first provides some HTML. After processing it, the library returns:
47+
- The AMPized HTML
48+
- A list of validation errors in the HTML provided
49+
- A description of fixes and embed code conversions made to the HTML
50+
51+
### Use Cases
52+
53+
- Currently the AMP PHP Library is used by the [Drupal AMP Module](https://www.drupal.org/project/amp) to report issues with user entered, arbitrary HTML (originating from Rich Text Editors) and converting the HTML to AMPized HTML (as much as possible)
54+
- The AMP PHP Library command line validator can be used for experimentation and to do HTML to AMP HTML conversion of HTML files. While the [canonical validator](https://github.com/ampproject/amphtml/tree/master/validator) only validates, our library tries to make corrections too. As noted above, our validator is a subset of the canonical validator but already covers a lot of cases
55+
- The AMP PHP Library can be used in any other PHP project to "convert" HTML to AMP HTML and report validation issues. It does not have any non-PHP dependencies and will work in PHP 5.5 and higher. It will also work in recent versions of [HHVM](http://hhvm.com/).
56+
57+
### Setup
58+
59+
The project uses a [composer](https://getcomposer.org/) workflow. If you're not familiar with composer then please read up on it before trying to set this up.
60+
61+
Using this in Drupal requires some specific steps. Please refer to the [Drupal AMP Module](https://www.drupal.org/project/amp) documentation.
62+
63+
For all other scenarios, continue reading.
64+
65+
#### Setup for command line console
66+
67+
`git clone` this repo, `cd` into it and type in `$ composer install` at the command prompt to get all the dependencies of the library. Now you'll be able to use the command line AMP html converter `amp-console` (or equivalently `amp-console.php`
68+
69+
##### Running phpunit tests
70+
71+
After doing a `$ composer install` for setting up the command line console, you can run some [phpunit](https://phpunit.de/) tests
72+
73+
```bash
74+
$ vendor/bin/phpunit tests
75+
```
76+
77+
##### Looking at test coverage
78+
79+
To see test coverage data first ensure you have the xdebug extenstion installed in your PHP installation.
80+
81+
```bash
82+
$ php -m | grep xdebug # should output xdebug
83+
$ vendor/bin/phpunit tests --coverage-html=coverage-data
84+
$ cd coverage-data
85+
$ firefox index.html
86+
```
87+
88+
#### Setup for your composer based PHP project
89+
90+
To use this in your composer based PHP project, refer to [composer docs here](https://getcomposer.org/doc/05-repositories.md#loading-a-package-from-a-vcs-repository) to make changes to your `composer.json`
91+
92+
Or you can simply do `$ composer require lullabot/amp:"^1.0.0"` to fetch the library from [here](https://packagist.org/packages/lullabot/amp) and automatically update your `composer.json`
93+
94+
##### Advanced
95+
Should you wish to follow the bleeding edge you can do `$ composer require lullabot/amp:"dev-master"`. Note that this will create a `.git` folder in `vendor/lullabot/amp`. If you want to avoid that, do `$ composer require lullabot/amp:"dev-master" --prefer-dist`
96+
97+
### Using the command line `amp-console`
98+
99+
```bash
100+
$ cd <amp-php-library-repo-cloned-location>
101+
# Do this if you haven't already
102+
$ composer install
103+
$ ./amp-console amp:convert --help
104+
$ ./amp-console amp:convert <name-of-html-document> <options>
105+
```
106+
107+
Please note that the `--help` command line option is your friend. Use that when confused!
108+
109+
A few example HTML files are available in the test-html folder for you to test drive so that you can get a flavor of the AMP PHP library.
110+
111+
```bash
112+
$ ./amp-console amp:convert sample-html/sample-html-fragment.html
113+
$ ./amp-console amp:convert sample-html/several_errors.html --full-document
114+
```
115+
Note that you need to provide `--full-document` if you're providing a full html document file for conversion.
116+
117+
Lets see the output of the first example command above. The first few lines is the AMPized HTML provided by our library. The rest of the headings are self explanatory.
118+
119+
```html
120+
$ cd <amp-php-library-repo-cloned-location>
121+
$ ./amp-console amp:convert sample-html/sample-html-fragment.html
122+
Line 1: <p><a>Run</a></p>
123+
Line 2: <p><a href="http://www.cnn.com">CNN</a></p>
124+
Line 3: <amp-img src="http://i2.cdn.turner.com/cnnnext/dam/assets/160208081229-gaga-superbowl-exlarge-169.jpg" width="780" height="438" layout="responsive"></amp-img>
125+
Line 4: <p><a href="http://www.bbcnews.com" target="_blank">BBC</a></p>
126+
Line 5: <p></p>
127+
Line 6: <p>This is a <!-- test comment -->sample </p><div>sample</div> paragraph
128+
Line 7: <amp-iframe height="315" width="560" sandbox="allow-scripts allow-same-origin" layout="responsive" src="https://www.reddit.com"></amp-iframe>
129+
Line 8:
130+
Line 9:
131+
Line 10:
132+
133+
134+
ORIGINAL HTML
135+
---------------
136+
Line 1: <p><a style="color: red;" href="javascript:run();">Run</a></p>
137+
Line 2: <p><a style="margin: 2px;" href="http://www.cnn.com" target="_parent">CNN</a></p>
138+
Line 3: <img src="http://i2.cdn.turner.com/cnnnext/dam/assets/160208081229-gaga-superbowl-exlarge-169.jpg">
139+
Line 4: <p><a href="http://www.bbcnews.com" target="_blank">BBC</a></p>
140+
Line 5: <p><INPUT type="submit" value="submit"></p>
141+
Line 6: <p>This is a <!-- test comment -->sample <div onmouseover="hello();">sample</div> paragraph</p>
142+
Line 7: <iframe src="https://www.reddit.com"></iframe>
143+
Line 8: <script src="https://ajax.googleapis.com/ajax/libs/jquery/1.12.0/jquery.min.js"></script>
144+
Line 9: <style></style>
145+
Line 10:
146+
147+
148+
Transformations made from HTML tags to AMP custom tags
149+
-------------------------------------------------------
150+
151+
<img src="http://i2.cdn.turner.com/cnnnext/dam/assets/160208081229-gaga-superbowl-exlarge-169.jpg"> at line 3
152+
ACTION TAKEN: img tag was converted to the amp-img tag.
153+
154+
<iframe src="https://www.reddit.com"> at line 7
155+
ACTION TAKEN: iframe tag was converted to the amp-iframe tag.
156+
157+
158+
AMP-HTML Validation Issues and Fixes
159+
-------------------------------------
160+
FAIL
161+
162+
<a style="color: red;" href="javascript:run();"> on line 1
163+
- The attribute 'style' may not appear in tag 'a'.
164+
[code: DISALLOWED_ATTR category: DISALLOWED_HTML]
165+
ACTION TAKEN: a.style attribute was removed due to validation issues.
166+
- Invalid URL protocol 'javascript:' for attribute 'href' in tag 'a'.
167+
[code: INVALID_URL_PROTOCOL category: DISALLOWED_HTML]
168+
ACTION TAKEN: a.href attribute was removed due to validation issues.
169+
170+
<a style="margin: 2px;" href="http://www.cnn.com" target="_parent"> on line 2
171+
- The attribute 'style' may not appear in tag 'a'.
172+
[code: DISALLOWED_ATTR category: DISALLOWED_HTML]
173+
ACTION TAKEN: a.style attribute was removed due to validation issues.
174+
- The attribute 'target' in tag 'a' is set to the invalid value '_parent'.
175+
[code: INVALID_ATTR_VALUE category: DISALLOWED_HTML]
176+
ACTION TAKEN: a.target attribute was removed due to validation issues.
177+
178+
<input type="submit" value="submit"> on line 5
179+
- The tag 'input' is disallowed.
180+
[code: DISALLOWED_TAG category: DISALLOWED_HTML]
181+
ACTION TAKEN: input tag was removed due to validation issues.
182+
183+
<div onmouseover="hello();"> on line 6
184+
- The attribute 'onmouseover' may not appear in tag 'div'.
185+
[code: DISALLOWED_ATTR category: DISALLOWED_HTML]
186+
ACTION TAKEN: div.onmouseover attribute was removed due to validation issues.
187+
188+
<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.12.0/jquery.min.js"> on line 8
189+
- The tag 'script' is disallowed except in specific forms.
190+
[code: GENERAL_DISALLOWED_TAG category: CUSTOM_JAVASCRIPT_DISALLOWED]
191+
ACTION TAKEN: script tag was removed due to validation issues.
192+
193+
<style> on line 9
194+
- The parent tag of tag 'style' is 'body', but it can only be 'head'.
195+
[code: WRONG_PARENT_TAG category: DISALLOWED_HTML see: https://www.ampproject.org/docs/reference/spec.html#required-markup]
196+
ACTION TAKEN: style tag was removed due to validation issues.
197+
```
198+
199+
### Using the library in a composer based PHP project
200+
201+
First, follow the setup steps above if you're using this in a composer based project.
202+
203+
Sample code to get started:
204+
205+
```php
206+
<?php
207+
208+
use Lullabot\AMP\AMP;
209+
use Lullabot\AMP\Validate\Scope;
210+
211+
// Create an AMP object
212+
$amp = new AMP();
213+
214+
// Notice this is a HTML fragment, i.e. anything that can appear below <body>
215+
$html =
216+
'<p><a href="javascript:run();">Run</a></p>' . PHP_EOL .
217+
'<p><a style="margin: 2px;" href="http://www.cnn.com" target="_parent">CNN</a></p>' . PHP_EOL .
218+
'<p><a href="http://www.bbcnews.com" target="_blank">BBC</a></p>' . PHP_EOL .
219+
'<p><INPUT type="submit" value="submit"></p>' . PHP_EOL .
220+
'<p>This is a <div onmouseover="hello();">sample</div> paragraph</p>';
221+
222+
// Load up the HTML into the AMP object
223+
// Note that we only support UTF-8 or ASCII string input and output. (UTF-8 is a superset of ASCII)
224+
$amp->loadHtml($html);
225+
226+
// If you're feeding it a complete document use the following line instead
227+
// $amp->loadHtml($html, ['scope' => Scope::HTML_SCOPE]);
228+
229+
// If you want some performance statistics (see https://github.com/Lullabot/amp-library/issues/24)
230+
// $amp->loadHtml($html, ['add_stats_html_comment' => true]);
231+
232+
// Convert to AMP HTML and store output in a variable
233+
$amp_html = $amp->convertToAmpHtml();
234+
235+
// Print AMP HTML
236+
print($amp_html);
237+
238+
// Print validation issues and fixes made to HTML provided in the $html string
239+
print($amp->warningsHumanText());
240+
241+
// warnings that have been passed through htmlspecialchars() function
242+
// print($amp->warningsHumanHtml());
243+
244+
// You can do the above steps all over again without having to create a fresh object
245+
// $amp->loadHtml($another_string)
246+
// ...
247+
// ...
248+
249+
```
250+
251+
### Tips
252+
- Its probably not a good idea to run the library on your HTML dynamically on _every_ page view. You should try caching the results of `$amp->convertToAmpHtml()` once the library has run. If you're using the library from a CMS then you should consider using the caching facilities provided by the CMS.
253+
254+
### Caveats and Known issues
255+
- We only support UTF-8 string input and output from the library. If you're using ASCII, then you don't need to worry as UTF-8 is a superset of ASCII. If you're using another encoding like Latin-1 (etc.) you'll need to convert to UTF-8 strings before you use this library
256+
- If you have `<img>`s with `https` urls _and_ they don't have height/width attributes _and_ you are using PHP 5.6 or higher _and_ you have not listed any certificate authorities (`cafile`) in your `php.ini` file _then_ the library may have problems converting these to `<amp-img>`. This is because of http://php.net/manual/en/migration56.openssl.php . That link also has a work around.
257+
- If your `<amp-pinterest>` pins are appearing "chopped off" (after pinterest embed code conversion) try the workaround [here](https://github.com/Lullabot/amp-library/issues/46#issuecomment-230424580)
258+
259+
### Useful Links
260+
- [Composer homepage](https://packagist.org/packages/lullabot/amp) for the AMP PHP Library on [Packagist](https://packagist.org/), the PHP package repository
261+
- AMP Project [Homepage](https://www.ampproject.org/)
262+
- AMP Project [code repository](https://github.com/ampproject/amphtml) on Github
263+
- [AMP HTML JavaScript validator subtree](https://github.com/ampproject/amphtml/tree/master/validator) on Github within the AMP Project code repository
264+
- [Technical Specification](https://github.com/ampproject/amphtml/blob/master/validator/validator-main.protoascii) of AMP HTML in [Protocol Buffers](https://developers.google.com/protocol-buffers/) ASCII message format. See [here](https://github.com/ampproject/amphtml/blob/master/validator/validator.proto) for the Schema definition of the technical specification
265+
266+
### Useful Links for amp-library developers
267+
268+
- [Notes](https://github.com/Lullabot/amp-library/blob/master/src/Spec/README.md) on the contents of the `src/Spec` folder
269+
- [Notes](https://github.com/Lullabot/amp-library/blob/master/src/Validate/README.md) on the contents of the `src/Validate` folder
270+
271+
You can ignore these links if you simply plan to _use_ this library and not develop for it
272+
273+
### Third-party libraries
274+
275+
- Symfony:
276+
- [takeit/amp-html-bundle](https://github.com/takeit/AmpHtmlBundle)
277+
278+
- Drupal:
279+
- [Drupal AMP Module](https://www.drupal.org/project/amp)
280+
281+
### Sponsored by
282+
283+
- Google for creating the AMP Project and sponsoring development
284+
- Lullabot for development of the module, theme, and library to work with the specifications
5285
6-
This branch should not be merged with Master.

0 commit comments

Comments
 (0)