Blame view

node_modules/he/README.md 13.9 KB
aaac7fed   liuqimichale   add
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
  # he [![Build status](https://travis-ci.org/mathiasbynens/he.svg?branch=master)](https://travis-ci.org/mathiasbynens/he) [![Code coverage status](https://codecov.io/github/mathiasbynens/he/coverage.svg?branch=master)](https://codecov.io/github/mathiasbynens/he?branch=master) [![Dependency status](https://gemnasium.com/mathiasbynens/he.svg)](https://gemnasium.com/mathiasbynens/he)
  
  _he_ (for “HTML entities”) is a robust HTML entity encoder/decoder written in JavaScript. It supports [all standardized named character references as per HTML](https://html.spec.whatwg.org/multipage/syntax.html#named-character-references), handles [ambiguous ampersands](https://mathiasbynens.be/notes/ambiguous-ampersands) and other edge cases [just like a browser would](https://html.spec.whatwg.org/multipage/syntax.html#tokenizing-character-references), has an extensive test suite, and — contrary to many other JavaScript solutions — _he_ handles astral Unicode symbols just fine. [An online demo is available.](https://mothereff.in/html-entities)
  
  ## Installation
  
  Via [npm](https://www.npmjs.com/):
  
  ```bash
  npm install he
  ```
  
  Via [Bower](http://bower.io/):
  
  ```bash
  bower install he
  ```
  
  Via [Component](https://github.com/component/component):
  
  ```bash
  component install mathiasbynens/he
  ```
  
  In a browser:
  
  ```html
  <script src="he.js"></script>
  ```
  
  In [Node.js](https://nodejs.org/), [io.js](https://iojs.org/), [Narwhal](http://narwhaljs.org/), and [RingoJS](http://ringojs.org/):
  
  ```js
  var he = require('he');
  ```
  
  In [Rhino](http://www.mozilla.org/rhino/):
  
  ```js
  load('he.js');
  ```
  
  Using an AMD loader like [RequireJS](http://requirejs.org/):
  
  ```js
  require(
    {
      'paths': {
        'he': 'path/to/he'
      }
    },
    ['he'],
    function(he) {
      console.log(he);
    }
  );
  ```
  
  ## API
  
  ### `he.version`
  
  A string representing the semantic version number.
  
  ### `he.encode(text, options)`
  
  This function takes a string of text and encodes (by default) any symbols that aren’t printable ASCII symbols and `&`, `<`, `>`, `"`, `'`, and `` ` ``, replacing them with character references.
  
  ```js
  he.encode('foo © bar ≠ baz 𝌆 qux');
  // → 'foo &#xA9; bar &#x2260; baz &#x1D306; qux'
  ```
  
  As long as the input string contains [allowed code points](https://html.spec.whatwg.org/multipage/parsing.html#preprocessing-the-input-stream) only, the return value of this function is always valid HTML. Any [(invalid) code points that cannot be represented using a character reference](https://html.spec.whatwg.org/multipage/syntax.html#table-charref-overrides) in the input are not encoded:
  
  ```js
  he.encode('foo \0 bar');
  // → 'foo \0 bar'
  ```
  
  However, enabling [the `strict` option](https://github.com/mathiasbynens/he#strict) causes invalid code points to throw an exception. With `strict` enabled, `he.encode` either throws (if the input contains invalid code points) or returns a string of valid HTML.
  
  The `options` object is optional. It recognizes the following properties:
  
  #### `useNamedReferences`
  
  The default value for the `useNamedReferences` option is `false`. This means that `encode()` will not use any named character references (e.g. `&copy;`) in the output — hexadecimal escapes (e.g. `&#xA9;`) will be used instead. Set it to `true` to enable the use of named references.
  
  **Note that if compatibility with older browsers is a concern, this option should remain disabled.**
  
  ```js
  // Using the global default setting (defaults to `false`):
  he.encode('foo © bar ≠ baz 𝌆 qux');
  // → 'foo &#xA9; bar &#x2260; baz &#x1D306; qux'
  
  // Passing an `options` object to `encode`, to explicitly disallow named references:
  he.encode('foo © bar ≠ baz 𝌆 qux', {
    'useNamedReferences': false
  });
  // → 'foo &#xA9; bar &#x2260; baz &#x1D306; qux'
  
  // Passing an `options` object to `encode`, to explicitly allow named references:
  he.encode('foo © bar ≠ baz 𝌆 qux', {
    'useNamedReferences': true
  });
  // → 'foo &copy; bar &ne; baz &#x1D306; qux'
  ```
  
  #### `decimal`
  
  The default value for the `decimal` option is `false`. If the option is enabled, `encode` will generally use decimal escapes (e.g. `&#169;`) rather than hexadecimal escapes (e.g. `&#xA9;`). Beside of this replacement, the basic behavior remains the same when combined with other options. For example: if both options `useNamedReferences` and `decimal` are enabled, named references (e.g. `&copy;`) are used over decimal escapes. HTML entities without a named reference are encoded using decimal escapes.
  
  ```js
  // Using the global default setting (defaults to `false`):
  he.encode('foo © bar ≠ baz 𝌆 qux');
  // → 'foo &#xA9; bar &#x2260; baz &#x1D306; qux'
  
  // Passing an `options` object to `encode`, to explicitly disable decimal escapes:
  he.encode('foo © bar ≠ baz 𝌆 qux', {
    'decimal': false
  });
  // → 'foo &#xA9; bar &#x2260; baz &#x1D306; qux'
  
  // Passing an `options` object to `encode`, to explicitly enable decimal escapes:
  he.encode('foo © bar ≠ baz 𝌆 qux', {
    'decimal': true
  });
  // → 'foo &#169; bar &#8800; baz &#119558; qux'
  
  // Passing an `options` object to `encode`, to explicitly allow named references and decimal escapes:
  he.encode('foo © bar ≠ baz 𝌆 qux', {
    'useNamedReferences': true,
    'decimal': true
  });
  // → 'foo &copy; bar &ne; baz &#119558; qux'
  ```
  
  #### `encodeEverything`
  
  The default value for the `encodeEverything` option is `false`. This means that `encode()` will not use any character references for printable ASCII symbols that don’t need escaping. Set it to `true` to encode every symbol in the input string. When set to `true`, this option takes precedence over `allowUnsafeSymbols` (i.e. setting the latter to `true` in such a case has no effect).
  
  ```js
  // Using the global default setting (defaults to `false`):
  he.encode('foo © bar ≠ baz 𝌆 qux');
  // → 'foo &#xA9; bar &#x2260; baz &#x1D306; qux'
  
  // Passing an `options` object to `encode`, to explicitly encode all symbols:
  he.encode('foo © bar ≠ baz 𝌆 qux', {
    'encodeEverything': true
  });
  // → '&#x66;&#x6F;&#x6F;&#x20;&#xA9;&#x20;&#x62;&#x61;&#x72;&#x20;&#x2260;&#x20;&#x62;&#x61;&#x7A;&#x20;&#x1D306;&#x20;&#x71;&#x75;&#x78;'
  
  // This setting can be combined with the `useNamedReferences` option:
  he.encode('foo © bar ≠ baz 𝌆 qux', {
    'encodeEverything': true,
    'useNamedReferences': true
  });
  // → '&#x66;&#x6F;&#x6F;&#x20;&copy;&#x20;&#x62;&#x61;&#x72;&#x20;&ne;&#x20;&#x62;&#x61;&#x7A;&#x20;&#x1D306;&#x20;&#x71;&#x75;&#x78;'
  ```
  
  #### `strict`
  
  The default value for the `strict` option is `false`. This means that `encode()` will encode any HTML text content you feed it, even if it contains any symbols that cause [parse errors](https://html.spec.whatwg.org/multipage/parsing.html#preprocessing-the-input-stream). To throw an error when such invalid HTML is encountered, set the `strict` option to `true`. This option makes it possible to use _he_ as part of HTML parsers and HTML validators.
  
  ```js
  // Using the global default setting (defaults to `false`, i.e. error-tolerant mode):
  he.encode('\x01');
  // → '&#x1;'
  
  // Passing an `options` object to `encode`, to explicitly enable error-tolerant mode:
  he.encode('\x01', {
    'strict': false
  });
  // → '&#x1;'
  
  // Passing an `options` object to `encode`, to explicitly enable strict mode:
  he.encode('\x01', {
    'strict': true
  });
  // → Parse error
  ```
  
  #### `allowUnsafeSymbols`
  
  The default value for the `allowUnsafeSymbols` option is `false`. This means that characters that are unsafe for use in HTML content (`&`, `<`, `>`, `"`, `'`, and `` ` ``) will be encoded. When set to `true`, only non-ASCII characters will be encoded. If the `encodeEverything` option is set to `true`, this option will be ignored.
  
  ```js
  he.encode('foo © and & ampersand', {
    'allowUnsafeSymbols': true
  });
  // → 'foo &#xA9; and & ampersand'
  ```
  
  #### Overriding default `encode` options globally
  
  The global default setting can be overridden by modifying the `he.encode.options` object. This saves you from passing in an `options` object for every call to `encode` if you want to use the non-default setting.
  
  ```js
  // Read the global default setting:
  he.encode.options.useNamedReferences;
  // → `false` by default
  
  // Override the global default setting:
  he.encode.options.useNamedReferences = true;
  
  // Using the global default setting, which is now `true`:
  he.encode('foo © bar ≠ baz 𝌆 qux');
  // → 'foo &copy; bar &ne; baz &#x1D306; qux'
  ```
  
  ### `he.decode(html, options)`
  
  This function takes a string of HTML and decodes any named and numerical character references in it using [the algorithm described in section 12.2.4.69 of the HTML spec](https://html.spec.whatwg.org/multipage/syntax.html#tokenizing-character-references).
  
  ```js
  he.decode('foo &copy; bar &ne; baz &#x1D306; qux');
  // → 'foo © bar ≠ baz 𝌆 qux'
  ```
  
  The `options` object is optional. It recognizes the following properties:
  
  #### `isAttributeValue`
  
  The default value for the `isAttributeValue` option is `false`. This means that `decode()` will decode the string as if it were used in [a text context in an HTML document](https://html.spec.whatwg.org/multipage/syntax.html#data-state). HTML has different rules for [parsing character references in attribute values](https://html.spec.whatwg.org/multipage/syntax.html#character-reference-in-attribute-value-state) — set this option to `true` to treat the input string as if it were used as an attribute value.
  
  ```js
  // Using the global default setting (defaults to `false`, i.e. HTML text context):
  he.decode('foo&ampbar');
  // → 'foo&bar'
  
  // Passing an `options` object to `decode`, to explicitly assume an HTML text context:
  he.decode('foo&ampbar', {
    'isAttributeValue': false
  });
  // → 'foo&bar'
  
  // Passing an `options` object to `decode`, to explicitly assume an HTML attribute value context:
  he.decode('foo&ampbar', {
    'isAttributeValue': true
  });
  // → 'foo&ampbar'
  ```
  
  #### `strict`
  
  The default value for the `strict` option is `false`. This means that `decode()` will decode any HTML text content you feed it, even if it contains any entities that cause [parse errors](https://html.spec.whatwg.org/multipage/syntax.html#tokenizing-character-references). To throw an error when such invalid HTML is encountered, set the `strict` option to `true`. This option makes it possible to use _he_ as part of HTML parsers and HTML validators.
  
  ```js
  // Using the global default setting (defaults to `false`, i.e. error-tolerant mode):
  he.decode('foo&ampbar');
  // → 'foo&bar'
  
  // Passing an `options` object to `decode`, to explicitly enable error-tolerant mode:
  he.decode('foo&ampbar', {
    'strict': false
  });
  // → 'foo&bar'
  
  // Passing an `options` object to `decode`, to explicitly enable strict mode:
  he.decode('foo&ampbar', {
    'strict': true
  });
  // → Parse error
  ```
  
  #### Overriding default `decode` options globally
  
  The global default settings for the `decode` function can be overridden by modifying the `he.decode.options` object. This saves you from passing in an `options` object for every call to `decode` if you want to use a non-default setting.
  
  ```js
  // Read the global default setting:
  he.decode.options.isAttributeValue;
  // → `false` by default
  
  // Override the global default setting:
  he.decode.options.isAttributeValue = true;
  
  // Using the global default setting, which is now `true`:
  he.decode('foo&ampbar');
  // → 'foo&ampbar'
  ```
  
  ### `he.escape(text)`
  
  This function takes a string of text and escapes it for use in text contexts in XML or HTML documents. Only the following characters are escaped: `&`, `<`, `>`, `"`, `'`, and `` ` ``.
  
  ```js
  he.escape('<img src=\'x\' onerror="prompt(1)">');
  // → '&lt;img src=&#x27;x&#x27; onerror=&quot;prompt(1)&quot;&gt;'
  ```
  
  ### `he.unescape(html, options)`
  
  `he.unescape` is an alias for `he.decode`. It takes a string of HTML and decodes any named and numerical character references in it.
  
  ### Using the `he` binary
  
  To use the `he` binary in your shell, simply install _he_ globally using npm:
  
  ```bash
  npm install -g he
  ```
  
  After that you will be able to encode/decode HTML entities from the command line:
  
  ```bash
  $ he --encode 'föo ♥ bår 𝌆 baz'
  f&#xF6;o &#x2665; b&#xE5;r &#x1D306; baz
  
  $ he --encode --use-named-refs 'föo ♥ bår 𝌆 baz'
  f&ouml;o &hearts; b&aring;r &#x1D306; baz
  
  $ he --decode 'f&ouml;o &hearts; b&aring;r &#x1D306; baz'
  föo ♥ bår 𝌆 baz
  ```
  
  Read a local text file, encode it for use in an HTML text context, and save the result to a new file:
  
  ```bash
  $ he --encode < foo.txt > foo-escaped.html
  ```
  
  Or do the same with an online text file:
  
  ```bash
  $ curl -sL "http://git.io/HnfEaw" | he --encode > escaped.html
  ```
  
  Or, the opposite — read a local file containing a snippet of HTML in a text context, decode it back to plain text, and save the result to a new file:
  
  ```bash
  $ he --decode < foo-escaped.html > foo.txt
  ```
  
  Or do the same with an online HTML snippet:
  
  ```bash
  $ curl -sL "http://git.io/HnfEaw" | he --decode > decoded.txt
  ```
  
  See `he --help` for the full list of options.
  
  ## Support
  
  _he_ has been tested in at least:
  
  * Chrome 27-50
  * Firefox 3-45
  * Safari 4-9
  * Opera 10-12, 15–37
  * IE 6–11
  * Edge
  * Narwhal 0.3.2
  * Node.js v0.10, v0.12, v4, v5
  * PhantomJS 1.9.0
  * Rhino 1.7RC4
  * RingoJS 0.8-0.11
  
  ## Unit tests & code coverage
  
  After cloning this repository, run `npm install` to install the dependencies needed for he development and testing. You may want to install Istanbul _globally_ using `npm install istanbul -g`.
  
  Once that’s done, you can run the unit tests in Node using `npm test` or `node tests/tests.js`. To run the tests in Rhino, Ringo, Narwhal, and web browsers as well, use `grunt test`.
  
  To generate the code coverage report, use `grunt cover`.
  
  ## Acknowledgements
  
  Thanks to [Simon Pieters](https://simon.html5.org/) ([@zcorpan](https://twitter.com/zcorpan)) for the many suggestions.
  
  ## Author
  
  | [![twitter/mathias](https://gravatar.com/avatar/24e08a9ea84deb17ae121074d0f17125?s=70)](https://twitter.com/mathias "Follow @mathias on Twitter") |
  |---|
  | [Mathias Bynens](https://mathiasbynens.be/) |
  
  ## License
  
  _he_ is available under the [MIT](https://mths.be/mit) license.