You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

112 lines
3.7 KiB
Markdown

text-encoding
==============
This is a polyfill for the [Encoding Living
Standard](https://encoding.spec.whatwg.org/) API for the Web, allowing
encoding and decoding of textual data to and from Typed Array buffers
for binary data in JavaScript.
By default it adheres to the spec and does not support *encoding* to
legacy encodings, only *decoding*. It is also implemented to match the
specification's algorithms, rather than for performance. The intended
use is within Web pages, so it has no dependency on server frameworks
or particular module schemes.
Basic examples and tests are included.
### Install ###
There are a few ways you can get and use the `text-encoding` library.
### HTML Page Usage ###
Clone the repo and include the files directly:
```html
<!-- Required for non-UTF encodings -->
<script src="encoding-indexes.js"></script>
<script src="encoding.js"></script>
```
This is the only use case the developer cares about. If you want those
fancy module and/or package manager things that are popular these days
you should probably use a different library.
#### Package Managers ####
The package is published to **npm** and **bower** as `text-encoding`.
Use through these is not really supported, since they aren't used by
the developer of the library. Using `require()` in interesting ways
probably breaks. Patches welcome, as long as they don't break the
basic use of the files via `<script>`.
### API Overview ###
Basic Usage
```js
var uint8array = new TextEncoder().encode(string);
var string = new TextDecoder(encoding).decode(uint8array);
```
Streaming Decode
```js
var string = "", decoder = new TextDecoder(encoding), buffer;
while (buffer = next_chunk()) {
string += decoder.decode(buffer, {stream:true});
}
string += decoder.decode(); // finish the stream
```
### Encodings ###
All encodings from the Encoding specification are supported:
utf-8 ibm866 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6
iso-8859-7 iso-8859-8 iso-8859-8-i iso-8859-10 iso-8859-13 iso-8859-14
iso-8859-15 iso-8859-16 koi8-r koi8-u macintosh windows-874
windows-1250 windows-1251 windows-1252 windows-1253 windows-1254
windows-1255 windows-1256 windows-1257 windows-1258 x-mac-cyrillic
gb18030 hz-gb-2312 big5 euc-jp iso-2022-jp shift_jis euc-kr
replacement utf-16be utf-16le x-user-defined
(Some encodings may be supported under other names, e.g. ascii,
iso-8859-1, etc. See [Encoding](https://encoding.spec.whatwg.org/) for
additional labels for each encoding.)
Encodings other than **utf-8**, **utf-16le** and **utf-16be** require
an additional `encoding-indexes.js` file to be included. It is rather
large (596kB uncompressed, 188kB gzipped); portions may be deleted if
support for some encodings is not required.
### Non-Standard Behavior ###
As required by the specification, only encoding to **utf-8** is
supported. If you want to try it out, you can force a non-standard
behavior by passing the `NONSTANDARD_allowLegacyEncoding` option to
TextEncoder and a label. For example:
```js
var uint8array = new TextEncoder(
'windows-1252', { NONSTANDARD_allowLegacyEncoding: true }).encode(text);
```
But note that the above won't work if you're using the polyfill in a
browser that natively supports the TextEncoder API natively, since the
polyfill won't be used!
You can force the polyfill to be used by using this before the polyfill:
```html
<script>
window.TextEncoder = window.TextDecoder = null;
</script>
```
To support the legacy encodings (which may be stateful), the
TextEncoder `encode()` method accepts an optional dictionary and
`stream` option, e.g. `encoder.encode(string, {stream: true});` This
is not needed for standard encoding since the input is always in
complete code points.