aaac7fed
liuqimichale
add
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
|
# BFJ
[](https://www.npmjs.com/package/bfj)
[](https://travis-ci.org/philbooth/bfj)
[](https://opensource.org/licenses/MIT)
Big-Friendly JSON. Asynchronous streaming functions for large JSON data sets.
* [Why would I want those?](#why-would-i-want-those)
* [Is it fast?](#is-it-fast)
* [What functions does it implement?](#what-functions-does-it-implement)
* [How do I install it?](#how-do-i-install-it)
* [How do I read a JSON file?](#how-do-i-read-a-json-file)
* [How do I write a JSON file?](#how-do-i-write-a-json-file)
* [How do I parse a stream of JSON?](#how-do-i-parse-a-stream-of-json)
* [How do I create a JSON string?](#how-do-i-create-a-json-string)
* [How do I create a stream of JSON?](#how-do-i-create-a-stream-of-json)
* [What other methods are there?](#what-other-methods-are-there)
* [bfj.walk (stream, options)](#bfjwalk-stream-options)
* [bfj.eventify (data, options)](#bfjeventify-data-options)
* [What options can I specify?](#what-options-can-i-specify)
* [Options for parsing functions](#options-for-parsing-functions)
* [Options for serialisation functions](#options-for-serialisation-functions)
* [Is it possible to pause parsing or serialisation from calling code?](#is-it-possible-to-pause-parsing-or-serialisation-from-calling-code)
* [Can it handle newline-delimited JSON (NDJSON)?](#can-it-handle-newline-delimited-json-ndjson)
* [Why does it default to bluebird promises?](#why-does-it-default-to-bluebird-promises)
* [Can I specify a different promise implementation?](#can-i-specify-a-different-promise-implementation)
* [Is there a change log?](#is-there-a-change-log)
* [How do I set up the dev environment?](#how-do-i-set-up-the-dev-environment)
* [What versions of Node.js does it support?](#what-versions-of-nodejs-does-it-support)
* [What license is it released under?](#what-license-is-it-released-under)
## Why would I want those?
If you need
to parse huge JSON strings
or stringify huge JavaScript data sets,
it monopolises the event loop
and can lead to out-of-memory exceptions.
BFJ implements asynchronous functions
and uses pre-allocated fixed-length arrays
to try and alleviate those issues.
## Is it fast?
No.
BFJ yields frequently
to avoid monopolising the event loop,
interrupting its own execution
to let other event handlers run.
The frequency of those yields
can be controlled with the [`yieldRate` option](#what-options-can-i-specify),
but fundamentally it is not designed for speed.
Furthermore,
when serialising data to a stream,
BFJ uses a fixed-length buffer
to avoid exhausting available memory.
Whenever that buffer is full,
serialisation is paused
until the receiving stream processes some more data,
regardless of the value of `yieldRate`.
You can control the size of the buffer
using the [`bufferLength` option](#options-for-serialisation-functions)
but really,
if you need quick results,
BFJ is not for you.
## What functions does it implement?
Eight functions
are exported.
Four are
concerned with
parsing, or
turning JSON strings
into JavaScript data:
* [`read`](#how-do-i-read-a-json-file)
asynchronously parses
a JSON file from disk.
* [`parse` and `unpipe`](#how-do-i-parse-a-stream-of-json)
are for asynchronously parsing
streams of JSON.
* [`walk`](#bfjwalk-stream-options)
asynchronously walks
a stream,
emitting events
as it encounters
JSON tokens.
Analagous to a
[SAX parser][sax].
The other four functions
handle the reverse transformations,
serialising
JavaScript data
to JSON:
* [`write`](#how-do-i-write-a-json-file)
asynchronously serialises data
to a JSON file on disk.
* [`stringify`](#how-do-i-create-a-json-string)
asynchronously serialises data
to a JSON string.
* [`streamify`](#how-do-i-create-a-stream-of-json)
asynchronously serialises data
to a stream of JSON.
* [`eventify`](#bfjeventify-data-options)
asynchronously traverses
a data structure
depth-first,
emitting events
as it encounters items.
By default
it coerces
promises, buffers and iterables
to JSON-friendly values.
## How do I install it?
If you're using npm:
```
npm i bfj --save
```
Or if you just want
the git repo:
```
git clone git@github.com:philbooth/bfj.git
```
## How do I read a JSON file?
```js
const bfj = require('bfj');
bfj.read(path, options)
.then(data => {
// :)
})
.catch(error => {
// :(
});
```
`read` returns a [bluebird promise][promise] and
asynchronously parses
a JSON file
from disk.
It takes two arguments;
the path to the JSON file
and an [options](#options-for-parsing-functions) object.
If there are
no syntax errors,
the returned promise is resolved
with the parsed data.
If syntax errors occur,
the promise is rejected
with the first error.
## How do I write a JSON file?
```js
const bfj = require('bfj');
bfj.write(path, data, options)
.then(() => {
// :)
})
.catch(error => {
// :(
});
```
`write` returns a [bluebird promise][promise]
and asynchronously serialises a data structure
to a JSON file on disk.
The promise is resolved
when the file has been written,
or rejected with the error
if writing failed.
It takes three arguments;
the path to the JSON file,
the data structure to serialise
and an [options](#options-for-serialisation-functions) object.
## How do I parse a stream of JSON?
```js
const bfj = require('bfj');
// By passing a readable stream to bfj.parse():
bfj.parse(fs.createReadStream(path), options)
.then(data => {
// :)
})
.catch(error => {
// :(
});
// ...or by passing the result from bfj.unpipe() to stream.pipe():
request({ url }).pipe(bfj.unpipe((error, data) => {
if (error) {
// :(
} else {
// :)
}
}))
```
* `parse` returns a [bluebird promise][promise]
and asynchronously parses
a stream of JSON data.
It takes two arguments;
a [readable stream][readable]
from which
the JSON
will be parsed
and an [options](#options-for-parsing-functions) object.
If there are
no syntax errors,
the returned promise is resolved
with the parsed data.
If syntax errors occur,
the promise is rejected
with the first error.
* `unpipe` returns a [writable stream][writable]
that can be passed to [`stream.pipe`][pipe],
then parses JSON data
read from the stream.
It takes two arguments;
a callback function
that will be called
after parsing is complete
and an [options](#options-for-parsing-functions) object.
If there are no errors,
the callback is invoked
with the result as the second argument.
If errors occur,
the first error is passed
the callback
as the first argument.
## How do I create a JSON string?
```js
const bfj = require('bfj');
bfj.stringify(data, options)
.then(json => {
// :)
})
.catch(error => {
// :(
});
```
`stringify` returns a [bluebird promise][promise] and
asynchronously serialises a data structure
to a JSON string.
The promise is resolved
to the JSON string
when serialisation is complete.
It takes two arguments;
the data structure to serialise
and an [options](#options-for-serialisation-functions) object.
## How do I create a stream of JSON?
```js
const bfj = require('bfj');
const stream = bfj.streamify(data, options);
// Get data out of the stream with event handlers
stream.on('data', chunk => { /* ... */ });
stream.on('end', () => { /* ... */);
stream.on('dataError', () => { /* ... */);
// ...or you can pipe it to another stream
stream.pipe(someOtherStream);
```
`streamify` returns a [readable stream][readable]
and asynchronously serialises
a data structure to JSON,
pushing the result
to the returned stream.
It takes two arguments;
the data structure to serialise
and an [options](#options-for-serialisation-functions) object.
## What other methods are there?
### bfj.walk (stream, options)
```js
const bfj = require('bfj');
const emitter = bfj.walk(fs.createReadStream(path), options);
emitter.on(bfj.events.array, () => { /* ... */ });
emitter.on(bfj.events.object, () => { /* ... */ });
emitter.on(bfj.events.property, name => { /* ... */ });
emitter.on(bfj.events.string, value => { /* ... */ });
emitter.on(bfj.events.number, value => { /* ... */ });
emitter.on(bfj.events.literal, value => { /* ... */ });
emitter.on(bfj.events.endArray, () => { /* ... */ });
emitter.on(bfj.events.endObject, () => { /* ... */ });
emitter.on(bfj.events.error, error => { /* ... */ });
emitter.on(bfj.events.end, () => { /* ... */ });
```
`walk` returns an [event emitter][eventemitter]
and asynchronously walks
a stream of JSON data,
emitting events
as it encounters
tokens.
It takes two arguments;
a [readable stream][readable]
from which
the JSON
will be read
and an [options](#options-for-parsing-functions) object.
The emitted events
are defined
as public properties
of an object,
`bfj.events`:
* `bfj.events.array`
indicates that
an array context
has been entered
by encountering
the `[` character.
* `bfj.events.endArray`
indicates that
an array context
has been left
by encountering
the `]` character.
* `bfj.events.object`
indicates that
an object context
has been entered
by encountering
the `{` character.
* `bfj.events.endObject`
indicates that
an object context
has been left
by encountering
the `}` character.
* `bfj.events.property`
indicates that
a property
has been encountered
in an object.
The listener
will be passed
the name of the property
as its argument
and the next event
to be emitted
will represent
the property's value.
* `bfj.events.string`
indicates that
a string
has been encountered.
The listener
will be passed
the value
as its argument.
* `bfj.events.number`
indicates that
a number
has been encountered.
The listener
will be passed
the value
as its argument.
* `bfj.events.literal`
indicates that
a JSON literal
(either `true`, `false` or `null`)
has been encountered.
The listener
will be passed
the value
as its argument.
* `bfj.events.error`
indicates that
an error has occurred.
The error may be due to
invalid syntax on the incoming stream
or caught from one of the event handlers
in user code.
The listener
will be passed
the `Error` instance
as its argument.
* `bfj.events.end`
indicates that
the end of the input
has been reached
and the stream is closed.
* `bfj.events.endLine`
indicates that a root-level newline character
has been encountered in an [NDJSON](#can-it-handle-newline-delimited-json-ndjson) stream.
Only emitted if the `ndjson` [option](#options-for-parsing-functions) is set.
If you are using `bfj.walk`
to sequentially parse items in an array,
you might also be interested in
the [bfj-collections] module.
### bfj.eventify (data, options)
```js
const bfj = require('bfj');
const emitter = bfj.eventify(data, options);
emitter.on(bfj.events.array, () => { /* ... */ });
emitter.on(bfj.events.object, () => { /* ... */ });
emitter.on(bfj.events.property, name => { /* ... */ });
emitter.on(bfj.events.string, value => { /* ... */ });
emitter.on(bfj.events.number, value => { /* ... */ });
emitter.on(bfj.events.literal, value => { /* ... */ });
emitter.on(bfj.events.endArray, () => { /* ... */ });
emitter.on(bfj.events.endObject, () => { /* ... */ });
emitter.on(bfj.events.error, () => { /* ... */ });
emitter.on(bfj.events.end, () => { /* ... */ });
```
`eventify` returns an [event emitter][eventemitter]
and asynchronously traverses
a data structure depth-first,
emitting events as it
encounters items.
By default it coerces
promises, buffers and iterables
to JSON-friendly values.
It takes two arguments;
the data structure to traverse
and an [options](#options-for-serialisation-functions) object.
The emitted events
are defined
as public properties
of an object,
`bfj.events`:
* `bfj.events.array`
indicates that
an array
has been encountered.
* `bfj.events.endArray`
indicates that
the end of an array
has been encountered.
* `bfj.events.object`
indicates that
an object
has been encountered.
* `bfj.events.endObject`
indicates that
the end of an object
has been encountered.
* `bfj.events.property`
indicates that
a property
has been encountered
in an object.
The listener
will be passed
the name of the property
as its argument
and the next event
to be emitted
will represent
the property's value.
* `bfj.events.string`
indicates that
a string
has been encountered.
The listener
will be passed
the value
as its argument.
* `bfj.events.number`
indicates that
a number
has been encountered.
The listener
will be passed
the value
as its argument.
* `bfj.events.literal`
indicates that
a JSON literal
(either `true`, `false` or `null`)
has been encountered.
The listener
will be passed
the value
as its argument.
* `bfj.events.error`
indicates that
an error has occurred.
The error may be due to
a circular reference
encountered in the data
or caught from one of the event handlers
in user code.
The listener
will be passed
the `Error` instance
as its argument.
* `bfj.events.end`
indicates that
the end of the data
has been reached and
no further events
will be emitted.
## What options can I specify?
### Options for parsing functions
* `options.reviver`:
Transformation function,
invoked depth-first
against the parsed
data structure.
This option
is analagous to the
[reviver parameter for JSON.parse][reviver].
* `options.yieldRate`:
The number of data items to process
before yielding to the event loop.
Smaller values yield to the event loop more frequently,
meaning less time will be consumed by bfj per tick
but the overall parsing time will be slower.
Larger values yield to the event loop less often,
meaning slower tick times but faster overall parsing time.
The default value is `16384`.
* `options.Promise`:
Promise constructor that will be used
for promises returned by all methods.
If you set this option,
please be aware that some promise implementations
(including native promises)
may cause your process to die
with out-of-memory exceptions.
Defaults to [bluebird's implementation][promise],
which does not have that problem.
* `options.ndjson`:
If set to `true`,
newline characters at the root level
will be treated as delimiters between
discrete chunks of JSON.
See [NDJSON](#can-it-handle-newline-delimited-json-ndjson) for more information.
### Options for serialisation functions
* `options.space`:
Indentation string
or the number of spaces
to indent
each nested level by.
This option
is analagous to the
[space parameter for JSON.stringify][space].
* `options.promises`:
By default,
promises are coerced
to their resolved value.
Set this property
to `'ignore'`
for improved performance
if you don't need
to coerce promises.
* `options.buffers`:
By default,
buffers are coerced
using their `toString` method.
Set this property
to `'ignore'`
for improved performance
if you don't need
to coerce buffers.
* `options.maps`:
By default,
maps are coerced
to plain objects.
Set this property
to `'ignore'`
for improved performance
if you don't need
to coerce maps.
* `options.iterables`:
By default,
other iterables
(i.e. not arrays, strings or maps)
are coerced
to arrays.
Set this property
to `'ignore'`
for improved performance
if you don't need
to coerce iterables.
* `options.circular`:
By default,
circular references
will cause the write
to fail.
Set this property
to `'ignore'`
if you'd prefer
to silently skip past
circular references
in the data.
* `options.bufferLength`:
The length of the write buffer.
Smaller values use less memory
but may result in a slower serialisation time.
The default value is `1024`.
* `options.yieldRate`:
The number of data items to process
before yielding to the event loop.
Smaller values yield to the event loop more frequently,
meaning less time will be consumed by bfj per tick
but the overall serialisation time will be slower.
Larger values yield to the event loop less often,
meaning slower tick times but faster overall serialisation time.
The default value is `16384`.
* `options.Promise`:
Promise constructor that will be used
for promises returned by all methods.
If you set this option,
please be aware that some promise implementations
(including native promises)
may cause your process to die
with out-of-memory exceptions.
Defaults to [bluebird's implementation][promise],
which does not have that problem.
## Is it possible to pause parsing or serialisation from calling code?
Yes it is!
Both [`walk`](#bfjwalk-stream-options)
and [`eventify`](#bfjeventify-data-options)
decorate their returned event emitters
with a `pause` method
that will prevent any further events being emitted.
The `pause` method itself
returns a `resume` function
that you can call to indicate
that processing should continue.
For example:
```js
const bfj = require('bfj');
const emitter = bfj.walk(fs.createReadStream(path), options);
// Later, when you want to pause parsing:
const resume = emitter.pause();
// Then when you want to resume:
resume();
```
## Can it handle [newline-delimited JSON (NDJSON)](http://ndjson.org/)?
Yes.
If you pass the `ndjson` [option](#options-for-parsing-functions)
to `bfj.walk` or `bfj.parse`,
newline characters at the root level
will act as delimiters between
discrete JSON values:
* `bfj.walk` will emit a `bfj.events.endLine` event
each time it encounters a newline character.
* `bfj.parse` will resolve with the first value
and pause the underlying stream.
If it's called again with the same stream,
it will resume processing
and resolve with the second value.
To parse the entire stream,
calls should be made sequentially one-at-a-time
until the returned promise
resolves to `undefined`
(`undefined` is not a valid JSON token).
`bfj.unpipe` and `bfj.read` will not parse NDJSON.
## Why does it default to bluebird promises?
Until version `4.2.4`,
native promises were used.
But they were found
to cause out-of-memory errors
when serialising large amounts of data to JSON,
due to [well-documented problems
with the native promise implementation](https://alexn.org/blog/2017/10/11/javascript-promise-leaks-memory.html).
So in version `5.0.0`,
bluebird promises were used instead.
In version `5.1.0`,
an option was added
that enables callers to specify
the promise constructor to use.
Use it at your own risk.
## Can I specify a different promise implementation?
Yes.
Just pass the `Promise` option
to any method.
If you get out-of-memory errors
when using that option,
consider changing your promise implementation.
## Is there a change log?
[Yes][history].
## How do I set up the dev environment?
The development environment
relies on [Node.js][node],
[ESLint],
[Mocha],
[Chai],
[Proxyquire] and
[Spooks].
Assuming that
you already have
node and NPM
set up,
you just need
to run
`npm install`
to install
all of the dependencies
as listed in `package.json`.
You can
lint the code
with the command
`npm run lint`.
You can
run the tests
with the command
`npm test`.
## What versions of Node.js does it support?
Versions 4 and later.
## What license is it released under?
[MIT][license].
[ci-image]: https://secure.travis-ci.org/philbooth/bfj.png?branch=master
[ci-status]: http://travis-ci.org/#!/philbooth/bfj
[sax]: http://en.wikipedia.org/wiki/Simple_API_for_XML
[promise]: http://bluebirdjs.com/docs/api-reference.html
[bfj-collections]: https://github.com/hash-bang/bfj-collections
[eventemitter]: https://nodejs.org/api/events.html#events_class_eventemitter
[readable]: https://nodejs.org/api/stream.html#stream_readable_streams
[writable]: https://nodejs.org/api/stream.html#stream_writable_streams
[pipe]: https://nodejs.org/api/stream.html#stream_readable_pipe_destination_options
[reviver]: https://developer.mozilla.org/en/docs/Web/JavaScript/Reference/Global_Objects/JSON/parse#Using_the_reviver_parameter
[space]: https://developer.mozilla.org/en/docs/Web/JavaScript/Reference/Global_Objects/JSON/stringify#The_space_argument
[history]: HISTORY.md
[node]: https://nodejs.org/en/
[eslint]: http://eslint.org/
[mocha]: https://mochajs.org/
[chai]: http://chaijs.com/
[proxyquire]: https://github.com/thlorenz/proxyquire
[spooks]: https://github.com/philbooth/spooks.js
[license]: COPYING
|