1
0
mirror of https://github.com/dart-lang/sdk synced 2024-07-05 09:20:04 +00:00
dart-sdk/benchmarks/Utf8Decode
Aske Simon Christensen b59217721b [benchmark] Benchmark for UTF-8 decoding with typical data.
This is in preparation for upcoming optimizations to the UTF-8 decoding.

The data files are extracts from Wikipedia with markup stripped. They
cover 6 representative cases of typical input data:
- English text, only ASCII
- Danish text, mostly ASCII, only Latin-1
- Slovak text, mostly ASCII, not only Latin-1
- Russian text, max 2 bytes per character
- Nepali text, max 3 bytes per character
- Chinese text, full character range

Each of the languages are benchmarked with small (average 10 bytes),
medium (10 000 bytes) and large (10 000 000 bytes) inputs.

Only allowMalformed: false is benchmarked.

Change-Id: I72e6959c49388f2aebf33da0c582b7729be6297c
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/140870
Commit-Queue: Aske Simon Christensen <askesc@google.com>
Reviewed-by: Martin Kustermann <kustermann@google.com>
2020-03-27 13:02:07 +00:00
..
dart [benchmark] Benchmark for UTF-8 decoding with typical data. 2020-03-27 13:02:07 +00:00