add json lines

2024-12-02 10:45:29 +01:00 · 2024-12-02 10:45:29 +01:00 · c465fd16f5
commit c465fd16f5
parent 4183941c78
1 changed files with 84 additions and 0 deletions
--- a/technology/files/JSON
+++ b/technology/files/JSON
@ -0,0 +1,84 @@
+---
+obj: format
+website: https://jsonlines.org
+extension: "jsonl"
+mime: "application/jsonl"
+rev: 2024-12-02
+---
+
+# JSON Lines
+This page describes the JSON Lines text format, also called newline-delimited JSON. JSON Lines is a convenient format for storing structured data that may be processed one record at a time. It works well with unix-style text processing tools and shell pipelines. It's a great format for log files. It's also a flexible format for passing messages between cooperating processes.
+
+The JSON Lines format has three requirements:
+- **UTF-8 Encoding**: JSON allows encoding Unicode strings with only ASCII escape sequences, however those escapes will be hard to read when viewed in a text editor. The author of the JSON Lines file may choose to escape characters to work with plain ASCII files. Encodings other than UTF-8 are very unlikely to be valid when decoded as UTF-8 so the chance of accidentally misinterpreting characters in JSON Lines files is low.
+- **Each Line is a Valid JSON Value**: The most common values will be objects or arrays, but any JSON value is permitted.
+- **Line Separator is `\n`**: This means `\r\n` is also supported because surrounding white space is implicitly ignored when parsing JSON values.
+
+## Better than CSV
+```json
+["Name", "Session", "Score", "Completed"]
+["Gilbert", "2013", 24, true]
+["Alexa", "2013", 29, true]
+["May", "2012B", 14, false]
+["Deloise", "2012A", 19, true] 
+```
+
+CSV seems so easy that many programmers have written code to generate it themselves, and almost every implementation is different. Handling broken CSV files is a common and frustrating task. CSV has no standard encoding, no standard column separator and multiple character escaping standards. String is the only type supported for cell values, so some programs attempt to guess the correct types.
+
+JSON Lines handles tabular data cleanly and without ambiguity. Cells may use the standard JSON types.
+
+The biggest missing piece is an import/export filter for popular spreadsheet programs so that non-programmers can use this format.
+
+## Self-describing data
+```json
+{"name": "Gilbert", "session": "2013", "score": 24, "completed": true}
+{"name": "Alexa", "session": "2013", "score": 29, "completed": true}
+{"name": "May", "session": "2012B", "score": 14, "completed": false}
+{"name": "Deloise", "session": "2012A", "score": 19, "completed": true} 
+```
+
+JSON Lines enables applications to read objects line-by-line, with each line fully describing a JSON object. The example above contains the same data as the tabular example above, but allows applications to split files on newline boundaries for parallel loading, and eliminates any ambiguity if fields are omitted or re-ordered.
+
+## Easy Nested Data
+```json
+{"name": "Gilbert", "wins": [["straight", "7♣"], ["one pair", "10♥"]]}
+{"name": "Alexa", "wins": [["two pair", "4♠"], ["two pair", "9♠"]]}
+{"name": "May", "wins": []}
+{"name": "Deloise", "wins": [["three of a kind", "5♣"]]}
+```
+
+JSON Lines' biggest strength is in handling lots of similar nested data structures. One `.jsonl` file is easier to work with than a directory full of XML files.
+
+If you have large nested structures then reading the JSON Lines text directly isn't recommended. Use the "jq" tool to make viewing large structures easier:
+
+```
+grep pair winning_hands.jsonl | jq .
+
+{
+  "name": "Gilbert", 
+  "wins": [
+    [
+      "straight", 
+      "7♣"
+    ], 
+    [
+      "one pair", 
+      "10♥"
+    ]
+  ]
+}
+{
+  "name": "Alexa", 
+  "wins": [
+    [
+      "two pair", 
+      "4♠"
+    ], 
+    [
+      "two pair", 
+      "9♠"
+    ]
+  ]
+}
+```
+