22
33This page is a step-by-step introduction of how to write an omniparser schema (specifically tailor
44for the latest ` "omni.2.1" ` schema version) and how to ingest and transform inputs programmatically
5- and by the cli tool.
5+ and by the CLI tool.
66
77## Prerequisites and Notes
88
@@ -47,7 +47,6 @@ transform each of the data line into the following JSON output:
4747 "wind": "South East 4.97 mph"
4848 }
4949]
50-
5150```
5251As you can see, in the desired output, we'd like to standardize all the input temperatures into the
5352same fahrenheit unit; we'd also like to do some translation such that the wind direction and wind
@@ -56,7 +55,7 @@ into [RFC-3339](https://tools.ietf.org/html/rfc3339) standard format.
5655
5756## CLI (command line interface)
5857
59- Before we get into schema writing, let's first get familiar with omniparser cli so that we can easily
58+ Before we get into schema writing, let's first get familiar with omniparser CLI so that we can easily
6059and incrementally test our schema writing.
6160
6261Assuming you have the git repo cloned at ` ~/dev/jf-tech/omniparser/ ` , simply run this bash script:
@@ -77,7 +76,7 @@ $ touch input.csv
7776$ touch schema.json
7877```
7978Use any editor to cut & paste the CSV content from [ The Input] ( #the-input ) into ` input.csv ` , and
80- now run omniparser cli from ` ~/Downloads/omniparser/guide/ ` :
79+ now run omniparser CLI from ` ~/Downloads/omniparser/guide/ ` :
8180```
8281$ ~/dev/jf-tech/omniparser/cli.sh transform -i input.csv -s schema.json
8382Error: unable to perform schema validation: EOF
@@ -99,7 +98,7 @@ This is the common part of all omniparser schemas, the header `parser_settings`:
9998 }
10099}
101100```
102- It's self-explanatory. Now let's run the cli again:
101+ It's self-explanatory. Now let's run the CLI again:
103102```
104103$ ~/dev/jf-tech/omniparser/cli.sh transform -i input.csv -s schema.json
105104Error: schema 'schema.json' validation failed: (root): transform_declarations is required
@@ -121,7 +120,7 @@ transformation. Let's add an empty `transform_declarations` for now:
121120 "transform_declarations": {}
122121}
123122```
124- Run the cli we get another error:
123+ Run the CLI we get another error:
125124```
126125$ ~/dev/jf-tech/omniparser/cli.sh transform -i input.csv -s schema.json
127126Error: schema 'schema.json' validation failed: transform_declarations: FINAL_OUTPUT is required
@@ -143,7 +142,7 @@ the output. Given the section is called `transform_declarations` you might have
143142multiple templates defined in it. Each template can reference other templates. There must be one
144143and only one template called ` FINAL_OUTPUT ` .
145144
146- Run the cli we get a new error:
145+ Run the CLI we get a new error:
147146```
148147$ ~/dev/jf-tech/omniparser/cli.sh transform -i input.csv -s schema.json
149148Error: schema 'schema.json' validation failed: (root): file_declaration is required
@@ -193,7 +192,7 @@ Let's add these:
193192 }
194193```
195194
196- Run the cli again:
195+ Run the CLI again:
197196```
198197$ ~/dev/jf-tech/omniparser/cli.sh transform -i input.csv -s schema.json
199198[
@@ -279,7 +278,7 @@ Let's make small modifications to our schema:
279278}
280279```
281280
282- Rerun the cli to ensure everything is still working. Now the IDR and its imaginary converted XML
281+ Rerun the CLI to ensure everything is still working. Now the IDR and its imaginary converted XML
283282equivalent look like this:
284283```
285284<>
@@ -339,7 +338,7 @@ Remember for the first data line, its corresponding IDR (or the IDR's equivalent
339338Thus, an XPath query ` "xpath": "DATE" ` on the root of the IDR would return ` 01/31/2019 12:34:56-0800 ` , which is
340339used as the value for the field ` date ` . So on and so forth for all other fields.
341340
342- Run the cli , we have:
341+ Run the CLI , we have:
343342```
344343$ ~/dev/jf-tech/omniparser/cli.sh transform -i input.csv -s schema.json
345344[
@@ -388,7 +387,7 @@ built-in function to achieve this:
388387 }
389388```
390389
391- Run cli we have:
390+ Run CLI we have:
392391```
393392$ ~/dev/jf-tech/omniparser/cli.sh transform -i input.csv -s schema.json
394393[
@@ -508,7 +507,7 @@ Here we introduce two new things: 1) template and 2) custom_func `javascript`.
508507 value `10.5`, `"type": "float"` is used. However when the script is done, the result is already
509508 in float, there is no need to specify `"type": "float"` for the `custom_func` directive.
510509
511- Now let's run cli :
510+ Now let's run CLI :
512511```
513512$ ~ /dev/jf-tech/omniparser/cli.sh transform -i input.csv -s schema.json
514513[
@@ -562,7 +561,7 @@ numeric value. That should be an easy fix:
562561Basically changing `"low_temperature_fahrenheit": { "xpath": "LOW_TEMP_F" }` to
563562`"low_temperature_fahrenheit": { "xpath": "LOW_TEMP_F", "type": "float" }`.
564563
565- Run cli again, we have:
564+ Run CLI again, we have:
566565```
567566$ ~ /dev/jf-tech/omniparser/cli.sh transform -i input.csv -s schema.json
568567[
@@ -804,3 +803,21 @@ code snippet of showing how to achieve this:
804803 // output contains a []byte of the ingested and transformed record.
805804 }
806805```
806+
807+ ### The Output
808+ ```
809+ [
810+ {
811+ "date": "2019-01-31T12:34:56-08:00",
812+ "high_temperature_fahrenheit": 50.9,
813+ "low_temperature_fahrenheit": 30.2,
814+ "wind": "North 20.5 mph"
815+ },
816+ {
817+ "date": "2020-07-31T01:23:45-05:00",
818+ "high_temperature_fahrenheit": 102.2,
819+ "low_temperature_fahrenheit": 95,
820+ "wind": "South East 4.97 mph"
821+ }
822+ ]
823+ ```
0 commit comments