SBSV: square bracket separated values

A flexible, schema-driven structured log data format. Human readable, easy to write (you can write it without any dependencies: simple print() works fine), and easy to parse.

Install

python3 -m pip install sbsv

C library (experimental)

libsbsv is a C library for parsing SBSV files. It provides a C API for loading and querying SBSV data, and can be used in C/C++ projects.

Use

You can read this log-like data:

[meta-data] [id 1] [format string]
[meta-data] [id 2] [format token]
[data] [string] [id 1] [actual some long string...]
[data] [token] [id 2] [actual [some] [multiple] [tokens]]
[stat] [rows 2]

import sbsv

parser = sbsv.parser()
parser.add_schema("[meta-data] [id: int] [format: str]")
parser.add_schema("[data] [string] [id: int] [actual: str]")
parser.add_schema("[data] [token] [id: int] [actual: list[str]]")
parser.add_schema("[stat] [rows: int]")
with open("testfile.sbsv", "r") as f:
  result = parser.load(f)

Result would looks like:

{
  "meta-data": [{"id": 1, "format": "string"}, {"id": 2, "format": "string"}],
  "data": {
    "string": [{"id": 1, "actual": "some long string..."}],
    "token": [{"id": 2, "actual": ["some", "multiple", "tokens"]}]
  },
  "stat": [{"rows": 2}]
}

Details

Basic schema

Schema is consisted with schema name, variable name and type annotation.

[schema-name] [var-name: type]

You can use [A-Za-z0-9-_] for names.

Sub schema

[my-schema] [sub-schema] [some: int] [other: str] [data: bool]

You can add any sub schema. But if you add sub schema, you cannot add new schema with same schema name without sub schema.

[my-schema] [no: int] [sub: str] [schema: str]
# this will cause error

Ignore (TODO)

[2024-03-04 13:22:56] [DEBUG] [necessary] [from] [this part]

Regular log file may contain unnecessary data. You can specify parser to ignore [2024-03-04 13:22:56] [DEBUG] part.

# This will ignore first two elements for all lines.
parser.ignore_prefix("[$timestamp] [$log_level]")
parser.add_schema("[necessary] [from] [this: str]")

You can use parser.ignore_prefix("[$timestamp] [$log_level]", save_ignored=True) to save ignored data in result.

Duplicating names

Sometimes, you may want to use same name multiple times. You can distinguish them using additional tags.

[my-schema] [node 1] [node 2] [node 3]

Tag is added like node$some-tag, after $. Data should not contain tags: they will be only used in schema.

parser.add_schema("[my-schema] [node$0: int] [node$1: int] [node$2: int]")
result = parser.loads("[my-schema] [node 1] [node 2] [node 3]\n")
result["my-schema"][0]["node$0"] == 1

Name matching

If there are additional element in data, it will be ignored. The sequence of the names should not be changed.

parser.add_schema("[my-schema] [node: int] [value: int]")
data = "[my-schema] [node 1] [unknown element] [value 3]\n"
result = parser.loads(data)
result["my-schema"][0] == { "node": 1, "value": 3 }

Ordering

You may need a global ordering of each line.

parser.add_schema("[data] [string] [id: int] [actual: str]")
parser.add_schema("[data] [token] [id: int] [actual: list[str]]")
result = parser.load(f)
# This returns all elements in order
elems_all = parser.get_result_in_order()
# This returns elements matching names in order
# If it contains sub-schema, use $
# For example, [data] [string] [id: int] -> "data$string"
elems = parser.get_result_in_order(["[data] [string]", "[data] [token]"])
# You can also use ["data$string", "data$token"]

Or, you can get schema id (data$string and data$token) like this:

sbsv.get_schema_id("node") == "node"
sbsv.get_schema_id("data", "string") == "data$string"
# this is equal to 
sbsv.get_schema_id("data", "string") == '$'.join(["data", "string"])

Group

[data] [begin]
[block] [data 1]
[block] [data 2]
[data] [end]
[data] [begin]
[block] [data 3]
[block] [data 4]
[data] [end]

You can group block 1, 2

# First, add all to schema
parser.add_schema("[data] [begin]")
parser.add_schema("[data] [end]")
parser.add_schema("[block] [data: int]")
# Second, add group name, group start, group end
parser.add_group("data", "[data] [begin]", "[data] [end]")
parser.load(sbsv_file)
# Iterate groups
for block in parser.iter_group("data"):
  print("group start")
  for block_data in block:
    if block_data.schema_name == "block":
      print(block_data["data"])
# Or, use index
block_indices = parser.get_group_index("data")
for index in block_indices:
  print("use index")
  for block in parser.get_result_by_index("[block]", index):
    print(block["data"])

Output:

group start
1
2
group start
3
4
use index
1
2
use index
3
4

You can use group without closing schema.

[group-wo-closing] [new-group a]
[some] [data 9]
[some] [data 8]
[some] [data 7]
[group-wo-closing] [new-group b]
[some] [data 6]
[some] [data 5]
[group-wo-closing] [new-group c]
[some] [data 4]

# First, add all to schema
parser.add_schema("[group-wo-closing] [new-group: str]")
parser.add_schema("[some] [data: int]")
# Second, add group name, group start == group end
parser.add_group("new-group", "[group-wo-closing]", "[group-wo-closing]")
parser.load(sbsv_file)
# Iterate groups
for block in parser.iter_group("new-group"):
  print("group start")
  for block_data in block:
    if block_data.schema_name == "some":
      print(block_data["data"])
# Or, use index
block_indices = parser.get_group_index("new-group")
for index in block_indices:
  print("use index")
  for block in parser.get_result_by_index("[some]", index):
    print(block["data"])

Output

group start
9
8
7
group start
6
5
group start
4
use index
9
8
7
use index
6
5
use index
4

Primitive types

Primitive types are str, int, float, bool, null.

Complex types

nullable

[car] [id 1] [speed 100] [power 2] [price]
[car] [id 2] [speed 120] [power 3] [price 33000]

parser.add_schema("[car] [id: int] [speed: int] [power: int] [price?: int]")

Note: currently, not applicable for first element.

parser.add_schema("[car] [id?: int] [speed: int] [power: int] [price: int]")

list

[data] [token] [id 2] [actual [some] [multiple] [tokens]]

parser.add_schema("[data] [token] [id: int] [actual: list[str]]")

Custom types

You can define your own types by providing a converter function that takes a string and returns a value (x: str -> custom_type).

parser = sbsv.parser()

# Define a custom type "hex" to parse hexadecimal numbers
parser.add_custom_type("hex", lambda x: int(x, 16))

# Use the custom type in schema
parser.add_schema("[data] [id: hex] [val: hex]")

result = parser.loads("""
[data] [id ff] [val deadbeef]
""")

# result["data"][0]["id"] == 255
# result["data"][0]["val"] == 3735928559

Notes:

Register custom types before adding schemas that reference them for best performance.

Utilities

Line parser (stateless) (TODO)

If you want to parse single line, you can use line_parser

parser = sbsv.line_parser()
parser.add_schema("[node] [id: int] [value: int]")
parser.add_schema("[edge] [src: int] [dst: int] [value: int]")
result = parser.loads("[node] [id 1] [value 2]")
# result == SbsvData(schema_name="node", data={"id": 1, "value": 2})
# Note: result is not dict, but SbsvData object.

You cannot use parser.get_result() or parser.get_result_in_order() for line_parser - it does not store results in parser, but return directly. This can be useful in cases like parsing log lines one by one, without storing them in memory.

Body parser (TODO)

parser = sbsv.body_parser("[id: int] [value: int]")
result = parser.loads("[id 1] [value 2]")
# result == {"id": 1, "value": 2}

This only takes schema body, without schema name. It is useful when you want to parse data without caring about schema name. For example, it can be used for custom types that implements nested type.

parser = sbsv.parser()
body_parser = sbsv.body_parser("[id: int] [value: int]")
def custom_type_converter(x: str):
    return body_parser.loads(x)
parser.add_custom_type("mytype", custom_type_converter)
parser.add_schema("[data] [val: mytype]")
result = parser.loads("[data] [val [id 1] [value 2]]")
# result["data"][0]["val"] == {"id": 1, "value": 2}

Escape sequences for string

[car] [id 1] [name "\[name with square bracket\]"]
f"[car] [id {id}] [name {sbsv.escape_str("[name with square bracket]")}]"

Use sbsv.escape_str() to get escaped string and sbsv.unescape_str() to get original string from escaped string.

Contribute

Install uv

# Linux
curl -LsSf https://astral.sh/uv/install.sh | sh
uv sync

You should run black linter before commit.

uv run black .

Before implementing new features or fixing bugs, add new tests in tests/.

uv run pytest

Build and update

uv build
uv publish

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
examples/c		examples/c
libsbsv		libsbsv
sbsv		sbsv
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SBSV: square bracket separated values

Install

C library (experimental)

Use

Details

Basic schema

Sub schema

Ignore (TODO)

Duplicating names

Name matching

Ordering

Group

Primitive types

Complex types

nullable

list

Custom types

Utilities

Line parser (stateless) (TODO)

Body parser (TODO)

Escape sequences for string

Contribute

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SBSV: square bracket separated values

Install

C library (experimental)

Use

Details

Basic schema

Sub schema

Ignore (TODO)

Duplicating names

Name matching

Ordering

Group

Primitive types

Complex types

nullable

list

Custom types

Utilities

Line parser (stateless) (TODO)

Body parser (TODO)

Escape sequences for string

Contribute

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages