Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ jobs:
- name: golangci-lint
uses: golangci/golangci-lint-action@v6
with:
version: latest
version: v1.64.8

vulncheck:
runs-on: ubuntu-latest
Expand Down
13 changes: 10 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -157,6 +157,13 @@ err := ops.Reindex(ctx, "input.mkv", "output.mkv")

`reader.ReadStream` parses metadata and returns a `*BlockReader` from an `io.Reader` without ever calling Seek. `writer.NewStreamWriter` writes a live MKV stream to an `io.Writer` using unknown-size Segment and Clusters. See [docs/library.md](docs/library.md) for details.

**Remux a file to WebM:**
```go
// Validates the codecs (VP8/VP9/AV1, Vorbis/Opus, WebVTT), copies the media
// verbatim into a webm-DocType container, rejects non-WebM codecs.
err := matroska.RemuxToWebM(ctx, "in.mkv", "out.webm")
```

**Edit metadata with custom FS (S3, HTTP, etc.):**
```go
s3fs := &matroska.FS{
Expand All @@ -178,12 +185,12 @@ err := matroska.EditMetadata(ctx, "s3://bucket/movie.mkv", "s3://bucket/out.mkv"
cmd/mkvgo/ CLI binary
commands/ one file per command group

matroska/ facade -- re-exports everything, backward compat
matroska/ facade -- stable public API, re-exports everything

mkv/ core types, FS port, EBML IDs
mkv/ core types, FS port, EBML IDs (experimental, may change)
reader/ parse MKV → Container
writer/ Container → MKV bytes
ops/ high-level operations (mux, split, merge, edit...)
ops/ high-level operations (mux, split, merge, edit, remux-webm...)
subtitle/ SRT/ASS parsing

ebml/ low-level EBML encoding/decoding (no Matroska knowledge)
Expand Down
4 changes: 2 additions & 2 deletions docs/cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -205,7 +205,7 @@ mkvgo edit <file.mkv> -o <out.mkv> -
The JSON is a partial `Container` struct. Only fields you include are changed.

```bash
mkvgo edit movie.mkv -o out.mkv '{"title":"New Title"}}'
mkvgo edit movie.mkv -o out.mkv '{"title":"New Title"}'
cat patch.json | mkvgo edit movie.mkv -o out.mkv -
```

Expand Down Expand Up @@ -255,7 +255,7 @@ mkvgo edit-inplace <file.mkv> '<json>'
```

```bash
mkvgo edit-inplace movie.mkv '{"title":"Quick Fix"}}'
mkvgo edit-inplace movie.mkv '{"title":"Quick Fix"}'
```

### remove-track
Expand Down
44 changes: 38 additions & 6 deletions docs/library.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,9 @@
| `matroska` | `github.com/gravity-zero/mkvgo/matroska` | Facade -- re-exports everything |
| `ebml` | `github.com/gravity-zero/mkvgo/ebml` | Low-level EBML codec |

For most use cases, import `matroska` (the facade). Import sub-packages directly when you need fine-grained control.
`matroska` is the stable public API -- import it for most use cases. The `mkv`, `mkv/reader`, `mkv/writer`, `mkv/ops` and `mkv/subtitle` packages are lower-level and experimental: their APIs may change between minor versions. Import them directly when you need capabilities the facade does not expose (streaming, `NewWebMStreamWriter`).

Operations process the container incrementally -- they read and write block by block (or cluster by cluster) and never hold the whole file in memory, so multi-gigabyte inputs run with bounded memory.

---

Expand Down Expand Up @@ -71,6 +73,34 @@ err := writer.Write(&buf, container)

---

## WebM Output

WebM is a constrained Matroska profile: the `webm` DocType and a small codec set (VP8/VP9/AV1 video, Vorbis/Opus audio, WebVTT subtitles).

Check whether a `Container` can be written as WebM:

```go
if err := matroska.ValidateWebM(container); err != nil {
// Names each track whose codec is outside the WebM subset, or which is
// missing mandatory init data (Opus OpusHead, Vorbis headers, AV1 av1C).
return err
}
```

`matroska.WriteWebM` writes the `webm` DocType (version 4 when an AV1 track is present, else 2) plus Info and Tracks. Like `writer.Write`, it writes metadata only -- no clusters.

For a complete, playable WebM with frames, remux a source file:

```go
err := matroska.RemuxToWebM(ctx, "in.mkv", "out.webm")
```

`RemuxToWebM` validates the codecs, copies every block verbatim into time-bounded `webm` clusters, and rejects sources with non-WebM codecs. Elements outside the WebM subset (Chapters, Attachments, Tags) are dropped; list them beforehand with `matroska.WebMNonSubsetElements(container)`.

To write a WebM stream live (no source file), use `writer.NewWebMStreamWriter` -- the WebM counterpart of `NewStreamWriter`.

---

## Mux / Demux

**Mux** -- combine tracks from multiple sources:
Expand Down Expand Up @@ -344,16 +374,16 @@ err := matroska.RemoveTrack(ctx, "in.mkv", "out.mkv", []uint64{3}, opts)
```go
import "github.com/gravity-zero/mkvgo/mkv/subtitle"

entries, err := subtitle.ParseSRT(srtReader)
// []subtitle.SRTEntry{Index, StartMs, EndMs, Text}
entries, err := subtitle.ParseSRT("subs.srt")
// []subtitle.SRTEntry{StartMs, EndMs, Text}
```

### ASS/SSA

```go
assFile, err := subtitle.ParseASS(assReader)
// assFile.ScriptInfo, assFile.Styles, assFile.Events
// Each event: Layer, Start, End, Style, Name, Text
assFile, err := subtitle.ParseASS("subs.ass")
// assFile.Header (raw [Script Info] + [V4+ Styles] block), assFile.Events
// Each subtitle.ASSEvent{StartMs, EndMs, Fields}
```

### Extract from MKV
Expand Down Expand Up @@ -382,6 +412,8 @@ err := matroska.MergeASS(ctx, "movie.mkv", "subs.ass", "out.mkv", "jpn", "Japane

All functions return `error`. No panics, no logging.

The reader tolerates corrupted bodies: a zeroed or padded region between clusters (seen in some real-world rips) does not abort the read -- the parser resyncs to the next valid Cluster and returns the metadata gathered so far. A damaged EBML/Segment header still returns an error. Malformed input never panics.

```go
c, err := matroska.Open(ctx, path)
if err != nil {
Expand Down
42 changes: 29 additions & 13 deletions ebml/primitives.go
Original file line number Diff line number Diff line change
Expand Up @@ -53,13 +53,36 @@ func ReadFloat(r io.Reader, size int64) (float64, error) {
return math.Float64frombits(binary.BigEndian.Uint64(buf)), nil
}

// ReadString reads a UTF-8/ASCII string, trimming trailing nulls.
func ReadString(r io.Reader, size int64) (string, error) {
// readExact reads exactly size bytes. For sizes above a small threshold it
// grows the buffer incrementally (via io.LimitReader) instead of allocating
// size bytes upfront, so a malformed element that declares a huge size but
// supplies little data cannot force a giant allocation (memory DoS).
func readExact(r io.Reader, size int64) ([]byte, error) {
if err := checkSize(size); err != nil {
return "", err
return nil, err
}
buf := make([]byte, size)
if _, err := io.ReadFull(r, buf); err != nil {
const maxUpfront = 1 << 20 // 1 MiB: allocate exactly for the common small case
if size <= maxUpfront {
buf := make([]byte, size)
if _, err := io.ReadFull(r, buf); err != nil {
return nil, err
}
return buf, nil
}
buf, err := io.ReadAll(io.LimitReader(r, size))
if err != nil {
return nil, err
}
if int64(len(buf)) != size {
return nil, io.ErrUnexpectedEOF
}
return buf, nil
}

// ReadString reads a UTF-8/ASCII string, trimming trailing nulls.
func ReadString(r io.Reader, size int64) (string, error) {
buf, err := readExact(r, size)
if err != nil {
return "", err
}
for len(buf) > 0 && buf[len(buf)-1] == 0 {
Expand All @@ -70,12 +93,5 @@ func ReadString(r io.Reader, size int64) (string, error) {

// ReadBytes reads raw bytes.
func ReadBytes(r io.Reader, size int64) ([]byte, error) {
if err := checkSize(size); err != nil {
return nil, err
}
buf := make([]byte, size)
if _, err := io.ReadFull(r, buf); err != nil {
return nil, err
}
return buf, nil
return readExact(r, size)
}
24 changes: 24 additions & 0 deletions ebml/primitives_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -104,6 +104,30 @@ func TestReadBytes_ExceedsMaxSize(t *testing.T) {
}
}

// TestReadBytes_HugeDeclaredSizeNoAlloc guards against a memory DoS: a tiny
// input that declares a near-MaxElementSize element must fail with the data
// available, NOT allocate the declared size upfront. (If it over-allocated, this
// test would spike ~256 MB / be killed under the race detector's memory limits.)
func TestReadBytes_HugeDeclaredSizeNoAlloc(t *testing.T) {
r := bytes.NewReader([]byte{1, 2, 3, 4, 5})
_, err := ReadBytes(r, 256*1024*1024) // 256 MB declared, 5 bytes available
if err == nil {
t.Fatal("expected error for truncated huge element")
}
}

func TestCheckSizeBoundary(t *testing.T) {
if err := checkSize(MaxElementSize); err != nil {
t.Errorf("checkSize(MaxElementSize) = %v, want nil (cap is inclusive)", err)
}
if err := checkSize(MaxElementSize + 1); err == nil {
t.Error("checkSize(MaxElementSize+1) = nil, want error")
}
if err := checkSize(-1); err == nil {
t.Error("checkSize(-1) = nil, want error")
}
}

func TestReadBytes_NegativeSize(t *testing.T) {
r := strings.NewReader("x")
_, err := ReadBytes(r, -1)
Expand Down
11 changes: 8 additions & 3 deletions ebml/reader.go
Original file line number Diff line number Diff line change
Expand Up @@ -26,11 +26,11 @@ func ReadVINT(r io.Reader) (uint64, int, error) {

val := uint64(b)
if width > 1 {
rest := make([]byte, width-1)
if _, err := io.ReadFull(r, rest); err != nil {
var rest [7]byte // width is 1..8, so width-1 fits; avoids a heap alloc per VINT
if _, err := io.ReadFull(r, rest[:width-1]); err != nil {
return 0, 0, err
}
for _, rb := range rest {
for _, rb := range rest[:width-1] {
val = (val << 8) | uint64(rb)
}
}
Expand All @@ -44,6 +44,11 @@ func ReadElementID(r io.Reader) (uint32, int, error) {
if err != nil {
return 0, 0, err
}
if n > 4 {
// EBML element IDs are 1-4 octets; a wider VINT would silently truncate
// into uint32. Reject it rather than corrupt the parse.
return 0, n, fmt.Errorf("invalid element ID: %d-octet VINT exceeds 4-octet limit", n)
}
return uint32(val), n, nil
}

Expand Down
14 changes: 14 additions & 0 deletions ebml/reader_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -102,3 +102,17 @@ func TestReadVINT_MultiByteValues(t *testing.T) {
}
}
}

func TestReadElementID_RejectsWideVINT(t *testing.T) {
// A 5-octet VINT (first set bit at position 3 -> width 5) is not a valid
// element ID and would silently truncate into uint32.
_, _, err := ReadElementID(bytes.NewReader([]byte{0x08, 0x00, 0x00, 0x00, 0x01}))
if err == nil {
t.Fatal("expected error for 5-octet element ID")
}
// A valid 4-octet ID (EBML header) must still parse.
id, n, err := ReadElementID(bytes.NewReader([]byte{0x1A, 0x45, 0xDF, 0xA3}))
if err != nil || n != 4 || id != IDEBMLHeader {
t.Fatalf("4-octet ID: id=0x%X n=%d err=%v", id, n, err)
}
}
82 changes: 71 additions & 11 deletions matroska/matroska.go
Original file line number Diff line number Diff line change
@@ -1,5 +1,12 @@
// Package matroska provides backward-compatible access to the mkvgo toolkit.
// New code should import mkv, mkv/reader, mkv/writer, mkv/ops, mkv/subtitle directly.
// Package matroska is the stable, supported public API of mkvgo: a small,
// curated facade over the lower-level building blocks in mkv and its
// subpackages. Prefer it for application code — its exported surface is the one
// kept backward-compatible.
//
// The mkv, mkv/reader, mkv/writer, mkv/ops and mkv/subtitle packages are
// lower-level and EXPERIMENTAL: their APIs may change between minor versions.
// Import them directly only for capabilities this facade does not expose yet
// (e.g. streaming readers/writers, NewWebMStreamWriter).
package matroska

import (
Expand Down Expand Up @@ -79,18 +86,71 @@ func Write(w io.Writer, c *Container) error {
return writer.Write(w, c)
}

// WriteWebM writes c as a WebM file (validates WebM codec compatibility, then
// writes with the "webm" DocType). See mkv.ValidateWebM.
func WriteWebM(w io.Writer, c *Container) error {
return writer.WriteWebM(w, c)
}

// ValidateWebM reports whether c can be written as WebM, naming any track whose
// codec falls outside the WebM subset (VP8/VP9/AV1, Vorbis/Opus, WebVTT).
func ValidateWebM(c *Container) error {
return mkv.ValidateWebM(c)
}

// IsWebMCodec reports whether a codec (short name "vp9" or Matroska id "V_VP9")
// is permitted in WebM.
func IsWebMCodec(codec string) bool {
return mkv.IsWebMCodec(codec)
}

// WebMDocTypeVersion returns the EBML DocTypeVersion needed for c as WebM
// (4 when an AV1 track is present, else 2).
func WebMDocTypeVersion(c *Container) uint64 {
return mkv.WebMDocTypeVersion(c)
}

// RemuxToWebM reads srcPath and writes a complete, playable WebM file to
// dstPath, copying the media verbatim. Rejects sources with non-WebM codecs.
// Non-subset elements (Chapters/Attachments/Tags) are dropped — see
// WebMNonSubsetElements to detect that loss beforehand.
func RemuxToWebM(ctx context.Context, srcPath, dstPath string, opts ...Options) error {
return ops.RemuxToWebM(ctx, srcPath, dstPath, opts...)
}

// WebMNonSubsetElements lists the elements in c (Chapters/Attachments/Tags) that
// a WebM remux will drop; empty means nothing is lost.
func WebMNonSubsetElements(c *Container) []string {
return mkv.WebMNonSubsetElements(c)
}

// --- Operations ---

func Mux(ctx context.Context, opts MuxOptions) error { return ops.Mux(ctx, opts) }
func Demux(ctx context.Context, opts DemuxOptions) error { return ops.Demux(ctx, opts) }
func Split(ctx context.Context, opts SplitOptions) ([]string, error) { return ops.Split(ctx, opts) }
func Join(ctx context.Context, sources []string, dstPath string) error {
return ops.Join(ctx, sources, dstPath)
func Mux(ctx context.Context, opts MuxOptions, extra ...Options) error {
return ops.Mux(ctx, opts, extra...)
}
func Demux(ctx context.Context, opts DemuxOptions, extra ...Options) error {
return ops.Demux(ctx, opts, extra...)
}
func Split(ctx context.Context, opts SplitOptions, extra ...Options) ([]string, error) {
return ops.Split(ctx, opts, extra...)
}
func Join(ctx context.Context, sources []string, dstPath string, opts ...Options) error {
return ops.Join(ctx, sources, dstPath, opts...)
}
func Merge(ctx context.Context, opts MergeOptions) error { return ops.Merge(ctx, opts) }
func Validate(ctx context.Context, path string) ([]Issue, error) { return ops.Validate(ctx, path) }
func Compare(ctx context.Context, pathA, pathB string) ([]Diff, error) {
return ops.Compare(ctx, pathA, pathB)
func Merge(ctx context.Context, opts MergeOptions, extra ...Options) error {
return ops.Merge(ctx, opts, extra...)
}
func Validate(ctx context.Context, path string, opts ...Options) ([]Issue, error) {
return ops.Validate(ctx, path, opts...)
}
func Compare(ctx context.Context, pathA, pathB string, opts ...Options) ([]Diff, error) {
return ops.Compare(ctx, pathA, pathB, opts...)
}

// Reindex rebuilds the seek index (Cues) of a file. See ops.Reindex.
func Reindex(ctx context.Context, srcPath, dstPath string, opts ...Options) error {
return ops.Reindex(ctx, srcPath, dstPath, opts...)
}

func RemoveTrack(ctx context.Context, srcPath, dstPath string, removeIDs []uint64, opts ...Options) error {
Expand Down
10 changes: 10 additions & 0 deletions mkv/doc.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
// Package mkv holds the core Matroska model (Container, Track, Block, …) shared
// by its subpackages reader, writer, ops and subtitle, which implement the
// parsing, serialisation and high-level operations.
//
// STABILITY: mkv and its subpackages are lower-level building blocks and are
// considered EXPERIMENTAL — their exported APIs may change between minor
// versions. For a stable, backward-compatible surface, use the top-level
// matroska package; reach for these packages directly only when you need
// capabilities matroska does not expose (streaming, custom operations).
package mkv
Loading
Loading