Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(*): ODict V2 #639

Draft
wants to merge 178 commits into
base: main
Choose a base branch
from
Draft

feat(*): ODict V2 #639

wants to merge 178 commits into from

Conversation

Nickersoft
Copy link
Member

@Nickersoft Nickersoft commented Feb 12, 2024

ODict V2 is a major rewrite of the ODict compiler and format, rewriting all of the core logic in Rust and introducing several new features. Rewriting it Rust makes ODict more versatile, allowing it to be more easily called from other languages without having the CLI installed, as well as compatibility with Rust-based platforms like Tauri.

Breaking Changes

As ODict V2 is a complete rewrite of the compiler, it is no longer backwards-compatible with files compiled with ODict V1.

  • The FlatBuffers schema has been replaced with the Rust-specific rkyv serialization library, which doesn't need external schema DSL files (+ faster lookups w/ built-in HashMap support)
  • Snappy compression has been replaced with LZ4 (smaller files)
  • bleve is now replaced with the faster tantivy library and uses the charabia tokenizer for better multilingual tokenization
  • The built-in server was moved from the core library to the CLI with a multi-dictionary support and a new endpoint schema

What's New

  • odict new command for initializing new ODXML files with the correct schema
  • odict merge now has the ability to merge multiple dictionaries into a single base dictionary
  • Custom tokenizers via the Tantivy SDK
  • New Rust crate (still unnamed) and Rust SDK, complete with a more intuitive way to read/create/write dictionaries in code
  • Node bindings via NAPI-RS, removing the need for having the CLI installed + odict service
  • Native Python bindings via PyO3
  • odict info command for printing out the top-level information of a file (format, name, etc.) in different formats (JSON, XML, etc.)

What's Deprecated

  • The Go SDK no longer exists, as no good interop between Go + Rust really exists, but can be revisited if there is a strong community demand
  • The hidden odict service command no longer exists

Remaining Todos for Core Library

  • SQL dumping
  • Built-in dictionary server

@Nickersoft Nickersoft marked this pull request as draft February 12, 2024 09:14
renovate bot added 30 commits June 18, 2024 02:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant