Skip to content

MBCS converter using Rustler with encoding crate

License

Notifications You must be signed in to change notification settings

enpedasi/mbcs_rs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MbcsRs

Build Status

Charactor encoding support for Elixir. using ruster & "rust-encoding" crate. (Shift-JIS, EUC-JP, Big5.. other WHATWG encoding)

Installation

If available in Hex, the package can be installed by adding mbcs_rs to your list of dependencies in mix.exs:

def deps do
  [
    {:mbcs_rs, "~> 0.1"}
  ]
end

Usage:

iex> MbcsRs.encode!("日本語", "SJIS") |> MbcsRs.decode!("SJIS")
日本語

iex> MbcsRs.encode!("你好,世界", "BIG5") |> MbcsRs.decode!("BIG5") 
"你好,世界"

iex> MbcsRs.encode!("한국어", "EUC-KR") |> MbcsRs.decode!("EUC-KR")
"한국어"

iex> File.stream!("KEN_ALL.CSV") \
 |> Stream.map(&MbcsRs.decode!(&1,"SJIS")) \
 |> Stream.filter(&String.contains?(&1,"福岡市中央区")) \
 |> Enum.to_list
["40133,\"810  \",\"8100000\",\"フクオカケン\",\"フクオカシチユウオウク\",\"イカニケイサイガナイバアイ\",\"福岡県\",\"福岡市中央区\",\"以下に掲載がない場合\",0,0,0,0,0,0\n",
...
 "40133,\"810  \",\"8100037\",\"フクオカケン\",\"フクオカシチユウオウク\",\"ミナミコウエン\",\"福岡県\",\"福岡市中央区\",\"南公園\",0,0,0,0,0,0\n",
 "40133,\"810  \",\"8100022\",\"フクオカケン\",\"フクオカシチユウオウク\",\"ヤクイン\",\"福岡県\",\"福岡市中央区\",\"薬院\",0,0,1,0,0,0\n",
 ...]

Supporting Other Encodings. See WHATWG encoding spec

requirement

Rust compiler & cargo

example for alpine linux

apk add musl-dev rust cargo