Skip to content

Reads UTF-8 on stdin and prints out the raw Unicode codepoints. Useful for seeing exactly what a string consists of.

Notifications You must be signed in to change notification settings

lunasorcery/utf8info

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

40 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

utf8info

CircleCI

utf8info is a small utility that reads a UTF-8 stream and prints out the raw codepoint information. It's useful for spotting invisible control characters like U+202E RIGHT-TO-LEFT OVERRIDE, and interrogating complex Zero-Width-Joiner sequences like 👨‍👩‍👧‍👦, which is composed of 7 characters!

This tool supports codepoints from the latest published version of the Unicode Standard, sourcing data from the Unicode Character Database.

Building & Installing

On macOS and Linux, it should be as simple as running the following inside the utf8info directory:

make && make install

When a new version of the standard is released, you can fetch the latest UCD with make update, and then build as before.

Windows is not officially supported, but it'll likely work under WSL.

Note: Building utf8info depends on curl, unzip, and a C++17-compatible C++ compiler being present.

Options:

-v, --verbose       Enable verbose output. This prints the raw UTF-8 bytes next to the codepoint info.
-d, --definitions   Display definitions for CJK Unified Ideographs
-a, --all           List all known codepoints and exit.

About

Reads UTF-8 on stdin and prints out the raw Unicode codepoints. Useful for seeing exactly what a string consists of.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published