Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Arg to define glyphset without checking an existing font #86

Open
RosaWagner opened this issue Jun 24, 2022 · 5 comments
Open

Arg to define glyphset without checking an existing font #86

RosaWagner opened this issue Jun 24, 2022 · 5 comments
Assignees
Labels
documentation Improvements or additions to documentation wontfix This will not be worked on

Comments

@RosaWagner
Copy link

It would be nice to create/define glyph sets from number of speakers or orthography status.
For example; listing glyphs necessary to support languages > 20000 speakers

@kontur kontur added documentation Improvements or additions to documentation wontfix This will not be worked on labels Jun 27, 2022
@kontur kontur self-assigned this Jun 27, 2022
@kontur
Copy link
Contributor

kontur commented Jun 27, 2022

For the time being I'd prefer to keep the CLI for working on font files so as to not muddy its purpose. For charset building you could use the library in a script, for example:

from hyperglot.languages import Languages
from hyperglot.language import Language, Orthography

glyphs = []
for iso, info in Languages().items():
    lang = Language(info, iso)

    if "speakers" not in lang or lang["speakers"] < 20000: 
        continue

    orth = lang.get_orthography()
    if not orth:
        continue

    orth = Orthography(lang.get_orthography())
    glyphs.extend(orth.base_chars)
    glyphs.extend(orth.required_base_marks)

sorted(set(glyphs))

You may look at the constructor options for Languages (particularly validity level) and the Orthography attributes you are interested in (e.g. ignore particular scripts entirely). I know there is a couple of quirks with these objects, but a lot of this is done for augmenting, validating and decomposing the yaml data into actual codepoints. For example, no speaker count in the data is information as such, as is no orthography being given.

For use of the library we should add more documentation in form of concrete "How do I..." examples like this 👍

@iandoug
Copy link

iandoug commented Apr 19, 2023

Similar: given a language (/ code), print out characters in language. Assumes terminal is using font with appropriate glyphs.

Else can just browse the hyperglot.yaml file :-)

Thanks, Ian

@kontur
Copy link
Contributor

kontur commented Apr 19, 2023

Similar: given a language (/ code), print out characters in language. Assumes terminal is using font with appropriate glyphs.

One of the last updates introduced a feature like this. It's split from the main hyperglot command, use for example hyperglot-data eng or hyperglot-data Suomi — it will show info by iso code or attempt to find the language by name.

@frankrolf
Copy link

frankrolf commented Jul 25, 2024

I was looking for documentation on how to use Hyperglot to extract characters for a given language (basically, what’s visible on https://hyperglot.rosettatype.com). This was helpful – however, the API seems to have changed:

from hyperglot.languages import Languages
from hyperglot.language import Language

for iso, info in Languages().items():
    lang = Language(info, iso)
Traceback (most recent call last):
  File "/Users/fg/Desktop/hyper_test.py", line 6, in <module>
    lang = Language(info, iso)
  File "site-packages/hyperglot/language.py", line 56, in __init__
    data[key] = default
TypeError: 'str' object does not support item assignment

it seems that the info is no longer needed – what worked is this:

from hyperglot.languages import Languages
from hyperglot.language import Language

for iso, info in Languages().items():
    lang = Language(iso)

@kontur
Copy link
Contributor

kontur commented Jul 25, 2024

Yes, this is correct. We wanted to make it more easy (and efficient) to interact with the data, so you can simply Language(xxx). This got mentioned in the release notes, but of course individual comments/issues etc. are hard to keep up to date. A more comprehensive "how to" would be valuable, I agree.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

4 participants