Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLDR-15101 Support for Ethiopic-Latin (Morse Code) Transliteration #1608

Merged
merged 1 commit into from
Feb 2, 2022
Merged

CLDR-15101 Support for Ethiopic-Latin (Morse Code) Transliteration #1608

merged 1 commit into from
Feb 2, 2022

Conversation

dyacob
Copy link
Contributor

@dyacob dyacob commented Nov 16, 2021

CLDR-15101

  • This PR completes the ticket.

@srl295
Copy link
Member

srl295 commented Nov 17, 2021

Is this the right way to represent morse code in Unicode? We might want to consider that a little further.

@dyacob
Copy link
Contributor Author

dyacob commented Nov 17, 2021

The most current specification that I can locate for Morse Code is ITU-R M.1677-1 which has an "in force" status. The recommendation does not address Unicode representation directly, but does appear to use U+002E consistently for "dot", then U+2212 primarily for "dash", but U+2013 also appears. The authors may have just left it to the word processor to format:

https://www.itu.int/dms_pubrec/itu-r/rec/m/R-REC-M.1677-1-200910-I!!PDF-E.pdf

@srl295
Copy link
Member

srl295 commented Nov 22, 2021

probably blocked until CLDR-15191 is discussed

@jira-pull-request-webhook
Copy link

Hooray! The files in the branch are the same across the force-push. 😃

~ Your Friendly Jira-GitHub PR Checker Bot

@dyacob
Copy link
Contributor Author

dyacob commented Nov 23, 2021

@srl295, understood about the blocking, I've pushed in an update that uses am-Ethi-t-d0-morse as a forward alias, so the transliteration is ready when CLDR-15191 is ready and can be easily tweaked if needed.

@srl295 srl295 self-assigned this Nov 30, 2021
@srl295 srl295 self-requested a review November 30, 2021 21:18
Copy link
Member

@srl295 srl295 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG so far

Comment on lines +665 to +671
# TODO: Seek a better way to handle this. The conversion adds a space to the end of every sequence to terminate it.
# The CLDR "TestTransforms" unit test will strip off the space at the end of a line in the am-Latn-t-am-Ethi-m0-morse-code.txt
# test file thus breaking the comparison with generated words. The following will strip off the space at the end of
# a converted string to fascilitate compatibility with the unit tests.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it makes some sense to not generate an extra space. However this will make 73 -> --... ...-- whereas 7+3 = --......--

Copy link
Contributor Author

@dyacob dyacob Dec 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Converted letters would run together too. For example "at" is read clearly as ".- -" . Without the " " added after ".-" the conversion would be ".--" which would be misinterpreted as "w" . The issue is the space between characters, but not between words. At the end of a word though, the last character still gets the " " which is now superfluous since a word boundary token is naturally expected.

Thinking about it again now though... "/" is the normal word boundary in ASCII form (the conversion of " "), which this transliterator is not supporting. I should add it if everything else looks ok.

Copy link
Contributor Author

@dyacob dyacob Feb 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@srl295 , the mapping of 73 -> --... ...-- , might be OK. The https://morsedecoder.com/ service produces this output (taking a leap of faith here to assume the service is valid).

@jira-pull-request-webhook
Copy link

Hooray! The files in the branch are the same across the force-push. 😃

~ Your Friendly Jira-GitHub PR Checker Bot

CLDR-15101 Support for Ethiopic-Latin (Morse Code) Transliteration
@jira-pull-request-webhook
Copy link

Hooray! The files in the branch are the same across the force-push. 😃

~ Your Friendly Jira-GitHub PR Checker Bot

@srl295 srl295 merged commit 6b91866 into unicode-org:main Feb 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants