Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: disassembleHangul is incorrect for double vowel and double consonant #71

Closed
roeniss opened this issue Apr 20, 2024 · 3 comments
Closed
Assignees
Labels
bug Something isn't working

Comments

@roeniss
Copy link

roeniss commented Apr 20, 2024

Bug description

In official docs, disassembleHangul works as "한글 문자열을 글자별로 초성/중성/종성 단위로 완전히 분리하여" (in English, "seperates Korean works into onset/nucleus/coda syllables"), which is not what the function actually does.

  1. Double consonants (e.g., ㄳ, ㄽ) should be treated as a single syllable. Currently it doens't.
  2. Double vowels (e.g., ㅐ, ㅘ) should be treated as a single syllable. Currently it does sometimes.

Expected behavior

h.disassembleHangul('개') // I think this would be 'ㄱㅐ', and it is.
h.disassembleHangul('과') // I think this would be 'ㄱㅘ', but it isn't. it's `ㄱㅗㅏ`

To Reproduce

[email protected]

Possible Solution

skipped, because i'm unsure whether this is intentional or not.

etc.

Here's full test cases below. I think every assertion should be passed.

// double consonant 1 (ㄲ, ㄸ, ㅃ, ㅆ, ㅉ)
// onset 
h.disassembleHangul('까') == 'ㄲㅏ'
h.disassembleHangul('따') == 'ㄸㅏ'
h.disassembleHangul('빠') == 'ㅃㅏ'
h.disassembleHangul('싸') == 'ㅆㅏ'
h.disassembleHangul('짜') == 'ㅉㅏ'

// code
h.disassembleHangul('갂') == 'ㄱㅏㄲ'
h.disassembleHangul('갔') == 'ㄱㅏㅆ'

// double consonant 2 (ㄳ, ㄵ, ㄶ, ㄺ, ㄻ, ㄼ, ㄽ, ㄾ, ㄿ, ㅀ, ㅄ)
// code
h.disassembleHangul('갃') == 'ㄱㅏㄳ' // false
h.disassembleHangul('갅') == 'ㄱㅏㄵ' // false
h.disassembleHangul('갆') == 'ㄱㅏㄶ' // false
h.disassembleHangul('갉') == 'ㄱㅏㄺ' // false
h.disassembleHangul('갊') == 'ㄱㅏㄻ' // false
h.disassembleHangul('갋') == 'ㄱㅏㄼ' // false
h.disassembleHangul('갌') == 'ㄱㅏㄽ' // false
h.disassembleHangul('갍') == 'ㄱㅏㄾ' // false
h.disassembleHangul('갎') == 'ㄱㅏㄿ' // false
h.disassembleHangul('갏') == 'ㄱㅏㅀ' // false
h.disassembleHangul('값') == 'ㄱㅏㅄ' // false

// single vowel (ㅏ, ㅑ, ㅓ, ㅕ, ㅗ, ㅛ, ㅜ, ㅠ, ㅡ, ㅣ)
// nucleus
h.disassembleHangul('가') == 'ㄱㅏ'
h.disassembleHangul('갸') == 'ㄱㅑ'
h.disassembleHangul('거') == 'ㄱㅓ'
h.disassembleHangul('겨') == 'ㄱㅕ'
h.disassembleHangul('고') == 'ㄱㅗ'
h.disassembleHangul('교') == 'ㄱㅛ'
h.disassembleHangul('구') == 'ㄱㅜ'
h.disassembleHangul('규') == 'ㄱㅠ'
h.disassembleHangul('그') == 'ㄱㅡ'
h.disassembleHangul('기') == 'ㄱㅣ'

// double vowel (ㅐ, ㅒ, ㅔ, ㅖ, ㅘ, ㅙ, ㅚ, ㅝ, ㅞ, ㅟ, ㅢ)
// nucleus
h.disassembleHangul('개') == 'ㄱㅐ'
h.disassembleHangul('걔') == 'ㄱㅒ'
h.disassembleHangul('게') == 'ㄱㅔ'
h.disassembleHangul('계') == 'ㄱㅖ'
h.disassembleHangul('과') == 'ㄱㅘ' // false
h.disassembleHangul('괘') == 'ㄱㅙ' // false
h.disassembleHangul('괴') == 'ㄱㅚ' // false
h.disassembleHangul('궈') == 'ㄱㅝ' // false
h.disassembleHangul('궤') == 'ㄱㅞ' // false
h.disassembleHangul('귀') == 'ㄱㅟ' // false
h.disassembleHangul('긔') == 'ㄱㅢ' // false
@roeniss roeniss added the bug Something isn't working label Apr 20, 2024
@roeniss
Copy link
Author

roeniss commented Apr 21, 2024

I guess this is highly related to the common Korean keyboard layout, which makes sense in some ways.

I just wanted to point out that the inconsistency might add another cognitive load to users.

@KangYunHo1221 KangYunHo1221 self-assigned this May 14, 2024
@KangYunHo1221
Copy link

KangYunHo1221 commented May 14, 2024

Maybe some of its usage is re-assemble disassembled one. We might need options to deal with double consonant.

export function assembleHangul(words: string[]) {
  const disassembled = disassembleHangul(words.join('')).split('');
  return disassembled.reduce(binaryAssembleHangul);
}

assembleHangul(['값', 'ㅣ ', '너무', '빘 ', 'ㅏ']) // its useful to make as "갑시 너무 비싸"

@okinawaa
Copy link
Member

okinawaa commented Jun 1, 2024

Thank you for giving me a good opinion.
I'll keep the issue closed because there's no further discussion. If you need to discuss it further, please feel free to open the issue.

@okinawaa okinawaa closed this as completed Jun 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants