Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

title case may be no right when the en-dash exist in title #9068

Closed
zousiyu1995 opened this issue Aug 17, 2022 · 11 comments · Fixed by #9102
Closed

title case may be no right when the en-dash exist in title #9068

zousiyu1995 opened this issue Aug 17, 2022 · 11 comments · Fixed by #9102
Labels
bib(la)tex good first issue An issue intended for project-newcomers. Varies in difficulty. type: enhancement

Comments

@zousiyu1995
Copy link

When the en-dash exist in the title, see bolded part,

Kinetic Studies on Enzyme-Catalyzed Reactions: Oxidation of Glucose, Decomposition of Hydrogen Peroxide and Their Combination

I run the "title case" and noticed that the initial letter of the word after the en-dash is lowercased (jabref generated Enzyme-catalyzed), is this correct?
I want it to be capitalized (what i want is Enzyme-Catalyzed). Is this possible?

image

full bib text is shown as following,

@Article{Tao2009-ping-pong,
  author       = {Tao, Zhimin and Raffel, Ryan A. and Souid, Abdul-Kader and Goodisman, Jerry},
  title        = {Kinetic Studies on Enzyme-Catalyzed Reactions: Oxidation of Glucose, Decomposition of Hydrogen Peroxide and Their Combination},
  number       = {7},
  pages        = {2977--2988},
  volume       = {96},
  date         = {2009},
  doi          = {10.1016/j.bpj.2008.11.071},
  journaltitle = {Biophysical Journal},
  type         = {Journal article},
  url          = {https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2711289/},
}
@Siedlerchr
Copy link
Member

Thanks for the feedback, currently JabRef does only check words separated by whitespace if I understand the code correctly. No dashes or other characters are considered. I think it should be easy to adapt it to consider words directly following a dash as well. One needs to check if the current char is a dash as well.

The TitleParser splits words by whitepace:

for (char c : title.toCharArray()) {
if (Character.isWhitespace(c)) {
createWord(isProtected).ifPresent(words::add);
} else {
if (wordStart == -1) {
wordStart = index;
}
buffer.append(c);
}
index++;
}
createWord(isProtected).ifPresent(words::add);
return words;
}

@Siedlerchr Siedlerchr added good first issue An issue intended for project-newcomers. Varies in difficulty. type: enhancement labels Aug 18, 2022
@Siedlerchr Siedlerchr added this to Normal priority in Features & Enhancements via automation Aug 18, 2022
@jvsdurso
Copy link

Hi. Can you describe to me, step by step, how do I reach this part of the JabRef you talking about? I wanna help coding this issue.

Thanks ;)

@zousiyu1995
Copy link
Author

Hi. Can you describe to me, step by step, how do I reach this part of the JabRef you talking about? I wanna help coding this issue.

Thanks ;)

image

@ThiloteE
Copy link
Member

Changing this to ALWAYS capitalize might be wrong as well, if following certain grammars or citationstyles:

Capitalize each word that is a word on its own, as in "Protein–Protein Interaction". If the hyphen adds a prefix that doesn't stand on its own, like "Non-protein Elements", then capitalize the prefix. But I've seen it both ways.

Refs. https://english.stackexchange.com/questions/26964/capitalization-of-words-with-dashes-in-titles

But to be honest ... does it really matter? Just make "capitalize" to make it all capital, including after a dash. That's it. These Grammars are too complicated. Grammars should make live easier, not more complicated...

@Siedlerchr
Copy link
Member

Agree with Thilo. We capitalize the word after the dash always.

@zousiyu1995 Codewise take a look at the TitleCaseFormatter and the corresponding test. Add an example with a dash to the test case so you can directly check if your code works

https://github.com/JabRef/jabref/blob/bb011c9313367a28990ae213b3920fe6cd10d1dc/src/main/java/org/jabref/logic/formatter/casechanger/TitleCaseFormatter.java

@ryan-carpenter
Copy link

Add an example with a dash to the test case so you can directly check if your code works

Also hyphen (U+2010) en-dash (U+2013), em-dash (U+2014) and possibly other dash-like characters. I have also encountered the minus sign (U+2212) used as a hyphen.

Some of these characters may be surrounded by zero width spaces. These will probably be recognized as white spaces—I am not sure about Java—but can still cause unexpected results if, for example, you write a regular expression that relies on a specific number of characters. Likewise, the input string may contain one or more unexpected (visible) spaces, as in "Non -protein elements".

@ryan-carpenter
Copy link

But to be honest ... does it really matter? Just make "capitalize" to make it all capital, including after a dash.

Those of use who think so choose sentence case whenever possible! Eager capitalization is a good choice for title case.

@jvsdurso
Copy link

One more question, what should I do about the words with two or more hyphens? Like "brother-in-law", "face-to-face", etc. And how should I handle wrong words like "computer--based", or it never happens?

@mlep
Copy link
Contributor

mlep commented Aug 26, 2022

Could the "simple rule" be:
"Capitalize as if hyphens and dashes where space characters, except for the em-dash (---) which is the equivalent of a period (.)"
?

@jvsdurso
Copy link

jvsdurso commented Aug 27, 2022

Could the "simple rule" be: "Capitalize as if hyphens and dashes where space characters, except for the em-dash (---) which is the equivalent of a period (.)" ?

What do you mean by equivalent of a period? Found this guide https://www.thepunctuationguide.com/em-dash.html

@mlep
Copy link
Contributor

mlep commented Aug 28, 2022

Your guide is right about the detailed rules, but I believe it will be difficult (and not that useful) to develop an algorithm that matches the cases.
By period (.), I meant "put a capital letter after en em-dash".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bib(la)tex good first issue An issue intended for project-newcomers. Varies in difficulty. type: enhancement
Projects
Archived in project
6 participants