Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Markdown comments dissapear on new start when having # chars #7010

Closed
manurare opened this issue Oct 13, 2020 · 33 comments · Fixed by #8382
Closed

Markdown comments dissapear on new start when having # chars #7010

manurare opened this issue Oct 13, 2020 · 33 comments · Fixed by #8382
Labels
bug Confirmed bugs or reports that are very likely to be bugs entry-editor good first issue An issue intended for project-newcomers. Varies in difficulty.
Projects

Comments

@manurare
Copy link

JabRef version 5.1--2020-08-30--e023aa0 on Ubuntu 20.04

Hi all!
I am trying to add comments in the comment tab to different entries to get myself an own insight of what I am reading. I've read that from version 5.1 on the markdown style is available. What I am trying to do is create sections in the comment: goal, achievement, method, failure... like this:

Goal (#### Goal)

Lorem ipsum dolor sit amet, consectetur adipiscing elit,

Achievement (#### Achievement)

Lorem ipsum dolor sit amet, consectetur adipiscing elit,

Method (#### Method)

Lorem ipsum dolor sit amet, consectetur adipiscing elit,

I save the changes and everything looks to work correctly but everytime I reopen JabRef the # are gone and so the sections are not longer highlighted.

Goal Lorem ipsum dolor sit amet, consectetur adipiscing elit,
Achievement Lorem ipsum dolor sit amet, consectetur adipiscing elit,
Method Lorem ipsum dolor sit amet, consectetur adipiscing elit,

It is a pain to add the hashes everytime I reopen the software. Do you have any clue how to fix this?

@mlep
Copy link
Contributor

mlep commented Oct 13, 2020

Indeed, a basic markdown support was added (#6232).
But maybe it is too basic: while markdown syntax such as **bold** is preserved, using the pound (#) character affects JabRef behavior. I believe this is because JabRef uses # for BibTeX strings.

@Siedlerchr Siedlerchr changed the title Markdown comments dissapear on new start Markdown comments dissapear on new start when having # chars Oct 13, 2020
@Siedlerchr
Copy link
Member

Yep, @mlep is right. The # is a special character which indicates a bibtex string.
See https://docs.jabref.org/advanced/strings

I am thinking if it would be possible to escape this somehow for the comment field. Related problem in #7012

@koppor
Copy link
Member

koppor commented Oct 14, 2020

Maybe the internal handling of # takes place at a wrong thing. Currently, it is from the data model to writing and reading. Not from the data model to the presentation. The decision back then was that BibTeX data is presented as key/value and the value is a simple string (sequence of characters). When using BibTeX strings, the data gets a list of strings (character sequences) and BibTeX strings.

See the comment org.jabref.model.entry.BibEntry#toString:

    /**
     * This returns a canonical BibTeX serialization. Special characters such as "{" or "&" are NOT escaped, but written
     * as is. In case the JabRef "hack" for distinguishing "field = value" and "field = {value}" (in .bib files) is
     * used, it is output as "field = {#value#}", which may cause headaches in debugging. We nevertheless do it this way
     * to a) enable debugging the internal representation and b) save time at this method.
     * <p>
     * Serializes all fields, even the JabRef internal ones. Does NOT serialize "KEY_FIELD" as field, but as key.
     */
    @Override
    public String toString() {
        return CanonicalBibEntry.getCanonicalRepresentation(this);
    }

The handling of # during writing is desribed at org.jabref.logic.bibtex.FieldWriter#formatAndResolveStrings

    /**
     * This method handles # in the field content to get valid bibtex strings
     * <p>
     * For instance, <code>#jan# - #feb#</code> gets  <code>jan #{ - } # feb</code> (see @link{org.jabref.logic.bibtex.LatexFieldFormatterTests#makeHashEnclosedWordsRealStringsInMonthField()})
     */

I see following solutions

  • Solution 1: Add tests for the markdown case and try to fix org.jabref.logic.bibtex.FieldWriter#formatAndResolveStrings.
  • Solution 2: Quick solution is to replace the internal # in JabRef by a non-used characters such as §. Can be done seamlessly as the .bib file is not affected by this change; only our internal code and all documentation
  • Solution 3: Change the data type of a field value. From String to FieldValue. Meaning org.jabref.model.entry.BibEntry#setField(org.jabref.model.entry.field.Field, java.lang.String) changes to org.jabref.model.entry.BibEntry#setField(org.jabref.model.entry.field.Field, org.jabref.model.entry.field.Value) (org.jabref.model.entry.BibEntry#setField(org.jabref.model.entry.field.Field, java.lang.String)). This would bring JabRef's internal data model even more close to BibTeX

@teertinker
Copy link
Contributor

@Siedlerchr I am thinking if it would be possible to escape this somehow for the comment field. Related problem in #7012

I was just thinking, users may apply the markdown formatter to other fields (e.g. like abstract) to use the new markdown function. So just escaping the comment field may not be a real solution

@koppor
Copy link
Member

koppor commented Nov 10, 2020

Quick recap:

left: BibTeX; right: Entry Editor

     field1 = value     -> #value#
     field2 = {value}   -> value
     
     field1 = value # value --> #value# # #value#

Solution 1

# test

## test

Just check for # at the beginning of a line followed by a space. If yes, do not do any replacement magic.

Regex: $#+

  • Good, because no documentation needs to be changed
  • Bad, because "quick handling" is introduced

Solution 2

 field1 = value     -> %value%
 field2 = {value}   -> value
 
 field1 = value # value --> %value% # %value%
  • Using % is bad, because % is a LaTeX command for comments. However, a user does not write LaTeX comments inside a BibTeX string

  • Using & is bad, because & is a LaTeX command for tables

  • Using $ is bad, because $ is a LaTeX command for math mode

  • Using § is bad, because % is not found on a US keyboard

  • Bad, because % has a different meaning in JabRef's entry editor than in BibTeX.

Solution 3

This is domain-driven design to the max ^^.

Decision

We opt for Solution 1, because does not introduce any changes in the documentation and usage of JabRef

@koppor koppor added good first issue An issue intended for project-newcomers. Varies in difficulty. bug Confirmed bugs or reports that are very likely to be bugs and removed status: devcall labels Nov 10, 2020
@k3KAW8Pnf7mkmdSMPHz27
Copy link
Sponsor Member

Some thoughts regarding solution 1.

  1. Should issue Error when adding file containing # to entry with character #7012 also be quick fixed?
  2. Are linking to anchors allowed? (e.g., (a link)[#an-anchor] )
  3. Is it an option to attempt to identify a String constant instead? There is already code that I believe is to this end (it was part of the initial commit so I can't find any associated issue/PR),
    private static final Pattern RESOLVE_CONTENT_PATTERN = Pattern.compile(".*#[^#]+#.*");

    (?<!#)#\p{isL}+# might be an option, but at that point it might be better to not use regexp.

Nitpicking:

Regex: $#+

^ = beginning of a line

@koppor
Copy link
Member

koppor commented Nov 10, 2020

Some thoughts regarding solution 1.

1. Should issue #7012 also be quick fixed?

Not though thourghly, but yes, on the one hand sounds good.

On the other hand, I am thinking whether % as character would be better. OK, maybe, there are also fiels with # out there

2. Are linking to anchors allowed? (e.g., `(a link)[#an-anchor]`  )

Damn it. Sure thing. Also URLs having fragments. #.

Disable bibtex field resolving completly in abstract? Maybe, this refs your solution proposed at #7012 (comment).

In the devcall, we agreed that only power users use the string power of BibTeX strings (https://docs.jabref.org/advanced/strings). My take was that our UI in the entry editor currently is not that good to support it well (the improvment of the entry editor is Solution 3).

3. Is it an option to attempt to identify a String constant instead? There is already code that I believe is to this end (it was part of the initial commit so I can't find any associated issue/PR),
   https://github.com/JabRef/jabref/blob/f5c52a2ef6aafa3536eb4e3f93974c8219c790f3/src/main/java/org/jabref/model/database/BibDatabase.java#L48
   `(?<!#)#\p{isL}+#` might be an option, but at that point it might be better to not use regexp.

I don't fully understand the RegEx. Naivly, I would have searched for #[a-zA-Z1-9._-]+#, as only certain letters are allowed in BibTeX strings.

Nitpicking:

Regex: $#+

^ = beginning of a line

Ups 😇

@k3KAW8Pnf7mkmdSMPHz27
Copy link
Sponsor Member

k3KAW8Pnf7mkmdSMPHz27 commented Nov 11, 2020

I don't fully understand the RegEx. Naivly, I would have searched for #[a-zA-Z1-9._-]+#, as only certain letters are allowed in BibTeX strings.

Hum, yup, my regex was not thought through enough. I was going for Unicode support and thought that a # before a String would either be consumed or be "illegal". #[a-zA-Z1-9._-]+# is still better than .*#[^#]+#.*, if there is no Unicode support. I don't know enough about BibTeX/biblatex to find good edge cases.

I suppose my argument is that it might be easier to match the strings rather than the markdown, especially if the allowed characters are more limited than [^#].

  • However, a user does not write LaTeX comments inside a BibTeX string

Are you sure you want to jinx it? 😉
There is also url percent encoding.

Disable bibtex field resolving completly in abstract? Maybe, this refs your solution proposed at #7012 (comment).

Well, disable string resolution for most fields by default. But, I don't know enough about what goes on in the background of JabRef to even know if the settings below would make the comments save correctly.


Skärmavbild 2020-11-10 kl  19 48 35


Edit:
I tested the settings and they did not work as I'd expect.

@Siedlerchr
Copy link
Member

I think that kind of setting seems to be a viable solution
I have seen comments with latex code.

@Siedlerchr
Copy link
Member

You might need to spearate the fields by semicolon like for the wrap fields

@k3KAW8Pnf7mkmdSMPHz27
Copy link
Sponsor Member

@Siedlerchr it appears to have been a user bug, it is now working as expected 😳

Add a third option? Along the lines of "Only resolve strings in..." with some defaults (author, title, date, etc.?), perhaps add categories?
I guess this could be considered "Solution 1.b", as it attempts to avoid the issue rather than resolving anything.

@luis-valdez
Copy link
Contributor

Hello, is this issue still open?

If it is, I would like to work on it.

@k3KAW8Pnf7mkmdSMPHz27
Copy link
Sponsor Member

I think this issue is still open, but I do not know how it should be solved.

@sauliusg
Copy link
Contributor

sauliusg commented Apr 3, 2021

I think this issue is still open, but I do not know how it should be solved.

A quick lookup in http:https://www.bibtex.org/Format/ hints that "When quotation-marks are used, string concatenation using # is possible, but not when braces are used.". This can be interpreted in such a way that hash characters ('#') SHOULD be left unmodified in values delimited by '{}'. Apparently this was the behaviour of the previous JabRef version, wan't it?

If so, values in '{' could be parsed without modification until the matching closing '}', counting only unescaped instances of '{}'; and all other values should be string-expanded. Is this the way bibtex and biber handle them?

(I admit that I never use bibtex string definitions, so I can not say for sure how the string definitions behave).

@sauliusg
Copy link
Contributor

sauliusg commented Apr 3, 2021

Edit:
I tested the settings and they did not work as I'd expect.

Just tested: disabling string resolution for the 'abstract' field works; however, the fields in the text entry field need to be separated by semicolon, not by space:
Screenshot from 2021-04-03 12-51-24
Tested on the commit 1d1156f in https://github.com/JabRef/jabref.git, and on v5.3.

@Siedlerchr
Copy link
Member

So it's sufficient to add comment and abstract to the list ofdefault fields

@sauliusg
Copy link
Contributor

sauliusg commented Apr 3, 2021

So it's sufficient to add comment and abstract to the list of default fields

Seems like this would be a functional workaround. I'm testing right now the setting with 'abstract' and 'note' included; will add 'comment' as well. So far everything is working as needed, at least for me.

But is it correct to process strings in '{}' delimited bibtex fields? What you think about the interpretation of the bibtex.org reference above?

@sauliusg
Copy link
Contributor

sauliusg commented Apr 3, 2021

So it's sufficient to add comment and abstract to the list of default fields

Confirmed: when 'comment' is added to the 'Resolve strings for all fields except:' list, Markdown is saved and read in correctly.

@sauliusg
Copy link
Contributor

sauliusg commented Apr 3, 2021

So it's sufficient to add comment and abstract to the list of default fields

The suggested workaround helps against tampering with # symbols, but entering text with curly braces into a comment or notes text box causes parser issues again:

  1. If I add an unbalanced curly brace into comment, a syntax error is reported;
  2. If I add balanced but reciprocating curly braces (like in }{), they are written to the bibtexfile unescaped and then lost upon second parsing;
  3. If a string },{ is entered into a comment field as a text, the entry is saved (again unescaped) into the file, and this file causes error on the second reading; JabRef then looses this entry and "glues" its fragments to the previous one.

Saving arbitrary text entered by a user should escape/unescape or encode/decode all special characters used in the format when serialising/deserialising embedded text.

@Siedlerchr
Copy link
Member

Curly braces are more complicated as JabRef uses them to detect fields and entry boundaries etc.

@sauliusg
Copy link
Contributor

sauliusg commented Apr 3, 2021

Curly braces are more complicated as JabRef uses them to detect fields and entry boundaries etc.

BibLaTeX forum seems to have an answer (https://tex.stackexchange.com/questions/230750/open-brace-in-bibtex-fields): you can encode { and } characters by using either \{\vphantom{\}} and \vphantom{\{}\} sequences, or by using \textbraceleft{} and \textbraceright{} commands. The backslash itself should be encoded as \textbackslash{} in both cases. The Biber bug mentioned under the above link seems to be fixed already, both methods work for bibtex and for biber+biblatex (at least on Linux, see attached .zip).
curly-braces-in-bibliographies.zip

The encoding method is completely general. JabRef could use any of the encoding/decoding methods to serialise/deserialise user text.

Another character that needs special treatment is $ (http:https://www.bibtex.org/SpecialSymbols/); I did not check yet how JabRef handles it. And chars % and _ probably as well... AFAIK these can be simply escaped by a backslash.

@sauliusg
Copy link
Contributor

sauliusg commented Apr 3, 2021

The encoding method is completely general. JabRef could use any of the encoding/decoding methods to serialise/deserialise user text.

Except that now I see one problem – what if a user wants to put a LaTeX command into the comment? Escaping would prevent this. Two solutions are possible:

  1. Have as special interface gesture (e.g. right-click-menu-"insert TeX command") to insert unquoted TeX commands, and render them differently (say on a grey background); this is consistent and "user-friendly" but complicated;
  2. Do not escape the text fields at all, and tell users that they must insert a properly formatted BibTeX input, but detect all wrong inputs and parsing errors (for JabRef) before saving and at the parse time; also make sure that all imported BIbTeX entries are properly formatted... This seems simpler to implement but LaTeX beginners will be baffled for a while.

@Siedlerchr
Copy link
Member

We have a cleanup formatter for escaping Latex, Latex Cleanup, that does several things, for example escaping a % sign-in a latex compatible way

@sauliusg
Copy link
Contributor

sauliusg commented Apr 3, 2021

We have a cleanup formatter for escaping Latex, Latex Cleanup, that does several things, for example escaping a % sign-in a latex compatible way

That's good news :). Could the Latex Cleaner also escape lone },{ instances so that they do not cause corrupted database files when saving?

@k3KAW8Pnf7mkmdSMPHz27
Copy link
Sponsor Member

k3KAW8Pnf7mkmdSMPHz27 commented Apr 3, 2021

Quick recap:

left: BibTeX; right: Entry Editor

     field1 = value     -> #value#
     field2 = {value}   -> value
     
     field1 = value # value --> #value# # #value#

I thought this referred to that the #string# syntax is JabRef specific syntax and not part of BibTeX? Isn't # only the "concatenation character" symbol for BibTeX?
As usual, I can't find any good reference for this but BibTeXing by Oren Patashnik mentions that,

You may concatenate as many strings as you like (except that there’s a limit to the overall length of the resulting field); just be sure to put the concatenation character ‘#’, surrounded by optional spaces or newlines, between each successive pair of strings.

@Siedlerchr
Copy link
Member

There are some examples https://docs.jabref.org/advanced/strings

@k3KAW8Pnf7mkmdSMPHz27
Copy link
Sponsor Member

But is it correct to process strings in '{}' delimited bibtex fields? What you think about the interpretation of the bibtex.org reference above?

It is not clear to me if I am misunderstanding something, but isn't the answer to this question "no"? I.e., BibTeX strings/constants inside of {/} are not processed/expanded, but JabRef strings are.

There are some examples https://docs.jabref.org/advanced/strings

I believe that supports my claim, but I am not sure. 😛

@Siedlerchr
Copy link
Member

Siedlerchr commented Apr 3, 2021

My proposed solution would be:

  1. Add comment and abstract and maybe notes? to the list of fields that should not use BibtexString escaping by default.
  2. Create a prefs migration
  3. Document the changes

I would not dig into the curly brace stuff, that might be a latex command or whatever the user adds.

@koppor
Copy link
Member

koppor commented Aug 2, 2021

Note that org.jabref.logic.importer.fileformat.BibtexParser#parseFieldContent does not respect JabRef's preferences of field resolving. Thus, the solution proposed at #7010 (comment) does not work.

@koppor
Copy link
Member

koppor commented Aug 3, 2021

But is it correct to process strings in '{}' delimited bibtex fields?

When reading a bib file it is not correct.

I'll try to expand my comment from #7010 (comment)

We DO NOT have a "good" model for BibTeX strings in JabRef. The currently (bad) model is that each value field of BibTeX is a String. One sees it best at author = {First Author and Second Author}. A good model would be

author = List(Author.of("First", "Author"), Author.of("Second", "Author"))

(Assuming Author.of(String firstName, String lastName).

Currently we have only

author = "First Author and Second Author"

JabRef does some parsing at the entry editor and at all formatters.

The same issue happens with Strings.

author = first # last

should be

author = List.of(BibTeXString.of(first), BibTeXString.of(last))

but we have

author = "#first# #last#"

internally and deal with it during processing of the field values.

For a really good fix, we need to introduce BibFieldValue as JavaClass. That class offers an object model as outlined above. I think, we need to derive from it with BibAuthors (modeling authors), BibFile for modeling a file link, and so on. All unknown types get BibFieldValue without any subclassing. We already have this kind of classification at org.jabref.model.entry.field.FieldProperty with fields already classified: org.jabref.model.entry.field.StandardField. For a first version, one can use the three proposed classes (Bib...) to see the consequences of that idea. One could start with just BibFieldValue without any sub classes. The custom parsing at each place would remain; however, the handling of BibTeX strings would work.

and change org.jabref.model.entry.BibEntry#getField from

  public Optional<String> getField(Field field) {
        return Optional.ofNullable(fields.get(field));
  }

to

  public Optional<BibFieldValue> getField(Field field) {
        return Optional.ofNullable(fields.get(field));
  }

And do the hundreds other related changes.

@Siedlerchr
Copy link
Member

Siedlerchr commented Dec 21, 2021

I stumbled across this issue again and tested it. The solution in the comment works fine: #7010 (comment)

So I will go forward and add this to the list of default fields:+

Curly braces inside will also work, but only if you have a pair. A single one only breaks

@Article{,
  abstract = {this is a test string

#FFFFF should not},
  comment  = {this is a test string

## bibtex stringor not

### sub heading},
}

Siedlerchr added a commit that referenced this issue Jan 3, 2022
Bugs automation moved this from Normal priority to Done Jan 13, 2022
Siedlerchr added a commit that referenced this issue Jan 13, 2022
* Change default behavior of resolve bibtex strings

Fixes #7010
Fixes #7012
Fixes #8303

* Renaming of fields

* fix prefs, remove migration

* Fix gui properties and l10n

* adjust defaults, fix bst tests

* remove obsolete test
Fix changelog

* fix checkstyle

* fix another test

* fix comment

* Fix typos

* add institution

* Sort fields alphabetically

* Group prefernces at "File": BibTeX strings, Loading, Saving

* Remove unused imports

* Add ADR-0024

* Add test for comment field

* fix tests

* Fix markdown in ADR-0024

* ADR-0024: Fix filename and addr to adr.md

* ADR-0019: Fix typo

* Remove obsolete empty line

* Introduce BIBTEX_STRING_START_END_SYMBOL and remove negation

- Add org.jabref.logic.bibtex.FieldWriter#BIBTEX_STRING_START_END_SYMBOL
- !doNotResolveStrings -> resolveStrings

* Remove deprecated constructor FieldWriterPreferences()

* Fix test on Windows (CRLF issue)

* Add missing context information

* Add more tests

* Fix negation

Co-authored-by: Oliver Kopp <[email protected]>
@Siedlerchr
Copy link
Member

We now changed the whole logic and enable BibTeX string resolving only for a couple of fields. This can be adjusted in the options

@ilippert
Copy link

Hi,

am on
JabRef 5.10--2023-02-25--b1abf01
Linux 6.1.13-200.fc37.x86_64 amd64
Java 19.0.2
JavaFX 19+11

sorry, before I open a new issue, just to check: this issue was concerned with a field that includes the # character. Would you confirm that this did not cover how a line with # is re-presented, say in the preview?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Confirmed bugs or reports that are very likely to be bugs entry-editor good first issue An issue intended for project-newcomers. Varies in difficulty.
Projects
Archived in project
Bugs
  
Done
Development

Successfully merging a pull request may close this issue.

9 participants