Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[INCOMPATIBLE][SELF INCOMPATIBLE] main,Tmain: revise acceptable characters for kind letters and kind names #1737

Merged
merged 12 commits into from
Sep 15, 2019

Conversation

masatake
Copy link
Member

@masatake masatake commented Apr 15, 2018

(This initial comment is completely rewritten after updating the changes.)

Neither whitespace characters nor characters other than [a-zA-Z0-9] are not allowed as a kind name defined with --regex- option.

A kind definition consists of three parts: a letter, a name, and a description.
--kinddef- and --regex- options are for defining a kind.

The pull request changes the acceptable characters for defining a kind.

If merging this pull request, ctags accepts alphabetical characters
other than 'F' as a kind letter: [a-zA-EG-Z] in a regex. ctags
accepts alphabetical characters as the first character of a kind
name. ctags accepts alphabetical and numerical characters as the rest
characters.

Exuberant-ctags accepts any characters. As the result Exuberant-ctags
can generate a broken tags file.

Here is an example of generating a broken tags file:

$ cat /tmp/foo.c
int main(void) {return 0;}

$ /usr/bin/ctags -o - --regex-C='/^(.).*$/\1/|,|||||,wrong kind/' /tmp/foo.c
i	/tmp/foo.c	/^int main(void) {return 0;}$/;"	|
main	/tmp/foo.c	/^int main(void) {return 0;}$/;"	f

$ /usr/bin/ctags -o - --fields=+K --regex-C='/^(.).*$/\1/|,|||||,wrong kind/' /tmp/foo.c
i	/tmp/foo.c	/^int main(void) {return 0;}$/;"	|||||
main	/tmp/foo.c	/^int main(void) {return 0;}$/;"	function

With --regex-C, a kind with a kind letter '|' and a kind name "|||||" is defined.

v2 tags format expects alphabetical characters for a kind name which uses in
value parts of a tagfield. '|' in "|||||" violates this. The v2 tags format
doesn't say much about kind letters. However, as far as reading the tags file
loader of vim, only alphabetical characters are acceptable.

Universal-ctags with this pull request, ctags can generate a tag file
following the v2 tags format with one exception. Universal-ctags
allows using numerical characters in a kind name unless it is used as
the initial character of the name. This compromise is for acceptable
heading1, heading2, and heading3 kinds of HTML parser. I guess there
will be more cases that users (and developers) want to use numerical characters
in their kind name. Though, accepting numerical characters violates
the v2 tags format, the tags file loader of vim accepts a name as
far as the name starts from an alphabetical character. In other hands,
vim rejects a symbol character like '|'.

Close #597.

@coveralls
Copy link

coveralls commented Apr 15, 2018

Coverage Status

Coverage increased (+0.008%) to 86.15% when pulling e6d75a1 on masatake:strict-naming-rules into 6adbc59 on universal-ctags:master.

@masatake
Copy link
Member Author

I must update ctags-incompatibilities.7.rst.in, too.

@masatake
Copy link
Member Author

The kinds of HTML parser includes number letter ([0-9]) in their name:

[yamato@master]~/var/ctags-github% ./ctags --list-kinds-full | awk '{print $1, $3}' | grep '[0-9]'
./ctags --list-kinds-full | awk '{print $1, $3}' | grep '[0-9]'
HTML heading1
HTML heading2
HTML heading3

Though HTML parser doesn't do so but the name of kind can be used in scope field in abbreviated form.

    scope:heading1:...

This is o.k. However,

    heading1:...

this abbreviated from violates a rule written in format.rst:

A tagfield has a name, a colon, and a value: "name:value".

* The name consist only out of alphabetical characters.  Upper and lower case
  are allowed.  Lower case is recommended.  Case matters ("kind:" and "Kind:
  are different tagfields).

1 is not an alphabetical character.

tag.c in vim uses "isalpha". For a kind letter, use isalpha for recognizing it. This is simple. We should not use a number character as a kind letter. For a kind name, use isalpha for recognizing it. However, vim checks the only the first character of the kind name. The rest of characters can be any. The exception is : and `\t'. It seems that scope field is not used in tag.c.

Using "heading1" in HTML parser doesn't cause a trouble in vim users as far as they don't use extra plugins.

I wonder what I should do. Maybe accepting "[a-zA-Z][a-zA-Z0-9]*" as a kind name is better.
It must be written in version the v3 format spec.

About "kind letter", the code is updated. ctags-optlib.7 man page is updated. ctags-incompatible.7 must be updated.

About "kind name", all are not ready.

About the other named things like field, I will write in #1711.

Comments are really welcome.

@masatake masatake mentioned this pull request May 18, 2018
@masatake masatake changed the title [INCOMPATIBLE][SELF INCOMPATIBLE] main,Tmain: reject other than an is alpha char as a kind letter [WIP] [INCOMPATIBLE][SELF INCOMPATIBLE] main,Tmain: reject other than an is alpha char as a kind letter Jun 23, 2018
@masatake masatake changed the title [WIP] [INCOMPATIBLE][SELF INCOMPATIBLE] main,Tmain: reject other than an is alpha char as a kind letter [INCOMPATIBLE][SELF INCOMPATIBLE] main,Tmain: reject other than an is alpha char as a kind letter Jun 29, 2018
@masatake
Copy link
Member Author

All documents are updated. Kind letters and kind names related incompatibilities are documented.

@masatake
Copy link
Member Author

masatake commented Jul 2, 2018

I have to clean up the initial comment and I have to ask people reviewing the man pages.

@masatake masatake changed the title [INCOMPATIBLE][SELF INCOMPATIBLE] main,Tmain: reject other than an is alpha char as a kind letter [INCOMPATIBLE][SELF INCOMPATIBLE] main,Tmain: revise acceptable characters for kind letters and kind names Jul 7, 2018
@masatake
Copy link
Member Author

masatake commented Jul 7, 2018

@techee, @k-takata, and other, can I ask you to review the changes for following documents in this documents?

  • docs/format.rst
  • man/ctags-incompatibilities.7.rst.in
  • man/ctags-optlib.7.rst.in

docs/format.rst Outdated Show resolved Hide resolved
main/parse.c Outdated Show resolved Hide resolved
main/lregex.c Outdated Show resolved Hide resolved
man/ctags-optlib.7.rst.in Outdated Show resolved Hide resolved
@masatake
Copy link
Member Author

@ploxiln, thank you very much. I will reflect your comment to the changes.

@masatake
Copy link
Member Author

I reflected all the suggestions from @ploxiln. I will rebase this branch to master.

@masatake
Copy link
Member Author

Rebased. When merging, I will squash all "!squash" commits.

@masatake
Copy link
Member Author

I forgot updating the expected output of test cases.

@masatake
Copy link
Member Author

masatake commented Aug 6, 2019

  • Updating expected output of Tmain test cases, and
  • Writing about this big change on README.md

@codecov-io
Copy link

Codecov Report

Merging #1737 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1737      +/-   ##
==========================================
+ Coverage      86%   86.01%   +<.01%     
==========================================
  Files         175      175              
  Lines       35161    35177      +16     
==========================================
+ Hits        30240    30257      +17     
+ Misses       4921     4920       -1
Impacted Files Coverage Δ
main/lregex.c 90.25% <100%> (+0.21%) ⬆️
main/parse.c 94.52% <100%> (+0.01%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9787a6c...03cc742. Read the comment docs.

@masatake
Copy link
Member Author

masatake commented Aug 9, 2019

Adding the description of this change to README.md is needed because the impact is so big.

…alpha char as a kind letter

Exuberant-ctags accepts any character given via --regex-<lang>=/pattern/name/L,...
option.

Universal-ctags accepted isalnum character given via --regex-<lang>=/pattern/name/L,...
option and --kinddef-<lang>=L,... option.

Though what kind of character we can use as kind letter is not written
in FORMAT file, vim accepts only "isalpha" char as kind letter. So
ctags should not allow users to define a kind using non-isalpha char.

Signed-off-by: Masatake YAMATO <[email protected]>
…ical char as the first letter in a kind name

SELF-INCOMPATIBLE aspect:
When defining a kind with --kinddef-<LANG>, Universal-ctags accepted
both an alphabetical and a numerical characters as the initial letter.
With this change, Universal-ctags accepts only an alphabetical.

INCOMPATIBLE aspect:
Universal-ctags accepts numerical characters as the rest letters
(other than initial letter) in a kind name. This violates the tags v2
format. This commit documents it as an EXCEPTION.
We must convert the EXCEPTION to planed v3 format.

Exuberant-ctags doesn't have --kinddef-<LANG>.
However, a kind can be defined with --regex-<LANG>. This commit doesn't deal with
the option.

Re-wording is suggsted by @ploxiln.
Revising the error message is suggested by @ploxiln.
A wrong sentence is pointed out by @ploxiln.

Signed-off-by: Masatake YAMATO <[email protected]>
…n kind name defined with --regex-<LANG>

This change has big impact.
As far as browsing .ctags files in github, people users
whitespace in kind names defined with --regex-<LANG>.

This change rejects them.

A kind name is used for "value" part of tagfield.
In this case, there is no trouble in using
a whitespace (and characters other than alphabetical char).

However, a kind name can be used "name" pat of tagfield
when printing scope field. In this case, using characters
other than alphabetical char in the kind can be a cause of
trouble.

(After long thinking I decide to add an exception; using a numerical
char as parts of the body of a kind name is allowed. Here "body" means
non-initial letters of the kind name.)

Revising the error message is suggested by @ploxiln.

Signed-off-by: Masatake YAMATO <[email protected]>
Signed-off-by: Masatake YAMATO <[email protected]>
Wrong English usage is pointed out by @ploxiln.

Signed-off-by: Masatake YAMATO <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reject a space in kind name in regex parser
4 participants