Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed parsing of tags #382

Merged
merged 2 commits into from
Nov 27, 2020
Merged

Fixed parsing of tags #382

merged 2 commits into from
Nov 27, 2020

Conversation

riccardoferretti
Copy link
Collaborator

fixes problem referred to in #302 (see when reopened)

@riccardoferretti riccardoferretti mentioned this pull request Nov 26, 2020
6 tasks
@riccardoferretti riccardoferretti added this to the 0.7.1 milestone Nov 26, 2020
.map(m => m[2].trim())
.filter(tag => !isNumeric(tag))
Array.from(text.matchAll(HASHTAG_REGEX), m => m[2].trim()).filter(
tag => !isNumeric(tag)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the regex could be more restrictive as to avoid this check

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should work const HASHTAG_REGEX = /(^|[ ])#([a-zA-Z][\w_-]+\b)/gm;

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that more regex magic could do the trick, but after spending so much time on it I got pragmatic (tired) and settled here.

This is not the best tag parsing code out there, I have seen something by twitter https://github.com/twitter/twitter-text/ but it's much more (I have also pulled out their tag code, it just feels a bit heavier and doesn't quite match the tags cases we want).
I have not been able to find a simple tag extraction library/gist/file out there, would love pointers if people are aware of some.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The addition I made is simple enough to work in "every" case, you just force the first character to be a letter

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thing is, it can start with a number, it just can't be all numbers:

      62 |     expect(
      63 |       extractHashtags('this #123 tag should be ignore, but not #123four')
    > 64 |     ).toEqual(new Set(['123four']));

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was merged, have you tried with the latest version?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, using v0.77. The code is a bit different know.. But i don't know if i'm doing something wrong...

export const extractHashtags = (text: string): Set<string> => {
  return isSome(text)
    ? new Set(Array.from(text.matchAll(HASHTAG_REGEX), m => m[2].trim()))
    : new Set();
};

Vscode highlight it as a tag but it's not recognized as one in tag explorer. Maybe the problem is with tag explorer itself?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@riccardoferretti do you have any idea what this might be?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the meantime, please submit an issue @bserrao

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting.. I reproduced it as well, that's odd. yes let's start by filing an issue

@riccardoferretti riccardoferretti merged commit 9f17b1f into master Nov 27, 2020
@riccardoferretti riccardoferretti deleted the fix/302-tag-parsing branch November 27, 2020 12:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants