Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discord data parsing error #63

Open
namanko opened this issue Feb 11, 2021 · 6 comments
Open

Discord data parsing error #63

namanko opened this issue Feb 11, 2021 · 6 comments

Comments

@namanko
Copy link

namanko commented Feb 11, 2021

Hi i am having some difficulty while creating the dictionary of my friends and my messages, there seems to be a problem with the regex used n this code

response_sets = re.findall(r'[.+] (?!' + re.escape(personName) + r').+\n(.+)\n{2}(?:[.+] ' + re.escape(personName) + r'\n(.+)\n{2})', data)

this is what has been used
but it returns a blank dictionary

[08-Oct-20 02:40 PM] ShadowRanger5#3348
hello

[08-Oct-20 03:00 PM] sai#2795
Hi wassup

this is what my data looks like after i have formatted but using the above regex i cant seem to create a dictionary to extract my friends and my conversations

It is possible that this was made keeping in mind older versions of the discord chats parser among many other things that are a little outdated in this repository (seq2seq model and some code of the word2vec)

would appreciate if anybody can come up with a solution for this

@TotallyNotChase
Copy link
Collaborator

It should work fine, as seen here.

Note that the regex you pasted in the issue is rendered incorrectly, here's the actual regex just for context-

r'\[.+\] (?!' + re.escape(personName) + r').+\n(.+)\n{2}(?:\[.+\] ' + re.escape(personName) + r'\n(.+)\n{2})'

It should be noted that the dataset training works on response sets. This means that the regex captures your (personName) responses against another person's statements. What this means in practice is that, in your given short snippet - sai#2795 is the responder and hence the regex on that short snippet will only capture a response set when personName is sai#2795.

If the text history looked like-

[08-Oct-20 02:40 PM] sai#2795
hello

[08-Oct-20 03:00 PM] ShadowRanger5#3348
Hi wassup

you will now need to use ShadowRanger5#3348 as personName to get any response sets out of this. Since in this snippet, ShadowRanger5#3348 is the only person responding.

If you're still having trouble, it is highly likely the regex is not the issue - please provide a small (and preferably censored) version of the chatlog you're trying to parse, along with your input for personName.

@namanko
Copy link
Author

namanko commented Feb 12, 2021

it is working but gives only 1 pair of response even though i have approximately 70 dms. Do i need to change the chats format in some way? Please tell what i should do to retrieve the chats

@TotallyNotChase
Copy link
Collaborator

As long as they are in the format of a response - as in, another person's message followed by your message - it should be parsed correctly. Ensure you set personName correctly when prompted to.

@namanko
Copy link
Author

namanko commented Feb 12, 2021

https://imgur.com/a/aeTAZUW
this is the kind of data i am working with

also did u guys take into account that one person cud have sent more than one message in one go?

@TotallyNotChase
Copy link
Collaborator

Unfortunately, the training is only capable of working with atomic response sets - that is, one reply to one statement. But as long as there are multiple response sets in your chatlog - it should still work.

Also, the screenshot does not show your inputs to the script - so I'm unsure where the 733 length of dictionary is coming from. Are you exporting messages from multiple sources?

As a side note, please do not post images of debug text data. I cannot really copy text from an image. You may try pasting your chatlog in a regex tester, such as the link I posted before, and checking the matches. (make sure to alter the person name if you need to).

I don't think it's a problem with the regex per se - perhaps there's something I'm missing. But this is the first time this issue has been encountered.

@namanko
Copy link
Author

namanko commented Feb 12, 2021

actually the 733 thing occurs coz i am also using whatsapp data which has no problem, it has been extracted successfully and its length is 732 and only 1 has been extracted from discord chats... Even i am confused as to what the problem is.. it works fine on the regex tester but stopd working when implemented

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants