Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot read some spreadsheet cells #1016

Open
jsd3 opened this issue Sep 7, 2021 · 7 comments
Open

Cannot read some spreadsheet cells #1016

jsd3 opened this issue Sep 7, 2021 · 7 comments
Labels
needs more info This issue can't reproduce, need more info

Comments

@jsd3
Copy link

jsd3 commented Sep 7, 2021

Description

I have a spreadsheet (attached) that has 81 columns and 13 rows, yet some of the rows read only 79 or 80 columns . I have tried Excelize and it also has the same trouble with two of the spreadsheet columns when the cells are empty (see the columns "Customer request date" and "Target date" and the last two rows in the attached spreadsheet).

sample_parse_failure.xlsx

Steps to reproduce the issue:

  1. Use Excelize to read the attached spreadsheet and count the number of cells/columns/values in each row.
  2. Notice that the last two rows have cells that are not read and so the column total count is different!

Describe the results you received:

There are two columns in the spreadsheet that do not seem to get read properly if they are empty: "Customer request date"
and "Target date"

Describe the results you expected:

I expect to have 81 columns/values in each row, but am getting only 79 or 80 in the last two rows.

Output of go version:

go version go1.17 darwin/amd64

Excelize version or commit ID:

V2

Environment details (OS, Microsoft Excel™ version, physical, etc.):

MacOS

@xuri
Copy link
Member

xuri commented Sep 8, 2021

As the documentations of GetRows says, it fetches the rows with value or formula cells, the tail continuously empty cell will be skipped.

@xuri xuri added the wontfix This will not be worked on label Sep 8, 2021
@jsd3
Copy link
Author

jsd3 commented Sep 10, 2021

Thank you for your response! I don't think it is just continuously empty tail cells that are skipped, but rather certain empty non-string cells (in this case "date" cells). If I duplicate one of the fully copied rows in the spreadsheet (say row 10) and place it at the end (as row 14), it copies fully, yet the same cells in rows 12 and 13 are skipped. Anyway, thanks so much again!

@xuri
Copy link
Member

xuri commented Sep 12, 2021

Did you find out which cell has been skipped, could you provide a sample code to reproduce this issue? In addition, please try to upgrade to the latest version or using the master branch code.

@xuri xuri added needs more info This issue can't reproduce, need more info and removed wontfix This will not be worked on labels Sep 12, 2021
@jsd3
Copy link
Author

jsd3 commented Dec 21, 2021

The spreadsheet I attached will show you the problem if you convert it to csv. I am a Golang newbie so my code is a mess and isn't worth sharing. I believe this has to do with certain raw cell "Style" values (33 and 34) for dates. In another library I "fixed" for this same problem, I just had to adjust it to handle those style values since they were unmapped. So basically, it looks like there are date formats (styles) that Microsoft uses that excelize doesn't recognize and so it returns nil instead of empty string for the value and it effectively skips the cell entirely. I'm attaching the same spreadsheet with a row added at the end that has complete values. If you convert it to a csv file you will see that rows 12 and 13 do not have 81 columns/cells, but rows 10 and 14 do (showing that excelize is not just skipping cells for columns that are empty for the rest of the rows). For me, this issue was a problem since having a csv file with different numbers of columns/cells in different rows doesn't allow me to work with file data as needed. I hope this helps.
sample_parse_failure2.xlsx

@valevisca
Copy link

Hi guys,

A question about GetRows() and "the tail continuously empty cell will be skipped" thing. Is there a way to get empty cells as empty strings? I have to read sheets where the first line (row #1) contains a table header, but not all the cells are always non-empty. My current problem is that when I read the sheet I end up with rows of different length and this complicates the code.
Thanks!

@xuri
Copy link
Member

xuri commented Jun 26, 2023

Hi @valevisca, the cells of each row maybe not follow the header (first) row, because the header row may be longer or shorter than other rows, and the library should not understand which user use which row as the header, and should not fill empty cell automatically, it should be return cells of each row as is, just skip cells in the tail of the row to avoid a much more blank cells (not zero cells or just looks like an empty cell) memory allocation.

Please check the length of each row before getting the cell element in the slice. I have made a figure in issue #1245 to explain what kind of cells will be skipped for your reference. You can compare and fill each row's cell in your need easily.

@valevisca
Copy link

Hi @xuri,

thanks, that is what I ended up doing. It was a matter of understanding the way GetRows works and apply the right fix on my side. Thank also for the pointer to issue #1245. That helped a lot.
You put together a great library: thanks for that!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs more info This issue can't reproduce, need more info
Projects
None yet
Development

No branches or pull requests

3 participants