You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
During implementation of Azure converter issue one of the unit tests that worked on 1.x branch kept failing. The unit test istest_azure_converter_with_multicolumn_header_table and it is available on this PR
Upon further investigation @sjrl and I have traced this issue to stem from how we calculate Document id. A possible solution might be to "allow duplicate column names. Looking at the example in the PDF, it is very common to have tables in things like financial reports to have multi column headers. And I think the best way to represent that in a dataframe is to have duplicate column names. I think it would be better to update the call to the to_json() method to work with dataframes that have duplicate column names."
Not sure how deep consequences of this change would be but a PR resolving the Azure converter issue is blocked by this issue.
The text was updated successfully, but these errors were encountered:
During implementation of Azure converter issue one of the unit tests that worked on 1.x branch kept failing. The unit test is
test_azure_converter_with_multicolumn_header_table
and it is available on this PRUpon further investigation @sjrl and I have traced this issue to stem from how we calculate Document id. A possible solution might be to "allow duplicate column names. Looking at the example in the PDF, it is very common to have tables in things like financial reports to have multi column headers. And I think the best way to represent that in a dataframe is to have duplicate column names. I think it would be better to update the call to the to_json() method to work with dataframes that have duplicate column names."
Not sure how deep consequences of this change would be but a PR resolving the Azure converter issue is blocked by this issue.
The text was updated successfully, but these errors were encountered: