-
Notifications
You must be signed in to change notification settings - Fork 4.9k
Initial commit of System.IO.Packaging #1818
Conversation
Hi @EricWhiteDev, I'm your friendly neighborhood .NET Foundation Pull Request Bot (You can call me DNFBOT). Thanks for your contribution! TTYL, DNFBOT; |
@EricWhiteDev, Thanks for signing the contribution license agreement so quickly! Actual humans will now validate the agreement and then evaluate the PR. |
@dotnet-bot test this please |
/// accompanied by a '\n' in the string, else its an error. | ||
/// </summary> | ||
/// <param name="contentType"></param> | ||
private static void ValidateCarriageReturns(string contentType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this going to be correct on platforms where newlines are just "\n"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a very interesting issue. RFC-2616 (definition of content types, as referred to by OPC) does not allow for CRLF (or CR or LF) in a content type. Further, I did a check in my sample documents, and no Open XML documents contain a CR, LF, or CRLF in a content type. I suspect that this code has never been executed in a real scenario.
However, I am loath to change this code, as changing it could potentially impact existing applications. There may be some scenario where programs are setting content types with CRLF in them, although I can't think of what that would be.
One of my goals with this module is to implement as closely as possible the exact same functionality as the existing implementation. There are certainly aspects of System.IO.Packaging that could be improved upon, such as adding new functionality to make it more LINQ friendly, and so on. However, I propose that this is a separate project from the project of getting an implementation that is as compatible as possible as the existing one.
I propose that we leave the code as-is for now, and if there are issues that crop up in the Linux or Mac world, we address those as a separate work item.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok.
@EricWhiteDev, just following up... what's the status of this? |
Two tests are also failing on Unix:
The relevant files aren't showing up in the diff, but I believe the issue is that the source is using verbatim string literals that span lines, such that the newline on the platform on which compilation is happening is being embedded in the string, rather than the newline on which the test is running. The code should be changed to avoid verbatim strings spanning newlines. |
@EricWhiteDev, any updates on this? |
@stephentoub I've been on vacation for the past few days. I have made progress on making appropriate changes, but have not finished. I have some questions on things, and will contact you or Matt regarding those. |
Ok, thanks for the update. |
Working around a build issue by setting the versions of all the dependencies of System.IO.Packaging to explicit version -beta-23008. Otherwise, we are unable to resolve references to System.Object and other System.Runtime types.
@stephentoub Fixed the issue with tests that have strings that span newlines, as those tests failed on Unix. |
return true; | ||
else | ||
return false; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could just be:
return !s.CanRead && !s.CanSeek && !s.CanWrite;
Thanks, Eric. I left some additional comments. As they're mostly minor issues, I'd be ok with us merging this and then addressing such issues subsequently; up to you. I do think some/most of the commits should be squashed down before it's merged, e.g. the initial commit from dotnet-bot and then one or a few commits from you addressing feedback. We try to avoid commits in the history that don't build or pass tests, that are just undoing changes from previous commits in the same PR, etc. |
|
||
// Flush all the parts that are currently open. | ||
// This will flush part relationships. | ||
DoOperationOnEachPart(DoFlush); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DoFlush is an instance method; the delegate allocation can't be avoided with the current signatures (or it would need to be manually cached on the instance, which comes with its own tradeoffs), so addressing that would be a larger change. Using p => DoFlush(p)
would actually be worse, as it'd force a closure allocation as well in order to capture the implicit this
reference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeap, I was wrong initially but we can
DoOperationOnEachPart((thisRef, packagePart) => thisRef.DoFlush(packagePart));
and pass this
as parameter inside DoOperationOnEachPart
.
All these changes related to private/internal API.
One more thing: I just noticed the test files in the PR (GitHub was hiding them from view). It appears you've Base64-encoded various test assets (e.g. .docx files) and included them in a .cs file for use by the tests. That might be the right approach, but it's not how we've handled test assets elsewhere, e.g. in System.IO.Compression, where test data has been provided in a separate NuGet package: |
@stephentoub Thanks for doing this detailed review. Good points all. To a large extent, these code issues are in code that I inherited from the original System.IO.Packaging. In general, I opted to make as few changes as possible to existing code, given that this code is used in mission critical applications where stability of the application that uses this library is of the highest importance. Regarding performance issues, the majority of processing time occurs in System.IO.Compression, and no matter how much we optimize the code in System.IO.Packaging, we can't impact the performance of System.IO.Compression. In the process of developing the new System.IO.Packaging, I did some measurements on performance, and our biggest issues are in the Open XML SDK proper, not System.IO.Packaging, so it seems good to address those first. In any case, I have metrics that show that the new System.IO.Packaging has better performance than the old one (mainly due to the better perf of System.IO.Compression when compared to MS.Internal.IO.Compression). Performance in this area is not a concern of the users of the library. I would like to go with your suggested plan of merging now, and then addressing code issues such as these in a subsequent phase of work. The code in its current state is the code that I "hardened" by testing on 370,000 Open XML files, and making even simple changes requires that I again run the code through the my massive test. This test is run using OxRunner (https://www.youtube.com/watch?v=AcZFexDxL0U), which requires setting up 8 quad-core machines, all properly updated, with correct software installed, git repos properly installed on all machines, and so on. It is not possible to test efficiently on such a massive document library without taking an approach that uses parallelism. Further, I have a super-high priority project of making the Open XML SDK ready for Office 2016 (time is ticking on this one), so I propose that I address these issues after I make headway on the Open XML SDK. |
@EricWhiteDev: that's fine, thanks. Please do just verify about the test assets with @FiveTimesTheFun, and squash as mentioned. |
@EricWhiteDev, just checking in on this. I think this is good to go except for the test assets and then squashing. Any ETA for when you plan to update this accordingly? |
@EricWhiteDev, @FiveTimesTheFun, how are we doing on this? |
@stephentoub We are meeting in a couple of hours so that I can understand the details of how to put together the test assets. I'll put together the changes tomorrow. Should be good to go in short order. |
Great! Thanks. |
@stephentoub We rescheduled our meeting for Thursday, so I expect to have this in good shape early next week. |
@EricWhiteDev, checking in again 😄 Any updates? |
@stephentoub Yes! Matt and I got the test assets moved to the corefx-testdata repo, and I have most of the tests working using those test assets. I have a small amount of cleanup left, and then will push latest commit, and in theory, should be good to go. I'll ping you hopefully before the end of week, you can take a final look. I have been buried last couple of days on another project, but should have time to finalize this today or tomorrow. |
Thanks, Eric 😄 |
…epo instead of from TestFileLib.cs
@stephentoub Ok, it is ready once again for you to look at. All tests now pull from the corefx-testdata repo. TestFileLib.cs (which had the base64 encoded test assets) has been deleted. Further, some tests were leaving behind temporary files. Now all tests clean up after themselves. The temporary files were named with Guids, so no chance of conflicting with other tests, but in any case, was messy and wrong to leave them around. So take a look, and let me know if there is anything else I need to do. |
if (contentType == null) | ||
throw new ArgumentNullException("contentType"); | ||
|
||
if (String.CompareOrdinal(contentType, String.Empty) == 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (contentType.Length == 0)
?
Thanks, Eric. I took a quick skim through and left a few more minor comments. At this point, here's what I think you should do:
Once that's in, I can do a final pass through, and we can get this merged. Thanks! |
@stephentoub I have consolidated the code cleanup into a single issue: |
@stephentoub Sorry, I am out of my depth here with regards to squashing commits in this scenario. (I am able to do this with my own super-simple repos, but worry about doing this one.) Don't want to mess it up, so I will get with @FiveTimesTheFun to do the squashing. I will schedule a short meeting with him. |
Thanks, @EricWhiteDev. I can help; I'll squash things down now, make sure it all builds/tests appropriately, and put up a replacement PR that contains your commits. |
@stephentoub thank you, appreciate it. As far as I am concerned, don't need any of the commit history. |
Replaced by #2701 |
New version of System.IO.Packaging, rewritten to use System.IO.Compression. This assembly is used by the Open XML SDK. The code was written to work with both .NET 4.5.* and COREFX/CORECLR. I will be adding additional tests in the near future. In addition, I have a massive amount of tests that will go into "outer-loop" testing after some cleanup.