-
Notifications
You must be signed in to change notification settings - Fork 394
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove or document Direct Output Committers limitations #54
Comments
Hi @gerashegalov, Was able to reproduce this issues and following are the observations.
Also, Can you please expand a bit more on the Expected behavior section. Are you refering to chainging the behaviour of DirectOutputCommitter class in some way ? |
Hi @ajayborra, thank you for checking out the issue. just for the reference: s3n is deprecated in Hadoop 2.x and removed in Hadoop 3.x . I would like to point out that the issue is related to this but not about FileSystem URI's. And yes S3 conf is missing in the default Hadoop libraries outside EMR. this issue is primarilyabout hadoop output committers. I propose dropping DOC classes from these project. We should definitely not hardcode them like in TransmogrifAI/utils/src/main/scala/com/salesforce/op/utils/io/avro/AvroInOut.scala Line 159 in f379975
The expected behavior I think should be the default output committer as in Hadoop 3, FileOutputCommitter v2: https://docs.databricks.com/spark/latest/faq/append-slow-with-spark-2.0.0.html FileOutputCommitter class will already be used when we stop overriding it via |
@gerashegalov Thanks for clarifying on this. +1 on leaving it to the users to choose the committer class. Raised a PR for this #86. Please review it when you get a chance. |
Describe the bug
Direct output committer and vanilla committers are susceptible to list inconsistency on s3. It only works with layers on top of s3 such as S3Guard or EMRFS as long as one relies on _SUCCESS files.
To Reproduce
Run on pure s3 :)
Expected behavior
Use the default committer. Just refer users to options out there and how to configure them
Logs or screenshots
N/A
Additional context
N/A
The text was updated successfully, but these errors were encountered: