Can i Preprocess with PCH and compile the job remotely? #26

chengjiaozyl · 2021-11-26T03:27:26Z

As official doc describes, PCH can be used in preprocess, and insert "#pragma GCC pch_preprocess "filename"" into the output. Furthermore, the path can be absolute or relative to the working dir. That's to say, it is possible to use PCH to achieve distribute build, which could improve the compilation efficiency. And i have skipped your code and found during the preparation, yadcc skips the PCH and just use -E, -fno-working-directory and -fdirectives-only to produce preprocess output. Now, i have two questions and wish your reply.

1、As i knew, -E option just preprocess the included headers directly without PCH, and should takes more time not only during the preprocess but also during the compilation. So, have u attemp to use PCH to achieve distributely build in yadcc, and if so, why not use it?

2、As gcc doc describes, -fworking-directory is default setting. However, yadcc turns it off and user -fno-working-directory, and some debug information like file path will not be inserted into the preprocess output. As i think, the purpose is to avoid absoulte path in the output which may cause missing cache in spite of the same source file. I have tried to handle the preprocess output of msvc compiler， and found without absolute path, cache will be hit if the source file is same in different computer. It helps much to save compilation time, but i can use the results to start executable files but can not use the results obj files or exp files or pdb files to debug if turning off -fworking-directory . Thus, i wonder that have yadcc found similar problems after turning path off.

Wish your reply, thank you very much.

0x804d8000 · 2021-11-26T04:55:35Z

If my memory serves me well, PCH can be quite large if many headers are included when building it (usually how it’s done). Transferring it can be quite costly even in enterprise-grade network. CMIIW, I haven’t use PCH for a while.

This can be mitigated to a certain degree by caching PCH on compile server. Internally this is done for supporting Java, by uploading dependencies to a central storage and ask compile server to cache them. And it seems to work adequately well.

However, internally our codebase is way too large to use a single PCH to include all headers. Meanwhile I don’t think GCC supports using multiple PCHs when compiling a single TU so splitting PCH is not an option.

That’s why we decided not to use it.

Regarding the second question, I have no experience with MSVC, sorry. But it does look strange to me that MSVC can accept arguments with hyphen, instead of slash.

chengjiaozyl · 2021-11-29T07:19:27Z

Thank you for reply. Just take a example, msvc certainly does not accept -fno-working-directory , it does not have similar function to do same work like the option -fno-working-directory as for gcc. i do this work manually on the preprocess output. The second question is just about whether can i debug on the output while turning the -fworking-directory off, if i use yadcc to compile my source codes distributely.

chengjiaozyl · 2021-11-29T07:32:51Z

If my memory serves me well, PCH can be quite large if many headers are included when building it (usually how it’s done). Transferring it can be quite costly even in enterprise-grade network. CMIIW, I haven’t use PCH for a while.

This can be mitigated to a certain degree by caching PCH on compile server. Internally this is done for supporting Java, by uploading dependencies to a central storage and ask compile server to cache them. And it seems to work adequately well.

However, internally our codebase is way too large to use a single PCH to include all headers. Meanwhile I don’t think GCC supports using multiple PCHs when compiling a single TU so splitting PCH is not an option.

That’s why we decided not to use it.

Regarding the second question, I have no experience with MSVC, sorry. But it does look strange to me that MSVC can accept arguments with hyphen, instead of slash.

As for the first question, may be in most projects, pch is very large. But i just use fastbuild(another distbute complile tool) to compile my ue4 project remotely on serveral cloud mechines. I found if the number of worker machine is not large enough, the time spent on the transferring data and the more time while we split original compilation into preprocess and compilation may be more than that spent while we just compile locally. After i study on this, and found the pch may have a influence upon this. Thus, if the pch is not very large( ue has its own reflection and build system and split its pch well, the size of pch is always just beyond 10MB), Theoretically， i could use pch? wish your reply, thank you very much.

0x804d8000 · 2021-11-30T05:38:40Z

I can’t recall much issue with debugging using GDB, either because:

I didn’t need source code during debug much, or
blade always used a relative path to workspace root and I usually run my binary there, so GDB was able to find the source file correctly.

I don’t have a development environment at hand so I can’t check which was the case, but the latter seems more likely.

Addition of that argument is not strictly necessary, it’s just a hack for lazy people (me) to make caching work. Although not tested, I think caching is also possible without that argument, by ignore that line in preprocessed file when generating cache key, and post-editing the resulting ELF file (dwarf). The relative attribute here should be DW_AT_comp_dir. (Maybe we can only do post-editing to add that attributes, without removing the argument.)

You can test if removing that argument can help your debugger to find the source file. It probably works.

Regarding PCH, yes I think it should be possible to compile distributedly with it. Though I don’t think 10MB is a acceptable size for relatively large number of translation units*, unless some sort of caching is done. Transferring it once per second can easily reach 100Mbps, while usually in real world scenarios we usually build a dozen of files or more per second.

By the way I’m not sure if caching would make much sense in this case as a change in one of headers would like invalidate a large number of cache entries (if not all).

*: I don’t have a sense about how well the PCH compresses. Preprocessed source compresses well and the result is usually 1/10 or 1/20 (or even less), so I hardly ever worried about it’s size.

chengjiaozyl · 2021-12-06T03:30:45Z

Thank you for your reply, let me test it and discuss with you soon after i have results. Besides, i just do my work based on compilers, including gcc, clang, cl.exe, so i am not able to do work like post-edit...but thanks for your advice, may i can have a test on this to verify my guess.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can i Preprocess with PCH and compile the job remotely? #26

Can i Preprocess with PCH and compile the job remotely? #26

chengjiaozyl commented Nov 26, 2021

0x804d8000 commented Nov 26, 2021

chengjiaozyl commented Nov 29, 2021

chengjiaozyl commented Nov 29, 2021

0x804d8000 commented Nov 30, 2021

chengjiaozyl commented Dec 6, 2021

Can i Preprocess with PCH and compile the job remotely? #26

Can i Preprocess with PCH and compile the job remotely? #26

Comments

chengjiaozyl commented Nov 26, 2021

0x804d8000 commented Nov 26, 2021

chengjiaozyl commented Nov 29, 2021

chengjiaozyl commented Nov 29, 2021

0x804d8000 commented Nov 30, 2021

chengjiaozyl commented Dec 6, 2021