Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ThreadAwareModule Logging #441

Closed
ramo-j opened this issue Oct 8, 2021 · 9 comments · Fixed by #464
Closed

ThreadAwareModule Logging #441

ramo-j opened this issue Oct 8, 2021 · 9 comments · Fixed by #464
Assignees

Comments

@ramo-j
Copy link
Collaborator

ramo-j commented Oct 8, 2021

Right now, when a thread aware module that uses self.logger to log does so, there is no indication which thread the log comes from. This should be improved to make it clearer what logs belong to which thread.

@ramo-j ramo-j self-assigned this Oct 8, 2021
@ramo-j
Copy link
Collaborator Author

ramo-j commented Nov 5, 2021

Current possible implementation looks like this (sample is from unittest output)

<snip>
[2021-11-05 03:13:11,309] [TurbiniaGCPProcessor] CRITICAL project or turbinia_zone are not all specified, bailing out
[2021-11-05 03:13:11,309] [TurbiniaGCPProcessor] CRITICAL project or turbinia_zone are not all specified, bailing out
[2021-11-05 03:13:11,309] [TurbiniaGCPProcessor] CRITICAL project or turbinia_zone are not all specified, bailing out
[2021-11-05 03:13:11,309] [TurbiniaGCPProcessor] CRITICAL project or turbinia_zone are not all specified, bailing out
[2021-11-05 03:13:11,309] [TurbiniaGCPProcessor] CRITICAL project or turbinia_zone are not all specified, bailing out
..[2021-11-05 03:13:11,311] [TurbiniaGCPProcessorThreaded] [MainThread] SUCCESS  Downloaded gs:https://hashes.json to /fake/local/hashes.json
[2021-11-05 03:13:11,311] [TurbiniaGCPProcessorThreaded] [MainThread] SUCCESS  Downloaded gs:https://results.plaso to /fake/local/results.plaso
..Turbinia log file: /tmp/tmpniwp26k7/turbinia-disk-1.log
[2021-11-05 03:13:11,315] [TurbiniaGCPProcessorThreaded] [MainThread] SUCCESS  Creating Turbinia request b5a1f2ed4edd4b60afbcd708f0d87d67 with Evidence <MagicMock name='GoogleCloudDisk().name' id='139818074580688'>
[2021-11-05 03:13:11,315] [TurbiniaGCPProcessorThreaded] [MainThread] INFO     Waiting for Turbinia request b5a1f2ed4edd4b60afbcd708f0d87d67 to complete
[2021-11-05 03:13:11,316] [TurbiniaGCPProcessorThreaded] [MainThread] INFO     <MagicMock name='get_turbinia_client().format_task_status()' id='139818074533792'>
[2021-11-05 03:13:11,316] [TurbiniaGCPProcessorThreaded] [MainThread] SUCCESS  Downloaded gs:https://BinaryExtractorTask.tar.gz to /local/BinaryExtractorTask.tar.gz
[2021-11-05 03:13:11,316] [TurbiniaGCPProcessorThreaded] [MainThread] INFO     Collected 3 results
[2021-11-05 03:13:11,316] [TurbiniaGCPProcessorThreaded] [MainThread] SUCCESS  Found plaso result: /fake/data.plaso
[2021-11-05 03:13:11,316] [TurbiniaGCPProcessorThreaded] [MainThread] SUCCESS  Found plaso result: /fake/data2.plaso
[2021-11-05 03:13:11,316] [TurbiniaGCPProcessorThreaded] [MainThread] SUCCESS  Found BinaryExtractorTask result: /local/BinaryExtractorTask.tar.gz
..[2021-11-05 03:13:11,318] [TurbiniaGCPProcessorThreaded] [MainThread] CRITICAL Specified project turbinia-wrong-project does not match Turbinia configured project turbinia-project. Use gcp_turbinia_disk_copy_ts recipe to copy the disk into the same project.
.[2021-11-05 03:13:11,319] [TurbiniaGCPProcessorThreaded] [MainThread] CRITICAL project or turbinia_zone are not all specified, bailing out
[2021-11-05 03:13:11,320] [TurbiniaGCPProcessorThreaded] [MainThread] CRITICAL project or turbinia_zone are not all specified, bailing out
..........[2021-11-05 03:13:11,321] [dftimewolf          ] DEBUG    Loading module DummyModule1 from tests.test_modules.modules
[2021-11-05 03:13:11,321] [dftimewolf          ] DEBUG    Loading module DummyModule2 from tests.test_modules.modules
[2021-11-05 03:13:11,321] [dftimewolf          ] DEBUG    Loading module DummyPreflightModule from tests.test_modules.modules
..[2021-11-05 03:13:11,322] [dftimewolf          ] DEBUG    Loading module DummyModule1 from tests.test_modules.modules
[2021-11-05 03:13:11,322] [dftimewolf          ] DEBUG    Loading module DummyModule2 from tests.test_modules.modules
[2021-11-05 03:13:11,322] [dftimewolf          ] DEBUG    Loading module DummyPreflightModule from tests.test_modules.modules
.[2021-11-05 03:13:11,323] [dftimewolf          ] DEBUG    Loading module DummyModule1 from tests.test_modules.modules
<snip>
[2021-11-05 03:13:11,376] [dftimewolf          ] INFO     Running module: ThreadAwareConsumerModule
[2021-11-05 03:13:11,376] [ThreadAwareConsumerModule] [Thread-40] INFO     ThreadAwareConsumerModule Static Pre Process
[2021-11-05 03:13:11,376] [dftimewolf          ] INFO     Running 3 threads, max 2 simultaneous for module ThreadAwareConsumerModule
[2021-11-05 03:13:11,376] [ThreadAwareConsumerModule] [ThreadPoolExecutor-1_0] INFO     ThreadAwareConsumerModule Process!
[2021-11-05 03:13:11,376] [ThreadAwareConsumerModule] [ThreadPoolExecutor-1_1] INFO     ThreadAwareConsumerModule Process!
[2021-11-05 03:13:12,378] [ThreadAwareConsumerModule] [ThreadPoolExecutor-1_0] INFO     ThreadAwareConsumerModule Process!
[2021-11-05 03:13:13,379] [ThreadAwareConsumerModule] [Thread-40] INFO     ThreadAwareConsumerModule Static Post Process
[2021-11-05 03:13:13,380] [dftimewolf          ] INFO     Module ThreadAwareConsumerModule finished execution
.[2021-11-05 03:13:13,381] [dftimewolf          ] DEBUG    Loading module ContainerGeneratorModule from tests.test_modules.thread_aware_modules
[2021-11-05 03:13:13,381] [dftimewolf          ] DEBUG    Loading module ThreadAwareConsumerModule from tests.test_modules.thread_aware_modules
<snip>

@tomchop What do you think? I don't love it, but I'm not sure how I would improve it either.

@tomchop
Copy link
Collaborator

tomchop commented Nov 5, 2021

Interesting, why are Thread-40 and ThreadPoolExecutor-1_0 following a different naming convention?

@ramo-j
Copy link
Collaborator Author

ramo-j commented Nov 8, 2021

For

[2021-11-05 03:13:11,376] [ThreadAwareConsumerModule] [Thread-40] INFO     ThreadAwareConsumerModule Static Pre Process

This is being executed by the thread created by state for the module - The modules main thread, but not the programs main thread. The only module method that is run in parallel is Process which is done by the ThreadPoolExecutor.

Perhaps an better scenario is only logging thread ids within Process logging calls - as opposed to this first attempt, which is all logging calls from within the module.

@ramo-j
Copy link
Collaborator Author

ramo-j commented Nov 8, 2021

Second attempt sample. An improvement I think.

[2021-11-08 03:19:35,485] [ThreadAwareConsumerModule] INFO     ThreadAwareConsumerModule Static Pre Set Up
ThreadAwareConsumerModule SetUp!
[2021-11-08 03:19:35,485] [ThreadAwareConsumerModule] INFO     ThreadAwareConsumerModule Static Post Set Up
[2021-11-08 03:19:35,486] [dftimewolf          ] INFO     Running module: ContainerGeneratorModule
ContainerGeneratorModule Process!
[2021-11-08 03:19:35,486] [dftimewolf          ] INFO     Module ContainerGeneratorModule finished execution
[2021-11-08 03:19:35,486] [dftimewolf          ] INFO     Running module: ThreadAwareConsumerModule
[2021-11-08 03:19:35,486] [ThreadAwareConsumerModule] INFO     ThreadAwareConsumerModule Static Pre Process
[2021-11-08 03:19:35,486] [dftimewolf          ] INFO     Running 3 threads, max 2 simultaneous for module ThreadAwareConsumerModule
[2021-11-08 03:19:35,487] [ThreadAwareConsumerModule] INFO     [ThreadPoolExecutor-1_0] ThreadAwareConsumerModule Process!
[2021-11-08 03:19:35,487] [ThreadAwareConsumerModule] INFO     [ThreadPoolExecutor-1_1] ThreadAwareConsumerModule Process!
[2021-11-08 03:19:36,488] [ThreadAwareConsumerModule] INFO     [ThreadPoolExecutor-1_0] ThreadAwareConsumerModule Process!
[2021-11-08 03:19:37,490] [ThreadAwareConsumerModule] INFO     ThreadAwareConsumerModule Static Post Process
[2021-11-08 03:19:37,491] [dftimewolf          ] INFO     Module ThreadAwareConsumerModule finished execution
.[2021-11-08 03:19:37,492] [dftimewolf          ] DEBUG    Loading module ContainerGeneratorModule from tests.test_modules.thread_aware_modules
[2021-11-08 03:19:37,492] [dftimewolf          ] DEBUG    Loading module ThreadAwareConsumerModule from tests.test_modules.thread_aware_modules
[2021-11-08 03:19:37,493] [dftimewolf          ] INFO     Setting up module: ContainerGeneratorModule

@tomchop
Copy link
Collaborator

tomchop commented Nov 8, 2021

Yeah, that looks much better! I don't know if we need the full ThreadPoolExecutor-1_0 or if we could just call it something shorter to save some screen info (like Thread1, I don't know how involved this is)

@ramo-j
Copy link
Collaborator Author

ramo-j commented Nov 8, 2021

The name comes from threading.current_thread().getName(). We can get a numerical ID, but if we want other names, we'd have to create and track them manually, or s/PoolExecutor-//g.

@tomchop
Copy link
Collaborator

tomchop commented Nov 9, 2021

Seeing #464, changing the name to something shorter (like what you suggest doing s/PoolExecutor-//g.) would be straightforward enough no?

@ramo-j
Copy link
Collaborator Author

ramo-j commented Nov 9, 2021

I think I prefer having it with PoolExecutor - We can see from that sample that threadpoolexecutor reuses threads, and combining that with the knowledge that non pool'd thread names aren't reused like that, I think makes it clearer what is happening.

@tomchop
Copy link
Collaborator

tomchop commented Nov 10, 2021

Cool, sounds good!

tomchop pushed a commit that referenced this issue Nov 10, 2021
* First iteration of fix for #441

* 2nd iteration for #441

* Test addition

* linter appeasement

* PR suggestions
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants