Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design and create a test suite #55

Closed
abhijithnraj opened this issue Feb 11, 2023 · 99 comments
Closed

Design and create a test suite #55

abhijithnraj opened this issue Feb 11, 2023 · 99 comments
Assignees
Labels
enhancement New feature or request

Comments

@abhijithnraj
Copy link

abhijithnraj commented Feb 11, 2023

Hi, I have been following this project for some time since its release. Great Work.

But support for several git features coming in, is there any plans or ideas on testing the features. I would like to work on this.

For Manim based tests, we can have a look at the same way manimCE tests their own code. https://docs.manim.community/en/stable/contributing/testing.html and https://github.com/ManimCommunity/manim/tree/main/tests

Hi, @initialcommit-io , is there any test design you have in mind.?

Some of my ideas:

  1. Some basic Robustness tests to ensure no failures, like object creation, function calls, running from out of repo, support for the available python versions etc.
  2. Some Functional Test cases by creating a story, a basic git repo: run command compare output, first commit: run command compare output. But with changes happening to the output representation itself, need to inspect the Manim MObject that is being rendered and assert on its properties.
  3. Checking the command line flags are properly being dispatched, ensuring no regressions on new modifications etc.

Once done we can add an actions pipeline to ensure continuous integration.

I am working on this, let me know if there are any existing plans on this.

Once again, Thank You for this wonderful project.

@abhijitnathwani
Copy link
Contributor

I was about to create an issue to track this @abhijithnraj. This a good point to start integrating tests NOW. @initialcommit-io even I was skeptical while adding support for the stash subcommand that I may break something in some other commands while overriding the GitSimBaseCommand methods. I think we need to start designing the test plan aggressively now. The problems you caught with Typer integration could be caught easily with some functional test plans.

For Manim based tests, we can have a look at the same way manimCE tests their own code. https://docs.manim.community/en/stable/contributing/testing.html and https://github.com/ManimCommunity/manim/tree/main/tests

I didn't know how to test the images that will be generated but this looks promising.

Once done we can add an actions pipeline to ensure continuous integration.

Exactly. This will ensure great test coverage and using tools like pytest we shall be in a good form to support this for the long term with new features and guaranteeing support for the already existing features.

Python version is already an issue with the new Typer implementation which I have discussed here.

@initialcommit-io initialcommit-io added the enhancement New feature or request label Feb 11, 2023
@initialcommit-io
Copy link
Contributor

@abhijithnraj Thank you for creating this issue and for your proposed contributions! Yes, it's clear we need to start on this now, as @abhijitnathwani brought up some breaking changes across Python versions that we had not been testing for.

For context - I didn't have any tests going into the project because I wanted to gauge user interest before spending too much time on it, but now that we know the project has a solid user base, we need to prioritize a robust test suite for the reasons you both mentioned.

@abhijithnraj I don't have any specific test design in mind, but I agree with all of your points, and good find on the Manim test option. Maybe a good place to start is if you want to put together a PR with a simplified version of an initial test suite, that covers a single aspect of the codebase - probably git-sim log subcommand since that is the simplest - from all perspectives (1), (2), and (3) that you mentioned.

Then we can all discuss the PR and see if it will work for the whole codebase/program.

@initialcommit-io initialcommit-io changed the title Code Testing Design and create a test suite Feb 11, 2023
@initialcommit-io
Copy link
Contributor

@abhijitnathwani @abhijithnraj FYI I meant to mention - I will be working on a small tool that can generate dummy Git repos with specific numbers of commits / branch structures / etc. I am thinking we can use this in functional tests to create the structure we need for testing certain commands, like merge/rebase, etc.

@abhijithnraj
Copy link
Author

@initialcommit-io Yes, that can help with generating test data.

@initialcommit-io
Copy link
Contributor

@abhijithnraj Do you have a timeline for an initial version of the test suite that provides coverage for the git log feature?

@initialcommit-io
Copy link
Contributor

@abhijithnraj @abhijitnathwani @paketb0te Here is the first version of a tool I created to generate dummy Git repos:

https://github.com/initialcommit-com/git-dummy

I'm thinking we can use this in our functional tests.

@abhijithnraj
Copy link
Author

abhijithnraj commented Feb 13, 2023

@initialcommit-io I am studying the graphical test cases for manim. I think I can raise the first PR at max 2 days. We will add more testcases once we finalize the framework in the PR. Meanwhile I will checkout git-dummy. currently for testing, I wrote a simple code to generate some commits.

@initialcommit-io
Copy link
Contributor

@abhijithnraj Sounds great. FYI I added a --branches=N option to git-dummy, so now it can easily generate dummy Git repos with an arbitrary number of branches. This will be good for simulating commands that show multiple branches, like merge, rebase, etc.

@initialcommit-io
Copy link
Contributor

@abhijithnraj @abhijitnathwani @paketb0te Hi again - last update for a bit on git-dummy, I added flags --diverge-at to set the commit number at which the branches will diverge from main.

I also added --merge=x,y,...,n so the user can select which branches to merge back into main if desired.

Its not much, but I think at this point git-dummy has enough so that we can use it for basic functional test coverage, to set up automated scenarios/repos we need for the tests.

I am also planning to add hooks for git-dummy into Git-Sim, with something like a --generate flag. This will allow users to generate a test repo automatically before running a Git-Sim command.

@initialcommit-io
Copy link
Contributor

@abhijithnraj Hi again! Just wanted to check in and see if this is still something you were working on?

@abhijithnraj
Copy link
Author

Hi, Still working on it.

@initialcommit-io
Copy link
Contributor

Hi @abhijithnraj @abhijitnathwani @paketb0te ! Hope you all are doing great =)

Just a note I released a significant refactor in git-sim v0.2.6 which overhauls the way it parses the commit history and adds several important new options/flags. Git-Sim now properly traces parent/child relationships, which means it can display more accurate and complex graph structures, with multiple branches and the relationships between them.

See the release notes for more details: https://github.com/initialcommit-com/git-sim/releases/tag/v0.2.6

I figured I'd mention it here since it might affect the test suite, depending on how it was being written.

Let me know if you get the chance to try it out. It's likely there are some bugs since we still don't have the test suite set up yet, but I really wanted to get this released since I feel it's an important milestone for the project, will make the tool much more useful for users, and will make future development smoother.

@abhijitnathwani
Copy link
Contributor

Hey @initialcommit-io
These are some great features 👌🏻
I'll try these out over the weekend and let you know if I find something broken.

@ehmatthes
Copy link
Contributor

Hi @abhijithnraj, @abhijitnathwani. This is such a cool project! Do either of you have the work you've done public at all? I saw your forks, but don't see any additional tests. I would like to jump in, but don't want to step on your toes if you're about to share some progress.

@abhijitnathwani
Copy link
Contributor

Hey @ehmatthes
Thanks for taking the time to check the project. Sadly, I don't have any python testing experience and given the nature of the project we need to design some testing framework for the images it renders. Afaik, we don't have any major progress done on the testing part. We are all open to contributions on this front!
You can represent your thoughts about how we can achieve this and we can review it with @initialcommit-io :)

@ehmatthes
Copy link
Contributor

Sure, here's a quick take on how I'd start this.

Because this is still an evolving project, I'd focus on end to end tests and not worry too much about unit tests. I think I can write a demo test suite this week that would do the following:

  • Create a temp dir for the test run.
  • Create a simple git repo, perhaps using git-dummy.
  • Call git-sim status against the test project.
  • Compare the resulting image file against a reference file.
  • Repeat that same process for a number of other commands that are already supported by git-sim.

pytest handles this kind of setup really nicely. If everything passes, you never even see the temp dir. If the tests fail, you can drop into that temp dir and see what happened. In that temp dir you can then run the command manually, and do a number of other troubleshooting actions.

If that sounds good to you all, I should be able to share a demo by the end of the week.

@initialcommit-io
Copy link
Contributor

initialcommit-io commented Jun 12, 2023

@ehmatthes This sounds great. I completely agree that end-to-end tests make more sense for the project at this point, to provide baseline insurance that code changes won't break functionality.

I think the tests should catch 2 types of errors caused by code changes (let me know if you have other ideas):

  1. Unhandled Python exceptions
  2. Changes that cause deviation from the existing reference image for each test case

The nice thing about comparing to a reference image is that it is comprehensive - the image captures all elements drawn by git-sim, including commits, arrows, text, etc. So any unintended change to those would cause a test failure. The other benefit is that the default git-sim output is an image file, and this corresponds to the final frame of git-sim video output if the user uses the --animate flag, so verifying the images should be sufficient to guarantee a good video as well.

One downside of comparing images might be performance, so I'm curious how that turns out. Another consideration is that reference files will need to be updated/maintained as functionality changes over time.

For setting up the Git repos to use in each test case, I recommend (again :D) using git-dummy for reproducibility and flexibility. Certain test cases require a Git repo to be in a certain state, and the ability to generate a consistent repo to meet that criteria is why I created git-dummy. Ideally the dummy repos would just be created/cleaned up as a part of the test cases. If you run into a scenario that git-dummy can't currently generate a repo for, let me know and I can add that functionality into git-dummy.

Let us know if you have any other thoughts! Looking forward to seeing what you put together as a demo 😄

@ehmatthes
Copy link
Contributor

The nice thing about comparing to a reference image is that it is comprehensive - the image captures all elements drawn by git-sim, including commits, arrows, text, etc. So any unintended change to those would cause a test failure. The other benefit is that the default git-sim output is an image file, and this corresponds to the final frame of git-sim video output if the user uses the --animate flag, so verifying the images should be sufficient to guarantee a good video as well.

Yes, this matches my thinking as well.

One downside of comparing images might be performance, so I'm curious how that turns out. Another consideration is that reference files will need to be updated/maintained as functionality changes over time.

I have done this in another project, and performance was not a significant issue. For end to end tests you have to generate the images that the project depends on. Once they're generated, the comparison itself does not take long.

One really nice thing about this approach is that it makes it really straightforward to update the reference files. When you run a test, you can pass a flag to stop after the first failed test. You can then drop into the test's temp dir, and look at the generated image that doesn't match the reference file. If this image is correct, you can simply copy that into the reference file folder, and the test will run. You don't have to generate the test file in a separate process; running the tests and failing becomes a way to keep the test suite up to date. The test suite becomes much more than just a pass/fail test, it shows you exactly what the current code would do for an end user.

For setting up the Git repos to use in each test case, I recommend (again :D) using git-dummy for reproducibility and flexibility. Certain test cases require a Git repo to be in a certain state, and the ability to generate a consistent repo to meet that criteria is why I created git-dummy. Ideally the dummy repos would just be created/cleaned up as a part of the test cases. If you run into a scenario that git-dummy can't currently generate a repo for, let me know and I can add that functionality into git-dummy.

Does git-dummy generate the same hashes each time it's run? That seems like an issue for testing.

@initialcommit-io
Copy link
Contributor

Does git-dummy generate the same hashes each time it's run? That seems like an issue for testing.

Great point. The hashes would screw that up, since they depend on the timestamp of the commit which would be regenerated each time git-dummy is executed.

Git-Sim actually has a global flag called --highlight-commit-messages which hides the hash values. We could use this, however I just noticed there is a visual bug with that makes the commit message text too big and often overlaps between commits, which would give us ugly reference images. One option is just to fix the overlapping and use the existing flag.

Another option is that I can create a new global option for Git-Sim called --show-hashes, which is true by default. That way in our test scripts we can supply something like git-sim --show-hashes=false log so that we avoid the non-matching hash issue you mentioned.

Alternatively, we might be able to update git-dummy to use hardcoded, expected timestamps via environment variables GIT_AUTHOR_DATE and GIT_COMMITTER_DATE. These could be set and then unset during each run of Git-Dummy, so that we always get the expected SHA1s on our commits...

Do you have any thoughts/preferences?

@ehmatthes
Copy link
Contributor

I don't think any of this is an issue. I think it's possible to create a single git repo to use in all tests. There will be new git-sim commands run, but not new git commands run. So git-dummy might help to build the initial test suite, but it doesn't need to be used on each test run.

@initialcommit-io
Copy link
Contributor

Hmm, we could do that, but using a single repo to cover all test cases could get increasingly complex as more and more test cases need to be covered. To eventually get full scenario coverage we may need to jump around in the Git repo to do stuff like checkouts/switches/merges/rebases from different starting states. There are also certain scenarios captured by git-sim logic that are based on other factors, like the number of commits in the repo / on the active branch. For example, in a repo with less than 5 commits, there is special logic to handle the display.

My thinking with git-dummy is that it would be used to create a tailor-made repo for each test case with the desired number of commits, branches, merges, HEAD location, etc, and then just clean it up after each test run. This guarantees a minimally complex repo that suits the specific goals of each test case. Of course it does add time since git-dummy would need to repeatedly create/clean up the reference repos, but it performs pretty fast and has the benefit that the git-dummy command for each test case could be hard coded into the test cases, instead of having to ship a test repo around with the codebase.

Maybe there is a happy medium where we can use git-dummy to create a handful of test repos that are each suited to subsets of test cases, instead of generating and cleaning up a new repo for each test case... Thoughts?

@ehmatthes
Copy link
Contributor

Is there any way to prevent the image from opening automatically when calling something like git-sim log?

I tried --quiet, but that just seems to suppress CLI output. I didn't see an obvious place in the code where the generated image is opened; maybe that's something from manim?

@ehmatthes
Copy link
Contributor

Never mind, I just found it: git-sim -d log

@initialcommit-io
Copy link
Contributor

Yes! Use the -d flag to suppress the image auto-open! Like: git-sim -d log

You can use in conjunction with --quiet to suppress the CLI output as well.

@ehmatthes
Copy link
Contributor

Okay, I think the test for git-sim log is working. @initialcommit-io, are you on macOS? If so, I think it will be pretty straightforward for you to run this demo in a little bit.

@initialcommit-io
Copy link
Contributor

Awesome - I actually switched to pc last year but I do have a mac as well I can try it on... Feel free to submit a PR to the dev branch and I'll take a look.

@ehmatthes
Copy link
Contributor

ehmatthes commented Jul 5, 2023

I did not run into any file permission issues during development, and I don't see any now.

I just modified test_log() to show the path to the generated file:

def test_log(tmp_repo):
    """Test a simple `git-sim log` command."""
    ...

    print("fp_generated:", fp_generated)
    assert compare_images(fp_generated, fp_reference)

Here's the output of running just this test on macOS:

(.venv)$ pytest -s -k "test_log"
...
fp_generated: /private/var/folders/gk/y2n2jsfj23g864pdr38rv4ch0000gn/T/pytest-of-eric/pytest-414/sample_repo0/sample_repo/git-sim_media/sample_repo/images/git-sim-log_07-04-23_15-33-49.png

And here's the same output on my Windows VM:

(.venv)> pytest -s -k "test_log"
...
fp_generated: C:\Users\eric\AppData\Local\Temp\pytest-of-eric\pytest-71\sample_repo0\git-sim_media\sample_repo0\images\git-sim-log_07-04-23_16-07-22.png

I get similar output in a Git Bash shell on my Windows VM. What output do you get for the path to the generated image file?


The path to the generated file is being read from the output of the git-sim log command. You can cd into the sample repo generated by the test suite and run git-sim log there manually. (Make sure you are in an active venv, where git-sim is available):

> cd C:\Users\eric\AppData\Local\Temp\pytest-of-eric\pytest-71\sample_repo0
> git-sim log
Output image location: C:\Users\eric\AppData\Local\Temp\pytest-of-eric\pytest-71\sample_repo0\git-sim_media\sample_repo0\images\git-sim-log_07-04-23_16-11-38.jpg
> git-sim -d --output-only-path --img-format=png log
C:\Users\eric\AppData\Local\Temp\pytest-of-eric\pytest-71\sample_repo0\git-sim_media\sample_repo0\images\git-sim-log_07-04-23_16-16-55.png

The first command here shows that git-sim log runs in the repo generated by the test. The second command shows that the full command run by the test suite, git-sim -d --output-only-path --img-format=png log, runs as well.

I would be curious to know if git-sim is generating different output in your environment, or if the test code is not parsing the output properly in your environment. If fp_generated is not a file path, I'd add one more line to test_log():

def test_log(tmp_repo):
    """Test a simple `git-sim log` command."""
    ...

    print("output:", output.stdout.decode().strip())
    print("fp_generated:", fp_generated)
    assert compare_images(fp_generated, fp_reference)

@initialcommit-io
Copy link
Contributor

Sorry about that. It looks like the changes I was making to git-sim and trying to test caused an error that led to the output not being generated at all, hence the confusion.

After tweaking my change, the tests now run, but still fail due to bad ratio diff. I did confirm it is using the Windows reference files to compare, but looks like the fonts used differ between my output and the Windows reference files. I think this happened to you at one point as well, but we thought it was across OS's. This font difference makes the ratio_diff up to 5%.

Attaching images of the git-sim log Windows reference file and my output showing the differing fonts (see below). On my system it uses Monospace Bold which is the font that git-sim is configured to use for all output. I assume the reference files were generated on a system where Monospace was not available as a system font (or maybe just not in the correct bold weight), so it used the default system font as a fallback.

This could get hairy as devs try to test across different systems... I wonder if we could get around this by adding a global font argument to git-sim, which we can specifically set in tests to a default font that we know exists in each respective environment.

git-sim-log_07-04-23_18-22-52

git-sim-log_windows

@ehmatthes
Copy link
Contributor

Sorry about that. It looks like the changes I was making to git-sim and trying to test caused an error that led to the output not being generated at all, hence the confusion.

Okay, that's helpful. I made a note to add a line to compare_images() to check that the image file exists, before actually doing the comparison. We can make a similar assertion about the output of each call, so in the future you'll know more quickly whether there was an issue with the output message or the actual image file.

After tweaking my change, the tests now run, but still fail due to bad ratio diff. I did confirm it is using the Windows reference files to compare, but looks like the fonts used differ between my output and the Windows reference files. I think this happened to you at one point as well, but we thought it was across OS's. This font difference makes the ratio_diff up to 5%.

Attaching images of the git-sim log Windows reference file and my output showing the differing fonts (see below). On my system it uses Monospace Bold which is the font that git-sim is configured to use for all output. I assume the reference files were generated on a system where Monospace was not available as a system font (or maybe just not in the correct bold weight), so it used the default system font as a fallback.

This was really helpful to know as well. I think there's a straightforward solution that makes testing much easier across systems. I changed all the instances of "Monospace" in git_sim_base_command.py to "Courier New". Here's the output on macOS:
git-sim-log_07-04-23_20-38-49

Here's the output on Windows:
git-sim-log_07-04-23_20-45-08

I believe this works better because "Monospace" just causes Manim to use the system's default monospace font. "Courier New" is a more specific font, which happens to be available on most macOS and Windows Systems. I would be curious to see what the git-sim log output looks like on your system if you change Monospace to Courier New.

@ehmatthes
Copy link
Contributor

ehmatthes commented Jul 5, 2023

I think the most reliable fix for this would be to bundle a specific font with the test suite. I added a .ttf file to the test suite, and modified construct() in log.py to use the bundled font. This is a more complex change than just making the font a setting, because from my brief reading Manim expects bundled fonts to be used in a context manager.

To experiment with different fonts in the git-sim log output, I changed all references to "Monospace" to self.font, and modified __init__() in GitSimBaseCommand:

class GitSimBaseCommand(m.MovingCameraScene):
    def __init__(self):
        super().__init__()
        self.init_repo()

        self.fontColor = m.BLACK if settings.light_mode else m.WHITE
        self.font = "Monospace"
        self.drawnCommits = {}
        ...

With this change, all tests still pass. If I set self.font = "Courier New", tests fail but I get the output shown above.

To use a bundled font, I had to make the following change to construct() in Log:

    def construct(self):
        with m.register_font("/Users/eric/projects/git-sim/tests/e2e_tests/ProggyClean.ttf"):
            self.font = "ProggyCleanTT"
            if not settings.stdout and not settings.output_only_path and not settings.quiet:
                print(
                    f"{settings.INFO_STRING} {type(self).__name__.lower()}{' --all' if self.all_subcommand else ''}{' -n ' + str(self.n) if self.n_subcommand else ''}"
                )
            self.show_intro()
            ...

This made fairly ugly output, but I believe the output would be identical on all systems:
git-sim-log_07-04-23_21-51-00

I don't think I'd bother using a tiny font like this. I was experimenting with a tiny font, and that's one that showed up in a quick search. Also, I'm guessing you wouldn't have to add a context manager to every command's construct() method. I imagine with a better understanding of the codebase there's one place to put that context manager that all files can use.

@initialcommit-io
Copy link
Contributor

Thanks for all the deets! Yes using Courier New in git_sim_base_command.py makes the tests work.

The quickest and least invasive solution for now seems to be adding a global flag for the user to specify the font, and then hardcoding that as Courier New from the tests raw command.

This will have the added benefit of letting the user customize the font. I will try this out and update here.

I made a note to add a line to compare_images() to check that the image file exists, before actually doing the comparison. We can make a similar assertion about the output of each call, so in the future you'll know more quickly whether there was an issue with the output message or the actual image file.

And oh yes - this would be great!

@initialcommit-io
Copy link
Contributor

Ok - this is implemented on the dev branch. I added a --font global option which is now set by the test suite. I also updated the reference files for Macos since the test suite on Mac and Windows now uses Courier New font.

Wanna try it out and lemme know if it works for you?

@ehmatthes
Copy link
Contributor

Okay, the dev branch works for me. I did have to rebuild my virtual environment. That wasn't obvious until I tried to run a git-sim command manually.

I removed the Windows-specific files, and the code that modified the reference file path to look for a Windows-specific file.


I also tried using a bundled font at the test-only level:

def test_log(tmp_repo):
    """Test a simple `git-sim log` command."""
    raw_cmd = "git-sim log"
    cmd_parts = get_cmd_parts(raw_cmd)

    os.chdir(tmp_repo)

    with m.register_font("/Users/eric/projects/git-sim/tests/e2e_tests/ProggyClean.ttf"):
        output = subprocess.run(cmd_parts, capture_output=True)

    fp_generated = Path(output.stdout.decode().strip())
    ...

This didn't work, and I wasn't too suprised by that. I believe subprocess.run() opens a new shell, which doesn't see the changes that register_font() made to the environment. That said, I think there's a way to make this work if we end up running into the same environment-specific testing issues. I think you could make the test font .ttf file available in a shell, and then make the test's git-sim calls run in that same shell. If you run into an issue with different people generating slightly different but correct images, please ping me and I'll be happy to try building this out.

@ehmatthes
Copy link
Contributor

I made a PR so you can pull these small improvements if you want. Feel free to reject the PR and incorporate those changes whenever appropriate.

@initialcommit-io
Copy link
Contributor

Sure I'll merge the PR, but can you make it to the dev branch instead of main? Looks like it's including 2 of the commits I made since those don't exist on main yet... But those should go away if you change it to the dev branch.

@ehmatthes
Copy link
Contributor

I'm newer to committing to others' projects, so some of my Git habits are unclear. I forgot to run Black before submitting the PR, so now the PR would include three commits:

9e0b88 (HEAD -> dev, update_tests) Use Black formatting conventions.
6d0582 Remove Windows-specific test files.
65050d Additional assertions about filepaths that are generated during testing.
7c9c32 (upstream/dev) Update e2e test reference files for macOS

Should I submit a PR with these 3 commits? Do you want me to squash them? Do you squash them when you merge?

@initialcommit-io
Copy link
Contributor

Thanks for running Black - this is my first time using it as suggested by other contributors. On my Mac I was able to get Black to to run automatically when exiting Vim (via a Vim plugin), but had issues setting that up on my current Windows machine so now it's manual again, so I often forget to run that myself too 😸.

You can just submit the PR to the dev branch with the 3 commits, I'll squash and merge them in the GitHub interface.

@ehmatthes
Copy link
Contributor

Sure thing, I'm trying to use Black more consistently in my own work as well. I resubmitted the PR.

@initialcommit-io
Copy link
Contributor

Merged it and added tests for the remaining non-networked subcommands. I regenerated all the reference images on my Windows machine and then tested on my Mac, where 12/18 tests passed and the other 6 failed due to bad ratio diff. Most of these were barely above the threshold, but the biggest difference was about 0.0126.

I'm thinking this is just due to how the image files are generated on different platforms (regardless of font). Anyway, I'm thinking we can just boost the acceptable ratio diff from 0.0075 to 0.015 to take into account these subtle differences.

@initialcommit-io
Copy link
Contributor

(the updated reference files are pushed up in the dev branch if you want to test on both your systems to compare)

@ehmatthes
Copy link
Contributor

I got the same results on my Mac. The highest ratio came from the log command, but I opened the generated image and the reference image and couldn't see any difference at all. I think a ratio of 0.015 sounds perfectly reasonable. I'm going to be a little embarrassed, but mostly curious if someone who's more familiar with working with images comes along and has a much simpler way to test image output.

The test suite is slower now, as expected when running 6 times as many tests. It takes 80s to run the test suite on my system. I was tempted to focus on the compare_images() function, but then I remembered to profile. I added profiling code to the compare_images() function, and around each git-sim call. It takes less than 0.1s to run compare_images(), but about 3-6s to run each git-sim command:

Spent 3.334 seconds running test_add.
Spent 0.085 seconds in compare_images().
.Spent 4.103 seconds running test_branch.
Spent 0.087 seconds in compare_images().
.Spent 5.484 seconds running test_checkout.
Spent 0.088 seconds in compare_images().
.Spent 5.732 seconds running test_cherrypick.
Spent 0.088 seconds in compare_images().
.Spent 3.235 seconds running test_clean.
Spent 0.085 seconds in compare_images().
.Spent 3.205 seconds running test_commit.
Spent 0.086 seconds in compare_images().
.Spent 3.837 seconds running test_log.
Spent 0.091 seconds in compare_images().
.Spent 5.776 seconds running test_merge.
Spent 0.088 seconds in compare_images().
.Spent 3.617 seconds running test_mv.
Spent 0.086 seconds in compare_images().
.Spent 6.985 seconds running test_rebase.
Spent 0.090 seconds in compare_images().
.Spent 4.160 seconds running test_reset.
Spent 0.086 seconds in compare_images().
.Spent 3.301 seconds running test_restore.
Spent 0.086 seconds in compare_images().
.Spent 3.348 seconds running test_revert.
Spent 0.086 seconds in compare_images().
.Spent 3.518 seconds running test_rm.
Spent 0.086 seconds in compare_images().
.Spent 3.328 seconds running test_stash.
Spent 0.086 seconds in compare_images().
.Spent 3.205 seconds running test_status.
Spent 0.085 seconds in compare_images().
.Spent 5.336 seconds running test_switch.
Spent 0.089 seconds in compare_images().
.Spent 4.065 seconds running test_tag.
Spent 0.091 seconds in compare_images().

If you want to profile the current suite on your system, it's on my profile_tests branch.

There are a couple ways to speed up tests...

@ehmatthes
Copy link
Contributor

Parallel tests

The main way to speed up tests is to run them in parallel. pytest has a really good plugin called pytest-xdist which handles all of the overhead of distributing tests for many test suites. It works for our suite. Install pytest-xdist from PyPI, and then run pytest -n auto. It will figure out the appropriate amount of distribution to do on your system. On my system, it reduces the total test suite time from 80s to 20s.

Non-parallel output:

$ pytest
============== test session starts ===========================
platform darwin -- Python 3.11.2, pytest-7.4.0, pluggy-1.2.0
rootdir: /Users/eric/projects/git-sim
plugins: xdist-3.3.1
collected 18 items                                                                                                  

tests/e2e_tests/test_core_commands.py ..................[100%]

============== 18 passed in 79.52s (0:01:19) =================

Parallel output:

$ pytest -n auto
============== test session starts ===========================
platform darwin -- Python 3.11.2, pytest-7.4.0, pluggy-1.2.0
rootdir: /Users/eric/projects/git-sim
plugins: xdist-3.3.1
10 workers [18 items]     
..................                                      [100%]
============== 18 passed in 22.69s ===========================

The -s flag doesn't work when running tests in parallel, so if you're examining output it's better not to run the tests in parallel. Also, the benefits of running parallel tests are highly dependent on the system architecture. Some people will see more benefit than others from running parallel tests; some people may see worse performance.

@ehmatthes
Copy link
Contributor

Running selected tests

You can organize the test suite into multiple files. For example you could pick a set of core commands you always want to test, and put those in test_core_commands.py. Then other tests go in test_noncore_commands.py. You'd run pytest tests/e2e_tests/test_core_commands.py most of the time, and then pytest or pytest -n auto when you want to run all the tests. Of course it doesn't need to be core and noncore, it could be simple and complex, or however you think about the overall set of commands.

pytest has a way of marking tests, so you could just mark some tests as core tests. Add the decorator @pytest.mark.core, and then run pytest -m core. This would only run the tests you've marked as core tests.

@initialcommit-io
Copy link
Contributor

I think a ratio of 0.015 sounds perfectly reasonable.

Sweet! I'll bump that up.

The main way to speed up tests is to run them in parallel. pytest has a really good plugin called pytest-xdist which handles all of the overhead of distributing tests for many test suites. It works for our suite. Install pytest-xdist from PyPI, and then run pytest -n auto. It will figure out the appropriate amount of distribution to do on your system. On my system, it reduces the total test suite time from 80s to 20s.

Uhhh..... this is awesome!!! On my PC this speeds things up from 45s to 14s. I think it's a good compromise to gain that kind of speedup by running in parallel, and if something fails to rerun in series to get the output.

Random note on parallelism - I spent a lot of time using the multiprocessing module to try to get git-sim to render individual animations in parallel (by rendering independent segments of each video in parallel) to speed up performance, as I feel that Manim's performance is a major limitation of git-sim.

Unfortunately I couldn't get it working in the end as I had issues passing data (specifically file handles/references) in between processes. It was pretty weird behavior and I was kinda surprised I couldn't get it working - seemed so close. Maybe one day as Python's multiprocessing and pickling tools get more robust...

pytest has a way of marking tests, so you could just mark some tests as core tests. Add the decorator @pytest.mark.core, and then run pytest -m core. This would only run the tests you've marked as core tests.

Good to know!

@ehmatthes
Copy link
Contributor

I have one more thought at the moment, not about performance. You can set custom messages to display when a test fails. For example you're seeing "bad pixel ratio: " at times in the output, but if there are multiple test failures you kind of have to hunt for which tests had which ratios. You can write code like this:

msg = f"{path_gen} image comparison failed: {ratio_diff}"
assert ratio_diff < 0.015, msg

Then you'd see that information for failing tests even without using the -s flag.

A lot of the decisions like this just come down to how you want to use your test suite. What kind of information do you want to see in all test runs? What kind of information do you want to see only when looking at results in detail?

@initialcommit-io
Copy link
Contributor

initialcommit-io commented Jul 6, 2023

Funny you mention that - as a part of the threshold update I just committed I updated that very same printed message to specify the subcommand that failed:

print(f"bad pixel ratio ({path_ref.stem[8:]}): {ratio_diff}")

😸

@ehmatthes
Copy link
Contributor

On my PC this speeds things up from 45s to 14s.

Do you mind if I ask about the specs on your system? I think I have a reasonably well-specced system, but your tests are twice as fast without parallel, and still significantly faster when run in parallel. I wonder if there's anything about my setup that's slowing things down for me.

Random note on parallelism - I spent a lot of time using the multiprocessing module to try to get git-sim to render individual animations in parallel (by rendering independent segments of each video in parallel) to speed up performance, as I feel that Manim's performance is a major limitation of git-sim.

I've got other things going on for a while so I'm going to stick to a focus on testing for the time being, but I would love to look into this at some point.

Funny you mention that - as a part of the threshold update I just committed I updated that very same printed message to specify the subcommand that failed:

print(f"bad pixel ratio ({path_ref.stem[8:]}): {ratio_diff}")

You might give the assert message a try, because I think this is the only remaining reason to use the -s flag.

@initialcommit-io
Copy link
Contributor

Do you mind if I ask about the specs on your system? I think I have a reasonably well-specced system, but your tests are twice as fast without parallel, and still significantly faster when run in parallel. I wonder if there's anything about my setup that's slowing things down for me.

I upgraded my PC's CPU to the 13600k late last year, runs a lot faster on my PC now than on my Intel mac.

I've got other things going on for a while so I'm going to stick to a focus on testing for the time being, but I would love to look into this at some point.

Oh definitely - I'm grateful for just the testing help this has been incredibly good for the project. Happy to share what I've done on the performance front at some point if your get some extra time!

You might give the assert message a try, because I think this is the only remaining reason to use the -s flag.

Oh gotcha I'll take a look - I missed that assert statement in your note.

@ehmatthes
Copy link
Contributor

Oh gotcha I'll take a look - I missed that assert statement in your note.

To be clear, that would move the assertion from the individual test functions to the compare_images() function. That seems like a slightly better structure anyway.

I'll go quiet for a while, but if you ever reply somewhere here and don't hear from me, feel free to reach out directly over email. I think I'm going to write a newsletter series on Git at some point, and I'll definitely use git-sim for the visualizations for that series.

@ehmatthes
Copy link
Contributor

One more quick thing, you might want to remove the stub file test.py from the top level. It's not causing any problems, but with pytest in use it's just taking up space and it might confuse someone looking at the project.

@initialcommit-io
Copy link
Contributor

I think I'm going to write a newsletter series on Git at some point, and I'll definitely use git-sim for the visualizations for that series.

That's awesome. Please drop me a note when it's ready would love to take a look.

One more quick thing, you might want to remove the stub file test.py from the top level.

Oh yeah - I thought about that but decided to leave it as a stub for unit tests. But you're right now that we have the tests/folder it probably at least makes sense to move it into there...

@initialcommit-io
Copy link
Contributor

Closing this since related work to refine the test suite will be handled thru #99

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants