Pinecone testing strategy #2644

ZanSara · 2022-06-08T10:02:08Z

Problem

Pinecone tests are heavily flaky. They regularly fail on CI due to 404, missing indexes, 500 from the server, etc...
Such failures are very noisy and make very hard for us to understand if there's a real issue with the tests or not.

Analysis

The Pinecone tests seems to be all running against a single live instance.
As we run many CI suites in parallel, this instance receives concurrent requests that conflict with each other and causes these failures
In addition, the real HTTP calls performed by those runners make Pinecone tests a negative outlier in speed: each test requires about 30 seconds to run, with peaks of about a minute. This issue should also be addressed.

Solution
At a first glance I could not identify a suitable test strategy for Pinecone. Possible solutions could be:

A proper test mock library provided by Pinecone (seems not to exist)
A mock based on responses (good for unit tests, but kind of voids the point of an integration test)
A way to instantiate a brand new, isolated Pinecone for every runner (assuming it can be done reasonably fast). This would still need a review of the tests to separate unit and integration test though, as it would not help with speed.
Some form of cross-runner, daily cache of the responses.

@jamescalam: can you recommend a general direction here? I will be happy to do the heavy lifting once we decide how to proceed, but I am unsure which way would be best.

The text was updated successfully, but these errors were encountered:

jamescalam · 2022-06-08T15:28:29Z

@ZanSara I think of all the solutions you suggested, only instantiating a new index for each runner could work, but the index creation time of ~20-30 seconds may be problematic?

Alternatively, we have pods which increase the throughput of the index, and replicas which duplicate the index, giving higher availability and throughput. I'd imagine this could help. At the moment the PineconeDocumentStore only accepts an argument for replicas, I will be working on an updated doc store this/next week so I will add pods.

But for now, does increasing the replicas value help?

ZanSara · 2022-06-09T07:58:46Z

Hello @jamescalam, thanks for your reply. Definitely we can't consider adding 20-30s to each tests. Such change would make the current suite of just 30 tests take more than half a hour to run.

On the other hand I don't see why you believe no mocking or caching is viable. Remember we're not testing Pinecone here, we're only testing the Haystack code: one or two integration tests calling a real Pinecone instance can stay, but all others should be just fine with a mocked response. Can you explain a bit more what makes you think it's a bad idea?

Regarding your suggestion about pods and replicas, I'm afraid they wouldn't help. It's not really about availability or throughput (often it's the GH runners that lose internet access, so I believe Pinecone's availability is alright already), but about the concurrent and conflicting requests. Making the test instance more powerful won't help preventing this. Unless replicas can be used to create isolated duplicate indexes? But I guess that is not the case.

jamescalam · 2022-06-09T11:05:48Z

@ZanSara sure I assumed that would make tests too long. I assume the primary failures here are due to multiple runs being performed in parallel from different places, all calling the same Pinecone instance? I didn't realize this before. Would it be possible to add a unique identifier to the index from each runner to the index name to avoid overlap from these different instances? Because you can host many indexes in Pinecone, and in this case each instance would be calling to a unique index and avoiding these concurrent and conflicting requests being made to the same index. This would not solve the issue with slow testing from an index being created for each test.

I do see your point on mocking/caching now. I'm not against doing that and if the above doesn't help then I think as you suggest, a mock Pinecone client could be best. What would this need to look like exactly? Like a pinecone-client-mock that replicates pinecone-client responses without calling Pinecone itself?

You're correct that replicas cannot be used to create isolated duplicate indexes.

ZanSara · 2022-06-09T11:48:08Z

Actually both of your suggestions would be great! 😊 In practice I propose:

We can have a few (2-3) integration tests that run against the live Pinecone instance, simulating the typical process. Here we should definitely find a way to add the unique index identifier: assuming creating an index is a fast operation, it would be ideal and most likely will solve the problem of concurrent requests
For the unit tests instead, the mock you describe is perfect: just a class that returns fake responses without calling the real Pinecone instance. That would slash down the execution time to much more reasonable levels. No need to have a complex mock here: something that simply returns predictable responses is sufficient. Let's just not forget that we will need this mock also to test error responses.

How to we split this task? I understand you are going to submit a PR soon, so if you prefer to take care of these tests I'll leave them to you. Otherwise I will give it a try myself in the next days, but I might need some help along the way, especially for the mock.

Thanks again for your time!

jamescalam · 2022-06-13T10:35:05Z

I'm not too familiar with Haystack's testing suite so it may be faster for you to work on, but I can build the mock Pinecone client. I've submitted #2657 for the unary queries, we do need this to be merged before modifying/adding the integration tests as the structure of responses has changed as per pinecone-client 2.0.11.

Are there any mock clients in Haystack already that I can refer to? If not is there a preferred location for something like that?

ZanSara · 2022-06-13T12:49:03Z

Sounds like a plan! I'll take care of adapting the tests once the mock is merged 👍

Unfortunately for what I know we have no similar mocks (I think Pinecone is the only SaaS vector db we support), so feel free to take the approach you're more familiar with. In another occasion we mocked HTTP requests with responses, but you shouldn't imitate it if it feels forced in your situation. If you have doubts let's discuss on the PR 🙂 Here is an example of such fixtures:

haystack/test/conftest.py

Lines 471 to 498 in 54518ac

 @pytest.fixture 

 def deepset_cloud_fixture(): 

 if MOCK_DC: 

 responses.add( 

 method=responses.GET, 

 url=f"{DC_API_ENDPOINT}/workspaces/default/indexes/{DC_TEST_INDEX}", 

 match=[responses.matchers.header_matcher({"authorization": f"Bearer {DC_API_KEY}"})], 

 json={"indexing": {"status": "INDEXED", "pending_file_count": 0, "total_file_count": 31}}, 

 status=200, 

 ) 

 responses.add( 

 method=responses.GET, 

 url=f"{DC_API_ENDPOINT}/workspaces/default/pipelines", 

 match=[responses.matchers.header_matcher({"authorization": f"Bearer {DC_API_KEY}"})], 

 json={ 

 "data": [ 

 { 

 "name": DC_TEST_INDEX, 

 "status": "DEPLOYED", 

 "indexing": {"status": "INDEXED", "pending_file_count": 0, "total_file_count": 31}, 

 } 

 ], 

 "has_more": False, 

 "total": 1, 

 }, 

 ) 

 else: 

 responses.add_passthru(DC_API_ENDPOINT)

About the location, AFAIK you don't have a test_pinecone.py, so feel free to put it in test/conftest.py. We will move it out later if necessary.

jamescalam · 2022-07-05T05:25:19Z

@ZanSara I'll prioritize this to hopefully have it ready for the next release. In the example you're using responses, but this wouldn't (afaik) apply to Pinecone as this uses the Pinecone client.

I'm not familiar with the tests so I'm not sure about the best approach here, I thought the best may be to create a mock Pinecone client class with the same methods as the real client - but with mock responses, does this seem reasonable to you?

My only thought is how will the real pinecone client in the doc store be replaced with this mock client? Unless we added an optional argument at init to pass the pinecone client in to the doc store, maybe you have a better idea. Either way, I've started work on the mock client

ZanSara · 2022-07-05T07:18:11Z

@ZanSara I'll prioritize this to hopefully have it ready for the next release

Thank you! 😊

I'm not familiar with the tests so I'm not sure about the best approach here, I thought the best may be to create a mock Pinecone client class with the same methods as the real client - but with mock responses, does this seem reasonable to you?

Yes, that's how I would do it 👍

My only thought is how will the real pinecone client in the doc store be replaced with this mock client?

With monkeypatching. If you're not familiar with it I can totally help! The concept is that right before the test, we reassign pinecone-client (or whatever class/method we imported from it) with its mock. Have a look: https://docs.pytest.org/en/6.2.x/monkeypatch.html

ZanSara added type:bug Something isn't working topic:tests ignore-for-release-notes PRs with this flag won't be included in the release notes. topic:pinecone labels Jun 8, 2022

ZanSara self-assigned this Jun 8, 2022

jamescalam mentioned this issue Jun 13, 2022

Pinecone unary queries upgrade #2657

Merged

5 tasks

jamescalam mentioned this issue Jul 6, 2022

added mock pinecone client #2770

Merged

4 tasks

ZanSara mentioned this issue Jul 8, 2022

Mocking Pinecone tests #2778

Merged

3 tasks

ZanSara closed this as completed Aug 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pinecone testing strategy #2644

Pinecone testing strategy #2644

ZanSara commented Jun 8, 2022 •

edited

Loading

jamescalam commented Jun 8, 2022

ZanSara commented Jun 9, 2022

jamescalam commented Jun 9, 2022 •

edited

Loading

ZanSara commented Jun 9, 2022

jamescalam commented Jun 13, 2022

ZanSara commented Jun 13, 2022

jamescalam commented Jul 5, 2022 •

edited

Loading

ZanSara commented Jul 5, 2022

Pinecone testing strategy #2644

Pinecone testing strategy #2644

Comments

ZanSara commented Jun 8, 2022 • edited Loading

jamescalam commented Jun 8, 2022

ZanSara commented Jun 9, 2022

jamescalam commented Jun 9, 2022 • edited Loading

ZanSara commented Jun 9, 2022

jamescalam commented Jun 13, 2022

ZanSara commented Jun 13, 2022

jamescalam commented Jul 5, 2022 • edited Loading

ZanSara commented Jul 5, 2022

ZanSara commented Jun 8, 2022 •

edited

Loading

jamescalam commented Jun 9, 2022 •

edited

Loading

jamescalam commented Jul 5, 2022 •

edited

Loading