Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding paging to get repositories from the database #1793

Closed
wants to merge 6 commits into from

Conversation

thiago
Copy link

@thiago thiago commented Sep 13, 2016

Resolve #1776.
Related PR: #1777 #1783

@lu4nation
Copy link

Running it here this solution solves the 999 limit to me.

@josmo
Copy link

josmo commented Sep 14, 2016

Thanks @thiago LGTM and works for me as well. I do think we should consider moving about from an "IN" clause and using the temp inner join like what's mentioned in http:https://www.xaprb.com/blog/2006/06/28/why-large-in-clauses-are-problematic/ it would probably speed things up on orgs that have large repos :)

@@ -8,6 +8,7 @@ type Feed struct {
Name string `json:"name" meddler:"repo_name"`
FullName string `json:"full_name" meddler:"repo_full_name"`

Id int `json:"id,omitempty" meddler:"build_id,zeroisnull"`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps we sort on Created instead since we already have the field available?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, i dont know why did it! 😊

@bradrydzewski
Copy link

bradrydzewski commented Sep 14, 2016

I do think we should consider moving about from an "IN" clause and using the temp inner join like what's mentioned in

To be fair this is from 2006 which may not account for 10 years of mysql or postgres optimizations.

it would probably speed things up on orgs that have large repos

Is there any indication that this is currently a performance bottleneck? From my testing with pretty large datasets on a relatively small database server, the IN statement still serves hundreds of feed queries per second with 999 items in the IN statement. I can't think of any real world install that would execute hundreds of feed queries per second.

You can see my performance testing and results at #1234 (comment)

@bradrydzewski
Copy link

also relevant from the comments section of that post. If you are having perf issues try running ANALYZE on the database in postres or sqlite (not sure about mysql)

My DBA had a really hard time believing my solution in the first email in this thread should be faster. So after digging some, he ran an ANALYZE on the tables in question.

This helps the PostgreSQL planner out to create more optimal query plans. It turns out that after analyze the original query that uses an "IN" takes 1ms.

So we went from:
IN Clause: ~4500ms
Inner Join: ~300ms
Analyze with IN Clause: ~1ms

The lesson is: use explain, use analyze, and consult your DBAs. They are your friends.

@@ -5,6 +5,7 @@ import (

"github.com/drone/drone/model"
"github.com/franela/goblin"
"fmt"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import grouping and sorting :)

@bradrydzewski
Copy link

bradrydzewski commented Sep 14, 2016

the multiple float64 allocations could be avoided with some slight re-factoring as mentioned in #1793 (comment)

this code correctly calculates the number of pages

const limit = 999

total := len(listof)
if total == 0 {
    return feed
}

pages := total / limit
if total % limit != 0 {
  pages++
}

to be honest I find the changes made to toList to be a bit complex and difficult to follow. At first glance it looks buggy, but I can't tell. I would prefer a more simple approach as noted in #1793 (comment).

I also think it would be easier to unit test ...

func toList(listof []*model.RepoLite, start, limit int) ([]string, []interface{}) { ... }

// could be tested with
list := []*RepoLite{
  { ... },
  { ... },
  { ... },
  { ... },
}
placeholders, parameters := toList(list, 0, 2)
g.Assert(len(placeholders)).Equal(2)
g.Assert(len(parameters)).Equal(2)
g.Assert(list[0].FullName).Equal(parameters[0])
g.Assert(list[1].FullName).Equal(parameters[1])

{FullName: "bradrydzewski/drone"},
{FullName: "drone/drone"},
}
_repo := []*model.Repo{

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps repoList instead of _repo since it is an array. When I see _repo I think it is a single repo

@thiago
Copy link
Author

thiago commented Sep 15, 2016

Thank you @bradrydzewski for the details in the comments and your patience. I think I made all considerations, let me know if not.

I decided to create a new branch and perform a clear change in the master branch.

If you confirm, I'll send a new pull request and close this.

@thiago
Copy link
Author

thiago commented Sep 15, 2016

I have not found another way to perform the tests in the paging function GetRepoListOf without using loop, since the limit is not configurable.

Perhaps it makes sense to make the limit a global setting as DATABASE_DRIVER and DATABASE_CONFIG but don't know exactly how do this, But I think it's better to be solved in another pull request

@thiago
Copy link
Author

thiago commented Sep 16, 2016

Did I do something wrong? Sorry my anxiety..

@bradrydzewski
Copy link

bradrydzewski commented Sep 16, 2016

EDIT: retracting comment. Looks like you made the requested changes in a different branch. Do you want to open a new PR from that branch? thanks ! :)

@thiago
Copy link
Author

thiago commented Sep 16, 2016

Yes, after analyzing the previous reviews I thought it best to implement a new branch to make the changes clearer in the master branch.

But I'd like you validate it before I submit a new PR, is it possible? Thanks! =D

@josmo josmo mentioned this pull request Feb 5, 2017
@Adron
Copy link

Adron commented Feb 22, 2017

Just checking in on this PR. I was speaking with @josmo a few days ago about it's status and ways to test and verify the paging and everything is working. Is there anything I could prospectively do to help push the priority up on this PR? (It's one we'd get great use out of for some customers, since they're in the gazillion repo range) . // @thiago @bradrydzewski

Thanks all! 👍

@bradrydzewski
Copy link

bradrydzewski commented Feb 23, 2017

Is there anything I could prospectively do to help push the priority up on this PR

you just did, thanks for the reminder ... just merged #1783 and #1929

@harness harness locked and limited conversation to collaborators Feb 23, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants