Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strange slowdown of parser function #220

Open
PhilippRue opened this issue Dec 12, 2022 · 7 comments
Open

Strange slowdown of parser function #220

PhilippRue opened this issue Dec 12, 2022 · 7 comments
Assignees

Comments

@PhilippRue
Copy link
Member

A function that is used in the KKR parsers to search output files slows down for some reason if the output file is larger. This can be seen in the attached screenshot where the search alone is completed in 2ms, the reading of the outputfile alone also takes 2ms but when both steps are combined the parser takes 300ms.

Screenshot 2022-12-12 at 08 20 42

@PhilippRue
Copy link
Member Author

The files to reproduce this issue can be found here.

@janssenhenning
Copy link
Contributor

@PhilippRue To identify performance problems I often use the line-profiler package
You install it with

pip install line-profiler

You select functions you want to profile by adding

@profile #Nothing has to be imported
def foo():
    ...

And execute an example script with kernprof -l script_to_profile.py (I'm sure there is also something for jupyter in this package but I never used it)
Finally you get a line by line breakdown with python -m line_profiler script_to_profile.py.lprof

@janssenhenning
Copy link
Contributor

One thing I would be suspicious of is the line

tmpval = tmptxt.pop(itmp)

since pop as an operation scales with the size of the list

@PhilippRue
Copy link
Member Author

Thanks, I'll give the line-profiler a try. But I don't think that it is actually the search because if I load the data and then only time the search function (cell 41 in the screenshot) it is fast (2ms). Only when we wrap it both in one function (readin and then search) it slows down

@janssenhenning
Copy link
Contributor

So why the combination of reading and extracting values is much slower than the separated functions I don't know but I tried using generators in search_string to reduce duplicated work. This changes the time spent in search from 0.730617 s to 0.0073027 s
(on iffaiida-test)

def search_string(searchkey, txt):
    for index, line in enumerate(txt):
        if searchkey in line:
            yield index, line

The the search function looks like this

def search(tmptxt, searchstring, splitinfo, replacepair=None, debug=False):

    res = []
    while itmp, tmpval in search_string(searchstring, tmptxt):
        if debug:
            print(('in parse_array_float (itmp, searchstring, outfile):', itmp, searchstring, outfile))
        if itmp >= 0:
            if replacepair is not None:
                tmpval = tmpval.replace(replacepair[0], replacepair[1])
            if splitinfo[0] == 1:
                tmpval = float(tmpval.split(splitinfo[1])[splitinfo[2]])
            elif splitinfo[0] == 2:
                tmpval = float(tmpval.split(splitinfo[1])[splitinfo[2]].split()[splitinfo[3]])
            else:
                raise ValueError('splitinfo[0] has to be either 1 or 2')
            res.append(tmpval)
    res = array(res)
    return res

@janssenhenning
Copy link
Contributor

Oh wait I think I know why the separation is much faster.

In search the tmptxt variable is mutated with pop so after one run the list no longer contains the search phrase
So on all subsequent runs the result should be empty

@PhilippRue
Copy link
Member Author

Oh wait I think I know why the separation is much faster.

In search the tmptxt variable is mutated with pop so after one run the list no longer contains the search phrase So on all subsequent runs the result should be empty

Ah, yes that makes sense. Thanks a lot for your help. We'll use your fix using generators :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants