Strange slowdown of parser function #220

PhilippRue · 2022-12-12T07:28:10Z

A function that is used in the KKR parsers to search output files slows down for some reason if the output file is larger. This can be seen in the attached screenshot where the search alone is completed in 2ms, the reading of the outputfile alone also takes 2ms but when both steps are combined the parser takes 300ms.

PhilippRue · 2022-12-12T07:29:26Z

The files to reproduce this issue can be found here.

janssenhenning · 2022-12-12T08:35:00Z

@PhilippRue To identify performance problems I often use the line-profiler package
You install it with

pip install line-profiler

You select functions you want to profile by adding

@profile #Nothing has to be imported
def foo():
    ...

And execute an example script with kernprof -l script_to_profile.py (I'm sure there is also something for jupyter in this package but I never used it)
Finally you get a line by line breakdown with python -m line_profiler script_to_profile.py.lprof

janssenhenning · 2022-12-12T08:45:36Z

One thing I would be suspicious of is the line

tmpval = tmptxt.pop(itmp)

since pop as an operation scales with the size of the list

PhilippRue · 2022-12-12T09:16:37Z

Thanks, I'll give the line-profiler a try. But I don't think that it is actually the search because if I load the data and then only time the search function (cell 41 in the screenshot) it is fast (2ms). Only when we wrap it both in one function (readin and then search) it slows down

janssenhenning · 2022-12-12T09:19:49Z

So why the combination of reading and extracting values is much slower than the separated functions I don't know but I tried using generators in search_string to reduce duplicated work. This changes the time spent in search from 0.730617 s to 0.0073027 s
(on iffaiida-test)

def search_string(searchkey, txt):
    for index, line in enumerate(txt):
        if searchkey in line:
            yield index, line

The the search function looks like this

def search(tmptxt, searchstring, splitinfo, replacepair=None, debug=False):

    res = []
    while itmp, tmpval in search_string(searchstring, tmptxt):
        if debug:
            print(('in parse_array_float (itmp, searchstring, outfile):', itmp, searchstring, outfile))
        if itmp >= 0:
            if replacepair is not None:
                tmpval = tmpval.replace(replacepair[0], replacepair[1])
            if splitinfo[0] == 1:
                tmpval = float(tmpval.split(splitinfo[1])[splitinfo[2]])
            elif splitinfo[0] == 2:
                tmpval = float(tmpval.split(splitinfo[1])[splitinfo[2]].split()[splitinfo[3]])
            else:
                raise ValueError('splitinfo[0] has to be either 1 or 2')
            res.append(tmpval)
    res = array(res)
    return res

janssenhenning · 2022-12-12T09:21:06Z

Oh wait I think I know why the separation is much faster.

In search the tmptxt variable is mutated with pop so after one run the list no longer contains the search phrase
So on all subsequent runs the result should be empty

PhilippRue · 2022-12-12T10:06:19Z

Oh wait I think I know why the separation is much faster.

In search the tmptxt variable is mutated with pop so after one run the list no longer contains the search phrase So on all subsequent runs the result should be empty

Ah, yes that makes sense. Thanks a lot for your help. We'll use your fix using generators :)

PhilippRue assigned dantogni Dec 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Strange slowdown of parser function #220

Strange slowdown of parser function #220

PhilippRue commented Dec 12, 2022

PhilippRue commented Dec 12, 2022

janssenhenning commented Dec 12, 2022

janssenhenning commented Dec 12, 2022

PhilippRue commented Dec 12, 2022

janssenhenning commented Dec 12, 2022

janssenhenning commented Dec 12, 2022

PhilippRue commented Dec 12, 2022

Strange slowdown of parser function #220

Strange slowdown of parser function #220

Comments

PhilippRue commented Dec 12, 2022

PhilippRue commented Dec 12, 2022

janssenhenning commented Dec 12, 2022

janssenhenning commented Dec 12, 2022

PhilippRue commented Dec 12, 2022

janssenhenning commented Dec 12, 2022

janssenhenning commented Dec 12, 2022

PhilippRue commented Dec 12, 2022