You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Aspects of accessing PP and UM fields file data has sometimes been very slow, for a quite a while. I had previously always assumed that this was a cf.aggregation issue, which it very much sometimes was! ... but I think aggregation now performs pretty well.
@theabro kindly raised a case of reading a CF field from a 16 GB PP file. the CF Field itself comprised 2040 (= 24 x 85) 2-d PP fields:
Accessing the full data array with a = f.array is taking ~11,000 seconds - far too long!
Investigations showed that the reason for this was that the whole PP file was being parsed (i.e. all headers read and processed) for every 2-d PP field that contributes to the array, i.e. 2040 times in this case.
Stopping this parsing reduces the time taken to get the full array, on the same machine, to ~2 seconds (!). The entire 16 GB can read from disk in ~3.5 minutes.
The size of the file per se is not the cause of the problem, rather the large amount of individual lookup headers in the file: 162,888 in this case. For small my test cases with fewer than 5 PP fields, the slow down is invisible :(
Long overdue PR to follow.
The text was updated successfully, but these errors were encountered:
Well done - that's magnificent, @davidhassell! David reports a speedup of x700 in reading the data from one of my PP directories. I am looking forward to it in the next release.
Aspects of accessing PP and UM fields file data has sometimes been very slow, for a quite a while. I had previously always assumed that this was a
cf.aggregation
issue, which it very much sometimes was! ... but I think aggregation now performs pretty well.@theabro kindly raised a case of reading a CF field from a 16 GB PP file. the CF Field itself comprised 2040 (= 24 x 85) 2-d PP fields:
Accessing the full data array with
a = f.array
is taking ~11,000 seconds - far too long!Investigations showed that the reason for this was that the whole PP file was being parsed (i.e. all headers read and processed) for every 2-d PP field that contributes to the array, i.e. 2040 times in this case.
Stopping this parsing reduces the time taken to get the full array, on the same machine, to ~2 seconds (!). The entire 16 GB can read from disk in ~3.5 minutes.
The size of the file per se is not the cause of the problem, rather the large amount of individual lookup headers in the file: 162,888 in this case. For small my test cases with fewer than 5 PP fields, the slow down is invisible :(
Long overdue PR to follow.
The text was updated successfully, but these errors were encountered: