Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fileio\private\ft_getopt MEX file much slower than .m file #1379

Open
NirOfir opened this issue Apr 18, 2020 · 10 comments
Open

fileio\private\ft_getopt MEX file much slower than .m file #1379

NirOfir opened this issue Apr 18, 2020 · 10 comments

Comments

@NirOfir
Copy link
Contributor

NirOfir commented Apr 18, 2020

Hi,
I'm using fieldtrip version 07/04/2020 on Windows 10, i7 2.5 GHz processor with 8 GB RAM.
I used ft_preprocessing to rereference and demean EEG data from the hard drive, using this code:

cfg = [];
cfg.dataset = FILE_PATH;
cfg.channel = 'EEG';
cfg.trl = trl{:, 1:3};
cfg.demean = 'yes';
cfg.reref = 'yes';
cfg.refchannel = 'all';
data = ft_preprocessing(cfg);

This took about 245 seconds for 420 trials, each on average 3 seconds long, with 64 channels and 512 Hz sampling rate. I thought this was slower than expected so I ran the same code (after using clear all to make sure there is nothing that will affect run time) with the profiler and got this result:
image
The function that takes most time (~85%) is the MEX of ft_getopt, so I tried removing the MEX file from fileio/private to force fieldtrip to use the m file. This run only took 9 seconds (about 27 times faster), and these are the profiler results:
image
Note that the m version of fileio/private/ft_getopt takes less than 0.3 seconds in total here. There is another, different, ft_getopt MEX at the end of this part of the list which sits in utilities, but this one is zippy.

I tried the same thing on a much stronger computer (100 GB RAM, 16 core processor), running centos 7 and got basically the same results.

I also tried loading the entire dataset to memory and rereferencing all of it as one trial, and this took less than 15 seconds, so I'm pretty sure the bottleneck is the ft_getopt MEX.

For now I've deleted the MEX files from my fieldtrip version, but I wonder if there's something in the code that can be fixed to make it faster, or that the MEX file should be removed? Are there situations in which the MEX provides a speedup?

Thanks!

@robertoostenveld
Copy link
Contributor

robertoostenveld commented Apr 18, 2020

that is surprising! The idea of the mex files is that (in most cases) they should be faster rather than slower compared to the original m-file implementation.

I wonder whether this is specific to your combination of OS and MATLAB version.

Could you try this:

>> addpath your_fieldtrip_path/test
>> tic; for i=1:280; test_ft_getopt; end; toc
Elapsed time is 0.280271 seconds.

I used the profiler and observed that the test script calls ft_getopt 26 times, hence I adjusted it to 280 times the loop to get approximately to the same number of calls that you have in your first profile summary.

When I use the m-file and 1000 times 26 calls, I get

>> profile clear
>> profile on
>> tic; for i=1:1000; test_ft_getopt; end; toc
Elapsed time is 0.751258 seconds.
>> profile report

and the profile shows 0.546s total time for ft_getopt.

When I use the mex file I get

>> profile report
>> profile clear
profile on
tic; for i=1:1000; test_ft_getopt; end; toc
Elapsed time is 0.733025 seconds.
>> profile report

and a reported total time of 0.540, almost the same.

I recall that some checks in ft_getopt take more time than others, I wonder whether that might play a role here as well.

@robertoostenveld
Copy link
Contributor

I just tried it on a Windows 10 computer, with MATLAB 2020a. Running the test script 280 times (and hence ft_getopt 7280 times) takes between the tic and toc 0.5 seconds, and according to the profiler a total time of 0.311 seconds. Quite a difference from your 207 seconds.

@NirOfir
Copy link
Contributor Author

NirOfir commented Apr 18, 2020

Thanks for the quick response! I tested using your code and I get basically the same results as you. However, I noticed that the profiler says that the tested ft_getopt MEX is the one from the utilities folder, which is very fast also on my machine (see the last item in the second profiler list). I compared the two ft_getopts and they are identical as far as I can tell. So I wanted to check what you wrote, that some checks take more time than others, and you were absolutely right! I looked more deeply into the profiler report from before and I saw that of the million different lines which call ft_getopt, line 127 in ft_read_data takes the cake, and this seems to be the source of all my woes. This is line 127:
hdr = ft_getopt(varargin, 'header');
Here's my guess: The data I use is gtec_mat format, in which the header is basically the whole dataset (loaded in ft_read_header). Since the MEX files require transferring the arguments to them outside of the matlab environment (as far as I could understand from the docs), this means that each time line 127 is called, the entire EEG file is sent to this externel MEX file. In comparison, when ft_getopt.m is used, no movement of large variables is required and the test is fast. Does that make sense?

@robertoostenveld
Copy link
Contributor

thanks for further debugging; yes that makes sense.

When entering the varargin (with the header, i.e. full data) in the mex file, it is not yet copied. The c-code in the mex file and the MATLAB workspace share the same memory. But inside the mex file, it loops over the elements of the cell-array and on line 94 it duplicates the array. At that moment a "deep copy" is made, which is returned.

See https://nl.mathworks.com/help/matlab/apiref/mxduplicatearray.html

In normal cases, it would not make a deep copy but only a shallow copy. So in the example

a = randn(1000);
b = a;

the array b is a shallow copy and would actually point to the same memory segment as a. MATLAB keeps track of how many references there are to the memory segment and is smart enough to only make a deep copy (=real copy) when you would modify b (or a). So usually you don't notice the difference between deep and shallow copies, but in the MEX file c-code MATLAB is not there to keep references of the memory block and you have to take care of it yourself.

In this case, I don't see the actual reason why a deep copy is needed; a shallow copy (which is very fast to make) would also be fine.

Could you give it a try to try and update the mex file such that it makes a shallow copy?

@schoffelen
Copy link
Contributor

Is there any update here @NirOfir?

@NirOfir
Copy link
Contributor Author

NirOfir commented Jun 10, 2020

Hi, I tried using mxgetcell/mxgetfield instead of mxduplicatearray, but that just resulted in the mex crashing matlab. I tried searching whether it is possible to set pointers as the output arguments, and I couldn't find anything like that. It seems to be possible to avoid copying when the output is exactly the input, but that's not the case here... I couldn't find any other solution.

@NirOfir
Copy link
Contributor Author

NirOfir commented Jul 11, 2021

I'm adding this here for when I or someone else has the time to deal with this:
https://www.mathworks.com/help/matlab/matlab_external/avoid-copies-of-large-arrays.html
Not sure if the current version is suitable for this.

@schoffelen
Copy link
Contributor

Hi all, I came across this issue while working on a functional implementation for epoched .fif files. Like with the dataformat referenced above by @NirOfir, the hdr(.orig) contains a copy of the data. This appreciably slows down the mex version of ft_getopt. I tried to look into Robert's @robertoostenveld suggestion to make return a shallow copy of the variables, but according to what I could find online, this does not seem to be possible out of principle, and any hints at hacky solutions might be highly matlab version dependent, hence undesirable.

Could we consider a workaround, by caching the hdr in ft_read_data according to the logic that we can use any cached header provided the input filename is unchanged from the previous call of ft_read_data to the current call?

@robertoostenveld
Copy link
Contributor

Would it not work if the mex file were to make a shadow copy? Like this

plhs[0] = field;

@robertoostenveld
Copy link
Contributor

Would it not work if the mex file were to make a shadow copy? Like this ...

No, I don't think it would work. But search for mxCreateReference and see https://nl.mathworks.com/matlabcentral/answers/231824-issue-with-mex-function

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants