Short Question About Implementation #359

adam2392 · 2022-05-13T18:18:08Z

Just pinging the ppl that seemed to touch these specific LOC.

I know you guys don't maintain this code anymore and have moved on, but I had a quick question in terms of what a specific line is doing. I was wondering if you could provide a quick answer (if you happened to write this part) to make sure I'm interpreting correctly. FYI: I have ported the code to cython and once this issue is resolved, I think we can safely move on :)

In

SPORF/packedForest/src/forestTypes/binnedTree/processingNodeBin.h

Lines 99 to 113 in a7a3c7e

 inline void randMatTernary(std::vector<weightedFeature>& featuresToTry){ 

 int rndMtry; 

 int rndFeature; 

 int rndWeight; 

 int mtryDensity = (int)((double)fpSingleton::getSingleton().returnMtry() * fpSingleton::getSingleton().returnMtryMult()); 

 for (int i = 0; i < mtryDensity; ++i){ 

 rndMtry = randNum->gen(fpSingleton::getSingleton().returnMtry()); 

 rndFeature = randNum->gen(fpSingleton::getSingleton().returnNumFeatures()); 

 featuresToTry[rndMtry].returnFeatures().push_back(rndFeature); 

 rndWeight = (randNum->gen(2)%2) ? 1 : -1; 

 assert(rndWeight==1 || rndWeight==-1); 

 featuresToTry[rndMtry].returnWeights().push_back(rndWeight); 

 } 

 }

are you sampling without replacement the feature index? It looks like rndFeature = randNum->gen(fpSingleton::getSingleton().returnNumFeatures()); can generate a random feature index, but is it possible to have a duplicate?

For example, say you have data with 4 columns, then maybe SPORF will sample a projection of:

indices = [0, 2, 0]
weights = [1, -1, 1]

Note that this in turn isn't a sparse linear combination with only +/- 1's, but now has a +2, -1 weight when doing the linear combination. Or is this function guaranteed to not have duplicates in its sampling of the projection matrix?

The text was updated successfully, but these errors were encountered:

MrAE · 2022-05-13T20:26:37Z

Hey @adam2392, I worked mainly in the R part of things although I do remember having a similar issue with this chunk (lots of whiteboarding). I never did figure out if this block was sampling in accordance with the SPORF paper -- and given your example, I'd say it's not.

In that case the indices should be sampled without replacement -- going from memory.

I did tinker around in the C++ code, but the base functions came from James.
I know James had some code in his own repo, which may have some tests in it 🤷🏼‍♂️ -- he'd be the one with the most knowledge about how it works.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Short Question About Implementation #359

Short Question About Implementation #359

adam2392 commented May 13, 2022 •

edited

Loading

MrAE commented May 13, 2022

Short Question About Implementation #359

Short Question About Implementation #359

Comments

adam2392 commented May 13, 2022 • edited Loading

MrAE commented May 13, 2022

adam2392 commented May 13, 2022 •

edited

Loading