Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Short Question About Implementation #359

Open
adam2392 opened this issue May 13, 2022 · 1 comment
Open

Short Question About Implementation #359

adam2392 opened this issue May 13, 2022 · 1 comment

Comments

@adam2392
Copy link

adam2392 commented May 13, 2022

Hi @MrAE, @jbrowne6 and @falkben

Just pinging the ppl that seemed to touch these specific LOC.

I know you guys don't maintain this code anymore and have moved on, but I had a quick question in terms of what a specific line is doing. I was wondering if you could provide a quick answer (if you happened to write this part) to make sure I'm interpreting correctly. FYI: I have ported the code to cython and once this issue is resolved, I think we can safely move on :)

In

inline void randMatTernary(std::vector<weightedFeature>& featuresToTry){
int rndMtry;
int rndFeature;
int rndWeight;
int mtryDensity = (int)((double)fpSingleton::getSingleton().returnMtry() * fpSingleton::getSingleton().returnMtryMult());
for (int i = 0; i < mtryDensity; ++i){
rndMtry = randNum->gen(fpSingleton::getSingleton().returnMtry());
rndFeature = randNum->gen(fpSingleton::getSingleton().returnNumFeatures());
featuresToTry[rndMtry].returnFeatures().push_back(rndFeature);
rndWeight = (randNum->gen(2)%2) ? 1 : -1;
assert(rndWeight==1 || rndWeight==-1);
featuresToTry[rndMtry].returnWeights().push_back(rndWeight);
}
}
are you sampling without replacement the feature index? It looks like rndFeature = randNum->gen(fpSingleton::getSingleton().returnNumFeatures()); can generate a random feature index, but is it possible to have a duplicate?

For example, say you have data with 4 columns, then maybe SPORF will sample a projection of:

indices = [0, 2, 0]
weights = [1, -1, 1]

Note that this in turn isn't a sparse linear combination with only +/- 1's, but now has a +2, -1 weight when doing the linear combination. Or is this function guaranteed to not have duplicates in its sampling of the projection matrix?

@MrAE
Copy link
Collaborator

MrAE commented May 13, 2022

Hey @adam2392, I worked mainly in the R part of things although I do remember having a similar issue with this chunk (lots of whiteboarding). I never did figure out if this block was sampling in accordance with the SPORF paper -- and given your example, I'd say it's not.

In that case the indices should be sampled without replacement -- going from memory.

I did tinker around in the C++ code, but the base functions came from James.
I know James had some code in his own repo, which may have some tests in it 🤷🏼‍♂️ -- he'd be the one with the most knowledge about how it works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants