Skip to content

ozansener/active_adaptation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

59 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

active_adaptation

Rough Plan

  • Implement adverserial Lavg -> Lmax
  • Robust optimization experiment
  • Active learning experiment
  • ICML Feb 24!!!!
  • Figure out the theory part of domain adaptation stuff
  • Domain adaptation as a discriminator
  • Domain adaptation experiment
  • Active domain adaptation experiment
  • ICCV March 17
  • RL experiment with robots
  • Counter-Factual learning
  • NIPS May 19

Step Baseline 1

  • Cifar10 training and test (full data/nothing new using tf_base)
  • Set the seed training_set and save it num_images = 5000
  • Random active learning test 0.1/0.2/0.3/0.4/0.5/0.6/0.7/0.8/0.9/1.0
  • Plot the accuracy vs training data (baseline 1) see acc_vs_size.png

Step Baseline 2

  • L_avg -> L_max experiment with the same setup
  • No random selection, just a sampling w/ replacement
  • Refactor the code
  • Re-run the baseline 1
  • Run the baseline 2
  • Plot the accuracy vs training data (baseline 2)
  • Consider unbiasing the gradients samples are pretty uniform
  • Consider dropping one fc layer the reason is about distribution difference

Step Active Learning

  • Consider a few tricks
    • Re-initialize everything effective but not much
    • Keep a validation and learn adverserial only on the validation (no contamination since actual network never sees it) effective but not much
  • Sample new data with the learned model
  • May be a diversity trick? (still a valid thing) does not seem necessary since tSNE is pretty diverse
    • Diversity is a submodular function if defined as sum of total probability covered around each ball
    • Theory suggests a covering ball so let's use that
  • Combinatorial Algorithm: start with greedy 2-OPT solution, then refine it using integer programming and binary search if feasible. this is pretty feasible actually somehow Gurobi is more efficient than greedy one to improve the solution
  • To match theory and practice, put feature learning in both players
  • Include gradient reversal layer
    • Seems like best option for now
    • Step 1: vanilla reversal Note: ADAM is using second derivative which is crap in adverserial setting so use momentum
    • Step 1.5: Implement reversal with single output (so it can learn data distribution)
    • Step 2: vanilla(so) reversal+loss_rescale
    • Step 3: Reversal(so) domain estimate + sampling
    • Step 4: Reversal(so and not/so) + combinatorial sampling (this is desired simply because theory)
  • Try with oracle loss still worse than random may be it is bringing sort of a bias
  • Look at the tSNE plot and see is it because of diversity it is pretty diverse
  • Exploration works so test different degree of exploration 0.2 seems like a nice one, may be 0.25
  • Consider normalizing stuff since they become crazy (may be remove batch norm)
  • Use BiGAN or ALI as semi-supervised algorithm

Baselines

  • k-k^\prime
  • maximum uncertanity
  • uncertanity based sampling
  • oracle uncertanity based sampling

Step Domain Adaptation

  • Modify the loss/adversery to be applicable to domain adaptation

Step Active Learning with DA

  • Combine everything

Device Assignment

  • 109 0/5k/10k/15k
  • 110 20k/25k/30k/35k
  • 106 40k/45k

Results

50,55,62,68,70,73,76,78,79,81

Way to sample active datapoints

  • get the top 5000 expected loss make it 1 rest 0
  • combine with gamma = 0.01 or 0.02

Report

  • Descrive the active learning with pool problem, state it is a weakly supervised problem and need to be treated like one
  • Discuss p(x) p_n(x) p_\hat{n}(x) and give the basic idea behind loss re-scale and discuss how it can be discussed through alternating minimization (Adverserial Weak Supervision)
  • Discuss the theoretical aspect of active learning
    • Review robustness and generalization
    • Lemma (VGG is robust)
    • Theorem: Any robust algorithm is robust with less samples if ()
  • Discuss the empirical setup and explain the two concepts (fixed budget, single step)
    • Representations should be as close as possible so it is easier to cover the same space with less points
      • Gradient reversal layer
    • The bound depends solely on (\gamma) hence solve combinatorial optimization to have minimum ball
      • Binary search over a submodular problem
  • Experiments
    • MNIST
    • Cifar 10 / Cifar 100 on VGG

Current TODO

  • Get the features and look at the tSNE
  • Sample far points and try this
  • Implement the combinatorial algorithm for N-D, try with 2D t-SNE points
  • Experiment the active learning

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages