5615_P03B

Structure of code

Prerequisites

Numpy
Scipy
Scikit-learn
Pandas
Tensorflow=v1.1.1
Keras=v2.0.4
Spacy
h5py=v2.7.0
Matplotlib
Seaborn
If you find it is hard to build the environment, you can also find the dockerfile provided by author of the paper in here: https://hub.docker.com/r/pangwei/tf1.1/ For tensorflow, the cpu version should be enough, but if you need to train with a large amount of data, it is better to use gpu version and then you will need cnn adn cuda kits from nvidia and a gpu as well.

Usage of code

To use the code,

Download the code
Put the code in a file with the name you prefer.
Now you can import the Influ into your script
The structure of this package and your scripts should be like:

a_directory
|-- project_code
|-- your_scripts
|-- your_dataset

*project_code is a directory for storing the code from this project

In your scripts:

from project_code.Influ import Influ
influ = Influ()
influ.load_data(your_dataset)
influ.convert(features_you_choose, label_you_choose)
influ.cal_influe()

Documentation

In the Influ:

load_data(filename)
Load the data from your dataset, please be noticed that due to the design, the value of your label should be 1 and 2. The format of dataset can be csv, txt or xlsx
convert(feature, label) 'feature' is a list of the features you choose. 'label' is your chosen label.This method will convert your dataset into a format that can be read by tensorflow and it will generate a compressed file called 'fake_data' which will be later read.
cal_influe(test_idx, gamma) This will start training and compute influence function and display the plot for visualization automatically. test_idx: This is the test data you choose as a standard to compute the euclidean distance for visualization. gamma: This is the parameter for rbf kernel.
visualization(scale) This will generate the plot again if you are not satisfied with the automatically generated plot. The reason for this is because the default scale for x axis and y axis is both 0.03. If you want to have better view on the distribution, you can set the scale by yoursef.

Test

Test codes are situated in unittest file. But if you want to run the test, you have to copy it and place it like this:

a_directory
|-- project_code
|-- your_scripts
|-- test.py(this is the test code)
|-- your_dataset

Reference

The code for computing the influence function which is stored in influence directory is adapted from the work of https://github.com/kohpangwei/influence-release/

Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
.idea		.idea
influence		influence
scripts		scripts
source_datasets		source_datasets
unittest		unittest
Influ.py		Influ.py
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
final_delivery_signed_p03B copy 2.pdf		final_delivery_signed_p03B copy 2.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

5615_P03B

Structure of code

Prerequisites

Usage of code

Documentation

Test

Reference

About

Releases

Packages

Contributors 6

Languages

License

cruxyoung/5615_P03B

Folders and files

Latest commit

History

Repository files navigation

5615_P03B

Structure of code

Prerequisites

Usage of code

Documentation

Test

Reference

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 6

Languages

Packages