Skip to content

Commit

Permalink
2
Browse files Browse the repository at this point in the history
  • Loading branch information
czk23 committed Oct 25, 2023
1 parent 2b8d6bc commit aa64cb9
Show file tree
Hide file tree
Showing 4,745 changed files with 2,537,580 additions and 0 deletions.
The diff you're trying to view is too large. We only load the first 3000 changed files.
3 changes: 3 additions & 0 deletions ML_for_SQL/.idea/.gitignore

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions ML_for_SQL/.idea/.name

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

11 changes: 11 additions & 0 deletions ML_for_SQL/.idea/ML_for_SQL.iml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

7 changes: 7 additions & 0 deletions ML_for_SQL/.idea/inspectionProfiles/profiles_settings.xml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 4 additions & 0 deletions ML_for_SQL/.idea/misc.xml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

8 changes: 8 additions & 0 deletions ML_for_SQL/.idea/modules.xml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

7 changes: 7 additions & 0 deletions ML_for_SQL/README
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
本项目是使用机器学习算法来分类SQL注入语句与正常语句:
使用了SVM,Adaboost,决策树,随机森林,逻辑斯蒂回归,KNN,贝叶斯等算法分别对SQL注入语句与正常语句进行分类。
data是收集的样本数据
file中存放的是训练好的各个模型
featurepossess.py是对原始样本进行预处理,提特征。
sqlsvm.py等py文件是训练模型
testsql是对训练好的模型进行测试,用准确率来度量模型效果。
Binary file not shown.
Binary file not shown.
63 changes: 63 additions & 0 deletions ML_for_SQL/adaboost.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
# -*- coding: utf-8 -*-
"""
Created on Mon Nov 20 19:06:57 2017
@author: wf
"""
import numpy as np
import pandas as pd
from sklearn import metrics
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.ensemble import AdaBoostClassifier
from sklearn.model_selection import train_test_split
from featurepossess import generate
import joblib

sql_matrix=generate("./data/sqlnew.csv","./data/sql_matrix.csv",1)
nor_matrix=generate("./data/normal_less.csv","./data/nor_matrix.csv",0)

df = pd.read_csv(sql_matrix)
df.to_csv("./data/all_matrix.csv",encoding="utf_8_sig",index=False)
df = pd.read_csv( nor_matrix)
df.to_csv("./data/all_matrix.csv",encoding="utf_8_sig",index=False, header=False, mode='a+')

feature_max = pd.read_csv('./data/all_matrix.csv')
arr=feature_max.values
data = np.delete(arr, -1, axis=1) #删除最后一列
#print(arr)
target=arr[:,7]
#随机划分训练集和测试集
train_data,test_data,train_target,test_target = train_test_split(data,target,test_size=0.3,random_state=3)
#模型
model1=DecisionTreeClassifier(max_depth=5)
model2=GradientBoostingClassifier(n_estimators=100)
model3=AdaBoostClassifier(model1,n_estimators=100)
model1.fit(train_data,train_target)#训练模型
model2.fit(train_data,train_target)#训练模型
model3.fit(train_data,train_target)#训练模型
joblib.dump(model2, './file/GBDT.model')#梯度提升书算法
print("GBDT.model has been saved to 'file/GBDT.model'")

joblib.dump(model3, './file/Adaboost.model')
print("Adaboost.model has been saved to 'file/Adaboost.model'")
#clf = joblib.load('svm.model')
y_pred1=model2.predict(test_data)#预测
print("y_pred:%s"%y_pred1)
print("test_target:%s"%test_target)
#Verify
print("GBDT:")
print('Precision:%.3f' %metrics.precision_score(y_true=test_target,y_pred=y_pred1))#查全率
print('Recall:%.3f' %metrics.recall_score(y_true=test_target,y_pred=y_pred1))#查准率
print(metrics.confusion_matrix(y_true=test_target,y_pred=y_pred1))#混淆矩阵

y_pred2=model3.predict(test_data)#预测
print("y_pred:%s"%y_pred2)
print("test_target:%s"%test_target)
#Verify
print("Adaboost:")
print('Precision:%.3f' %metrics.precision_score(y_true=test_target,y_pred=y_pred2))#查全率
print('Recall:%.3f' %metrics.recall_score(y_true=test_target,y_pred=y_pred2))#查准率
print(metrics.confusion_matrix(y_true=test_target,y_pred=y_pred2))#混淆矩阵


Loading

0 comments on commit aa64cb9

Please sign in to comment.