diff --git a/Projeto_Final_MoA_no_Paredawn.ipynb b/Projeto_Final_MoA_no_Paredawn.ipynb new file mode 100644 index 0000000..45c3748 --- /dev/null +++ b/Projeto_Final_MoA_no_Paredawn.ipynb @@ -0,0 +1,11475 @@ +{ + "nbformat": 4, + "nbformat_minor": 0, + "metadata": { + "colab": { + "name": "Projeto_Final - MoA no Paredawn.ipynb", + "provenance": [], + "collapsed_sections": [ + "US878rR1wrBq", + "XBQ40pnrwwYL", + "UAK3kawHwyZZ", + "fy8XzwXWw0fq", + "v2p_CrKixE0k", + "rGjBCv6PLsC-" + ], + "authorship_tag": "ABX9TyOLyjrHzNZ32W4P0HHD/+z9", + "include_colab_link": true + }, + "kernelspec": { + "name": "python3", + "display_name": "Python 3" + }, + "language_info": { + "name": "python" + } + }, + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "view-in-github", + "colab_type": "text" + }, + "source": [ + "\"Open" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "5XnrpYnCeesB" + }, + "source": [ + "***\n", + "# ***Projeto Final Imersão Alura 3° edição*** \n", + "***" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "0LAuJQsEa94g" + }, + "source": [ + "" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "5TOR8N9ui7Qk" + }, + "source": [ + "![inicio.png]()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "5ZL5iRagjBTc" + }, + "source": [ + "\"imersão" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Gte_glj5jeHD" + }, + "source": [ + "

Olá Mergulhadores!!! \n", + "\n", + "Você deve estar pensando: `O que é esse treco de MoA`, não é mesmo? se não pensou isso, poxa, eu pensei!! \n", + "\n", + "Mas fique tranquilo, vou tentar descomplicar isso para você! Vamos colocar esse tal de MoA no `paredão` para **analisar e explorar seus dados**, depois vamos tentar criar uma `maquina preditiva` para **prever** com os dados que temos se ele será **atividado** ou **não**. \n", + "\n", + "Após isso, vamos criar outra maquina preditiva para prever se o experimento feito será com uso de droga ou uso do controle com um melhor metrica de avalição possível. \n", + "\n", + "Poxa, só de falar estou animado! Espero que você também esteja assim!\n", + "\n", + "Então vamos lá!" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "E-w_JjsDk64N" + }, + "source": [ + "#***Objetivos do Projeto***#" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Rfmk7O_Jk9YU" + }, + "source": [ + "

O objetivo desse projeto é o desenvolvimento de duas maquinas preditivas, com o intuito de auxilar na descoberta de novos medicamentos através dos MoAs (Mecanismo de Ação).\n", + "\n", + "\n", + "\n", + "1. `Primeiro Objetivo:` Criar uma Maquina preditiva que consiga prever se o experimento irá ativar um ou mais MoA, com um F1-score maior que 80%\n", + "2. `Segundo Objetivo:` Criar uma Maquina preditiva que consiga prever se o experimento foi tratado com **droga** ou **com_controle**\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Vk_RARVBlJvE" + }, + "source": [ + "#***Contextualizando***#" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "I7W8yPGSlMlD" + }, + "source": [ + "O [Conective Map](https://clue.io/), é um projeto dentro do MIT e Harvord, o Laboratorio de Ciencia Inovadora aprensenta esse desafio com o objetivo no avanço no desenvolvimento de medicamentos por de algoritmos que consigam prever o MoA." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "PZUucP8AlO0j" + }, + "source": [ + " Mas o que é MoA?\n", + "\n", + "

Do inglês MoA significa Mechanisms of Action (Mecanimos de Ação).

\n", + "\n", + "

Antigamente, as drogas eram derivadas de produtos naturais, muitos remedios eram colocamos em uso clinico sem ao menos entender os mecanimos biológicos daquele medicamento.

\n", + "\n", + "

Atualmente, com grandes tecnologias, esse processo de descobrimento de novas drogas passou por uma mudança nas abordagens. Hoje, temos um modelo mais voltado para a compreenção do mecanismo biológico de uma doença, com isso, buscamos identificar um alvo proteíco associado a uma doença e desenvolver uma molecula que possa reagir com essa proteína alvo.

\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "hffcf4mMlPjr" + }, + "source": [ + " Como determinamos os MoAs de um novo medicamento?\n", + "\n", + "

Uma abordagem é tratar uma amostra de célula humanas com a droga e depois analisar as respostas celulares com algoritmos que consigam identificar padrões conhecidos em grandes bancos de dados genômicos

\n", + "\n", + "

Neste projeto tem um conjunto de dados exclusivo que combina a expressão gênica e os dados de viabilidade celular. Os dados são baseados em uma nova tecnologia que mede simultaneamente (nas mesmas amostras) as respostas das células humanas aos medicamentos em um pool de 100 tipos de células diferentes

\n", + "\n", + "

Em outro conjunto de dados, chamado dados_resuldados.csv, tem as anotações do MoA para mais de 5.000 medicamentos

" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "xZVQuT8Rlbjf" + }, + "source": [ + "Esse projeto foi desenvolvido na [Imersão dados - 3° edição - Alura](https://www.alura.com.br/imersao-dados)
\n", + "\n", + "Dados estão disponíveis no [Kaggle](https://www.kaggle.com/c/lish-moa/data)
" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "qa8_grzhliAr" + }, + "source": [ + "##Conjunto de Dados" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "OQ3FROh2lmc8" + }, + "source": [ + "\n", + "Neste projeto tem dois conjunto de dados sendo eles: \n", + "\n", + "1. ```dados_treinamentos.csv ``` \n", + "2. ``` dados_resultados.csv```\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "m0Zoc-vrpmG1" + }, + "source": [ + "No conjunto de dados ```dados_treinamentos.csv ``` encontramos as seguintes colunas:\n", + "\n", + "Colunas | Type | Objetivo\n", + "-------------------|------------------|------------------\n", + "id | Object | identificador único do experimento\n", + "tratamento | Object | Qual tipo de tratamento, com droga ou com controle\n", + "tempo | int64 | tempo observado \n", + "dose | Object | Qual dose tomou \n", + "droga | Object | Codigo da droga\n", + "g-0 até g-771 | float64 | Genes\n", + "c-0 até c-99 | float64 | Celulas" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "US878rR1wrBq" + }, + "source": [ + "#***Importação das Bibliotecas***" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "8U_MU0mKjTl8" + }, + "source": [ + "import pandas as pd\n", + "import numpy as np\n", + "\n", + "#Bibliotecas de visualização de dados\n", + "import seaborn as sns\n", + "import matplotlib\n", + "import matplotlib.pyplot as plt\n", + "\n", + "import warnings\n", + "warnings.filterwarnings('ignore')\n", + "\n", + "#Bibliotecas de Pré-processamento\n", + "\n", + "from sklearn.preprocessing import LabelEncoder\n", + "from sklearn.model_selection import train_test_split\n", + "from sklearn.ensemble import ExtraTreesClassifier\n", + "from sklearn.utils import resample\n", + "from sklearn.feature_selection import SelectFromModel\n", + "\n", + "#Bibliotecas de Modelo de Machine Learning\n", + "from sklearn.linear_model import LogisticRegression\n", + "from sklearn.tree import DecisionTreeClassifier\n", + "from sklearn.ensemble import RandomForestClassifier\n", + "from sklearn.dummy import DummyClassifier\n", + "from xgboost import XGBClassifier\n", + "\n", + "\n", + "#Bibliotecas de Avaliação do Modelo\n", + "from sklearn.metrics import accuracy_score\n", + "from sklearn.metrics import classification_report" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "XBQ40pnrwwYL" + }, + "source": [ + "#***Carregamento das Bases***" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "UAK3kawHwyZZ" + }, + "source": [ + "## Carregando a Base Experimentos" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "PbTIA2dkw2yS" + }, + "source": [ + "df_experimentos = pd.read_csv('https://github.com/alura-cursos/imersaodados3/blob/main/dados/dados_experimentos.zip?raw=true', compression = 'zip')" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "fy8XzwXWw0fq" + }, + "source": [ + "##Carregando a Base Resultados" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "b-5E8BTuc8Su" + }, + "source": [ + "df_resultados = pd.read_csv('https://github.com/alura-cursos/imersaodados3/blob/main/dados/dados_resultados.csv?raw=true')" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "WVQ17navxCjM" + }, + "source": [ + "#***Analise Exploratória***" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "o6jqXqkMxpsz" + }, + "source": [ + "Então vamos começar a nossa análise e colocar os dados das bases `Experimentos` e `Resultados` no `Paredawn`" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "g-AGUpOsN_ae" + }, + "source": [ + "\"Paredawn\"\n", + "\n", + "\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "v2p_CrKixE0k" + }, + "source": [ + "##Base Experimentos" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "2a2D7ZoOxl2T" + }, + "source": [ + "Para iniciar a nossa análise exploratória vamos começar entendendo a dimensão na nossa base utilizando o `.shape`:" + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "CAQj2A4exIJ9", + "outputId": "facbb5ae-68ec-42b3-e92d-903a0c983929" + }, + "source": [ + "df_experimentos.shape" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "(23814, 877)" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 144 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "fPh-ybHVx5u2" + }, + "source": [ + "Conforme observado acima, conseguimos perceber que temos `23.814 linhas` e `877 colunas` " + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "8MtOef0Ux9Lq" + }, + "source": [ + "Para verificar as primeiras linhas do nosso DataFrame utilizaremos `.head()`" + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 244 + }, + "id": "LW6K5VPIyJmL", + "outputId": "75447055-6851-42d9-ecd3-b5a7c17c70a9" + }, + "source": [ + "df_experimentos.head()" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
idtratamentotempodosedrogag-0g-1g-2g-3g-4g-5g-6g-7g-8g-9g-10g-11g-12g-13g-14g-15g-16g-17g-18g-19g-20g-21g-22g-23g-24g-25g-26g-27g-28g-29g-30g-31g-32g-33g-34...c-60c-61c-62c-63c-64c-65c-66c-67c-68c-69c-70c-71c-72c-73c-74c-75c-76c-77c-78c-79c-80c-81c-82c-83c-84c-85c-86c-87c-88c-89c-90c-91c-92c-93c-94c-95c-96c-97c-98c-99
0id_000644bb2com_droga24D1b68db1d531.06200.5577-0.2479-0.6208-0.1944-1.0120-1.0220-0.03260.5548-0.09211.18300.15300.5574-0.40150.1789-0.6528-0.79690.63420.1778-0.3694-0.5688-1.1360-1.18800.69400.43930.26640.19070.1628-0.28530.58190.2934-0.5584-0.0916-0.3010-0.1537...0.48050.49650.36800.84270.10420.14030.17581.2570-0.59791.2250-0.05530.73510.58100.95900.24270.04950.41410.84320.6162-0.73181.21200.6362-0.44270.12881.48400.17990.5367-0.1111-1.01200.66850.28620.25840.80760.5523-0.19120.6584-0.39810.21390.38010.4176
1id_000779bfccom_droga72D1df89a8e5a0.07430.40870.29910.06041.01900.52070.23410.3372-0.40470.8507-1.1520-0.4201-0.09580.45900.08030.22500.52930.2839-0.34940.28830.9449-0.1646-0.2657-0.33720.3135-0.43160.47730.2075-0.4216-0.1161-0.0499-0.26270.9959-0.24830.2655...0.40830.03190.39050.70990.29120.4151-0.2840-0.3104-0.63730.2887-0.07650.25390.44430.59320.20310.76390.5499-0.3322-0.09770.4329-0.27820.78270.59340.34020.14990.44200.93660.8193-0.42360.3192-0.42650.75430.47080.02300.29570.48990.15220.12410.60770.7371
2id_000a6266acom_droga48D118bb41b2c0.62800.58171.5540-0.0764-0.03231.23900.17150.21550.00651.2300-0.4797-0.5631-0.0366-1.83000.6057-0.32780.6042-0.3075-0.1147-0.0570-0.0799-0.8181-1.53200.23070.49010.4780-1.39704.6240-0.04371.2870-1.85300.60690.42900.17830.0018...-0.5477-0.7576-0.04440.1894-0.0014-2.3640-0.46820.1210-0.5177-0.06040.1682-0.44360.49630.13630.33350.9760-0.0427-0.12350.09590.0690-0.9416-0.7548-0.1109-0.62720.30190.11720.1093-0.31130.3019-0.0873-0.7250-0.62970.61030.0223-1.3240-0.3174-0.6417-0.2187-1.40800.6931
3id_0015fd391com_droga48D18c7f86626-0.5138-0.2491-0.26560.52884.0620-0.8095-1.95900.1792-0.1321-1.0600-0.8269-0.3584-0.8511-0.5844-2.56900.8183-0.0532-0.85540.1160-2.35202.1200-1.1580-0.7191-0.8004-1.4670-0.0107-0.89950.2406-0.2479-1.0890-0.75750.0881-2.73700.87450.5787...-2.1220-0.3752-2.3820-3.7350-2.9740-1.4930-1.6600-3.16600.2816-0.2990-1.1870-0.5044-1.7750-1.6120-0.9215-1.0810-3.0520-3.4470-2.7740-1.8460-0.5568-3.3960-2.9510-1.1550-3.2620-1.5390-2.4600-0.9417-1.55500.2431-2.0990-0.6441-5.6300-1.3780-0.8632-1.2880-1.6210-0.8784-0.3876-0.8154
4id_001626bd3com_droga72D27cbed3131-0.3254-0.40090.97000.69191.4180-0.8244-0.2800-0.1498-0.87890.8630-0.2219-0.5121-0.95771.17500.20420.19700.1244-1.7090-0.3543-0.5160-0.3330-0.26850.76490.20571.37200.68350.8056-0.3754-1.20900.2965-0.07120.63890.6674-0.07831.1740...-0.22740.32150.1535-0.4640-0.59430.39730.15000.51780.51590.60910.1813-0.42490.78320.65290.56480.48170.05870.53030.6376-0.3966-1.4950-0.9625-0.05410.62730.45630.06980.81340.19240.6054-0.18240.00420.00480.66701.06900.5523-0.30310.10940.2885-0.37860.7125
\n", + "

5 rows × 877 columns

\n", + "
" + ], + "text/plain": [ + " id tratamento tempo dose ... c-96 c-97 c-98 c-99\n", + "0 id_000644bb2 com_droga 24 D1 ... -0.3981 0.2139 0.3801 0.4176\n", + "1 id_000779bfc com_droga 72 D1 ... 0.1522 0.1241 0.6077 0.7371\n", + "2 id_000a6266a com_droga 48 D1 ... -0.6417 -0.2187 -1.4080 0.6931\n", + "3 id_0015fd391 com_droga 48 D1 ... -1.6210 -0.8784 -0.3876 -0.8154\n", + "4 id_001626bd3 com_droga 72 D2 ... 0.1094 0.2885 -0.3786 0.7125\n", + "\n", + "[5 rows x 877 columns]" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 145 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "JunSnAxmyPmU" + }, + "source": [ + "

Para conseguir explorar essa base, precisamos saber quais são as colunas, os tipos de dados que possuem nas colunas, se tem valores nulos ou faltantes e algumas estatísticas" + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "ApfYDGeqyLeG", + "outputId": "b013108d-4e82-4916-ccbc-e794c16578da" + }, + "source": [ + "df_experimentos.columns" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "Index(['id', 'tratamento', 'tempo', 'dose', 'droga', 'g-0', 'g-1', 'g-2',\n", + " 'g-3', 'g-4',\n", + " ...\n", + " 'c-90', 'c-91', 'c-92', 'c-93', 'c-94', 'c-95', 'c-96', 'c-97', 'c-98',\n", + " 'c-99'],\n", + " dtype='object', length=877)" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 146 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "jHIO8a8cyVN3", + "outputId": "327e4d1a-62da-4ba5-8f0f-c6188bc39897" + }, + "source": [ + "#Verificando o tipo das colunas principais\n", + "df_experimentos[['id', 'tratamento', 'tempo', 'dose', 'droga','g-0', 'c-0']].info()" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "text": [ + "\n", + "RangeIndex: 23814 entries, 0 to 23813\n", + "Data columns (total 7 columns):\n", + " # Column Non-Null Count Dtype \n", + "--- ------ -------------- ----- \n", + " 0 id 23814 non-null object \n", + " 1 tratamento 23814 non-null object \n", + " 2 tempo 23814 non-null int64 \n", + " 3 dose 23814 non-null object \n", + " 4 droga 23814 non-null object \n", + " 5 g-0 23814 non-null float64\n", + " 6 c-0 23814 non-null float64\n", + "dtypes: float64(2), int64(1), object(4)\n", + "memory usage: 1.3+ MB\n" + ], + "name": "stdout" + } + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 333 + }, + "id": "PEevXUajObHT", + "outputId": "cbe73a06-8e6e-4cde-8995-39a99623d48d" + }, + "source": [ + "#Verificando a contagem, média, desvio padrão, valor minimo, percentis(25%, 50%, 75%) e valor máximo.\n", + "df_experimentos.describe()" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/html": [ + "

\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
tempog-0g-1g-2g-3g-4g-5g-6g-7g-8g-9g-10g-11g-12g-13g-14g-15g-16g-17g-18g-19g-20g-21g-22g-23g-24g-25g-26g-27g-28g-29g-30g-31g-32g-33g-34g-35g-36g-37g-38...c-60c-61c-62c-63c-64c-65c-66c-67c-68c-69c-70c-71c-72c-73c-74c-75c-76c-77c-78c-79c-80c-81c-82c-83c-84c-85c-86c-87c-88c-89c-90c-91c-92c-93c-94c-95c-96c-97c-98c-99
count
mean48.0201560.248366-0.0956840.1522530.0819710.057347-0.1388360.035961-0.202651-0.1900830.119905-0.1233210.1823070.1432030.209402-0.173884-0.0244320.126823-0.1466630.087687-0.082982-0.111908-0.0873790.047548-0.117474-0.113212-0.052746-0.0910550.112176-0.046458-0.076239-0.1976990.382177-0.1894320.078791-0.0933120.135729-0.188616-0.6067100.534425...-0.517397-0.360770-0.435752-0.613591-0.402083-0.619682-0.452265-0.497164-0.413836-0.277029-0.547845-0.358611-0.442906-0.475194-0.010404-0.467001-0.276963-0.455848-0.412918-0.456404-0.472514-0.505481-0.492735-0.446836-0.463029-0.409310-0.333124-0.295009-0.328342-0.401615-0.469244-0.461411-0.513256-0.500142-0.507093-0.353726-0.463485-0.378241-0.470252-0.301505
std19.4028071.3933990.8123631.0357310.9500121.0320911.1793880.8823951.1254941.7498851.0871801.2915011.2536041.2345901.2730681.2471780.6598391.4189971.1796880.7433010.8447961.2195290.8244010.9248380.7601591.2031860.8669771.1037651.0016871.0277581.2793991.3025671.5591740.9335141.1722701.1743251.0617191.3976772.2002772.003317...2.1223181.7107251.8988712.3078201.7850552.2255961.9910212.0638961.8870011.4596392.1878351.7306341.9247162.0219271.0298202.0043171.4293401.9242631.8887881.8328632.0113962.0913532.0556241.9874762.0140451.8839741.6472411.6340731.6631701.8327942.0004882.0424752.0017142.1071052.1595891.6292912.0597251.7036151.8348281.407918
min24.000000-5.513000-5.737000-9.104000-5.998000-6.369000-10.000000-10.000000-10.000000-10.000000-8.337000-10.000000-5.870000-8.587000-5.018000-10.000000-10.000000-10.000000-10.000000-4.226000-10.000000-10.000000-5.700000-10.000000-10.000000-10.000000-8.272000-8.184000-4.835000-7.913000-10.000000-10.000000-2.956000-8.356000-7.182000-10.000000-9.261000-10.000000-10.000000-10.000000...-10.000000-10.000000-10.000000-10.000000-10.000000-10.000000-10.000000-10.000000-10.000000-9.839000-10.000000-10.000000-10.000000-10.000000-6.452000-10.000000-9.938000-10.000000-10.000000-10.000000-10.000000-10.000000-10.000000-10.000000-10.000000-10.000000-10.000000-10.000000-10.000000-10.000000-10.000000-10.000000-10.000000-10.000000-10.000000-10.000000-10.000000-10.000000-10.000000-10.000000
25%24.000000-0.473075-0.562200-0.437750-0.429575-0.470925-0.602225-0.493900-0.525175-0.511675-0.360200-0.511475-0.489675-0.447500-0.481200-0.607975-0.404150-0.391950-0.513775-0.272200-0.488675-0.524600-0.538900-0.440375-0.508900-0.533900-0.497700-0.512875-0.467800-0.378300-0.505750-0.457975-0.328200-0.600500-0.478700-0.570525-0.481800-0.541950-0.604100-0.470250...-0.588075-0.564025-0.561000-0.583250-0.566500-0.603200-0.541575-0.560825-0.555200-0.534500-0.569100-0.558300-0.573350-0.594275-0.389925-0.551200-0.544150-0.575075-0.568275-0.582650-0.558575-0.562375-0.572800-0.561225-0.560675-0.560100-0.533700-0.504575-0.544275-0.569150-0.566175-0.565975-0.589975-0.568700-0.563775-0.567975-0.552575-0.561000-0.592600-0.562900
50%48.000000-0.008850-0.0466000.0752000.008050-0.026900-0.015650-0.000650-0.0179000.0100000.1604500.0385500.0138000.0602500.009800-0.0301000.0000000.149400-0.0022000.000000-0.027800-0.002800-0.069350-0.0118000.0000000.018300-0.0116500.0161000.0376000.0037500.0217500.0270000.019100-0.0540000.0233500.003350-0.0006000.015350-0.0007000.005800...-0.017650-0.041550-0.002950-0.012650-0.0056000.0076500.0049500.000000-0.023800-0.011450-0.007100-0.019500-0.019500-0.0093000.081550-0.0069000.018400-0.014650-0.0143500.005300-0.005300-0.0040500.003300-0.007900-0.004600-0.0024000.007850-0.005600-0.020600-0.030000-0.0099000.003250-0.009100-0.013750-0.003300-0.010250-0.001250-0.0068000.014000-0.019500
75%72.0000000.5257000.4030750.6639250.4634000.4653750.5104250.5287250.4119000.5492250.6977750.5254000.5752750.6044500.5758250.4579750.3824750.8295000.4947750.3278000.4006000.4924000.4148750.4334000.3292500.5277000.4616500.5084250.5864500.4312750.5076000.4580750.4710750.3919500.5513000.5037250.5648750.5170250.4605000.642300...0.4526750.4276750.4621750.4479750.4471500.4412500.4706000.4585500.4410000.4600750.4609500.4499750.4452000.4732000.5635750.4563500.4934000.4483750.4519750.4630750.4476750.4620000.4689000.4523750.4604750.4616750.4659500.4634000.4500750.4308750.4577500.4615000.4456750.4529000.4709000.4447500.4652250.4464000.4612750.438650
max72.00000010.0000005.0390008.25700010.00000010.0000007.2820007.3330005.4730008.8870006.43300010.00000010.00000010.00000010.00000010.0000004.1340006.4180004.7500008.8720004.0810009.8420005.2480005.9420005.20100010.0000008.49400010.00000010.00000010.0000009.41600010.00000010.0000006.79600010.00000010.00000010.0000005.8340005.60200010.000000...3.8880003.5960004.8570003.5490003.3820003.3280004.1570003.7360003.5820003.1190003.3230005.0140002.8980004.1850003.1700003.2760004.9920003.7700002.8510003.2110004.5340003.8900003.9940004.3210004.0200003.7380003.2520005.4060003.1100003.3200004.0690003.9600003.9270003.5960003.7470002.8140003.5050002.9240003.1110003.805000
\n", + "

8 rows × 873 columns

\n", + "
" + ], + "text/plain": [ + " tempo g-0 ... c-98 c-99\n", + "count 23814.000000 23814.000000 ... 23814.000000 23814.000000\n", + "mean 48.020156 0.248366 ... -0.470252 -0.301505\n", + "std 19.402807 1.393399 ... 1.834828 1.407918\n", + "min 24.000000 -5.513000 ... -10.000000 -10.000000\n", + "25% 24.000000 -0.473075 ... -0.592600 -0.562900\n", + "50% 48.000000 -0.008850 ... 0.014000 -0.019500\n", + "75% 72.000000 0.525700 ... 0.461275 0.438650\n", + "max 72.000000 10.000000 ... 3.111000 3.805000\n", + "\n", + "[8 rows x 873 columns]" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 148 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "iZT5fEn0QFkm" + }, + "source": [ + "

Vamos verificar se a base possui algum valor nulo. Para isso vamos utilizar o `.isnull()` depois somar todos os nulos com a `sum()` e ordenalos de forma decrescente para ver os maiores primeiro com `sort_values()`." + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "igFtbtOKOhUW", + "outputId": "05e3d2d5-c369-42bb-8edb-9b782f5a7743" + }, + "source": [ + "df_experimentos.isnull().sum().sort_values(ascending = False)" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "c-99 0\n", + "g-282 0\n", + "g-293 0\n", + "g-292 0\n", + "g-291 0\n", + " ..\n", + "g-575 0\n", + "g-574 0\n", + "g-573 0\n", + "g-572 0\n", + "id 0\n", + "Length: 877, dtype: int64" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 149 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "5P8czA9vSsHA" + }, + "source": [ + "Vamos que não possui nenhum valor nulo na base `Experimentos`" + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "8jrB_9DAyXRa", + "outputId": "5587760b-feec-4937-ee2a-caa696c246f0" + }, + "source": [ + "#verificando os possíveis valores e a quantidade da coluna tratamento\n", + "df_experimentos['tratamento'].value_counts()" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "com_droga 21948\n", + "com_controle 1866\n", + "Name: tratamento, dtype: int64" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 150 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "nxMIwdmnyaSb" + }, + "source": [ + "Vamos fazer o nosso primeiro Gráfico, vamos verificar a coluna `'Tratamento'`, será que essa coluna está balanceada?" + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 380 + }, + "id": "uoIAoxH7yZnD", + "outputId": "579df7bc-a76b-4d52-d85b-9b40ff01f1b1" + }, + "source": [ + "sns.countplot(x='tratamento', data = df_experimentos, palette=\"crest\")\n", + "plt.title('Tipos de Tratamentos', fontdict={'fontsize':18, 'fontweight': 'bold'},)\n", + "plt.xlabel('Tratamento') \n", + "plt.ylabel('Quantidade') \n", + "plt.show()" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "display_data", + "data": { + "image/png": "\n", + "text/plain": [ + "

" + ] + }, + "metadata": { + "tags": [] + } + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "B81l5aQzy4pS" + }, + "source": [ + "Conforme o gráfico vimos que essa coluna está desbalanceada, será que isso afetará na nossa maquina preditiva?\n", + "\n", + "vamos ver a porcentagem de cada variável dessa coluna, para isso utilizaremos o `value_counts` com o parametro `Normalize`" + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "l0Skr-uIz8my", + "outputId": "a8f5ac14-1ce2-48b2-9866-3b2adb24842e" + }, + "source": [ + "print(df_experimentos['tratamento'].value_counts(normalize = True)*100)" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "text": [ + "com_droga 92.164273\n", + "com_controle 7.835727\n", + "Name: tratamento, dtype: float64\n" + ], + "name": "stdout" + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "4KNOi9v_J5n_" + }, + "source": [ + "Temos uma frequencia total de **92%** de `com_droga` e **7%** de `com_controle`." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "RR2hDZ5P0CnU" + }, + "source": [ + "Agora vamos ver os gráficos das colunas: Dose e Tempo\n" + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 380 + }, + "id": "K7U1M1hi0VKu", + "outputId": "f71fa0ed-48d5-47f1-b8a9-04884508f76e" + }, + "source": [ + "sns.countplot(x = 'dose', data=df_experimentos, palette=\"crest\")\n", + "plt.title('Tipos de doses',fontdict={'fontsize':18, 'fontweight': 'bold'},)\n", + "plt.xlabel('Doses') \n", + "plt.ylabel('Quantidade') \n", + "plt.show()" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "display_data", + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "tags": [] + } + } + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 380 + }, + "id": "8q7Txm7C0O85", + "outputId": "f4723929-e07a-40b6-a955-89490f9093a3" + }, + "source": [ + "sns.countplot(x='tempo', data = df_experimentos, palette=\"crest\")\n", + "plt.title('Tipos utilizados no conjunto de dados', fontdict={'fontsize':18, 'fontweight': 'bold'},)\n", + "plt.xlabel('tempo') \n", + "plt.ylabel('Quantidade') \n", + "plt.show()" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "display_data", + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "tags": [] + } + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "49CpmZbj0OZT" + }, + "source": [ + "Vimos que diferente da coluna `Tratamento`, as colunas `Dose` e `Tempo` estão mais balanceados.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "iFa-6HwLK-rO" + }, + "source": [ + "Vamos ver o top 5 drogas mais utilizadas nesse dataset" + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "C0MSWtpxK2By", + "outputId": "bb8a3d09-1581-42aa-db34-1a5548f0b426" + }, + "source": [ + "ranking_drogas = df_experimentos['droga'].value_counts().index[0:5]\n", + "ranking_drogas" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "Index(['cacb2b860', '87d714366', '9f80f3f77', '8b87a7a83', '5628cb3ee'], dtype='object')" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 155 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 407 + }, + "id": "E452ViqdKrsG", + "outputId": "b037e287-6be3-495f-e899-2adaa9eb2ad6" + }, + "source": [ + "sns.color_palette(\"crest\", as_cmap=True)\n", + "plt.figure(figsize=(8, 6))\n", + "ax = sns.countplot(x = 'droga', data=df_experimentos.query('droga in @ranking_drogas'), color='#092A32')\n", + "ax.set_title('Drogas mais utilizadas',fontdict={'fontsize':18, 'fontweight': 'bold'})\n", + "plt.xlabel('Drogas') \n", + "plt.ylabel('Quantidade') \n", + "plt.show()" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "display_data", + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAfgAAAGGCAYAAACXAJPOAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nO3deVgVdf//8eeBAyJugHJIK7dcc0sy16y0qLQy99KwW2/9lbdLZpjibqkpqX3LLCssNcwlsTtpc+u2RSXK6CawDMkyRUMwFAQVOMzvDy/nFkEk5YCMr8d1eV0w53Nm3vNxmNeZz8yZsRmGYSAiIiKW4lbeBYiIiEjpU8CLiIhYkAJeRETEghTwIiIiFqSAFxERsSAFvIiIiAUp4KXCGzJkCE2bNi3wLzAwkIcffpiwsDAOHTpU3iVazrk+HzJkSLksf/369axYseKSNZV0Wlko7z6Ta4+9vAsQKS3e3t60adMGwzD466+/+OWXX9i7dy9r1qxh7ty5PPDAA+VdomW0aNECd3d3mjVrVubLPnnyJM8//zz+/v4MHTr0b9dUnrWLlCUFvFjG9ddfX+Co7uDBg0yYMIH//ve/TJo0iRtvvJHWrVuXX4EWEhoaWm7L/uqrr8jJySk0vaQ1lWftImVJQ/RiWTfeeCNLly6lRo0a5Obm8sorr5ivnRsunTJlCmvXrqVTp06MGzfOfP3QoUNMnjyZO+64g5YtW9KxY0dGjx5NfHx8oeUsX76c7t2706pVK/r168c333zDM888U+Rw7MaNG+nbty+tWrWiffv2BAcHEx0dXaBNXl4eK1asoE+fPrRr147AwED69evHunXrLrnO56/Xd999R+/evWndujUPPfQQ0dHR5Obm8txzz3HbbbcRGBjIiy++yIU3syxJjUUNN19J3d27d6dp06aFwvfC5QwZMoTx48cDkJycXOi1kgyBF9XuwlM85/87/xRPSfoGICIigrvvvpvWrVvTt29fdu3addF6XL1NyLVLAS+W5ufnZw7Nx8TEkJWVVeD1pKQk5s6dS506dbjuuusA+PXXX+nXrx8ffPABmZmZNG/eHKfTybZt2xg0aFCBne97773H/PnzSU5OpkqVKhiGwahRo0hMTCxUy4oVK5g4cSJ79uyhcePGVKlShe+++44RI0bw448/mu3CwsKYN28e+/bto2HDhjRs2JC9e/cyY8YMli5dWqL1Pnz4MGPGjMEwDHJzc0lMTGTcuHEsWrSI7du34+XlRVZWFm+//Tbr16//2zUWpTTqvpQWLVrg7+8PQKVKlejUqRMtWrS44vl26tSpwD8fHx8AbDYbnp6eQMn7Zv369cyZM4dDhw5Ro0YNPD09GT16NMnJyYWWW5bbhFx7FPBieTfffDMAubm5hS64i4uLY+7cuWzYsIHJkycDMGfOHI4fP46/vz+ffPIJ69evZ+vWrTRo0IDc3FxmzpxpHvUuW7YMgBtuuIHNmzfzwQcfMHPmTPbt21dgOYZhsHz5ctzd3XnwwQf54IMP+Oyzz/D19SUvL4+IiAizbWRkJADz5s3j/fffJzIykkWLFtGkSRN+/fXXEq1zdHQ0c+fOZePGjYwePRqAEydOsH37dj799FM++eQTM8Q2bdr0t2ssSmnUfSmhoaHcfvvtANSqVYsVK1aUypD7ihUrzH8TJ040Pwj+4x//wOFw/K2+eeONNwBo2LAhmzZtYu3atcydO7dQwJf1NiHXHp2DF8vz9vY2f77wCL5mzZo8+OCD5u/p6enmEXq/fv2oU6cOAD4+PgwaNIgXXniBAwcOkJSUhK+vL4cPHwagT58+1KhRA4BevXoxf/58/vrrL3O+NpuNL7/8ssCyvby8aNCgAenp6aSkpJjTK1euTHZ2NuvXr6dGjRoEBgZy//33c//995d4nR0OB/fccw8Ad911F6+++ioADz30kNkfbdu2Zfv27Rw5cuRv11iU0qi7vGVnZ/PMM8+Qm5vLzTffTEhICFDyvjl27Jj5IbJXr15UqVIFgJ49e/LCCy+Qmppqvr+stwm59ijgxfKOHz9u/ly9evUCr9WpUwc3t/8NZP3xxx/m0Xn9+vULtK1bt67586FDhwpc6HV+W5vNRr169QoEPEBCQgJLliwhLi6u0GvnnwcPCQlh6tSpxMTEEBMTg5ubGy1atKBXr1488sgjVKpU6ZLrfP3115s/n/vgARAQEFBo+pkzZ/52jUUpjbrL25w5c/jtt9+oXLkyixYtMofnoWR98+eff5rTbrzxxgJtbrjhhgIBX9J5gjX6VsqeAl4sLy4uDjh7JH9+SMPZI6OLsdvtF/3dzc2N/Px88/e8vLwCbS8Mw+TkZIYNG0ZGRgaenp60bdsWLy8v9uzZQ0ZGRoG2/fr1o1WrVkRGRvLNN9+QlJREfHw88fHx7Nixg7feeuuS6+zh4WH+bLPZzJ/d3d2LnP53ayxKadR9Yb/l5uZe8j2l5dNPP2XDhg0ATJ06lYYNG5qvlbRvzq//wm3i/O3l78wTSqdv5dqjgBdLS0lJYevWrcDZoerzj8iKcv5RV1JSUoHXzj/XWa9evQIfDs4/t+90Ojlw4ECB927dutXcab/11lt06tQJODtkXlR4NmnShClTpgBnv/e9evVqFi1axJdffskvv/xC06ZNi12Py/F3ayzK5dZ97v/lwrAsq5sUJScnM2PGDADuv/9+BgwYUOD1kvbNuQsAofA28dtvv13WPM8pj21CKjZdZCeWdeTIEcaMGUN2djZeXl7mxWbF8fPzo2PHjgBs2LChwLnV1atXA9CsWTPq16+Pw+Ewd+gbN24kMzMTgPfff5/09PQC8z3/NICvry8AX3zxhXm1/bkd+q+//sqjjz5K9+7dzeHcqlWrct9995nvv9RQ+eUqaY1FudK6z13r8N1335lD1R999FGhIW343yhEeno6p06dKtG6FScvL4+QkBAyMzOpU6cOs2fPLtSmpH0TEBBgnq7ZuHGjOX316tWF+q8ibBNSsekIXiwjOTnZvLNZVlYWP/30E3l5eXh5ebFgwQIaNWpUovnMmDGDwYMHk5qaSo8ePWjUqBG//vorJ0+epEqVKmYA2Gw2hg0bxosvvsgff/xBUFAQderU4cCBA9SvX5/ff//dnGeHDh1wd3fH6XQybNgwateuTVJSEiNGjGDZsmXs3buXAQMGsGTJEnJyckhOTua+++6jWbNmGIbB3r17AQgMDHTZkVpJa3z99dcLvbdBgwZXVHfv3r3ZuXMnmZmZ9OrViwYNGpCYmEj79u359ttvC7Q9F6DZ2dk88MADtG3blkWLFl32em/ZsoUffvgBOPvh4amnnipU29/pmyFDhjB79mxzmwgICODw4cM0bty4wLcrKsI2IRWbjuDFMrKzs4mOjiY6Opqff/6Z2rVr8+ijjxIVFcW9995b4vncdNNNREZG0qdPH7y9vdmzZw+VK1emV69ebNiwocDd8P75z38yduxYatWqRXZ2Np6enoSHhxcYqoWzN1IJCwujfv365pX8b775JuPGjeP222/Hw8ODv/76C7vdzsqVKxk2bBi1atXip59+4pdffuGGG27gySefJDw8vNC589JS0hrPP/I8x83N7Yrq7tWrF5MmTcLf35+MjAzy8vIIDw8vdKEawKBBg+jWrRteXl6kp6dfcX+cPn3a/PngwYPmNnTu38GDB/9W3zz22GOMGzcOf39/Tp06hZeXF2+++WaBc/pQMbYJqdhshsZ2REpdjx492L9/P0FBQSxZsqS8yxGRa5CO4EWuQFhYGPfddx99+/Y1vza3e/du84KqwMDA8ixPRK5hOoIXuQLbtm1j7Nix5Ofn43A4qFOnDj/99BM5OTnUq1ePDz74gKpVq5Z3mSJyDVLAi1yhHTt2EB4eTmJiIpmZmTgcDu666y7GjBmDn59feZcnItcoBbyIiIgF6Ry8iIiIBVnqe/CpqZnlXYKIiEiZ8fevdtHXdAQvIiJiQQp4ERERC1LAi4iIWJACXkRExIIU8CIiIhakgBcREbEgBbyIiIgFKeBFREQsSAEvIiJiQQp4ERERC1LAi4iIWJACXkRExIIU8CIiIhakgBcREbEgSz0uVuRa0ahrl/Iu4aqW9PXO8i5BpNzpCF5ERMSCFPAiIiIWpIAXERGxIAW8iIiIBSngRURELEgBLyIiYkEKeBEREQtSwIuIiFiQAl5ERMSCFPAiIiIWpIAXERGxIAW8iIiIBSngRURELEgBLyIiYkEKeBEREQty6fPgExMTGTVqFEOHDiU4OJinnnqK9PR0AI4fP84tt9zCk08+yUMPPUTLli0B8PX1ZfHixWRmZhISEkJmZibe3t4sWrQIHx8fV5YrIiJiGS4L+OzsbGbPnk2nTp3MaYsXLzZ/njx5MgMGDACgQYMGREREFHj/ypUrad++PSNGjGDdunWEh4fz7LPPuqpcERERS3HZEL2npyfh4eE4HI5Cr+3fv5/MzExat2590fdHR0cTFBQEQLdu3YiOjnZVqSIiIpbjsiN4u92O3V707N99912Cg4PN39PS0njqqac4evQogwcPplevXqSlpeHn5wdAzZo1OXr06CWX6evrjd3uXjorICIVlr9/tfIuQaTcufQcfFFycnL4/vvvmTVrFgA+Pj6MGzeOXr16kZmZyYABA+jYsWOB9xiGUaJ5p6dnl3a5IlIBpaZmlncJImWiuA+zZX4V/XfffVdgaL5q1ar069cPDw8P/Pz8aNmyJfv378fhcJCamgpASkpKkUP9IiIiUrQyD/j4+HiaNWtm/v7NN98wb9484OyFeXv37qVBgwZ06dKFTZs2AbBlyxa6du1a1qWKiIhUWC4bok9ISCAsLIzk5GTsdjubN2/m1VdfJTU1lbp165rt2rVrx4cffsgjjzyC0+nkiSeeICAggCFDhvDss88yePBgqlevzoIFC1xVqoiIiOXYjJKe4K4AdN5NrhWNunYp7xKuaklf7yzvEkTKxFV1Dl5ERERcTwEvIiJiQQp4ERERC1LAi4iIWJACXkRExIIU8CIiIhakgBcREbEgBbyIiIgFKeBFREQsSAEvIiJiQQp4ERERC1LAi4iIWJACXkRExIIU8CIiIhakgBcREbEgBbyIiIgFKeBFREQsSAEvIiJiQQp4ERERC1LAi4iIWJACXkRExIIU8CIiIhakgBcREbEgBbyIiIgFKeBFREQsSAEvIiJiQQp4ERERC1LAi4iIWJACXkRExIIU8CIiIhakgBcREbEgBbyIiIgFKeBFREQsyKUBn5iYyD333MOqVasACA0N5aGHHmLIkCEMGTKEL774AoCoqCj69evHgAEDWL9+PQC5ubmEhIQwaNAggoODOXjwoCtLFRERsRS7q2acnZ3N7Nmz6dSpU4HpzzzzDN26dSvQ7rXXXiMyMhIPDw/69+9PUFAQ27dvp3r16ixatIgdO3awaNEiXn75ZVeVKyIiYikuO4L39PQkPDwch8NRbLu4uDhatWpFtWrV8PLyIjAwkNjYWKKjowkKCgKgc+fOxMbGuqpUERERy3HZEbzdbsduLzz7VatWsXz5cmrWrMn06dNJS0vDz8/PfN3Pz4/U1NQC093c3LDZbOTk5ODp6XnRZfr6emO3u5f+yohIheLvX628SxApdy4L+KI8/PDD+Pj40Lx5c9566y2WLFlC27ZtC7QxDKPI915s+vnS07NLpU4RqdhSUzPLuwSRMlHch9kyvYq+U6dONG/eHIDu3buTmJiIw+EgLS3NbHP06FEcDgcOh4PU1FTg7AV3hmEUe/QuIiIi/1OmAT927FjzaviYmBgaN25MmzZtiI+PJyMjg6ysLGJjY2nXrh1dunRh06ZNAGzfvp0OHTqUZakiIiIVmsuG6BMSEggLCyM5ORm73c7mzZsJDg7m6aefpnLlynh7ezNv3jy8vLwICQlh+PDh2Gw2Ro8eTbVq1ejZsye7du1i0KBBeHp6Mn/+fFeVKiIiYjk2oyQntysInXeTa0Wjrl3Ku4SrWtLXO8u7BJEycdWcgxcREZGyoYAXERGxIAW8iIiIBSngRURELEgBLyIiYkEKeBEREQtSwIuIiFiQAl5ERMSCFPAiIiIWpIAXERGxIAW8iIiIBSngRURELEgBLyIiYkEKeBEREQtSwIuIiFiQAl5ERMSCFPAiIiIWpIAXERGxIAW8iIiIBSngRURELEgBLyIiYkEKeBEREQtSwIuIiFiQAl5ERMSCFPAiIiIWpIAXERGxIAW8iIiIBSngRURELEgBLyIiYkEKeBEREQtSwIuIiFiQAl5ERMSC7K6ceWJiIqNGjWLo0KEEBwdz5MgRJk+eTF5eHna7nQULFuDv70+LFi0IDAw037dixQry8/MJDQ3l8OHDuLu7M2/ePG688UZXlisiImIZLjuCz87OZvbs2XTq1Mmc9vLLLzNw4EBWrVpFUFAQy5cvB6Bq1apERESY/9zd3fn444+pXr06a9asYeTIkSxatMhVpYqIiFiOywLe09OT8PBwHA6HOW3mzJncd999APj6+nL8+PGLvj86OpqgoCAAOnfuTGxsrKtKFRERsRyXBbzdbsfLy6vANG9vb9zd3XE6naxevZqHHnoIgJycHEJCQnj00UfNo/q0tDT8/PzOFunmhs1mIycnx1XlioiIWIpLz8EXxel0MnHiRDp27GgO30+cOJFevXphs9kIDg6mXbt2hd5nGMYl5+3r643d7l7qNYtIxeLvX628SxApd2Ue8JMnT6ZevXqMGTPGnDZo0CDz544dO5KYmIjD4SA1NZVmzZqRm5uLYRh4enoWO+/09GyX1S0iFUdqamZ5lyBSJor7MFumX5OLiorCw8ODp556ypy2f/9+QkJCMAyDvLw8YmNjady4MV26dGHTpk0AbN++nQ4dOpRlqSIiIhWay47gExISCAsLIzk5GbvdzubNmzl27BiVKlViyJAhANx0003MmjWL6667jv79++Pm5kb37t1p3bo1LVq0YNeuXQwaNAhPT0/mz5/vqlJFREQsx2aU5OR2BaFhOblWNOrapbxLuKolfb2zvEsQKRNXzRC9iIiIlA0FvIiIiAUp4EVERCxIAS8iImJBCngRERELUsCLiIhYkAJeRETEghTwIiIiFqSAFxERsSAFvIiIiAUp4EVERCxIAS8iImJBCngRERELUsCLiIhYUIkD/osvvmDVqlUA/PHHH1joKbMiIiKWU6KAX7BgAZGRkXzwwQcAfPTRR8yZM8elhYmIiMjlK1HAf/fddyxZsoQqVaoAMHr0aPbs2ePSwkREROTylSjgK1WqBIDNZgPA6XTidDpdV5WIiIhcEXtJGgUGBjJ58mSOHj3K8uXL2bJlC+3bt3d1bSIiInKZShTw48ePZ9OmTXh5efHnn38ybNgw7r33XlfXJiIiIpep2IA/fPiw+XPr1q1p3bp1gdfq1KnjuspERETkshUb8IMGDcJms2EYBkePHqVatWrk5eVx6tQpbrzxRrZs2VJWdYqIiMjfUGzAf/nllwDMnTuXPn36cPPNNwMQFxfHRx995PrqRERE5LKU6Cr6n376yQx3gDZt2pCUlOSyokREROTKlOgiOzc3NxYtWsStt96KzWbjhx9+4MyZM66uTURERC5TiY7gX375Zdzc3Fi7di1r1qwhNzeXV155xdW1iYiIyGUq0RF8zZo1GT9+fIFpYWFhTJo0ySVFiYiIyJUpUcDv3LmTl156iePHjwOQk5ODj4+PAl5EROQqVeIh+unTp1OzZk3eeOMN+vfvT2hoqKtrExERkctUooCvWrUqt9xyCx4eHjRu3Jhx48axfPlyV9cmIiIil6lEQ/R5eXns3r2b6tWr8+9//5ubbrqJQ4cOubo2ERERuUwlCvjnnnuOtLQ0Jk6cyOzZs0lLS2PkyJGurk1EREQuk80wDKO8iygtqamZ5V2CSJlo1LVLeZdwVUv6emd5lyBSJvz9q130tWKP4Lt3724+A74on3/+ebELTkxMZNSoUQwdOpTg4GCOHDnCxIkTcTqd+Pv7s2DBAjw9PYmKimLlypW4ubkxcOBABgwYQG5uLqGhoRw+fBh3d3fmzZvHjTfeeIlVFREREbhEwK9YsQKAdevW4e/vT8eOHXE6nezcuZPs7OxiZ5ydnc3s2bPp1KmTOW3x4sUMHjyYHj168NJLLxEZGUnv3r157bXXiIyMxMPDg/79+xMUFMT27dupXr06ixYtYseOHSxatIiXX375ytdYRETkGlDsVfR169albt26/PTTTwwdOpRmzZrRokULnnjiCX7++ediZ+zp6Ul4eDgOh8OcFhMTw9133w1At27diI6OJi4ujlatWlGtWjW8vLwIDAwkNjaW6OhogoKCAOjcuTOxsbFXuq4iIiLXjBJdZHfs2DF27NhBYGAgbm5u/PDDDwWeFV/kjO127PaCsz916hSenp7A2bvjpaamkpaWhp+fn9nGz8+v0HQ3NzdsNhs5OTnm+4vi6+uN3e5eklUSEQsr7rykyLWiRAE/a9YsXnzxRRITEzEMg8aNGzN9+vQrWvDFru37u9PPl55e/GkDEbk26IJbuVZc9kV25wQGBrJ27dorLsTb25vTp0/j5eVFSkoKDocDh8NBWlqa2ebo0aPccsstOBwOUlNTadasGbm5uRiGUezRu4iIiPxPsQE/Z84cpk2bxuDBg4u8mv699977Wwvr3Lkzmzdv5uGHH2bLli107dqVNm3aMG3aNDIyMnB3dyc2NpYpU6Zw8uRJNm3aRNeuXdm+fTsdOnT4e2smIiJyDSs24Pv37w/A008//bdnnJCQQFhYGMnJydjtdjZv3szChQsJDQ1l3bp11KlTh969e+Ph4UFISAjDhw/HZrMxevRoqlWrRs+ePdm1axeDBg3C09OT+fPnX94aioiIXINKdKOb0NDQQgE7fPhw3n77bZcVdjl03k2uFbrRTfF0oxu5Vlz2OfioqCjWrl3Lvn37eOyxx8zpubm5HDt2rPQqFBERkVJVbMD36tWLDh06MGHCBMaOHWtOd3Nzo1GjRi4vTkRERC7PJa+iDwgIICIigszMTI4fP25Oz8zMxMfHx6XFiYiIyOUp0dfk5syZw4YNG/Dz8zO/j26z2S55L3oREREpHyUK+JiYGL755hsqVark6npERESkFBR7L/pz6tWrp3AXERGpQEp0BH/dddfx2GOPceutt+Lu/r97vY8bN85lhYmIiMjlK1HA+/j4FHjsq4iIiFzdShTwY8aMKTQtLCys1IsRERGR0lGigN+5cycvvfSS+TW5nJwcfHx8mDRpkkuLExERkctToovsXn75ZaZPn07NmjV544036N+/P6Ghoa6uTURERC5TiQK+atWq3HLLLXh4eNC4cWPGjRvH8uXLXV2biIiIXKYSDdHn5eWxe/duqlevzr///W9uuukmDh065OraRERE5DKVKOCfe+450tLSmDhxIrNnzyYtLY2RI0e6ujYRERG5TCV6XGxFocfFyrVCj4stnh4XK9eKy35c7Dl33nknNput0PQvvvjisosSERER1ylRwK9evdr8OTc3l+joaM6cOeOyokREROTKlCjgr7/++gK/169fn+HDhzN06FBX1CQiIiJXqEQBHx0dXeD3P//8kz/++MMlBYmIiMiVK1HAv/766+Y5eJvNRtWqVXnuuedcWpiIiIhcvkve6CYmJgbDMPjxxx/55ZdfMAyDQYMG0blzZwCysrJcXqSIiIj8PcUewW/atIklS5YQEhJCmzZtAIiPj2fhwoXk5OTQvXt3xowZo7vaiYiIXGWKDfh33nmH8PBwateubU678847ad68OePHj8fX15e0tDSXFykiIiJ/T7EBb7PZCoT7OQ6Hg1OnTvHss88yf/58lxUnIiIil6fYgD916tRFX8vKymLr1q1F3gBHREREylexF9m1bduWiIiIQtOXLVtG06ZNFe4iIiJXqWLvRX/y5EmeeOIJnE4nrVq1wjAMfvjhB9zd3Vm2bBk1atQoy1ovSfeil2uF7kVfPN2LXq4Vl30v+qpVq7J69Wp27drFTz/9RKVKlQgKCqJjx46lXqSIiIiUnhLd6KZz587m995FRETk6nfJG92IiIhIxaOAFxERsSAFvIiIiAUp4EVERCyoRBfZlZb169cTFRVl/p6QkEDLli3Jzs7G29sbgEmTJtGyZUuWLVvGpk2bsNlsjBkzhjvvvLMsSxUREanQyjTgBwwYwIABAwD49ttv+eyzz0hKSmLevHk0adLEbHfw4EE+/fRT1q5dy8mTJxk8eDC333477u7uZVmuiIhIhVVuQ/SvvfYao0aNKvK1mJgYunbtiqenJ35+flx//fUkJSWVcYUiIiIVV5kewZ/z448/Urt2bfz9/QFYvHgx6enp3HTTTUyZMoW0tDT8/PzM9n5+fqSmptK0adNi5+vr643drqN8kWtdcXf3ErlWlEvAR0ZG0qdPHwAef/xxmjZtSt26dZk5cybvvfdeofbF3E23gPT07FKtU0QqJt22Wq4VxX2YLZch+piYGNq2bQtAUFAQdevWBaB79+4kJibicDgKPGc+JSUFh8NRHqWKiIhUSGUe8CkpKVSpUgVPT08Mw2Do0KFkZGQAZ4O/cePGdOzYkS+++IKcnBxSUlI4evQojRo1KutSRUREKqwyH6JPTU01z6/bbDYGDhzI0KFDqVy5MgEBAYwdO5bKlSszcOBAgoODsdlszJo1Czc3fWVfRESkpIp9XGxFo/NuVwc9yvTiSusxpurj4ulxsXKtuOrOwYuIiIhrKeBFREQsSAEvIiJiQQp4ERERC1LAi4iIWJACXkRExIIU8CIiIhakgBcREbEgBbyIiIgFKeBFREQsSAEvIiJiQQp4ERERC1LAi4iIWJACXkRExIIU8CIiIhakgBcREbEgBbyIiIgFKeBFREQsSAEvIiJiQQp4ERERC1LAi4iIWJACXkRExIIU8CIiIhakgBcREbEgBbyIiIgFKeBFREQsSAEvIiJiQQp4ERERC1LAi4iIWJACXkRExIIU8CIiIhakgBcREbEge1kuLCYmhnHjxtG4cWMAmjRpwogRI5g4cSJOpxN/f38WLFiAp6cnUVFRrFy5Ejc3NwYOHMiAAQPKslQREZEKreNkX5AAABnESURBVEwDHqB9+/YsXrzY/H3y5MkMHjyYHj168NJLLxEZGUnv3r157bXXiIyMxMPDg/79+xMUFISPj09ZlysiIlIhlfsQfUxMDHfffTcA3bp1Izo6mri4OFq1akW1atXw8vIiMDCQ2NjYcq5URESk4ijzI/ikpCRGjhzJiRMnGDNmDKdOncLT0xOAmjVrkpqaSlpaGn5+fuZ7/Pz8SE1NLetSRUREKqwyDfj69eszZswYevTowcGDB3n88cdxOp3m64ZhFPm+i02/kK+vN3a7e6nUKuIK/v7VyruEa4L6WaSMAz4gIICePXsCULduXWrVqkV8fDynT5/Gy8uLlJQUHA4HDoeDtLQ0831Hjx7llltuueT809OzXVa7SGlITc0s7xKuCepnuVYU92G2TM/BR0VF8fbbbwOQmprKsWPH6Nu3L5s3bwZgy5YtdO3alTZt2hAfH09GRgZZWVnExsbSrl27sixVRESkQivTI/ju3bszYcIEPv/8c3Jzc5k1axbNmzdn0qRJrFu3jjp16tC7d288PDwICQlh+PDh2Gw2Ro8eTbVqGnITEREpKZtR0hPcFYCG5a4Ojbp2Ke8SrlpJX+8slfmoj4tXWv0scrW7aoboRUREpGwo4EVERCxIAS8iImJBCngRERELUsCLiIhYkAJeRETEghTwIiIiFqSAFxERsSAFvIiIiAUp4EVERCxIAS8iImJBCngRERELUsCLiIhYkAJeRETEghTwIiIiFqSAFxERsSAFvIiIiAUp4EVERCxIAS8iImJBCngRERELUsCLiIhYkAJeRETEghTwIiIiFqSAFxERsSAFvIiIiAUp4EVERCxIAS8iImJBCngRERELUsCLiIhYkAJeRETEghTwIiIiFqSAFxERsSAFvIiIiAXZy3qBL774It9//z15eXk8+eST/Oc//2HPnj34+PgAMHz4cO666y6ioqJYuXIlbm5uDBw4kAEDBpR1qSIiIhVWmQb8N998w759+1i3bh3p6en06dOHjh078swzz9CtWzezXXZ2Nq+99hqRkZF4eHjQv39/goKCzA8BIiIiUrwyDfjbbruN1q1bA1C9enVOnTqF0+ks1C4uLo5WrVpRrVo1AAIDA4mNjaV79+5lWa6IiEiFVaYB7+7ujre3NwCRkZHccccduLu7s2rVKpYvX07NmjWZPn06aWlp+Pn5me/z8/MjNTX1kvP39fXGbnd3Wf0iV8rfv1p5l3BNUD+LlMM5eIBt27YRGRnJO++8Q0JCAj4+PjRv3py33nqLJUuW0LZt2wLtDcMo0XzT07NdUa5IqUlNzSzvEq4J6ueKo1HXLuVdwlUr6eudl2xT3IfZMr+K/uuvv+aNN94gPDycatWq0alTJ5o3bw5A9+7dSUxMxOFwkJaWZr7n6NGjOByOsi5VRESkwirTgM/MzOTFF1/kzTffNC+YGzt2LAcPHgQgJiaGxo0b06ZNG+Lj48nIyCArK4vY2FjatWtXlqWKiIhUaGU6RP/pp5+Snp7O008/bU7r27cvTz/9NJUrV8bb25t58+bh5eVFSEgIw4cPx2azMXr0aPOCOxEREbk0m1HSE9wVgM67XR10Tu3iSnJOrSTUx8UrrX4W19O2fHEV7hy8iIiIuJ4CXkRExILK5Wty5UnDQRenYU2R/9G+onjaX1z9dAQvIiJiQQp4ERERC1LAi4iIWJACXkRExIIU8CIiIhakgBcREbEgBbyIiIgFKeBFREQsSAEvIiJiQQp4ERERC1LAi4iIWJACXkRExIIU8CIiIhakgBcREbEgBbyIiIgFKeBFREQsSAEvIiJiQQp4ERERC1LAi4iIWJACXkRExIIU8CIiIhakgBcREbEgBbyIiIgFKeBFREQsSAEvIiJiQQp4ERERC1LAi4iIWJACXkRExIIU8CIiIhZkL+8CivPCCy8QFxeHzWZjypQptG7durxLEhERqRCu2oD/9ttvOXDgAOvWrePXX39lypQprFu3rrzLEhERqRCu2iH66Oho7rnnHgBuuukmTpw4wcmTJ8u5KhERkYrhqg34tLQ0fH19zd/9/PxITU0tx4pEREQqjqt2iP5ChmFcso2/f7VLtjmx98fSKEeKoT52PfWx66mPy4b62XWu2iN4h8NBWlqa+fvRo0fx9/cvx4pEREQqjqs24Lt06cLmzZsB2LNnDw6Hg6pVq5ZzVSIiIhXDVTtEHxgYSIsWLXj00Uex2WzMnDmzvEsSERGpMGxGSU5ui4iISIVy1Q7Ri4iIyOVTwIuIiFjQVXsO/mqQlZXFpEmTOHHiBLm5uYwePZqZM2fy0UcfUaVKlWLbde7cmaFDh5ptjh49Sp8+fRg5cmSh5TidzhK3Bdi8eTPvvPMOHh4eBAQEMG/ePJxOJ6GhoRw7dowzZ84watQounXrVmp94SpF9d1bb71lvl5UX2RlZfHQQw+xdevWi/ZbYmIio0aNYujQoQQHBxdY5tdff82IESP45ZdfAFiyZAlff/01hmFw1113MWrUKADefvttoqKisNvtzJw5k9atW5OZmcn48eM5ceIEAQEBvPTSS3h6erqwh8pf9+7dC23zAC+++CLff/89eXl5PPnkk9x7770MGTKE6dOn06RJE7Od0+lkxowZ/P777+Tm5jJ48GB69+5Nbm4uoaGhHDhwgCpVqrB48WJq1KhBVFQUK1euxM3NjYEDBzJgwICyXuXLEhMTw7hx42jcuDEATZo0ITQ0tMh1/PTTT3nnnXdwc3OjU6dOjB8/npSUFKZMmUJOTg75+flMnjyZli1bFrmsDh06EBMTU2DasWPHmDRpEmfOnCE3N5fJkyfTpk0bl693ecrPz2fmzJns27cPDw8PZs2aRVpaGtOnT2f8+PG4ubkV2ld6enoWeRv0mJgY832enp689dZbeHh44Ofnx4IFC6hUqRIjR44kOzubhx56iKioKLOOhIQEdu/e/bf242XCkIuKiIgwFi5caBiGYfz555/GfffdZ3Tr1s04efLkJdtdaPjw4cbhw4dLtNxLtb399tuNjIwMwzAMY9q0acbHH39sfPLJJ8Zbb71lGIZhHDp0yLj33ntLtKzydqm+K6ovTp48aXTr1q3QvM61zcrKMoKDg41p06YZERERBdqcPn3aCA4ONrp06WIYhmEcPHjQGDt2rGEYhpGXl2cEBQUZf/75p5GYmGj06dPHyM3NNRISEoxXXnnFMAzDCAsLM5YvX24YhmG8+uqrRlxc3JV3wlWuqG0+OjraGDFihGEYhvHXX38Zd955p2EYhhEcHGz88ssvBdr+5z//McaPH28YhmGcOnXK6NKli+F0Oo1Vq1YZs2fPNgzDMNauXWts27bNyMrKMu69914jIyPDOHXqlPHAAw8Y6enpLl7D0vHNN9+Y29I5Ra1jdna20a1bNyMzM9PIz883+vfvb+zbt8+YP3++sWbNGsMwDOP77783/vnPf150We3bty807Z133jGioqIMwzCMmJgYY9iwYaW1aletLVu2GOPGjTMMwzAOHDhgPPHEE8arr75q/o0Wta+MiYkxnnjiCcMwDCMpKckYOHCgYRhGgfc9/vjj5vtCQ0PNfm3Xrl2hGmJiYoxZs2YVmv539vmuoiP4Yvj6+ppHeRkZGfj6+pKSksKbb77J7t27cXd357XXXiuy3fl27dpF/fr1qV27NidPniQkJITs7GxOnz7N9OnTCzxEpyRtfXx8yMjIoFq1aubyOnfubM7jyJEjBAQElEEPXbni+u7Cvhg7dixnzpzh1ltvLTSf89vm5eURHh5OeHh4oXZvvPEGgwcPZsGCBQDccMMNLF68GIATJ05gs9moWrUqGzdupEePHtjtdlq0aEGLFi0A2L59O6tWrQJgzJgxpdsZLnTuaDk5OZlKlSrxwgsv8Pzzzxfatnbu3MlLL72Eu7s7PXv2NI9ILtzmb7vtNnO7rV69OqdOncLpdAIQGRnJzz//zKlTp3jllVfw9fUlIyOD/Px8srOzqVKlCm5ubmzfvp2nnnoKgEceeQQ4e4vqVq1aUa3a2ZtWBQYGEhsbS/fu3cu4x0pHUesIEBUVZX7t18fHh+PHj+Pr68vx48eBgn8LH374IREREbi5uTFs2DB69uwJwJw5c0hISKBmzZq8/PLLDBs2zJz/+fuA3bt389JLL2G326lduzazZ8/G09OT//u//2P37t04nU6Cg4N58MEHXd8hpez33383t8O6devyxRdfsHfvXipVqoTD4ShyX1nUbdD37t3LBx98gN1ux+FwsHLlSgDy8vJITU0lICCA+fPnk52dzYgRI1i2bJlZw2uvvcbChQsL1HX+/qg86Rx8MR544AEOHz5MUFAQwcHBTJo0CYCmTZuyevVqWrZsycaNGy/a7px3332Xxx9/HIDU1FQGDBhAREQEzzzzTKEQKknbadOm0adPH+6++27y8/MLhPujjz7KhAkTmDJlisv6pTQV13fn98XGjRtp3Lgxq1evpnnz5oXmc35bu92Ol5dXoTa//fYbe/fupUePHoVemzNnDg8++CCjRo2iSpUqJCcnc+TIEYYPH84//vEP9u7dC5y9hfKaNWsYPHgwM2bMICcnp1T6wdU+/PBDatWqxdq1axk4cCDbtm0rtG0ZhsFzzz1HeHg4a9asITo6mtOnTwOFt3l3d3e8vb2Bs4F+xx134O7uDkCtWrWIiIigd+/eREREcMstt1CnTh3uvvtu7rvvPiZMmABAcnIyX331FUOGDGH8+PEcP36ctLQ0/Pz8zLor2i2qk5KSGDlyJIMGDWLnzp1FriNghvsvv/xCcnIybdq0YejQoXz66afcf//9TJs2jXHjxnHy5Elef/113nvvPd5++20++ugjAI4fP86DDz7I2rVrcXd35+uvvwbO7jP69evH0qVLefrpp4Gz2/brr7/Ou+++S82aNdm0aRO7d+8mOTmZ9957j3fffZelS5ea/9cVSZMmTdixYwdOp5P9+/dTuXJl7rrrLh5//HF69uxZ5L6yqNugV6pUiT59+pjvA/jggw+45557qFu3Lu3btyc0NJSqVasWCPcff/yR2rVrF7oJ2/n7o/KkgC/Gxo0bqVOnDlu3bmXlypU8//zzwNnzXwCtWrXit99+u2g7gJSUFLKzs6lbty5wdue3efNmBg0axMKFC80/+JK2zc/PZ86cOURGRrJt2zbc3Nz4/PPPzXmsXbuWpUuX8uyzz5bo9r7l7WJ9d2Ff/Prrr7Rt2xaA9u3bF5jHhW0vZt68eUyePLnI16ZNm8Znn33G22+/zcGDBzEMA6fTybJlyxg7dixTp04F4MyZM3Tp0oXVq1eTn5/P+vXrr2j9y8qePXsIDAwEzn6o6tu3b6Ft66+//qJSpUr4+fnh7u7Om2++aX5QunCbP2fbtm1ERkYyY8YMc9q5tq1bt+a3335j9+7dHDlyhK1bt/Lxxx+zcOFCcnJyMAyDBg0aEBERQePGjXnzzTcL1V0RtuFz6tevz5gxY1i6dClhYWFMnTqVnJyci67j77//zoQJE1i0aBEeHh4sW7aMHj16sGnTJmbPnk1YWBj79++nYcOGeHl5Ub16dZYuXQpApUqVuOWWW4CC/yf+/v5s2LCByZMnM3nyZNLS0jhw4ABjx45lyJAhxMTEkJKSQmxsLHFxcQwZMoThw4eTn59foT5InXPnnXfSqlUrHnvsMVauXEnDhg3NbeZS+8pzLraN9e3bl23btnHixAnzg9WFIiMj6dOnT4FpJd0flQUFfDFiY2O5/fbbAWjWrBlHjx7F6XRis9nMNjab7aLtAL788ks6duxotl+5ciUBAQGsWbOGWbNmFVheSdr+9ddfwNnhKJvNRqdOnUhISCAhIYEjR44A0Lx5c5xOp9n2anaxvruwLwzDwM3t7Oaan59fYB4Xti1KSkoK+/fvZ8KECQwcOJCjR48SHBzMkSNHiI+PB6BGjRoEBgYSHx9PrVq1uO2227DZbLRr147k5GQAateubX7Q6NKlC/v27SudjnAxd3f3Av1W1Lbl5uZWqG/PuXCbh7MXK77xxhuEh4ebQ+pFtY2NjaVTp07Y7XYCAgLw8fEhJSXF7GOA22+/naSkpCJvUe1wOK68A8pAQEAAPXv2xGazUbduXWrVqkV+fn6hdQT4888/GT16NPPnzzdHpGJjY+natStwdttKSEi46P/J+X187vdvv/2WEydOAGeDb8+ePXh4eOBwOIiIiCAiIoINGzbw//7f/8PT05P+/fub0z/77DNuvPFGl/WNK40fP561a9fy3HPPkZGRQc2aNYGL7ysvdRv0M2fO8NVXXwFnRwPvvvtuvv/++yKXHRMTY+4PzinJ/qisKOCLUa9ePeLi4oCzw4lVqlTB3d2d3bt3AxAXF0fDhg0v2g4gPj6eZs2amfNMT083P9lt27aN3Nxc87WStPX19eXEiRPmxhsfH0+9evXYvXs377zzDnB2GDk7O7vQtQBXo4v13YV90aBBAxISEgAKXT18YduiBAQEsG3bNt5//33ef/99HA4Hq1at4q+//mLWrFnk5eXhdDrZs2cPDRo04I477mDHjh3A2dGDc+fSOnTowDfffANgtq0IWrVqZda9fft2li5dWuS25XQ6SUlJwTAMnnzySTIyMgAKbfOZmZm8+OKLvPnmm/j4+BRY1rm2//3vf82/jx9/PPtAkZMnT5KSkoK/vz933HGHObR8ri/btGlDfHw8GRkZZGVlERsbS7t27VzfQaUgKiqKt99+Gzg7VH7s2DH69etXaB0Bpk6dyqxZs8xrO6Dg38KPP/5IvXr1aNiwIb/99htZWVmcOXOGYcOGYRgGp0+fNv8e4uLiuOmmm9iyZQv//ve/gbND/7Vr16ZGjRoA5geLiIgI9u7dS+vWrdm+fTv5+fmcOXOG2bNnl0EPlb69e/eao3JfffUVN998s3kgcLF95aVug+7u7s706dNJSUkBzv5fFPV3npKSQpUqVQp9i6Yk+6OyoovsivHII48wZcoUgoODycvLY9asWUydOpV9+/axZs0aAMaOHWt+1eL8duekpqaanygBHn74YSZNmsSmTZt47LHH+Pjjj9mwYQP9+vUrUdsPP/yQGTNmMHLkSDw9Pbnhhht44IEHcDqdTJ06lcGDB3P69GlmzJhhbuhXs6L6GAr3W+/evRk9ejT/+Mc/Cl1kd2HbhIQEwsLCSE5Oxm63s3nzZl599dVCQQTQokUL7r33XgYNGmR+Te7cEdVXX31lXhh1bgj66aefZsKECSxevJhatWqZX6m72vXs2ZNdu3YRHByM3W5n+fLlzJw5s9B2OHPmTPOisB49elC9enWAQtv8J598Qnp6unmeFyAsLAw4+3WtESNGkJGRweLFi3E4HOzcuZNBgwaRn5/Ps88+i5eXF0OGDGHSpElERkbi7e1NWFgYXl5ehISEMHz4cGw2G6NHjy4wOnA16969OxMmTODzzz8nNzeXWbNm0b59+0LreO60xbmLOwGGDh3Kk08+ydSpU9m0aRNw9kOAt7c3Tz31lHkB3dChQ7HZbDgcDj766CPmzZtHzZo1uf3222nVqhWhoaFs3bqVnJwc829p7ty5TJ482Tyaf+SRR/D09KRDhw488sgjGIbB4MGDy7y/SkOTJk0wDIP+/ftTqVIlFi5cSGRkJHA2qIvaV3p4eBR7G3S73c7zzz/P6NGj8fT0pFatWowbN67QslNTUwtcL3L+9PP3R+VJt6oVERGxoKv/EE9ERET+NgW8iIiIBSngRURELEgBLyIiYkEKeBEREQvS1+RErnGHDh3i/vvvN2/YkZubS7t27Rg9ejSVK1cu5+pE5HLpCF5E8PPzM+9qtnLlSrKysggJCSnvskTkCugIXkQKqFSpElOmTOG+++7jvffeIyYmhhMnTjBs2DBatmzJ1KlTyc7OJicnhxEjRhAUFER6err55MP69etz+PBhRo4cSceOHZk5cyb79+8nJyeHNm3aMG3aNPMDREZGBnl5eXTr1o1//etf5b3qIpaigBeRQjw8PGjZsiVZWVn8/PPPfPLJJ3h6ejJjxgxuu+02RowYwbFjx+jVqxedOnVixYoVNG7cmMmTJ5OYmEjfvn2Bs4/gbdq0qXkr1Pvvv5/ExEQOHDhAXl6e+dCeiIgI8vPzK8TdF0UqCv01iUiRMjMzcXd35+abbzbvtx0XF0eXLl0AqFmzJgEBAeZjeM895a9JkybmvburV6/OkSNHeOSRRxgyZAipqamkp6cTGBhISkoK48aN48MPP2TAgAEKd5FSpr8oESnk1KlT/Pzzz9SoUQMPDw9z+oVPMTs37cKj73M/f/LJJ8THx/Pee+8RERFBvXr1gLMfDjZu3Mjjjz9OUlIS/fr1q5DPIxe5mingRaSA3Nxc5syZQ5cuXQodVbdp08Z8OlpKSgpHjx6lQYMGNGzYkB9++AE4++Sy/fv3A2cfPNOgQQPsdjsJCQn88ccf5OTksGPHDr744gtuvfVWJk6ciLe3N8eOHSvbFRWxOD1sRuQad/7X5JxOJxkZGXTp0oVnnnmGTz75hF27drFw4ULgbGBPnTrVfHzpv/71L7p168aff/7JU089hd1up1GjRiQmJjJ+/Hjq1q3LyJEjqVatGoGBgXh5ebFx40beeecdQkNDcTqduLu7ExgYyPjx48u5J0SsRQEvIlds//79HDx4kDvvvJPTp09zzz33EBkZyXXXXVfepYlcsxTwInLFUlNTmThxItnZ2eTl5fHwww/z+OOPl3dZItc0BbyIiIgF6SI7ERERC1LAi4iIWJACXkRExIIU8CIiIhakgBcREbEgBbyIiIgF/X/96srBv1RP4gAAAABJRU5ErkJggg==\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "tags": [] + } + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "hkROloRG076Q" + }, + "source": [ + "Visando deixar mais fácil, vou remover dos nomes das colunas o `-`, por exemplo: 'g-0', após a mudança ficará 'g0'" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "TSYoGAon1UXE" + }, + "source": [ + "df_experimentos.columns = df_experimentos.columns.str.replace(\"-\", \"\",)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "rGjBCv6PLsC-" + }, + "source": [ + "###Correlação" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "p7tyl_2tMmv_" + }, + "source": [ + "

\"O coeficiente de correlação de Pearson (r) varia entre -1 e +1, cujos valores próximos de -1 e +1 indicam forte correlação linear e próximos de 0 indicam ausência de correlação linear. Note que ele capta apenas relações lineares entre variáveis (quaisquer outras relações, tal coeficiente não é indicado: isso será exemplificado na sequência).\n", + "\n", + "Note que entre 0 e 1, existe uma grande gama de valores que o coeficiente pode assumir. Para tal, diferentes autores buscaram dar “nomes” aos diferentes valores que o coeficiente de correlação pode assumir, para poder dizer se um dado valor de correlação pode ser dito como fraco/moderada/forte.\"\n", + "\n", + "Fonte: https://gpestatistica.netlify.app/blog/correlacao/" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "tlHVpFA0LZ9I" + }, + "source": [ + "\"Correlação" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "T2aSfzmP1WoC" + }, + "source": [ + "Vamos verificar qual a correlação entre as colunas dos genes: `g50` até `g100`e viabilidade celular: `c50` até `c99` tem entre elas:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "Kg-tZNw_Karl" + }, + "source": [ + "corr_g = df_experimentos.loc[:,'g50':'g100'].corr()" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "c09_FaYu1oKL", + "outputId": "e9a832d5-83df-4ad9-8ae6-ff42111eefad" + }, + "source": [ + "# Generate a mask for the upper triangle\n", + "mask = np.triu(np.ones_like(corr_g, dtype=bool))\n", + "\n", + "# Set up the matplotlib figure\n", + "f, ax = plt.subplots(figsize=(11, 9))\n", + "\n", + "# Generate a custom diverging colormap\n", + "cmap = sns.color_palette(\"crest\", as_cmap=True)\n", + "\n", + "# Draw the heatmap with the mask and correct aspect ratio\n", + "sns.heatmap(corr_g, mask=mask, cmap=cmap, center=0,\n", + " square=True, linewidths=.5, cbar_kws={\"shrink\": .5})\n", + "plt.title('Correlações entre g50 até g100',fontdict={'fontsize':18, 'fontweight': 'bold'})\n", + "plt.show()" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "display_data", + "data": { + "image/png": "\n", + "text/plain": [ + "

" + ] + }, + "metadata": { + "tags": [] + } + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "Zmpr2DANKfgl" + }, + "source": [ + "corr_c = df_experimentos.loc[:,'c49':'c99'].corr()" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "oWPoSM9q1wdr", + "outputId": "fe7e618f-97bb-43a7-94cf-a00d1653a0c0" + }, + "source": [ + "# Generate a mask for the upper triangle\n", + "mask = np.triu(np.ones_like(corr_c, dtype=bool))\n", + "\n", + "# Set up the matplotlib figure\n", + "f, ax = plt.subplots(figsize=(11, 9))\n", + "\n", + "# Generate a custom diverging colormap\n", + "cmap = sns.color_palette(\"crest\", as_cmap=True)\n", + "\n", + "# Draw the heatmap with the mask and correct aspect ratio\n", + "sns.heatmap(corr_c, mask=mask, cmap=cmap, center=0,\n", + " square=True, linewidths=.5, cbar_kws={\"shrink\": .5})\n", + "plt.title('Correlações entre c49 até c99',fontdict={'fontsize':18, 'fontweight': 'bold'})\n", + "plt.show()" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "display_data", + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "tags": [] + } + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "2kzVUPeF2NrG" + }, + "source": [ + "##Base Resultados" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "DsCTrRSI2UQD" + }, + "source": [ + "Como já colocamos no `paredão` a ***base de experimento*** e exploramos, vamos colocar no `paredão` a ***base de resultados*** para explorar." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "dPb5EVv-3auK" + }, + "source": [ + "Como fizemos na exploração anterior vamos ver se essa base tem a mesma **dimensão** que a base de experimento, para isso vamos utilizar o `.shape`" + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "pORhjwKK2SuK", + "outputId": "bebedbac-893f-476c-9574-faa81e0f1478" + }, + "source": [ + "df_resultados.shape" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "(23814, 207)" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 162 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "8oh2U1jB5lBc" + }, + "source": [ + "Diferente da base de dados anterior, essa base tem menos colunas, um total de `207 colunas`. Vamos verificar as primeira linhas." + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 261 + }, + "id": "ucHK3IoW2C5b", + "outputId": "f37f19aa-ea0f-4898-bb99-fc94d5b2be11" + }, + "source": [ + "df_resultados.head()" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
id5-alpha_reductase_inhibitor11-beta-hsd1_inhibitoracat_inhibitoracetylcholine_receptor_agonistacetylcholine_receptor_antagonistacetylcholinesterase_inhibitoradenosine_receptor_agonistadenosine_receptor_antagonistadenylyl_cyclase_activatoradrenergic_receptor_agonistadrenergic_receptor_antagonistakt_inhibitoraldehyde_dehydrogenase_inhibitoralk_inhibitorampk_activatoranalgesicandrogen_receptor_agonistandrogen_receptor_antagonistanesthetic_-_localangiogenesis_inhibitorangiotensin_receptor_antagonistanti-inflammatoryantiarrhythmicantibioticanticonvulsantantifungalantihistamineantimalarialantioxidantantiprotozoalantiviralapoptosis_stimulantaromatase_inhibitoratm_kinase_inhibitoratp-sensitive_potassium_channel_antagonistatp_synthase_inhibitoratpase_inhibitoratr_kinase_inhibitoraurora_kinase_inhibitor...protein_synthesis_inhibitorprotein_tyrosine_kinase_inhibitorradiopaque_mediumraf_inhibitorras_gtpase_inhibitorretinoid_receptor_agonistretinoid_receptor_antagonistrho_associated_kinase_inhibitorribonucleoside_reductase_inhibitorrna_polymerase_inhibitorserotonin_receptor_agonistserotonin_receptor_antagonistserotonin_reuptake_inhibitorsigma_receptor_agonistsigma_receptor_antagonistsmoothened_receptor_antagonistsodium_channel_inhibitorsphingosine_receptor_agonistsrc_inhibitorsteroidsyk_inhibitortachykinin_antagonisttgf-beta_receptor_inhibitorthrombin_inhibitorthymidylate_synthase_inhibitortlr_agonisttlr_antagonisttnf_inhibitortopoisomerase_inhibitortransient_receptor_potential_channel_antagonisttropomyosin_receptor_kinase_inhibitortrpv_agonisttrpv_antagonisttubulin_inhibitortyrosine_kinase_inhibitorubiquitin_specific_protease_inhibitorvegfr_inhibitorvitamin_bvitamin_d_receptor_agonistwnt_inhibitor
0id_000644bb2000000000000000000000000000000000000000...0000000000000000000000000000000000000000
1id_000779bfc000000000000000000000000000000000000000...0000000000000000000000000000000000000000
2id_000a6266a000000000000000000000000000000000000000...0000000000000000000000000000000000000000
3id_0015fd391000000000000000000000000000000000000000...0000000000000000000000000000000000000000
4id_001626bd3000000000000000000000000000000000000000...0000000000000000000000000000000000000000
\n", + "

5 rows × 207 columns

\n", + "
" + ], + "text/plain": [ + " id ... wnt_inhibitor\n", + "0 id_000644bb2 ... 0\n", + "1 id_000779bfc ... 0\n", + "2 id_000a6266a ... 0\n", + "3 id_0015fd391 ... 0\n", + "4 id_001626bd3 ... 0\n", + "\n", + "[5 rows x 207 columns]" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 163 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "nFJFWfJu6NYh" + }, + "source": [ + "Vamos olhar os nomes das colunas e ver se conseguimos tirar alguma informação" + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "05T4ix7742IV", + "outputId": "be7a4157-fc54-4dff-ed11-3a51136f47c8" + }, + "source": [ + "df_resultados.columns" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "Index(['id', '5-alpha_reductase_inhibitor', '11-beta-hsd1_inhibitor',\n", + " 'acat_inhibitor', 'acetylcholine_receptor_agonist',\n", + " 'acetylcholine_receptor_antagonist', 'acetylcholinesterase_inhibitor',\n", + " 'adenosine_receptor_agonist', 'adenosine_receptor_antagonist',\n", + " 'adenylyl_cyclase_activator',\n", + " ...\n", + " 'tropomyosin_receptor_kinase_inhibitor', 'trpv_agonist',\n", + " 'trpv_antagonist', 'tubulin_inhibitor', 'tyrosine_kinase_inhibitor',\n", + " 'ubiquitin_specific_protease_inhibitor', 'vegfr_inhibitor', 'vitamin_b',\n", + " 'vitamin_d_receptor_agonist', 'wnt_inhibitor'],\n", + " dtype='object', length=207)" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 164 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "fvE2mHd744JV", + "outputId": "89d84d8b-1c08-44bd-e43f-af22e0f694ae" + }, + "source": [ + "#Verificar os tipos de dados das colunas do dataset\n", + "df_resultados.info()" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "text": [ + "\n", + "RangeIndex: 23814 entries, 0 to 23813\n", + "Columns: 207 entries, id to wnt_inhibitor\n", + "dtypes: int64(206), object(1)\n", + "memory usage: 37.6+ MB\n" + ], + "name": "stdout" + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "jJo4VSSr6gc3" + }, + "source": [ + "No .head() vimos que aparecem muitos 0 e 1, pois 0 significa que não houve um ativamento do mecanismo de Ação, já o 1 é o ativimente do MoA. Vale ressaltar, que o mesmo experimento pode ativar nenhum, um ou vários MoAs." + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 350 + }, + "id": "Ih1mdvTYTo2G", + "outputId": "51170f43-d048-4469-d4c0-3a7aa72c3473" + }, + "source": [ + "#Verificando a contagem, média, desvio padrão, valor minimo, percentis(25%, 50%, 75%) e valor máximo.\n", + "df_resultados.describe()" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
5-alpha_reductase_inhibitor11-beta-hsd1_inhibitoracat_inhibitoracetylcholine_receptor_agonistacetylcholine_receptor_antagonistacetylcholinesterase_inhibitoradenosine_receptor_agonistadenosine_receptor_antagonistadenylyl_cyclase_activatoradrenergic_receptor_agonistadrenergic_receptor_antagonistakt_inhibitoraldehyde_dehydrogenase_inhibitoralk_inhibitorampk_activatoranalgesicandrogen_receptor_agonistandrogen_receptor_antagonistanesthetic_-_localangiogenesis_inhibitorangiotensin_receptor_antagonistanti-inflammatoryantiarrhythmicantibioticanticonvulsantantifungalantihistamineantimalarialantioxidantantiprotozoalantiviralapoptosis_stimulantaromatase_inhibitoratm_kinase_inhibitoratp-sensitive_potassium_channel_antagonistatp_synthase_inhibitoratpase_inhibitoratr_kinase_inhibitoraurora_kinase_inhibitorautotaxin_inhibitor...protein_synthesis_inhibitorprotein_tyrosine_kinase_inhibitorradiopaque_mediumraf_inhibitorras_gtpase_inhibitorretinoid_receptor_agonistretinoid_receptor_antagonistrho_associated_kinase_inhibitorribonucleoside_reductase_inhibitorrna_polymerase_inhibitorserotonin_receptor_agonistserotonin_receptor_antagonistserotonin_reuptake_inhibitorsigma_receptor_agonistsigma_receptor_antagonistsmoothened_receptor_antagonistsodium_channel_inhibitorsphingosine_receptor_agonistsrc_inhibitorsteroidsyk_inhibitortachykinin_antagonisttgf-beta_receptor_inhibitorthrombin_inhibitorthymidylate_synthase_inhibitortlr_agonisttlr_antagonisttnf_inhibitortopoisomerase_inhibitortransient_receptor_potential_channel_antagonisttropomyosin_receptor_kinase_inhibitortrpv_agonisttrpv_antagonisttubulin_inhibitortyrosine_kinase_inhibitorubiquitin_specific_protease_inhibitorvegfr_inhibitorvitamin_bvitamin_d_receptor_agonistwnt_inhibitor
count
mean0.0007140.0007560.0010080.0079790.0126400.0030650.0022680.0040310.0005040.0113380.0151170.0027710.0002940.0017640.0005040.0005040.0020160.0037370.0033590.0015120.0015540.0030650.0002520.0018060.0005040.0005460.0005040.0007560.0030650.0015120.0009660.0020580.0019740.0002520.0000420.0005040.0040730.0007980.0040310.000252...0.0043250.0007980.0023520.0093640.0005040.0028130.0002520.001470.0015540.0010500.0099100.0169650.0018480.0015120.0015120.0010500.0112120.0010500.0029810.0002520.0007980.0025200.0012600.0007980.0015540.0012600.0002940.0015120.0053330.0007560.0002520.0010500.0020160.0132700.0030650.0002520.0071390.0010920.0016380.001260
std0.0267090.0274830.0317310.0889670.1117160.0552830.0475660.0633650.0224430.1058760.1220220.0525730.0171430.0419600.0224430.0224430.0448510.0610200.0578640.0388520.0393870.0552830.0158710.0424560.0224430.0233590.0224430.0274830.0552830.0388520.0310630.0453150.0443830.0158710.0064800.0224430.0636930.0282360.0633650.015871...0.0656250.0282360.0484370.0963170.0224430.0529690.0158710.038310.0393870.0323840.0990570.1291420.0429460.0388520.0388520.0323840.1052930.0323840.0545220.0158710.0282360.0501330.0354720.0282360.0393870.0354720.0171430.0388520.0728340.0274830.0158710.0323840.0448510.1144290.0552830.0158710.0841900.0330250.0404360.035472
min



max
\n", + "

8 rows × 206 columns

\n", + "
" + ], + "text/plain": [ + " 5-alpha_reductase_inhibitor ... wnt_inhibitor\n", + "count 23814.000000 ... 23814.000000\n", + "mean 0.000714 ... 0.001260\n", + "std 0.026709 ... 0.035472\n", + "min 0.000000 ... 0.000000\n", + "25% 0.000000 ... 0.000000\n", + "50% 0.000000 ... 0.000000\n", + "75% 0.000000 ... 0.000000\n", + "max 1.000000 ... 1.000000\n", + "\n", + "[8 rows x 206 columns]" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 166 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "pz7hfc2MTurg" + }, + "source": [ + "Note que o `Max` é sempre `1` e o` Min` é sempre `0`, pois nessa base está populada com ***0 e 1***" + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "NS7Dpd-7TM_i", + "outputId": "65d3b12b-dd62-4bf4-8dab-29476cb3e399" + }, + "source": [ + "#Verificando se tem valores nulos\n", + "df_resultados.isnull().sum().sort_values(ascending = False)" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "wnt_inhibitor 0\n", + "cdk_inhibitor 0\n", + "dihydrofolate_reductase_inhibitor 0\n", + "cytochrome_p450_inhibitor 0\n", + "cyclooxygenase_inhibitor 0\n", + " ..\n", + "mtor_inhibitor 0\n", + "monopolar_spindle_1_kinase_inhibitor 0\n", + "monoamine_oxidase_inhibitor 0\n", + "monoacylglycerol_lipase_inhibitor 0\n", + "id 0\n", + "Length: 207, dtype: int64" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 167 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "G9fViN7wUAGb" + }, + "source": [ + "Vamos verificar se o `max` é `1` e o `min` é `0` escolhendo o primeiro MoA e olha seus possiveis valores utilizando o `.unique()`" + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "n5LuaWqO45ti", + "outputId": "f25dc0d0-20b1-4873-cd85-f64ca3f82e2e" + }, + "source": [ + "df_resultados['acetylcholine_receptor_agonist'].unique()" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "array([0, 1])" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 168 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "az3vtIuVX684" + }, + "source": [ + "Vamos verificar qual foi o MoA que mais ativou, para isso vamos somar todas as ativações e ordenar em ordem decrescente" + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "SaYxLI9YTNqu", + "outputId": "908ffa4c-21c0-40c7-c750-73bf6ea11856" + }, + "source": [ + "df_resultados.drop('id', axis=1).sum().sort_values(ascending=False)\n" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "nfkb_inhibitor 832\n", + "proteasome_inhibitor 726\n", + "cyclooxygenase_inhibitor 435\n", + "dopamine_receptor_antagonist 424\n", + "serotonin_receptor_antagonist 404\n", + " ... \n", + "protein_phosphatase_inhibitor 6\n", + "autotaxin_inhibitor 6\n", + "diuretic 6\n", + "erbb2_inhibitor 1\n", + "atp-sensitive_potassium_channel_antagonist 1\n", + "Length: 206, dtype: int64" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 169 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "vNmkS97yYY6f" + }, + "source": [ + "Os dois que mais ativaram são os que tem `inhibitor` no nome" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "8qcqX6cZ3NSX" + }, + "source": [ + "Agora vamos criar um `DataFrame` para armazenar os MoA e a quantidade de cada ação" + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 402 + }, + "id": "t1iKH6x0X1ho", + "outputId": "52f4f1f5-23e1-4088-cf58-5fc8e97a8976" + }, + "source": [ + "moas = df_resultados.drop(['id'], axis = 1).sum().sort_values(ascending = False).reset_index()\n", + "moas.columns = ['moa', 'quantidade']\n", + "moas" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
moaquantidade
0nfkb_inhibitor832
1proteasome_inhibitor726
2cyclooxygenase_inhibitor435
3dopamine_receptor_antagonist424
4serotonin_receptor_antagonist404
.........
201protein_phosphatase_inhibitor6
202autotaxin_inhibitor6
203diuretic6
204erbb2_inhibitor1
205atp-sensitive_potassium_channel_antagonist1
\n", + "

206 rows × 2 columns

\n", + "
" + ], + "text/plain": [ + " moa quantidade\n", + "0 nfkb_inhibitor 832\n", + "1 proteasome_inhibitor 726\n", + "2 cyclooxygenase_inhibitor 435\n", + "3 dopamine_receptor_antagonist 424\n", + "4 serotonin_receptor_antagonist 404\n", + ".. ... ...\n", + "201 protein_phosphatase_inhibitor 6\n", + "202 autotaxin_inhibitor 6\n", + "203 diuretic 6\n", + "204 erbb2_inhibitor 1\n", + "205 atp-sensitive_potassium_channel_antagonist 1\n", + "\n", + "[206 rows x 2 columns]" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 170 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "tV9yYIPT3ttK" + }, + "source": [ + "Agora vamos separar o primeiro nome do segundo para saber qual ação mais é ativada" + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "_55-23QwjOcg", + "outputId": "de5a0dd2-de2e-4f5f-e08e-5cc7d2795749" + }, + "source": [ + "rename = moas['moa'].str.split('_')\n", + "list_rename=[]\n", + "for i in rename:\n", + " list_rename.append(i[-1])\n", + "\n", + "list_rename[:10]" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "['inhibitor',\n", + " 'inhibitor',\n", + " 'inhibitor',\n", + " 'antagonist',\n", + " 'antagonist',\n", + " 'inhibitor',\n", + " 'antagonist',\n", + " 'antagonist',\n", + " 'inhibitor',\n", + " 'inhibitor']" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 171 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "bS_cxZiv4WeJ" + }, + "source": [ + "Vamos armazenar esse resultado em uma nova coluna nesse novo DataFrame que foi criado chamado moas." + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 195 + }, + "id": "Wj8UNJPFeBVL", + "outputId": "2b867b49-4cce-4f78-f0e5-834981b4053c" + }, + "source": [ + "moas['action'] = list_rename\n", + "\n", + "moas.head()" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
moaquantidadeaction
0nfkb_inhibitor832inhibitor
1proteasome_inhibitor726inhibitor
2cyclooxygenase_inhibitor435inhibitor
3dopamine_receptor_antagonist424antagonist
4serotonin_receptor_antagonist404antagonist
\n", + "
" + ], + "text/plain": [ + " moa quantidade action\n", + "0 nfkb_inhibitor 832 inhibitor\n", + "1 proteasome_inhibitor 726 inhibitor\n", + "2 cyclooxygenase_inhibitor 435 inhibitor\n", + "3 dopamine_receptor_antagonist 424 antagonist\n", + "4 serotonin_receptor_antagonist 404 antagonist" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 172 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "fMWJ91Zf5Zmv" + }, + "source": [ + "Agora vamos agrupar pela coluna `Action` e somar todos que tem o mesma ação e visualizar as 5 ações que mais ativaram" + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 225 + }, + "id": "8HryriWqdr3a", + "outputId": "e4411f28-b271-4085-e95c-b161f83c9251" + }, + "source": [ + "rank_moa = moas.groupby(by=['action']).agg({'quantidade': 'sum'}).sort_values('quantidade', ascending=False)[:5]\n", + "rank_moa" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
quantidade
action
inhibitor9693
antagonist3449
agonist2330
blocker323
agent150
\n", + "
" + ], + "text/plain": [ + " quantidade\n", + "action \n", + "inhibitor 9693\n", + "antagonist 3449\n", + "agonist 2330\n", + "blocker 323\n", + "agent 150" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 173 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 433 + }, + "id": "GXsv3y7NkJZS", + "outputId": "4ad01dd9-0168-4ede-a533-76d6ffb60510" + }, + "source": [ + "plt.style.use('seaborn')\n", + "plt.figure(figsize=(15,12))\n", + "rank_moa.plot.bar(color='#092A32')\n", + "plt.title('MoA mais ativados',fontdict={'fontsize':18, 'fontweight': 'bold'},)\n", + "plt.xlabel('Ação') \n", + "plt.ylabel('Quantidade') \n", + "plt.show()" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "
" + ] + }, + "metadata": { + "tags": [] + } + }, + { + "output_type": "display_data", + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "tags": [] + } + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "IDa45JA8po48" + }, + "source": [ + "#***Pré-processamento dos Dados***" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "1CfURjV8FKMF" + }, + "source": [ + "Agora que já colocamos as duas bases no `paredawn`, vamos preparar nossos dados para a criação da máquina preditiva" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Ipl6TKtXp0ZB" + }, + "source": [ + "##Transformação das colunas" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "npQBCR0nIdRh" + }, + "source": [ + "Vamos transformar as colunas que possuem o tipo de dados string em `1-0`\n", + "\n", + "Para a Base Experimentos vamos transformar as colunas: `tempo`, `tratamento`, `dose` e criar as colunas para armazenar essas transformaçoes em `tempo_24` , `tempo_48`, `tempo_72`, `dose`." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "aOWAzIDqrob8" + }, + "source": [ + "###Base Experimentos" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Cu1W6YFuFjfa" + }, + "source": [ + "Vamos separar a coluna tempo em 3 `tempo_24`, `tempo_48`, `tempo_72`, cada coluna será uma coluna binária (`0-1`).\n", + "\n", + "Caso seja o tempo que esteja no nome da coluna, será identificado como `1`, caso não é `0`" + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "Od1Tk3sgtBCR", + "outputId": "125b54ed-7649-4af7-ab83-29ffe67d8a59" + }, + "source": [ + "tempo24 = df_experimentos['tempo'] == 24\n", + "tempo24" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "0 True\n", + "1 False\n", + "2 False\n", + "3 False\n", + "4 False\n", + " ... \n", + "23809 True\n", + "23810 True\n", + "23811 False\n", + "23812 True\n", + "23813 False\n", + "Name: tempo, Length: 23814, dtype: bool" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 175 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "pKXKf2gHtTMI", + "outputId": "3c0d41f7-c45b-4f67-b984-23fa1567ac0c" + }, + "source": [ + "tempo48 = df_experimentos['tempo'] == 48\n", + "tempo48" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "0 False\n", + "1 False\n", + "2 True\n", + "3 True\n", + "4 False\n", + " ... \n", + "23809 False\n", + "23810 False\n", + "23811 True\n", + "23812 False\n", + "23813 False\n", + "Name: tempo, Length: 23814, dtype: bool" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 176 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "Aib5Gs_ktXMc" + }, + "source": [ + "tempo72 = df_experimentos['tempo'] == 72" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "QW6yus0ntafs" + }, + "source": [ + "com_droga = df_experimentos['tratamento'] =='com_droga'" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "dfYMlqHCtiDs" + }, + "source": [ + "D1 = df_experimentos['dose'] == 'D1'" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "GWYnF72BtZiQ" + }, + "source": [ + "df_experimentos['tempo_24'] = tempo24.astype('int')\n", + "df_experimentos['tempo_48'] = tempo48.astype('int')\n", + "df_experimentos['tempo_72'] = tempo72.astype('int')\n", + "df_experimentos['com_droga'] = com_droga.astype('int')\n", + "df_experimentos['dose'] = D1.astype('int')" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "Vqed45VLtldV", + "outputId": "1d55bc76-57f0-4514-a8a7-3e558b68fb9a" + }, + "source": [ + "df_experimentos.head()" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
idtratamentotempodosedrogag0g1g2g3g4g5g6g7g8g9g10g11g12g13g14g15g16g17g18g19g20g21g22g23g24g25g26g27g28g29g30g31g32g33g34...c64c65c66c67c68c69c70c71c72c73c74c75c76c77c78c79c80c81c82c83c84c85c86c87c88c89c90c91c92c93c94c95c96c97c98c99tempo_24tempo_48tempo_72com_droga
0id_000644bb2com_droga241b68db1d531.06200.5577-0.2479-0.6208-0.1944-1.0120-1.0220-0.03260.5548-0.09211.18300.15300.5574-0.40150.1789-0.6528-0.79690.63420.1778-0.3694-0.5688-1.1360-1.18800.69400.43930.26640.19070.1628-0.28530.58190.2934-0.5584-0.0916-0.3010-0.1537...0.10420.14030.17581.2570-0.59791.2250-0.05530.73510.58100.95900.24270.04950.41410.84320.6162-0.73181.21200.6362-0.44270.12881.48400.17990.5367-0.1111-1.01200.66850.28620.25840.80760.5523-0.19120.6584-0.39810.21390.38010.41761001
1id_000779bfccom_droga721df89a8e5a0.07430.40870.29910.06041.01900.52070.23410.3372-0.40470.8507-1.1520-0.4201-0.09580.45900.08030.22500.52930.2839-0.34940.28830.9449-0.1646-0.2657-0.33720.3135-0.43160.47730.2075-0.4216-0.1161-0.0499-0.26270.9959-0.24830.2655...0.29120.4151-0.2840-0.3104-0.63730.2887-0.07650.25390.44430.59320.20310.76390.5499-0.3322-0.09770.4329-0.27820.78270.59340.34020.14990.44200.93660.8193-0.42360.3192-0.42650.75430.47080.02300.29570.48990.15220.12410.60770.73710011
2id_000a6266acom_droga48118bb41b2c0.62800.58171.5540-0.0764-0.03231.23900.17150.21550.00651.2300-0.4797-0.5631-0.0366-1.83000.6057-0.32780.6042-0.3075-0.1147-0.0570-0.0799-0.8181-1.53200.23070.49010.4780-1.39704.6240-0.04371.2870-1.85300.60690.42900.17830.0018...-0.0014-2.3640-0.46820.1210-0.5177-0.06040.1682-0.44360.49630.13630.33350.9760-0.0427-0.12350.09590.0690-0.9416-0.7548-0.1109-0.62720.30190.11720.1093-0.31130.3019-0.0873-0.7250-0.62970.61030.0223-1.3240-0.3174-0.6417-0.2187-1.40800.69310101
3id_0015fd391com_droga4818c7f86626-0.5138-0.2491-0.26560.52884.0620-0.8095-1.95900.1792-0.1321-1.0600-0.8269-0.3584-0.8511-0.5844-2.56900.8183-0.0532-0.85540.1160-2.35202.1200-1.1580-0.7191-0.8004-1.4670-0.0107-0.89950.2406-0.2479-1.0890-0.75750.0881-2.73700.87450.5787...-2.9740-1.4930-1.6600-3.16600.2816-0.2990-1.1870-0.5044-1.7750-1.6120-0.9215-1.0810-3.0520-3.4470-2.7740-1.8460-0.5568-3.3960-2.9510-1.1550-3.2620-1.5390-2.4600-0.9417-1.55500.2431-2.0990-0.6441-5.6300-1.3780-0.8632-1.2880-1.6210-0.8784-0.3876-0.81540101
4id_001626bd3com_droga7207cbed3131-0.3254-0.40090.97000.69191.4180-0.8244-0.2800-0.1498-0.87890.8630-0.2219-0.5121-0.95771.17500.20420.19700.1244-1.7090-0.3543-0.5160-0.3330-0.26850.76490.20571.37200.68350.8056-0.3754-1.20900.2965-0.07120.63890.6674-0.07831.1740...-0.59430.39730.15000.51780.51590.60910.1813-0.42490.78320.65290.56480.48170.05870.53030.6376-0.3966-1.4950-0.9625-0.05410.62730.45630.06980.81340.19240.6054-0.18240.00420.00480.66701.06900.5523-0.30310.10940.2885-0.37860.71250011
\n", + "

5 rows × 881 columns

\n", + "
" + ], + "text/plain": [ + " id tratamento tempo ... tempo_48 tempo_72 com_droga\n", + "0 id_000644bb2 com_droga 24 ... 0 0 1\n", + "1 id_000779bfc com_droga 72 ... 0 1 1\n", + "2 id_000a6266a com_droga 48 ... 1 0 1\n", + "3 id_0015fd391 com_droga 48 ... 1 0 1\n", + "4 id_001626bd3 com_droga 72 ... 0 1 1\n", + "\n", + "[5 rows x 881 columns]" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 181 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "uk0Lmk35tzX0" + }, + "source": [ + "#df_experimentos.drop(['tratamento', 'tempo', 'dose'], axis=1, inplace=True)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "IcZiA6HzuB98", + "outputId": "41dfd639-a6e7-4029-8fec-09744e4d94a1" + }, + "source": [ + "df_experimentos.head()" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
idtratamentotempodosedrogag0g1g2g3g4g5g6g7g8g9g10g11g12g13g14g15g16g17g18g19g20g21g22g23g24g25g26g27g28g29g30g31g32g33g34...c64c65c66c67c68c69c70c71c72c73c74c75c76c77c78c79c80c81c82c83c84c85c86c87c88c89c90c91c92c93c94c95c96c97c98c99tempo_24tempo_48tempo_72com_droga
0id_000644bb2com_droga241b68db1d531.06200.5577-0.2479-0.6208-0.1944-1.0120-1.0220-0.03260.5548-0.09211.18300.15300.5574-0.40150.1789-0.6528-0.79690.63420.1778-0.3694-0.5688-1.1360-1.18800.69400.43930.26640.19070.1628-0.28530.58190.2934-0.5584-0.0916-0.3010-0.1537...0.10420.14030.17581.2570-0.59791.2250-0.05530.73510.58100.95900.24270.04950.41410.84320.6162-0.73181.21200.6362-0.44270.12881.48400.17990.5367-0.1111-1.01200.66850.28620.25840.80760.5523-0.19120.6584-0.39810.21390.38010.41761001
1id_000779bfccom_droga721df89a8e5a0.07430.40870.29910.06041.01900.52070.23410.3372-0.40470.8507-1.1520-0.4201-0.09580.45900.08030.22500.52930.2839-0.34940.28830.9449-0.1646-0.2657-0.33720.3135-0.43160.47730.2075-0.4216-0.1161-0.0499-0.26270.9959-0.24830.2655...0.29120.4151-0.2840-0.3104-0.63730.2887-0.07650.25390.44430.59320.20310.76390.5499-0.3322-0.09770.4329-0.27820.78270.59340.34020.14990.44200.93660.8193-0.42360.3192-0.42650.75430.47080.02300.29570.48990.15220.12410.60770.73710011
2id_000a6266acom_droga48118bb41b2c0.62800.58171.5540-0.0764-0.03231.23900.17150.21550.00651.2300-0.4797-0.5631-0.0366-1.83000.6057-0.32780.6042-0.3075-0.1147-0.0570-0.0799-0.8181-1.53200.23070.49010.4780-1.39704.6240-0.04371.2870-1.85300.60690.42900.17830.0018...-0.0014-2.3640-0.46820.1210-0.5177-0.06040.1682-0.44360.49630.13630.33350.9760-0.0427-0.12350.09590.0690-0.9416-0.7548-0.1109-0.62720.30190.11720.1093-0.31130.3019-0.0873-0.7250-0.62970.61030.0223-1.3240-0.3174-0.6417-0.2187-1.40800.69310101
3id_0015fd391com_droga4818c7f86626-0.5138-0.2491-0.26560.52884.0620-0.8095-1.95900.1792-0.1321-1.0600-0.8269-0.3584-0.8511-0.5844-2.56900.8183-0.0532-0.85540.1160-2.35202.1200-1.1580-0.7191-0.8004-1.4670-0.0107-0.89950.2406-0.2479-1.0890-0.75750.0881-2.73700.87450.5787...-2.9740-1.4930-1.6600-3.16600.2816-0.2990-1.1870-0.5044-1.7750-1.6120-0.9215-1.0810-3.0520-3.4470-2.7740-1.8460-0.5568-3.3960-2.9510-1.1550-3.2620-1.5390-2.4600-0.9417-1.55500.2431-2.0990-0.6441-5.6300-1.3780-0.8632-1.2880-1.6210-0.8784-0.3876-0.81540101
4id_001626bd3com_droga7207cbed3131-0.3254-0.40090.97000.69191.4180-0.8244-0.2800-0.1498-0.87890.8630-0.2219-0.5121-0.95771.17500.20420.19700.1244-1.7090-0.3543-0.5160-0.3330-0.26850.76490.20571.37200.68350.8056-0.3754-1.20900.2965-0.07120.63890.6674-0.07831.1740...-0.59430.39730.15000.51780.51590.60910.1813-0.42490.78320.65290.56480.48170.05870.53030.6376-0.3966-1.4950-0.9625-0.05410.62730.45630.06980.81340.19240.6054-0.18240.00420.00480.66701.06900.5523-0.30310.10940.2885-0.37860.71250011
\n", + "

5 rows × 881 columns

\n", + "
" + ], + "text/plain": [ + " id tratamento tempo ... tempo_48 tempo_72 com_droga\n", + "0 id_000644bb2 com_droga 24 ... 0 0 1\n", + "1 id_000779bfc com_droga 72 ... 0 1 1\n", + "2 id_000a6266a com_droga 48 ... 1 0 1\n", + "3 id_0015fd391 com_droga 48 ... 1 0 1\n", + "4 id_001626bd3 com_droga 72 ... 0 1 1\n", + "\n", + "[5 rows x 881 columns]" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 183 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "43LHzcC2rj71" + }, + "source": [ + "###Base Resultados" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "TEvh9tdUYwvv" + }, + "source": [ + "Na Base de Resultados vamos adicionar duas colunas, uma coluna será a soma de todos MoAs que ativaram.\n", + "\n", + "A outra coluna que contenha um True `1` ou False `0`, se foi ativado pelo menos uma vez ira ser 1, caso não tenha ativado nenhum, irá ser `0` . Assim, definiremos a nossa `variável target` para prevermos do `Objetivo 1`. \n", + "\n" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "1I6XNePZp4Ws" + }, + "source": [ + "df_resultados['qtd_moa'] = df_resultados.drop('id', axis=1).sum(axis=1)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "DOzojA4IqMv1", + "outputId": "bd884f78-3de4-47d0-c87f-e98962e53c8a" + }, + "source": [ + "#visualizando quantos MoAs foram ativados\n", + "df_resultados['qtd_moa'].sort_values(ascending=False)" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "21197 7\n", + "4849 7\n", + "19186 7\n", + "14316 7\n", + "20584 7\n", + " ..\n", + "6862 0\n", + "6861 0\n", + "11997 0\n", + "6859 0\n", + "23813 0\n", + "Name: qtd_moa, Length: 23814, dtype: int64" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 185 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "b1_TK56bqZhF", + "outputId": "6329c8b9-494b-4ee8-fd1f-de9f64c7cbb8" + }, + "source": [ + "#observando a nova coluna criada\n", + "df_resultados.head()" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
id5-alpha_reductase_inhibitor11-beta-hsd1_inhibitoracat_inhibitoracetylcholine_receptor_agonistacetylcholine_receptor_antagonistacetylcholinesterase_inhibitoradenosine_receptor_agonistadenosine_receptor_antagonistadenylyl_cyclase_activatoradrenergic_receptor_agonistadrenergic_receptor_antagonistakt_inhibitoraldehyde_dehydrogenase_inhibitoralk_inhibitorampk_activatoranalgesicandrogen_receptor_agonistandrogen_receptor_antagonistanesthetic_-_localangiogenesis_inhibitorangiotensin_receptor_antagonistanti-inflammatoryantiarrhythmicantibioticanticonvulsantantifungalantihistamineantimalarialantioxidantantiprotozoalantiviralapoptosis_stimulantaromatase_inhibitoratm_kinase_inhibitoratp-sensitive_potassium_channel_antagonistatp_synthase_inhibitoratpase_inhibitoratr_kinase_inhibitoraurora_kinase_inhibitor...protein_tyrosine_kinase_inhibitorradiopaque_mediumraf_inhibitorras_gtpase_inhibitorretinoid_receptor_agonistretinoid_receptor_antagonistrho_associated_kinase_inhibitorribonucleoside_reductase_inhibitorrna_polymerase_inhibitorserotonin_receptor_agonistserotonin_receptor_antagonistserotonin_reuptake_inhibitorsigma_receptor_agonistsigma_receptor_antagonistsmoothened_receptor_antagonistsodium_channel_inhibitorsphingosine_receptor_agonistsrc_inhibitorsteroidsyk_inhibitortachykinin_antagonisttgf-beta_receptor_inhibitorthrombin_inhibitorthymidylate_synthase_inhibitortlr_agonisttlr_antagonisttnf_inhibitortopoisomerase_inhibitortransient_receptor_potential_channel_antagonisttropomyosin_receptor_kinase_inhibitortrpv_agonisttrpv_antagonisttubulin_inhibitortyrosine_kinase_inhibitorubiquitin_specific_protease_inhibitorvegfr_inhibitorvitamin_bvitamin_d_receptor_agonistwnt_inhibitorqtd_moa
0id_000644bb2000000000000000000000000000000000000000...0000000000000000000000000000000000000001
1id_000779bfc000000000000000000000000000000000000000...0000000000000000000000000000000000000000
2id_000a6266a000000000000000000000000000000000000000...0000000000000000000000000000000000000003
3id_0015fd391000000000000000000000000000000000000000...0000000000000000000000000000000000000000
4id_001626bd3000000000000000000000000000000000000000...0000000000000000000000000000000000000001
\n", + "

5 rows × 208 columns

\n", + "
" + ], + "text/plain": [ + " id 5-alpha_reductase_inhibitor ... wnt_inhibitor qtd_moa\n", + "0 id_000644bb2 0 ... 0 1\n", + "1 id_000779bfc 0 ... 0 0\n", + "2 id_000a6266a 0 ... 0 3\n", + "3 id_0015fd391 0 ... 0 0\n", + "4 id_001626bd3 0 ... 0 1\n", + "\n", + "[5 rows x 208 columns]" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 186 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "HXdBJ3E-bZkT" + }, + "source": [ + "Criando a coluna que irá mostrar se houve pelo menos uma ativivação ou não, caso tenha pelo menos uma ativação será True, caso não False." + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "fAnC5VWMr61d", + "outputId": "10a1910d-0221-4925-8603-f6fb5c9829a2" + }, + "source": [ + "df_resultados['qtd_moa'] != 0" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "0 True\n", + "1 False\n", + "2 True\n", + "3 False\n", + "4 True\n", + " ... \n", + "23809 True\n", + "23810 True\n", + "23811 False\n", + "23812 True\n", + "23813 False\n", + "Name: qtd_moa, Length: 23814, dtype: bool" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 187 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "JeAwyRVcnZpN" + }, + "source": [ + "Vamos criar uma nova coluna na base de dados `resultados`" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "bPiz6ljJrgnc" + }, + "source": [ + "df_resultados['tem_moa'] = df_resultados['qtd_moa'] != 0" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "j-Kiw6tFsM5E", + "outputId": "07162e61-0107-4e1a-dfe8-40cbf1f5a763" + }, + "source": [ + "#Verificando a nova coluna chamada 'tem_moa'\n", + "df_resultados.head()" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
id5-alpha_reductase_inhibitor11-beta-hsd1_inhibitoracat_inhibitoracetylcholine_receptor_agonistacetylcholine_receptor_antagonistacetylcholinesterase_inhibitoradenosine_receptor_agonistadenosine_receptor_antagonistadenylyl_cyclase_activatoradrenergic_receptor_agonistadrenergic_receptor_antagonistakt_inhibitoraldehyde_dehydrogenase_inhibitoralk_inhibitorampk_activatoranalgesicandrogen_receptor_agonistandrogen_receptor_antagonistanesthetic_-_localangiogenesis_inhibitorangiotensin_receptor_antagonistanti-inflammatoryantiarrhythmicantibioticanticonvulsantantifungalantihistamineantimalarialantioxidantantiprotozoalantiviralapoptosis_stimulantaromatase_inhibitoratm_kinase_inhibitoratp-sensitive_potassium_channel_antagonistatp_synthase_inhibitoratpase_inhibitoratr_kinase_inhibitoraurora_kinase_inhibitor...radiopaque_mediumraf_inhibitorras_gtpase_inhibitorretinoid_receptor_agonistretinoid_receptor_antagonistrho_associated_kinase_inhibitorribonucleoside_reductase_inhibitorrna_polymerase_inhibitorserotonin_receptor_agonistserotonin_receptor_antagonistserotonin_reuptake_inhibitorsigma_receptor_agonistsigma_receptor_antagonistsmoothened_receptor_antagonistsodium_channel_inhibitorsphingosine_receptor_agonistsrc_inhibitorsteroidsyk_inhibitortachykinin_antagonisttgf-beta_receptor_inhibitorthrombin_inhibitorthymidylate_synthase_inhibitortlr_agonisttlr_antagonisttnf_inhibitortopoisomerase_inhibitortransient_receptor_potential_channel_antagonisttropomyosin_receptor_kinase_inhibitortrpv_agonisttrpv_antagonisttubulin_inhibitortyrosine_kinase_inhibitorubiquitin_specific_protease_inhibitorvegfr_inhibitorvitamin_bvitamin_d_receptor_agonistwnt_inhibitorqtd_moatem_moa
0id_000644bb2000000000000000000000000000000000000000...000000000000000000000000000000000000001True
1id_000779bfc000000000000000000000000000000000000000...000000000000000000000000000000000000000False
2id_000a6266a000000000000000000000000000000000000000...000000000000000000000000000000000000003True
3id_0015fd391000000000000000000000000000000000000000...000000000000000000000000000000000000000False
4id_001626bd3000000000000000000000000000000000000000...000000000000000000000000000000000000001True
\n", + "

5 rows × 209 columns

\n", + "
" + ], + "text/plain": [ + " id 5-alpha_reductase_inhibitor ... qtd_moa tem_moa\n", + "0 id_000644bb2 0 ... 1 True\n", + "1 id_000779bfc 0 ... 0 False\n", + "2 id_000a6266a 0 ... 3 True\n", + "3 id_0015fd391 0 ... 0 False\n", + "4 id_001626bd3 0 ... 1 True\n", + "\n", + "[5 rows x 209 columns]" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 189 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "t6izWH4KnvJX" + }, + "source": [ + "Mas pensando na nossa Maquina Preditiva, não é benéfico deixar em `True` e `False`, vamos transformar em `0` para False e `1` Para True. \n", + "\n", + "Para isso, vamos utilizar a Biblioteca `sklearn.preprocessing` e importar o `LabelEncoder` para fazer essa transformação" + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "u_JLcH6OsR61", + "outputId": "48742457-0802-4047-f4e7-533626ebd698" + }, + "source": [ + "#Aplicando a transformação\n", + "labelEncoder = LabelEncoder()\n", + "tem_moa = labelEncoder.fit_transform(df_resultados.tem_moa)\n", + "\n", + "df_resultados['atv_moa'] = tem_moa\n", + "#Visualizando a transformação\n", + "df_resultados.head()" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
id5-alpha_reductase_inhibitor11-beta-hsd1_inhibitoracat_inhibitoracetylcholine_receptor_agonistacetylcholine_receptor_antagonistacetylcholinesterase_inhibitoradenosine_receptor_agonistadenosine_receptor_antagonistadenylyl_cyclase_activatoradrenergic_receptor_agonistadrenergic_receptor_antagonistakt_inhibitoraldehyde_dehydrogenase_inhibitoralk_inhibitorampk_activatoranalgesicandrogen_receptor_agonistandrogen_receptor_antagonistanesthetic_-_localangiogenesis_inhibitorangiotensin_receptor_antagonistanti-inflammatoryantiarrhythmicantibioticanticonvulsantantifungalantihistamineantimalarialantioxidantantiprotozoalantiviralapoptosis_stimulantaromatase_inhibitoratm_kinase_inhibitoratp-sensitive_potassium_channel_antagonistatp_synthase_inhibitoratpase_inhibitoratr_kinase_inhibitoraurora_kinase_inhibitor...raf_inhibitorras_gtpase_inhibitorretinoid_receptor_agonistretinoid_receptor_antagonistrho_associated_kinase_inhibitorribonucleoside_reductase_inhibitorrna_polymerase_inhibitorserotonin_receptor_agonistserotonin_receptor_antagonistserotonin_reuptake_inhibitorsigma_receptor_agonistsigma_receptor_antagonistsmoothened_receptor_antagonistsodium_channel_inhibitorsphingosine_receptor_agonistsrc_inhibitorsteroidsyk_inhibitortachykinin_antagonisttgf-beta_receptor_inhibitorthrombin_inhibitorthymidylate_synthase_inhibitortlr_agonisttlr_antagonisttnf_inhibitortopoisomerase_inhibitortransient_receptor_potential_channel_antagonisttropomyosin_receptor_kinase_inhibitortrpv_agonisttrpv_antagonisttubulin_inhibitortyrosine_kinase_inhibitorubiquitin_specific_protease_inhibitorvegfr_inhibitorvitamin_bvitamin_d_receptor_agonistwnt_inhibitorqtd_moatem_moaatv_moa
0id_000644bb2000000000000000000000000000000000000000...00000000000000000000000000000000000001True1
1id_000779bfc000000000000000000000000000000000000000...00000000000000000000000000000000000000False0
2id_000a6266a000000000000000000000000000000000000000...00000000000000000000000000000000000003True1
3id_0015fd391000000000000000000000000000000000000000...00000000000000000000000000000000000000False0
4id_001626bd3000000000000000000000000000000000000000...00000000000000000000000000000000000001True1
\n", + "

5 rows × 210 columns

\n", + "
" + ], + "text/plain": [ + " id 5-alpha_reductase_inhibitor ... tem_moa atv_moa\n", + "0 id_000644bb2 0 ... True 1\n", + "1 id_000779bfc 0 ... False 0\n", + "2 id_000a6266a 0 ... True 1\n", + "3 id_0015fd391 0 ... False 0\n", + "4 id_001626bd3 0 ... True 1\n", + "\n", + "[5 rows x 210 columns]" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 190 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "RFQKd74oocNQ" + }, + "source": [ + "Vamos remover a coluna `tem_moa`, pois já temos a informação que queremos na coluna nova chamada `atv_moa` ( que mostra se foi ativado pelo menos uma vez ou não)." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "vX-ZBHRssfjU" + }, + "source": [ + "df_resultados.drop('tem_moa', inplace=True, axis=1)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "7SU36uSKsut8", + "outputId": "f2f33bb0-34cf-4072-e0b6-42b4c3499d22" + }, + "source": [ + "#verificando o DataFrame sem a antiga coluna\n", + "df_resultados.head()" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
id5-alpha_reductase_inhibitor11-beta-hsd1_inhibitoracat_inhibitoracetylcholine_receptor_agonistacetylcholine_receptor_antagonistacetylcholinesterase_inhibitoradenosine_receptor_agonistadenosine_receptor_antagonistadenylyl_cyclase_activatoradrenergic_receptor_agonistadrenergic_receptor_antagonistakt_inhibitoraldehyde_dehydrogenase_inhibitoralk_inhibitorampk_activatoranalgesicandrogen_receptor_agonistandrogen_receptor_antagonistanesthetic_-_localangiogenesis_inhibitorangiotensin_receptor_antagonistanti-inflammatoryantiarrhythmicantibioticanticonvulsantantifungalantihistamineantimalarialantioxidantantiprotozoalantiviralapoptosis_stimulantaromatase_inhibitoratm_kinase_inhibitoratp-sensitive_potassium_channel_antagonistatp_synthase_inhibitoratpase_inhibitoratr_kinase_inhibitoraurora_kinase_inhibitor...radiopaque_mediumraf_inhibitorras_gtpase_inhibitorretinoid_receptor_agonistretinoid_receptor_antagonistrho_associated_kinase_inhibitorribonucleoside_reductase_inhibitorrna_polymerase_inhibitorserotonin_receptor_agonistserotonin_receptor_antagonistserotonin_reuptake_inhibitorsigma_receptor_agonistsigma_receptor_antagonistsmoothened_receptor_antagonistsodium_channel_inhibitorsphingosine_receptor_agonistsrc_inhibitorsteroidsyk_inhibitortachykinin_antagonisttgf-beta_receptor_inhibitorthrombin_inhibitorthymidylate_synthase_inhibitortlr_agonisttlr_antagonisttnf_inhibitortopoisomerase_inhibitortransient_receptor_potential_channel_antagonisttropomyosin_receptor_kinase_inhibitortrpv_agonisttrpv_antagonisttubulin_inhibitortyrosine_kinase_inhibitorubiquitin_specific_protease_inhibitorvegfr_inhibitorvitamin_bvitamin_d_receptor_agonistwnt_inhibitorqtd_moaatv_moa
0id_000644bb2000000000000000000000000000000000000000...0000000000000000000000000000000000000011
1id_000779bfc000000000000000000000000000000000000000...0000000000000000000000000000000000000000
2id_000a6266a000000000000000000000000000000000000000...0000000000000000000000000000000000000031
3id_0015fd391000000000000000000000000000000000000000...0000000000000000000000000000000000000000
4id_001626bd3000000000000000000000000000000000000000...0000000000000000000000000000000000000011
\n", + "

5 rows × 209 columns

\n", + "
" + ], + "text/plain": [ + " id 5-alpha_reductase_inhibitor ... qtd_moa atv_moa\n", + "0 id_000644bb2 0 ... 1 1\n", + "1 id_000779bfc 0 ... 0 0\n", + "2 id_000a6266a 0 ... 3 1\n", + "3 id_0015fd391 0 ... 0 0\n", + "4 id_001626bd3 0 ... 1 1\n", + "\n", + "[5 rows x 209 columns]" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 192 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "HzIIkz2gpttr" + }, + "source": [ + "##Junção das Bases" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "KMtPY2o7o4VS" + }, + "source": [ + "Já transformamos todas as colunas que queríamos, agora vamos juntar as duas bases para conseguirmos ter uma melhor análise e cruzar melhor as colunas para tirar insights" + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 244 + }, + "id": "SxJVtYtXnBMf", + "outputId": "d1c352a0-5530-41fe-b104-d42eb7649094" + }, + "source": [ + "df = pd.merge(df_experimentos, df_resultados[['id','qtd_moa', 'atv_moa',]], on='id')\n", + "df.head()" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
idtratamentotempodosedrogag0g1g2g3g4g5g6g7g8g9g10g11g12g13g14g15g16g17g18g19g20g21g22g23g24g25g26g27g28g29g30g31g32g33g34...c66c67c68c69c70c71c72c73c74c75c76c77c78c79c80c81c82c83c84c85c86c87c88c89c90c91c92c93c94c95c96c97c98c99tempo_24tempo_48tempo_72com_drogaqtd_moaatv_moa
0id_000644bb2com_droga241b68db1d531.06200.5577-0.2479-0.6208-0.1944-1.0120-1.0220-0.03260.5548-0.09211.18300.15300.5574-0.40150.1789-0.6528-0.79690.63420.1778-0.3694-0.5688-1.1360-1.18800.69400.43930.26640.19070.1628-0.28530.58190.2934-0.5584-0.0916-0.3010-0.1537...0.17581.2570-0.59791.2250-0.05530.73510.58100.95900.24270.04950.41410.84320.6162-0.73181.21200.6362-0.44270.12881.48400.17990.5367-0.1111-1.01200.66850.28620.25840.80760.5523-0.19120.6584-0.39810.21390.38010.4176100111
1id_000779bfccom_droga721df89a8e5a0.07430.40870.29910.06041.01900.52070.23410.3372-0.40470.8507-1.1520-0.4201-0.09580.45900.08030.22500.52930.2839-0.34940.28830.9449-0.1646-0.2657-0.33720.3135-0.43160.47730.2075-0.4216-0.1161-0.0499-0.26270.9959-0.24830.2655...-0.2840-0.3104-0.63730.2887-0.07650.25390.44430.59320.20310.76390.5499-0.3322-0.09770.4329-0.27820.78270.59340.34020.14990.44200.93660.8193-0.42360.3192-0.42650.75430.47080.02300.29570.48990.15220.12410.60770.7371001100
2id_000a6266acom_droga48118bb41b2c0.62800.58171.5540-0.0764-0.03231.23900.17150.21550.00651.2300-0.4797-0.5631-0.0366-1.83000.6057-0.32780.6042-0.3075-0.1147-0.0570-0.0799-0.8181-1.53200.23070.49010.4780-1.39704.6240-0.04371.2870-1.85300.60690.42900.17830.0018...-0.46820.1210-0.5177-0.06040.1682-0.44360.49630.13630.33350.9760-0.0427-0.12350.09590.0690-0.9416-0.7548-0.1109-0.62720.30190.11720.1093-0.31130.3019-0.0873-0.7250-0.62970.61030.0223-1.3240-0.3174-0.6417-0.2187-1.40800.6931010131
3id_0015fd391com_droga4818c7f86626-0.5138-0.2491-0.26560.52884.0620-0.8095-1.95900.1792-0.1321-1.0600-0.8269-0.3584-0.8511-0.5844-2.56900.8183-0.0532-0.85540.1160-2.35202.1200-1.1580-0.7191-0.8004-1.4670-0.0107-0.89950.2406-0.2479-1.0890-0.75750.0881-2.73700.87450.5787...-1.6600-3.16600.2816-0.2990-1.1870-0.5044-1.7750-1.6120-0.9215-1.0810-3.0520-3.4470-2.7740-1.8460-0.5568-3.3960-2.9510-1.1550-3.2620-1.5390-2.4600-0.9417-1.55500.2431-2.0990-0.6441-5.6300-1.3780-0.8632-1.2880-1.6210-0.8784-0.3876-0.8154010100
4id_001626bd3com_droga7207cbed3131-0.3254-0.40090.97000.69191.4180-0.8244-0.2800-0.1498-0.87890.8630-0.2219-0.5121-0.95771.17500.20420.19700.1244-1.7090-0.3543-0.5160-0.3330-0.26850.76490.20571.37200.68350.8056-0.3754-1.20900.2965-0.07120.63890.6674-0.07831.1740...0.15000.51780.51590.60910.1813-0.42490.78320.65290.56480.48170.05870.53030.6376-0.3966-1.4950-0.9625-0.05410.62730.45630.06980.81340.19240.6054-0.18240.00420.00480.66701.06900.5523-0.30310.10940.2885-0.37860.7125001111
\n", + "

5 rows × 883 columns

\n", + "
" + ], + "text/plain": [ + " id tratamento tempo dose ... tempo_72 com_droga qtd_moa atv_moa\n", + "0 id_000644bb2 com_droga 24 1 ... 0 1 1 1\n", + "1 id_000779bfc com_droga 72 1 ... 1 1 0 0\n", + "2 id_000a6266a com_droga 48 1 ... 0 1 3 1\n", + "3 id_0015fd391 com_droga 48 1 ... 0 1 0 0\n", + "4 id_001626bd3 com_droga 72 0 ... 1 1 1 1\n", + "\n", + "[5 rows x 883 columns]" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 193 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "D9z2CjZqpPWq" + }, + "source": [ + "#***Levantamento de Hipóteses***" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "BSV3OnDypmEc" + }, + "source": [ + "Após conhecer a nossa base e fazer as transformações necessárias, vamos levantar algumas hipóteses para tentar válidar elas ou não.\n", + "\n", + "1. Todos os experimentos que foram tratados com com_controle não houve ativação do MoA\n", + "2. Dos experimentos que foram tratados com droga, 20% não ativaram o MoA\n", + "3. Metade dos experimentos que receberam a primeira dose não ativaram o MoA\n", + "4. 90% dos experimentos tratados com droga que receberam a segunda dose ativaram o MoA\n", + "5. A droga mais utilizada nos experimentos ativa o MoA 4 vezes, pelo menos uma vez\n", + "6. A droga mais utilizada nos experimentos não precisa tomar a segunda dose\n", + "7. Os experimentos que tiveram 24 horas de espera só ativaram o MoA 50% dos casos.\n", + "8. Os experimentos que foram tratados com droga, 75% ativa com a espera de 72 horas.\n", + "\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "mrdZDxtenZb9" + }, + "source": [ + "Agora que criamos nossas hipoteses, vamos verificar elas." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "6153pUybjXPT" + }, + "source": [ + "##1. Todos os experimentos que foram tratados com com_controle não houve ativação do MoA" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "PsZYlRi9ud4W" + }, + "source": [ + "Vamos verificar todos que que não foram utilizados droga e ver se tiveram MoA. Em tese, `com_controle` não deveria ter ativado o MoA" + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "VJ4l3xG5pwoe", + "outputId": "b6e19436-84e7-4750-cc8d-7215fbaaf6fa" + }, + "source": [ + "df.query('com_droga == 0')['atv_moa'].value_counts()" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "0 1866\n", + "Name: atv_moa, dtype: int64" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 194 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "S-8POPfsuwcc" + }, + "source": [ + "Conforme previsto, realmente nenhum com controle ativou um MoA\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "MD0CAW1djlBf" + }, + "source": [ + "##2. Dos experimentos que foram tratados com droga, 20% não ativaram o MoA" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Pyx2jn24u3h3" + }, + "source": [ + "Vamos ver se os tratados com droga não ativaram o MoA" + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "XxJ5Ldm00_Fa", + "outputId": "29e2b047-0f24-4e1a-bdd9-75cb06000441" + }, + "source": [ + "df.query('com_droga == 1')['atv_moa'].value_counts(normalize=True)" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "1 0.658238\n", + "0 0.341762\n", + "Name: atv_moa, dtype: float64" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 269 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "CY1qSJzOufLE", + "outputId": "7b065194-3a91-41e2-8ae3-585557c502ac" + }, + "source": [ + "droga_atv = df.query('com_droga == 1')['atv_moa'].value_counts()\n", + "\n", + "plt.style.use('seaborn')\n", + "plt.figure(figsize=(8,6))\n", + "ax = droga_atv.plot.bar(color=['grey','#092A32'])\n", + "plt.title('Tratados com Droga mas Não Ativaram',fontdict={'fontsize':18, 'fontweight': 'bold'},)\n", + "plt.xlabel('Ação') \n", + "plt.ylabel('Quantidade')\n", + "plt.xticks(rotation=0)\n", + "plt.legend(['Moa Ativou'])\n", + "plt.show()" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "display_data", + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "tags": [] + } + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "jmVbTRJgjuEd" + }, + "source": [ + "34% dos que receberam o tratamento com droga não ativaram o MoA" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "jXYcS3enj-Qd" + }, + "source": [ + "##3. Metade dos experimentos que receberam a primeira dose não ativaram o MoA" + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "8IBHdR7nvExW", + "outputId": "7b892fe4-de2c-47e7-b877-071cf0aa44eb" + }, + "source": [ + "df.query('com_droga == 1 and dose == 1')['atv_moa'].value_counts(normalize=True)" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "1 0.658449\n", + "0 0.341551\n", + "Name: atv_moa, dtype: float64" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 270 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 407 + }, + "id": "ff4WoD3i2KRE", + "outputId": "5749d75e-a82d-4517-c877-e916093aaefd" + }, + "source": [ + "droga_d1 = df.query('com_droga == 1 and dose == 1')['atv_moa'].value_counts()\n", + "\n", + "plt.style.use('seaborn')\n", + "plt.figure(figsize=(8,6))\n", + "droga_d1.plot.bar(color=['#092A32','grey',])\n", + "plt.title('Experimentos que tomaram a Primeira Dose e Ativou algum MoA',fontdict={'fontsize':18, 'fontweight': 'bold'},)\n", + "plt.xlabel('Ação') \n", + "plt.ylabel('Quantidade')\n", + "plt.xticks(rotation=0)\n", + "plt.legend(['MoA Ativou'])\n", + "plt.show()" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "display_data", + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "tags": [] + } + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "zBLf1vLGkMSE" + }, + "source": [ + "Neste caso, apenas 34% dos experimentos receberam a primeira dose, não ativou o MoA" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "pYcuxNBPki4v" + }, + "source": [ + "##4. 90% dos experimentos que foram tratados com droga que receberam a segunda dose ativaram o MoA" + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "K2gqgl12uqSX", + "outputId": "e130a300-1e09-4f56-c6ed-518736278993" + }, + "source": [ + "df.query('com_droga == 1 and dose == 0')['atv_moa'].value_counts(normalize=True)" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "1 0.658017\n", + "0 0.341983\n", + "Name: atv_moa, dtype: float64" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 272 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 407 + }, + "id": "Dm1gAaMY2KvE", + "outputId": "3b7d3e05-a2b2-4132-a2a1-9dc67af29aac" + }, + "source": [ + "droga_d2 = df.query('com_droga == 1 and dose == 0')['atv_moa'].value_counts()\n", + "\n", + "plt.style.use('seaborn')\n", + "plt.figure(figsize=(8,6))\n", + "droga_d2.plot.bar(color=['grey','#092A32'])\n", + "plt.title('Experimentos que tomaram a Segunda Dose e Não Ativou o MoA',fontdict={'fontsize':18, 'fontweight': 'bold'},)\n", + "plt.xlabel('Ação') \n", + "plt.ylabel('Quantidade')\n", + "plt.xticks(rotation=0)\n", + "plt.legend(['Moa Ativou'])\n", + "plt.show()" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "display_data", + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "tags": [] + } + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "tFEKd87elEe0" + }, + "source": [ + "Apenas 65% dos experimentos que foram tratados com droga que receberam a segunda dose ativaram o MoA" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "JO0brVEClXDP" + }, + "source": [ + "##5. A droga mais utilizada nos experimentos ativa o MoA 4 vezes pelo menos uma vez" + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "WHOjtED_ukG0", + "outputId": "572bdcdd-e167-4915-b160-75c25ee6cbc2" + }, + "source": [ + "df.groupby(by='droga')['qtd_moa'].sum().sort_values(ascending = False)" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "droga\n", + "87d714366 1436\n", + "d50f18348 558\n", + "9f80f3f77 246\n", + "8b87a7a83 203\n", + "5628cb3ee 202\n", + " ... \n", + "8be20f208 0\n", + "8c25111b8 0\n", + "8c48a42be 0\n", + "8c7b401c2 0\n", + "00199ff52 0\n", + "Name: qtd_moa, Length: 3289, dtype: int64" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 201 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "cXYIpiIs4z6a", + "outputId": "cbff7d21-e8a4-42e5-d4e9-8283972c5452" + }, + "source": [ + "df.query('droga == \"87d714366\"')['qtd_moa'].sort_values(ascending = False)" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "23802 2\n", + "7647 2\n", + "8425 2\n", + "8404 2\n", + "8359 2\n", + " ..\n", + "16260 2\n", + "16258 2\n", + "16246 2\n", + "16235 2\n", + "16 2\n", + "Name: qtd_moa, Length: 718, dtype: int64" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 202 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "zdpARoHh8jiG" + }, + "source": [ + "A droga mais utilizada nos experimentos ativa o 2 MoAs **todas** das vezes." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "H9u7NxWA8qGH" + }, + "source": [ + "##6. A droga mais utilizada nos experimentos não precisa tomar a segunda dose" + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "nqorggtm4kX7", + "outputId": "38cd5d8f-d19f-4a3c-9f7a-497603c4a5c6" + }, + "source": [ + "df.query('droga == \"87d714366\"')['dose'].value_counts()" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "1 375\n", + "0 343\n", + "Name: dose, dtype: int64" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 203 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ozSiRWRhluzU" + }, + "source": [ + "A droga mais utilizada tem casos que não ativou o MoA." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "GTo4L6jznFSl" + }, + "source": [ + "##7. Os experimentos que tiveram 24 horas de espera só ativaram o MoA 50% dos casos." + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "PrSupxf85vwz", + "outputId": "f0d9c00b-308a-4e95-dc78-23a17aabd6cd" + }, + "source": [ + "df.query('tempo_24 == 1')['atv_moa'].value_counts()" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "1 4718\n", + "0 3054\n", + "Name: atv_moa, dtype: int64" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 204 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "QS1GlDC7mvRt" + }, + "source": [ + "##8. Os experimentos que foram tratados com droga, 75% ativa com a espera de 72 horas." + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "M-MT-MY99zsC", + "outputId": "7008fb93-82b4-4f5a-caae-93c40e12a90d" + }, + "source": [ + "df.query('tempo_72 == 1 and com_droga == 1')['atv_moa'].value_counts(normalize=True)" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "1 0.659053\n", + "0 0.340947\n", + "Name: atv_moa, dtype: float64" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 275 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "3y6aHhDPm2ul" + }, + "source": [ + "Apenas 65% ativa o MoA em 72 horas" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "rMYcVDMH-LCj" + }, + "source": [ + "#***Criação da Maquina Preditiva***\n", + "Objetivo 1: Prever quando ativa pelo menos 1 MoA" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "8BXbiJhg5oMn" + }, + "source": [ + "Então agora vamos criar maquina preditiva do primeiro objetivo.\n", + "\n", + "Vamos escolher uma semente para que sempre quando tiver parametros de aleatoridade usaremos essa semente.\n", + "\n", + "Vamos fazer a nossa seleção das variáveis target e variáveis preditoras. \n", + "\n", + " \n", + "\n", + "1. `Variável target:` variavel que queremos prever.\n", + "2. `Variável preditora:` variável que irá nos auxilar na predição.\n", + "\n", + "\n", + "\n", + "\n", + "Vamos dividir nosso conjunto de dados em conjunto de treino e conjunto de teste. \n", + "\n", + "geralmente o conjunto de teste tem de tamanho em torno de 20% a 50% de todo o conjunto de dados. Isso depende muito do tamanho do conjunto de dados. \n", + "\n", + "Neste projeto escolhi deixar o com 30% para testar o nosso modelo." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "gWbmRV1p-nYx" + }, + "source": [ + "SEED = 10\n", + "\n", + "x = df.select_dtypes('float64','int64')\n", + "y = df['atv_moa']\n", + "\n", + "x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.3, random_state = SEED)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "mdacPyZ18axe" + }, + "source": [ + "Vamos testar em vários modelos preditivos de `Classificação`, com o objetivo de selecionar o melhor modelo possível para nossos dados." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "9YMB3vF__By_" + }, + "source": [ + "##Modelo Regressão Logistica antes do Balanceamento" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "njKMYKCd7PTx" + }, + "source": [ + "O primeiro será o modelo de regressão Logistica, para isso precisamos importar a biblioteca `sklearn.linear_model` e escolher o modelo de `LogisticRegression`.\n", + "Para ver a documentação da Biblioteca [clique aqui](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html)
\n", + "Vamos instanciar o modelo e treinar com nossos dados de treino." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "2x_kymmr-oEi" + }, + "source": [ + "model_rl = LogisticRegression(max_iter=1000)\n", + "model_rl.fit(x_train, y_train)\n", + "y_predict=model_rl.predict(x_test)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "zL0XcI25_NLG" + }, + "source": [ + "###Avaliando o Modelo" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "tKvbGEuu8Jb2" + }, + "source": [ + "Agora que já treinamos nossos dados, vamos avaliar o modelo, existem diversas métricas de avaliação. Neste projeto utilizaremos: `precision` , `recall`, ` F1-score` e `Accuracy`" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "UuIM9md1_Odn", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "b02327a6-0a94-4e0e-93e1-5315df1a8e00" + }, + "source": [ + "model_rl.score(x_test, y_test)*100\n" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "62.43526941917425" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 213 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Od1NBbni9ts-" + }, + "source": [ + "Tivemos um score de 62% e infelizmente isso é muito baixo.\n", + "\n", + "\"imersão\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "EpNZeXFy-hAW" + }, + "source": [ + "Mas calma, ainda temos muito trabalho pela frente, a vida de um Cientista é assim mesmo, precisamos testar, testar e no fim testar!" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "1iU1lVbr-i_X" + }, + "source": [ + "Vamos criar uma matrix para entender como está o erro, a coluna `Real` é a coluna dos **dados certos**, ja a coluna `Predictions `é a coluna que a **máquina previu**." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "XDLhIdgF-uot", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 402 + }, + "outputId": "588f27b5-35f9-4ab1-ad11-0e5edcdfdfba" + }, + "source": [ + "mtx = pd.DataFrame({'Real':y_test,'Predictions': y_predict})\n", + "mtx" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
RealPredictions
857910
1671800
1700711
190111
1347011
.........
61901
1201910
1686911
753610
155101
\n", + "

7145 rows × 2 columns

\n", + "
" + ], + "text/plain": [ + " Real Predictions\n", + "8579 1 0\n", + "16718 0 0\n", + "17007 1 1\n", + "1901 1 1\n", + "13470 1 1\n", + "... ... ...\n", + "619 0 1\n", + "12019 1 0\n", + "16869 1 1\n", + "7536 1 0\n", + "1551 0 1\n", + "\n", + "[7145 rows x 2 columns]" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 214 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "JRgqwrE--zOW" + }, + "source": [ + "Vamos importar a biblioteca que mostra várias métricas de Classificação. Para isso utilizamos a biblioteca `sklearn.metrics` e importamos o `classification_report`" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "d2GN4X5r-w9o", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "c8ba3074-26cd-4e4c-e85e-6233cb9f810c" + }, + "source": [ + "print('Classification metrics--> \\n', classification_report(y_test,y_predict))" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "text": [ + "Classification metrics--> \n", + " precision recall f1-score support\n", + "\n", + " 0 0.54 0.40 0.46 2841\n", + " 1 0.66 0.77 0.71 4304\n", + "\n", + " accuracy 0.62 7145\n", + " macro avg 0.60 0.59 0.59 7145\n", + "weighted avg 0.61 0.62 0.61 7145\n", + "\n" + ], + "name": "stdout" + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "zOPgJvxl_Yc-" + }, + "source": [ + "Assim ele mostra todas as métricas importantes para avaliar o modelo. \n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "6FwF86gIB4HS" + }, + "source": [ + "62% significa que a cada 100 previsões ele irá acertar apenas 62 e errar 38 de todas as 100, sendo muito baixo." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "4K4zvJ4lCMzF" + }, + "source": [ + "Esse modelo não está muito bom, vamos ver outro modelo e eu escolho o XGB." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "RiN9MEjH_UTE" + }, + "source": [ + "##Modelo XGBoost antes do Balanceamento" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "cdIYsTt7AH9V" + }, + "source": [ + "Agora vamos utilizar o XGB, para isso precisamos importar a biblioteca `xgboost` e escolher o modelo de `XGBClassifier`.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "7XNWlATD_79V" + }, + "source": [ + "\n", + "\n", + "Para ver a documentação [Clique Aqui](https://xgboost.readthedocs.io/en/latest/python/python_api.html)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "MLWBSjZtAj-l" + }, + "source": [ + "Vamos instanciar o modelo e treinar com os dados" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "zKVX5NZ3_gxz" + }, + "source": [ + "model_x=XGBClassifier()\n", + "model_x.fit(x_train,y_train)\n", + "y_predict=model_x.predict(x_test)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "dQTuAQjz_t8D" + }, + "source": [ + "###Avaliando o Modelo" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "4hqnLMg0ApQW" + }, + "source": [ + "Agora que treinamos vamos **avaliar** como fizemos com o modelo anterior e irei fazer a `avaliação` nos `demais modelos`, para poder `comparar` depois." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "o3BO6x1k_mYT", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "8552f4af-f1cd-487f-8be7-b742d8a20a9b" + }, + "source": [ + "model_x.score(x_test, y_test)*100" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "65.47235829251224" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 217 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "Eviu6JWa_jBz", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "52facb3b-2aa6-4fcf-a3ff-b8ed5f482123" + }, + "source": [ + "print('Classification metrics--> \\n', classification_report(y_test,y_predict))" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "text": [ + "Classification metrics--> \n", + " precision recall f1-score support\n", + "\n", + " 0 0.67 0.26 0.38 2841\n", + " 1 0.65 0.91 0.76 4304\n", + "\n", + " accuracy 0.65 7145\n", + " macro avg 0.66 0.59 0.57 7145\n", + "weighted avg 0.66 0.65 0.61 7145\n", + "\n" + ], + "name": "stdout" + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "7BeAq3TVBbK_" + }, + "source": [ + "Esse modelo está com um Score de 65%, aumentou 3% referente ao anterior, melhor que o Regressão Logistica, mas ainda não é o suficiente. \n", + "\n", + "Então vamos tentar mais um modelo, caso não seja bom, vamos tentar descobrir como podemos melhorar." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "x4coXQE__3F4" + }, + "source": [ + "##Modelo RandomForest antes do Balanceamento" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "IW3B5aMVCaSK" + }, + "source": [ + "Agora usaremos o Random Forest, para utilizar ele precisamos importar a biblioteca `sklearn.ensemble` e escolher o modelo de `RandomForestClassifier`.\n", + "\n", + "Para acessar a documentação [clique Aqui](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html)" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "M5UrVmgV___L" + }, + "source": [ + "#instaciando o modelo e treinando\n", + "model_rf = RandomForestClassifier(n_estimators=100,random_state=SEED, n_jobs=-1)\n", + "model_rf.fit(x_train, y_train)\n", + "y_predict = model_rf.predict(x_test)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "eUDyo19DAKCy" + }, + "source": [ + "###Avaliando o Modelo" + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "dZu0ujmIDcme", + "outputId": "72701a3b-4d0e-4910-ab1c-e6b2f3c74b22" + }, + "source": [ + "model_rf.score(x_test, y_test)*100" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "64.35269419174247" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 225 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "FhyFkSvcAIPi", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "0f21c5a0-6855-4aa1-d431-2d43fbf64e11" + }, + "source": [ + "print('Classification metrics--> \\n', classification_report(y_test,y_predict))" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "text": [ + "Classification metrics--> \n", + " precision recall f1-score support\n", + "\n", + " 0 0.62 0.26 0.37 2841\n", + " 1 0.65 0.90 0.75 4304\n", + "\n", + " accuracy 0.64 7145\n", + " macro avg 0.64 0.58 0.56 7145\n", + "weighted avg 0.64 0.64 0.60 7145\n", + "\n" + ], + "name": "stdout" + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Um4es6S6Dl9e" + }, + "source": [ + "O Random Forest ainda ficou pior que o XGBoost, com 64%. \n", + "\n", + "Na nossa `analise de exploratória`, vimos que a coluna que `com_droga` está `muito desbalanceada`, vamos balancear ela e verificar se `melhoramos` o score dos `nossos modelos`" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "_2_uO7c1AXmH" + }, + "source": [ + "#***Balanceando os Dados***" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "5Slq9I8_E6Q9" + }, + "source": [ + "Desbalanceamento ocorre quando temos um conjunto de dados que possui `muitos exemplos` de uma classe e `poucos exemplos` da outra classe. Como vimos na coluna `com_droga`.\n", + "\n", + "Quando isso ocorre, temos algumas técnicas para aplicar , por exemplo o Upsample e Downsample. " + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "RIUKkolyEfg3" + }, + "source": [ + "***Upsample:***\n", + "\n", + "\"Upsampling é um técnica de processamento digital de sinais para aumentar artificialmente a taxa de amostragem em N vezes, inserindo um número N-1 de zeros entre as amostras originais do sinal, e passando o conjunto obtido por um filtro de reconstrução, que nada mais é que um filtro do tipo passa-baixas. Para mostrar essa técnica utilizaremos o programa anterior ligeiramente modificado. O script a seguir simula o upsampling da senoide ideal e realiza a filtragem de reconstrução.\"\n", + "\n", + "***Downsample:***\n", + "\n", + "\"Downsampling ou decimação é a técnica de redução da taxa de amostragem. Isso é feito simplesmente separando uma amostra a cada N. Há uma inevitável distorção do sinal.\"\n", + "\n", + "Fonte: [Clique Aqui](https://www.embarcados.com.br/oversampling-upsampling-downsampling-dsp/#:~:text=A%20sobre%2Damostragem%20(oversampling),artificial%20da%20taxa%20de%20amostragem.)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "t35_Sx_xEDku" + }, + "source": [ + "\"imersão\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "JbMGkArYGMdd" + }, + "source": [ + "Vamos verificar nossa coluna e a frequencia de seus valores" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "XFZO4asAA0aM", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 380 + }, + "outputId": "03968605-1b6f-47ff-aa36-76248716f0b3" + }, + "source": [ + "sns.countplot(x='com_droga', data = df, palette=\"crest\")\n", + "plt.title('Tipos de Tratamentos', fontdict={'fontsize':18, 'fontweight': 'bold'},)\n", + "plt.xlabel('Tratamento') \n", + "plt.ylabel('Quantidade') \n", + "plt.show()" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "display_data", + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "tags": [] + } + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "8TKQFsLVGVuv" + }, + "source": [ + "Veja que está muito desbalanceado, isso faz com que a maquina preditiva aprenda muito quando é droga, mas não aprende quando não é droga, assim fazendo ela errar.\n", + "\n", + "Vamos resolver esse problema e vamos testar nossos modelos novamente." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "P48LXArDGkYt" + }, + "source": [ + "Escolhi a técnica de `Upsampling`, assim vamos aumentar o número de `0 ` que tem nessa coluna:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "-ssYCHhuA3cP" + }, + "source": [ + "#criando variáveis para a coluna\n", + "w= df.drop(['com_droga'],axis =1)\n", + "z = df['com_droga']" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "dHOxbE7FG3yt" + }, + "source": [ + "Vamos separar em variáveis qual tem mais e qual tem menos." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "wluSw4-PA3_j" + }, + "source": [ + "classe_com_mais = df[df.com_droga == 1]\n", + "classe_com_menos = df[df.com_droga == 0]" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "MJdctYyXA54s", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "83ac5903-43e2-4e97-9974-bea33771ca26" + }, + "source": [ + "#Classe antes do Upsampling\n", + "classe_com_mais.shape" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "(21948, 883)" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 231 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "c9BHRNb5A64q", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "7a3eb2ce-9f38-4e0d-d6bc-15d981a84764" + }, + "source": [ + "#Classe antes do Upsampling\n", + "classe_com_menos.shape" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "(1866, 883)" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 232 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "uoWLJrbHHMit" + }, + "source": [ + "Agora vamos aplicar a técnica UpSampling, vamos importar a biblioteca `sklearn.utils` e escolher o `resample`." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "pRqy-2LfBH6Y" + }, + "source": [ + "classe_com_menos_upsample = resample(classe_com_menos, replace = True,\n", + " n_samples = 21948,\n", + " random_state = SEED)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "J6hVOgH8Hltv" + }, + "source": [ + "Depois de feito a técnica, vamos concatenar os dados com o `.concat`." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "HAdpe69iBJtJ" + }, + "source": [ + "dados = pd.concat([classe_com_mais, classe_com_menos_upsample])" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "2QTR5Aq3IB8t" + }, + "source": [ + "Vamos verificar se foi feita a técnica? vamos utilizar o .`value_counts()`" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "8JiH7RQVBL8H", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "367ad045-b5b3-4df4-ea22-22e57f0aea21" + }, + "source": [ + "dados.com_droga.value_counts()" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "1 21948\n", + "0 21948\n", + "Name: com_droga, dtype: int64" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 235 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "W_3Q45VPBNLc", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "fdb12d0c-b4e4-456a-c9b1-976d0d551332" + }, + "source": [ + "dados.info()" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "text": [ + "\n", + "Int64Index: 43896 entries, 0 to 8203\n", + "Columns: 883 entries, id to atv_moa\n", + "dtypes: float64(872), int64(8), object(3)\n", + "memory usage: 296.1+ MB\n" + ], + "name": "stdout" + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "uyUoOozFBRE4", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 380 + }, + "outputId": "6c6a9b5c-ade1-4074-9934-c02a6651f825" + }, + "source": [ + "#Criando o Gráfico\n", + "sns.countplot(x='com_droga', data = dados, palette=\"crest\")\n", + "plt.title('Tipos de Tratamentos', fontdict={'fontsize':18, 'fontweight': 'bold'},)\n", + "plt.xlabel('Tratamento') \n", + "plt.ylabel('Quantidade') \n", + "plt.show()" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "display_data", + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "tags": [] + } + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "hoGQOlW4IPGs" + }, + "source": [ + "Pronto, agora a nossa coluna está balanceada, será que teremos uma melhora no nosso modelo? Vamos conferir!" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "T8Sjx5ecBTOu" + }, + "source": [ + "#**Criando Maquina Preditiva depois do Balanceamento**" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Va30I-CfIk3G" + }, + "source": [ + "Agora que temos a coluna balanceada, vamos dividir novamente nossos dados em treino e teste usando o `train_test_split` da biblioteca `sktlearn`
\n", + "Colocamos no `X` as variáveis preditoras (variável que auxiliam na predição).
\n", + "Já no `y` sempre colocamos a variável target (variável que queremos prever)." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "RdVVTX3sBfpr" + }, + "source": [ + "x = dados.select_dtypes('float64','int64')\n", + "y = dados['atv_moa']\n", + "\n", + "x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.3, random_state = SEED)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "_N89FB56KM-Q" + }, + "source": [ + "Agora já separado, vamos aplicar os modelos de `aprendizado de máquina.`" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "JeL3iKwsBvlw" + }, + "source": [ + "##Modelo Regressão Logistica depois do Balanceamento" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "8QQQwm-nKmqm" + }, + "source": [ + "Vamos utilizar primeiro o modelo Regressão Logistica " + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "G4qrRJhWBu6y" + }, + "source": [ + "model_rl = LogisticRegression(max_iter=1000)\n", + "model_rl.fit(x_train, y_train)\n", + "y_predict = model_rl.predict(x_test)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "lx3gMGQqCGpL" + }, + "source": [ + "###Avaliando o modelo" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "LSUKUjz3B-N_", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "b461281c-2c6c-48a3-de53-c7358aeb0d36" + }, + "source": [ + "model_rl.score(x_test, y_test)*100" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "78.09248993849192" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 242 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "tOK0sCvdKr2d" + }, + "source": [ + "Uau que incrível!
\n", + "\"tiago\"\n", + "\n", + "\n", + "Antes sem o balanceamento da coluna `com_droga` o modelo de `Regressão Logística` tinha um score de 62% e só com o balanceamento subiu para 78%, isso é **muito bom**... estamos no **caminho certo**!\n", + "\n" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "kAdQ-iWnCDEs", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "eb6acf94-6f9b-4fcb-9179-08f3beee3972" + }, + "source": [ + "print('Classification metrics--> \\n', classification_report(y_test,y_predict))" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "text": [ + "Classification metrics--> \n", + " precision recall f1-score support\n", + "\n", + " 0 0.82 0.86 0.84 8776\n", + " 1 0.69 0.62 0.65 4393\n", + "\n", + " accuracy 0.78 13169\n", + " macro avg 0.76 0.74 0.75 13169\n", + "weighted avg 0.78 0.78 0.78 13169\n", + "\n" + ], + "name": "stdout" + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "j95dH3TXLm81" + }, + "source": [ + "Agora vamos testar o modelo que se saiu melhor com a coluna desbalanceada o XGBoost" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ONT8QS25CqIb" + }, + "source": [ + "##Modelo XGB depois do Balanceamento" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "vXASElvfCJBT" + }, + "source": [ + "#Instanciando o modelo e treinando\n", + "model_x=XGBClassifier()\n", + "model_x.fit(x_train,y_train)\n", + "y_predict=model_x.predict(x_test)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "navZD2kgCdli" + }, + "source": [ + "###Avaliando o Modelo" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "RCn1-jWoCOVN", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "1010b770-5b3b-4ad8-ee5b-c47394b56a24" + }, + "source": [ + "model_x.score(x_test, y_test)*100" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "77.86468220821627" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 248 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "aFyrvuEgCXoy", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "071611e6-a632-4c4e-c7bc-c7543f4564bd" + }, + "source": [ + "print('Classification metrics--> \\n', classification_report(y_test,y_predict))" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "text": [ + "Classification metrics--> \n", + " precision recall f1-score support\n", + "\n", + " 0 0.82 0.85 0.84 8776\n", + " 1 0.68 0.63 0.65 4393\n", + "\n", + " accuracy 0.78 13169\n", + " macro avg 0.75 0.74 0.75 13169\n", + "weighted avg 0.77 0.78 0.78 13169\n", + "\n" + ], + "name": "stdout" + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Hqx7eES9MvEo" + }, + "source": [ + "Esse também melhorou depois do balanceamento, antes era `65%` e agora está com `77%`, teve uma melhora de 123%, porém o modelo de Regressão Linear está melhor que ele. " + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "lZ88Ga6bNPXv" + }, + "source": [ + "O que mais podemos fazer para melhorar nossas métricas?\n", + "\n", + "Vamos utilizar a técnica de Feature Seletion." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Y48QXabhD-Pg" + }, + "source": [ + "#***Feature Selection (Seleção de Variáveis)***" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "_WTkfjPPETBQ" + }, + "source": [ + "Há boatos que para criar um modelo de aprendizado de máquina precisa de muita calma, paciencia e muito teste. Realmente, você precisa saber diversas técnicas, como elaborar, construir e refinar o seu modelo e testar! \n", + "\n", + "Uma dessas técnicas é a Seleção de Variáveis. Assim como o nome já diz, essa técnica seleciona as melhores variáveis para o modelo, ter variáveis de mais isso pode prejudicar a performance do algoritmo. \n", + "\n", + "Muitas features podem causar problemas como duração para treinamento do modelo ou dificuldades de colocar modelo de produção. Essa técnica ajuda a reduzir o overfitting e aumenta a accurácia do modelo e como já foi falado reduz o tempo de treino. \n", + "\n", + "Se quiser saber mais sobre essas técnicas em python [Clique Aqui](https://medium.com/data-hackers/como-selecionar-as-melhores-features-para-seu-modelo-de-machine-learning-faf74e357913)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "9sG8T_tzPKGN" + }, + "source": [ + "Para isso vamos utilizar o *SelectFromModel*, importanto a biblioteca `sklearn.feature_selection` e selecionar o `SelectFromModel`\n", + "\n", + "para acessar a documentação [clique aqui](https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.SelectFromModel.html)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "a56qrp26Pv0p" + }, + "source": [ + "Vamos utilizar o modelo RandomForest para classificar as melhores variáveis, depois vamos treinar com RandomForest também para verificar as métricas." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "IQWvhPwpQMlV" + }, + "source": [ + "Para isso utilizei uma estrutura de repetição chamada For, para verificar em 10 e 10 quais as melhores variáveis.\n", + "\n", + "Deixei o código comentado pois ele demora muito para executar!" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "lkcLBgbwQbOu" + }, + "source": [ + "\n", + " '''k_vs_score = []\n", + "#for k in range(10, x_train.shape[1], 10):\n", + " model_rf = RandomForestClassifier(n_estimators=100,random_state=SEED, n_jobs=-1)\n", + " selector = SelectFromModel(model_rf, max_features=40, threshold=-np.inf)\n", + " selector.fit(x_train, y_train)\n", + "\n", + " x_train2 = selector.transform(x_train)\n", + " x_test2 = selector.transform(x_test)\n", + "\n", + " mdl = RandomForestClassifier(n_estimators=100,random_state=SEED, n_jobs=-1)\n", + " mdl.fit(x_train2, y_train) \n", + " p = mdl.predict(x_test2)\n", + "\n", + " score = mdl.score(x_test2, y_test)\n", + " print(\"k = {} - MAE = {}\".format(k, score))\n", + "\n", + "\n", + " mask = selector.get_support()\n", + " print(Xtrain.columns[mask])\n", + " k_vs_score.append(score)\n", + "'''" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "gKvgdmTgQpB2" + }, + "source": [ + "Após executar todo o código, foi observado que o melhor `valor` e onde o `score` está mais estável é quando o` K` é igual a `40`, `k` significa o número de variáveis,conforme imagem abaixo. " + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "tt-B3hSdSZbd" + }, + "source": [ + "![Screenshot_1.png]()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "y62TjDIMSj__" + }, + "source": [ + "Vamos utilizar o `get_support() `para criar uma mascara e pegar as melhores variáveis que o `SelectFromModel` selecionou." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "CYy7iEETEMXC", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "da21f415-c36f-47f6-9444-35d1c01e1197" + }, + "source": [ + "model_rf = RandomForestClassifier(n_estimators=100,random_state=SEED, n_jobs=-1)\n", + "selector = SelectFromModel(model_rf, max_features=40, threshold=-np.inf)\n", + "selector.fit(x_train, y_train)\n", + "\n", + "x_train2 = selector.transform(x_train)\n", + "x_test2 = selector.transform(x_test)\n", + "\n", + "mdl = RandomForestClassifier(n_estimators=100,random_state=SEED, n_jobs=-1)\n", + "mdl.fit(x_train2, y_train) \n", + "p = mdl.predict(x_test2)\n", + "\n", + "score = mdl.score(x_test2, y_test)\n", + "print(\"k = {} - Score = {}\".format(40, score))\n", + "\n", + "\n", + "mask = selector.get_support()\n", + "print(x_train.columns[mask])" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "text": [ + "k = 40 - Score = 0.8124383020730503\n", + "Index(['g39', 'g50', 'g51', 'g57', 'g58', 'g68', 'g75', 'g100', 'g122', 'g138',\n", + " 'g175', 'g178', 'g206', 'g223', 'g230', 'g312', 'g317', 'g322', 'g328',\n", + " 'g352', 'g365', 'g367', 'g418', 'g445', 'g463', 'g524', 'g525', 'g529',\n", + " 'g600', 'g620', 'g635', 'g656', 'g689', 'g701', 'g738', 'g764', 'c13',\n", + " 'c65', 'c73', 'c98'],\n", + " dtype='object')\n" + ], + "name": "stdout" + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "-PzoF46ZTJR2" + }, + "source": [ + "Esses são os nomes das colunas que o `SelectFromModel ` selecionou para ser as *melhores* `variáveis preditoras` do modelo." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "lQPIWjRWE2X_" + }, + "source": [ + "[['g39', 'g50', 'g51', 'g57', 'g58', 'g68', 'g75', 'g100', 'g122', 'g138',\n", + " 'g175', 'g178', 'g206', 'g223', 'g230', 'g312', 'g317', 'g322', 'g328',\n", + " 'g352', 'g365', 'g367', 'g418', 'g445', 'g463', 'g524', 'g525', 'g529',\n", + " 'g600', 'g620', 'g635', 'g656', 'g689', 'g701', 'g738', 'g764', 'c13',\n", + " 'c65', 'c73', 'c98']]" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "fi_SaOdCTism" + }, + "source": [ + "Agora vamos colocar no x as nossas novas variáveis preditoras" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "g_dscgJ_EVtH" + }, + "source": [ + "x = dados[['g39', 'g50', 'g51', 'g57', 'g58', 'g68', 'g75', 'g100', 'g122', 'g138',\n", + " 'g175', 'g178', 'g206', 'g223', 'g230', 'g312', 'g317', 'g322', 'g328',\n", + " 'g352', 'g365', 'g367', 'g418', 'g445', 'g463', 'g524', 'g525', 'g529',\n", + " 'g600', 'g620', 'g635', 'g656', 'g689', 'g701', 'g738', 'g764', 'c13',\n", + " 'c65', 'c73', 'c98']]\n", + "\n", + "y = dados['atv_moa']" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "3OGyjUrQGTIi" + }, + "source": [ + "#**Criação de Maquina Preditiva após Balanceamento e Seleção de Variáveis**" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "RYlokH8cTy21" + }, + "source": [ + "Agora que já fizemos o balanceamento, a selecão de variáveis, vamos verificar se o modelo melhorou sua performance?\n", + "\n", + "\"win\"\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "APDeT3xhUqI1" + }, + "source": [ + "Agora vamos utilizar os mesmos modelos preditivos de Classificação: `Regressão Logistica`, `XGBoost` e `RandomForest`" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "_1GGvIrBF0UO" + }, + "source": [ + "##Regressão Logistica após Balanceamento e Selecão de Variáveis" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "f1fflXLMF4Jd" + }, + "source": [ + "#treinando o modelo\n", + "model_rl.fit(x_train, y_train)\n", + "y_predict = model_rl.predict(x_test)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "eQBfJW8uGBkS", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "8eb646ae-35dc-4fa7-ec1c-487116bdaa3e" + }, + "source": [ + "print('Classification metrics--> \\n', classification_report(y_test,y_predict))" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "text": [ + "Classification metrics--> \n", + " precision recall f1-score support\n", + "\n", + " 0 0.82 0.86 0.84 8776\n", + " 1 0.69 0.62 0.65 4393\n", + "\n", + " accuracy 0.78 13169\n", + " macro avg 0.76 0.74 0.75 13169\n", + "weighted avg 0.78 0.78 0.78 13169\n", + "\n" + ], + "name": "stdout" + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "mIK4HTTsFtxk" + }, + "source": [ + "##XGB após Balanceamento e Selecão de Variáveis" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "U5Xq193XFeu5", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "a13a69d1-6970-4a7a-c6b3-64e11ff0f8af" + }, + "source": [ + "model_x.fit(x_train,y_train)\n", + "model_x.predict(x_test)" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "array([0, 0, 0, ..., 0, 0, 0])" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 254 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "5Tas6A6BFc__", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "84171fc4-1be7-4302-bbb9-34f993580e4d" + }, + "source": [ + "print('Classification metrics--> \\n', classification_report(y_test,y_predict))" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "text": [ + "Classification metrics--> \n", + " precision recall f1-score support\n", + "\n", + " 0 0.82 0.86 0.84 8776\n", + " 1 0.69 0.62 0.65 4393\n", + "\n", + " accuracy 0.78 13169\n", + " macro avg 0.76 0.74 0.75 13169\n", + "weighted avg 0.78 0.78 0.78 13169\n", + "\n" + ], + "name": "stdout" + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "rb2KC-rhFJ-j" + }, + "source": [ + "##Random Forest após Balanceamento e Selecão de Variáveis" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "qhsVButCEwbY" + }, + "source": [ + "mdl.fit(x_train, y_train)\n", + "y_predict = mdl.predict(x_test)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "grzOk3oFFCbq", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "cc66349d-32e4-4601-fbc1-a85fff98b181" + }, + "source": [ + "print('Classification metrics--> \\n', classification_report(y_test,y_predict))" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "text": [ + "Classification metrics--> \n", + " precision recall f1-score support\n", + "\n", + " 0 0.89 0.81 0.85 8776\n", + " 1 0.68 0.80 0.73 4393\n", + "\n", + " accuracy 0.81 13169\n", + " macro avg 0.78 0.80 0.79 13169\n", + "weighted avg 0.82 0.81 0.81 13169\n", + "\n" + ], + "name": "stdout" + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "sFfV8U1sW_Kt" + }, + "source": [ + "Vamos fazer uma comparação entre os modelos" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "T0ajI-ShVveV" + }, + "source": [ + "Modelo |Sem Balanceamento | Com Balanceamento | Com Seleção de Variáveis e Balanceamento |\n", + "-------------------|------------------|------------------|------------------\n", + "Regressão Logística | 62% | 78% | 78%\n", + "XGBoost | 65% | 77% | 78%\n", + "RandomForest | 64%| Não utilizado | 81%\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "VCvQ7SBCXB8E" + }, + "source": [ + "O modelo de RandomForest teve uma melhor métrica 81% de score.\n", + "\n", + "Portanto, o RandomForest ganhou o Objetivo 1!\n", + "\"win\"\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "S0Ao3kotHxmr" + }, + "source": [ + "#***Criação da Maquina Preditiva***\n", + "Objetivo 2: Prever se o experimento foi tratado com Droga ou Com controle." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "4Gt65VCKXKL1" + }, + "source": [ + "Agora vamos partir para nosso segundo objetivo! \n", + "\n", + "Prever se o experimento foi utilizado droga ou controle." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "-HQLpNOVXSRd" + }, + "source": [ + "Para isso vamos selecionar nosso` X e Y`\n", + "\n", + "Nosso `y `será a coluna `com_droga` e vamos selecionar todas as variáveis que são inteiro ou float para ser nosso `x`\n", + "\n", + "após isso vamos fazer a divisão da base de dados em teste e treino, deixando 30% para testar." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "-2HVsErfID0T" + }, + "source": [ + "X = dados.select_dtypes('float64','int64')\n", + "y = dados['com_droga']\n", + "\n", + "X_train, X_test, y_train, y_test = train_test_split(x, y, test_size = 0.3, random_state = SEED)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "6iB40sXrXw_9" + }, + "source": [ + "Vamos utilizar nossos famosos modelos que já foram para o paredão diversas vezes no último objetivo. Quem será que vai ganhar esse objetivo? Pra quem vai a sua torcida?" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "sDCNDLRlIRn7" + }, + "source": [ + "##Regressão Logistica" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "5027OuDcIWeJ" + }, + "source": [ + "model_rl.fit(X_train, y_train)\n", + "y_predict = model_rl.predict(X_test)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "rfcpIlh2I-ZV", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "0f2ca93c-c938-4d89-fb64-2cbf8ea61014" + }, + "source": [ + "print('Classification metrics--> \\n', classification_report(y_test,y_predict))" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "text": [ + "Classification metrics--> \n", + " precision recall f1-score support\n", + "\n", + " 0 0.85 0.88 0.87 6539\n", + " 1 0.88 0.85 0.86 6630\n", + "\n", + " accuracy 0.86 13169\n", + " macro avg 0.87 0.86 0.86 13169\n", + "weighted avg 0.87 0.86 0.86 13169\n", + "\n" + ], + "name": "stdout" + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Z3pdlB7WIqkB" + }, + "source": [ + "##XGBoost" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "isSL5qBNIp_C", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "16edc44a-eb67-4d89-9b9d-fef9e694119d" + }, + "source": [ + "#Treinando o modelo XGB\n", + "model_x.fit(X_train,y_train)\n", + "model_x.predict(X_test)" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "array([1, 0, 1, ..., 0, 0, 0])" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 263 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "W6I0bkDrJFKb", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "1ba1f091-1726-49e3-f473-62914f995219" + }, + "source": [ + "print('Classification metrics--> \\n', classification_report(y_test,y_predict))" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "text": [ + "Classification metrics--> \n", + " precision recall f1-score support\n", + "\n", + " 0 0.85 0.88 0.87 6539\n", + " 1 0.88 0.85 0.86 6630\n", + "\n", + " accuracy 0.86 13169\n", + " macro avg 0.87 0.86 0.86 13169\n", + "weighted avg 0.87 0.86 0.86 13169\n", + "\n" + ], + "name": "stdout" + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "8CwYWFE7Ig_K" + }, + "source": [ + "## RandomForest" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "d3uxPy0_Icqb" + }, + "source": [ + "#Treinando o modelo\n", + "mdl.fit(X_train, y_train)\n", + "y_predict = mdl.predict(X_test)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "kPG5Pu2ZIn9Z", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "b2c72156-8cc9-4115-a46f-99e637fecb64" + }, + "source": [ + "print('Classification metrics--> \\n', classification_report(y_test,y_predict))" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "text": [ + "Classification metrics--> \n", + " precision recall f1-score support\n", + "\n", + " 0 0.98 1.00 0.99 6539\n", + " 1 1.00 0.98 0.99 6630\n", + "\n", + " accuracy 0.99 13169\n", + " macro avg 0.99 0.99 0.99 13169\n", + "weighted avg 0.99 0.99 0.99 13169\n", + "\n" + ], + "name": "stdout" + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "tCOjx9rzZFDg" + }, + "source": [ + "O resultado é ? \n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "opAJ39naYbEe" + }, + "source": [ + "Modelo |Resultado | \n", + "-------------------|------------------|\n", + "Regressão Logística | 86% \n", + "XGBoost | 86%\n", + "RandomForest | 99%" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ID_nyzRlZiR8" + }, + "source": [ + "RandomForest ganhou novamente!!!!\n", + "\n", + "Desta vez com um alto índice de acerto 99%!\n", + "\n", + "\"win\"\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "v2IaZFwF-SG2" + }, + "source": [ + "#***Conclusão***" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "97bkoWY1e6vm" + }, + "source": [ + "\"Paredawn\"" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "uCNM32VWgYST" + }, + "source": [ + "Essa edição acaba aqui, após todos os `paredawns` e toda a nossa jornada para a criação das maquinas preditivas, espero que tenha se divertido no processo e tenha entendido!" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "VmNwJc2xgZtO" + }, + "source": [ + "\"Paredawn\"\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "fo1CNrHKgvh9" + }, + "source": [ + "

Concluímos os objetivos que tinhamos traçados no começo deste projeto, pois criamos dois modelos que possuem um score de 81% e 99%, um para prever se o MoA vai ativar nos experimentos e o outro para prever se os experimentos foram utilizados droga ou controle. " + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Z9WtY9oygrp1" + }, + "source": [ + "\"win\"" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "MAD7OIR8bs19" + }, + "source": [ + "

Neste projetos aprendemos um pouco mais sobre essa area incrivel de Bioinformativa, a explorar os conjunto de dados, balancear os dados, selecionar as variáveis, escolher os diversos modelos de classificão para prever a nossa variável target. \n", + "\n", + "Mas não acaba por aqui! Ainda tem muitas coisas para fazer nesse projeto, como:\n", + "\n", + "\n", + "\n", + "* Balancear as outras colunas;\n", + "* Testar novos modelos de classificação;\n", + "* Tratar mais os dados;\n", + "* Normalizar os dados;\n", + "* Padronizar os dados;\n", + "* Prever outras variáveis;\n", + "\n", + "Obrigado por ter chegado até aqui!\n", + "\n", + "\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "LtchdLyne_QP" + }, + "source": [ + "#***FIM***" + ] + } + ] +} \ No newline at end of file