Skip to content

A tool designed for rapid CSV file processing and filtering, specifically designed for log analysis.

License

Notifications You must be signed in to change notification settings

sumeshi/snip-snap-csv

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

37 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

snip-snap-csv

MIT License PyPI version Python Versions

A tool designed for rapid CSV file processing and filtering, specifically designed for log analysis.

Description

Note

This project is in the early stages of development. Please be aware that frequent changes and updates are likely to occur.

Usage

$ sscsv {{initializer}} {{Arguments}} - {{chainable}} {{Arguments}} - {{chainable}} {{Arguments}} - {{finalizer}} {{Arguments}}

e.g. Below is an example of reading a CSV file, extracting rows that contain 4624 in the EventID column, and displaying the top 3 rows them sorted by the Timestamp column.

$ sscsv load Security.csv - isin 'Event ID' 4624 - sort 'Date and Time' - head 3
2024-06-26T17:29:19+0000 [DEBUG] 1 files are loaded. Security.csv
2024-06-26T17:29:19+0000 [DEBUG] filter condition: 4624 in Event ID
2024-06-26T17:29:19+0000 [DEBUG] sort by Date and Time (asc).
2024-06-26T17:29:19+0000 [DEBUG] heading 3 lines.
shape: (3, 5)
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Level       ┆ Date and Time         ┆ Source                          ┆ Event ID ┆ Task Category β”‚
β”‚ ---         ┆ ---                   ┆ ---                             ┆ ---      ┆ ---           β”‚
β”‚ str         ┆ str                   ┆ str                             ┆ i64      ┆ str           β”‚
β•žβ•β•β•β•β•β•β•β•β•β•β•β•β•β•ͺ═══════════════════════β•ͺ═════════════════════════════════β•ͺ══════════β•ͺ═══════════════║
β”‚ Information ┆ 10/6/2016 01:00:55 PM ┆ Microsoft-Windows-Security-Aud… ┆ 4624     ┆ Logon         β”‚
β”‚ Information ┆ 10/6/2016 01:04:05 PM ┆ Microsoft-Windows-Security-Aud… ┆ 4624     ┆ Logon         β”‚
β”‚ Information ┆ 10/6/2016 01:04:10 PM ┆ Microsoft-Windows-Security-Aud… ┆ 4624     ┆ Logon         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Archtecture

This tool processes csv by connecting three processes: initializer, chainable, and finalizer.
For example, the initializer reads in the file, goes through multiple chainable processing steps, and then outputs the file using the finalizer.

Also, each process is explicitly separated from the others by "-".

initializer

load

Loads the specified CSV files.

Arguments:
  path*: str

examples

$ sscsv load ./Security.evtx
$ sscsv load ./logs/*.evtx

chainable manipulation

select

Displays the specified columns.

Arguments:
  columns: Union[str, tuple[str]]

examples

$ sscsv load ./Security.evtx - select 'Event ID'
$ sscsv load ./Security.evtx - select "Date and Time-Event ID"
$ sscsv load ./Security.evtx - select "'Date and Time,Event ID'"

isin

Displays rows that contain the specified values.

Arguments:
  colname: str
  values: list

examples

$ sscsv load ./Security.evtx - isin 'Event ID' 4624,4634

contains

Displays rows that contain the specified string.

Arguments:
  colname: str
  regex: str

examples

$ sscsv load ./Security.evtx - contains 'Date and Time' '10/6/2016'

head

Displays the first specified number of rows of the data.

Options:
  number: int = 5

examples

$ sscsv load ./Security.evtx - head 10

tail

Displays the last specified number of rows of the data.

Options:
  number: int = 5

examples

$ sscsv load ./Security.evtx - tail 10

sort

Sorts the data by the values of the specified column.

Arguments:
  columns: str

Options:
  desc: bool = False

examples

$ sscsv load ./Security.evtx - sort 'Date and Time'

changetz

Changes the timezone of the specified date column.

Arguments:
  columns: str

Options:
  timezone_from: str = "UTC"
  timezone_to: str = "Asia/Tokyo"
  new_colname: str = None

examples

$ sscsv load ./Security.evtx - changetz 'Date and Time' --timezone_from=UTC --timezone_to=Asia/Tokyo --new_colname='Date and Time(JST)'

finalizer

headers

Displays the column names of the data.

Options:
  plain: bool = False

examples

$ sscsv load ./Security.evtx - headers
2024-06-30T13:17:53+0000 [DEBUG] 1 files are loaded. Security.csv
┏━━━━┳━━━━━━━━━━━━━━━┓
┃ #  ┃ Column Name   ┃
┑━━━━╇━━━━━━━━━━━━━━━┩
β”‚ 00 β”‚ Level         β”‚
β”‚ 01 β”‚ Date and Time β”‚
β”‚ 02 β”‚ Source        β”‚
β”‚ 03 β”‚ Event ID      β”‚
β”‚ 04 β”‚ Task Category β”‚
β””β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

stats

Displays the statistical information of the data.

examples

$ sscsv load ./Security.evtx - stats
2024-06-30T13:25:53+0000 [DEBUG] 1 files are loaded. Security.csv
shape: (9, 6)
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ statistic  ┆ Level       ┆ Date and Time         ┆ Source                          ┆ Event ID    ┆ Task Category           β”‚
β”‚ ---        ┆ ---         ┆ ---                   ┆ ---                             ┆ ---         ┆ ---                     β”‚
β”‚ str        ┆ str         ┆ str                   ┆ str                             ┆ f64         ┆ str                     β”‚
β•žβ•β•β•β•β•β•β•β•β•β•β•β•β•ͺ═════════════β•ͺ═══════════════════════β•ͺ═════════════════════════════════β•ͺ═════════════β•ͺ═════════════════════════║
β”‚ count      ┆ 62031       ┆ 62031                 ┆ 62031                           ┆ 62031.0     ┆ 62031                   β”‚
β”‚ null_count ┆ 0           ┆ 0                     ┆ 0                               ┆ 0.0         ┆ 0                       β”‚
β”‚ mean       ┆ null        ┆ null                  ┆ null                            ┆ 5058.625897 ┆ null                    β”‚
β”‚ std        ┆ null        ┆ null                  ┆ null                            ┆ 199.775419  ┆ null                    β”‚
β”‚ min        ┆ Information ┆ 10/6/2016 01:00:35 PM ┆ Microsoft-Windows-Eventlog      ┆ 1102.0      ┆ Credential Validation   β”‚
β”‚ 25%        ┆ null        ┆ null                  ┆ null                            ┆ 5152.0      ┆ null                    β”‚
β”‚ 50%        ┆ null        ┆ null                  ┆ null                            ┆ 5156.0      ┆ null                    β”‚
β”‚ 75%        ┆ null        ┆ null                  ┆ null                            ┆ 5157.0      ┆ null                    β”‚
β”‚ max        ┆ Information ┆ 10/7/2016 12:59:59 AM ┆ Microsoft-Windows-Security-Aud… ┆ 5158.0      ┆ User Account Management β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

showquery

Displays the data processing query.

examples

sscsv load Security.csv - showquery
2024-06-30T13:26:54+0000 [DEBUG] 1 files are loaded. Security.csv
naive plan: (run LazyFrame.explain(optimized=True) to see the optimized plan)

  Csv SCAN Security.csv
  PROJECT */5 COLUMNS

show

Outputs the processing results to the standard output.

examples

$ sscsv load Security.csv - show
2024-06-30T13:27:34+0000 [DEBUG] 1 files are loaded. Security.csv
2024-06-30T13:27:34+0000 [DEBUG] heading 5 lines.
Level,Date and Time,Source,Event ID,Task Category
Information,10/7/2016 06:38:24 PM,Microsoft-Windows-Security-Auditing,4658,File System
Information,10/7/2016 06:38:24 PM,Microsoft-Windows-Security-Auditing,4656,File System
Information,10/7/2016 06:38:24 PM,Microsoft-Windows-Security-Auditing,4658,File System
Information,10/7/2016 06:38:24 PM,Microsoft-Windows-Security-Auditing,4656,File System
Information,10/7/2016 06:38:24 PM,Microsoft-Windows-Security-Auditing,4658,File System

dump

Outputs the processing results to a CSV file.

Options:
  path: str = yyyymmdd-HHMMSS_{QUERY}.csv

examples

$ sscsv load Security.csv - dump ./Security-sscsv.csv

Planned Features:

  • CSV cache (.pkl)
  • Filtering based on specific conditions (OR, AND conditions)
  • Grouping for operations like count
  • Joining with other tables
  • Config Batch
  • Export Config

Installation

from PyPI

$ pip install sscsv

from GitHub Releases

The version compiled into a binary using Nuitka is also available for use.

Ubuntu

$ chmod +x ./sscsv
$ ./sscsv {{options...}}

Windows

> sscsv.exe {{options...}}

License

snip-snap-csv is released under the MIT License.

About

A tool designed for rapid CSV file processing and filtering, specifically designed for log analysis.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published