Semantic Structure Identification using Image Detection Algorithms

#Python #TensorFlow #DeepLearning #UnsupervisedLearning #Tesseract #OpenCV

Project Paper - https://www.dropbox.com/s/q4ssacdievrmu52/Semantic%20Structure%20Identification%20using%20Image%20Detection%20Algorithms.docx?dl=0

Project Poster - https://www.dropbox.com/s/az6d160ao9zcv09/Informs%20BA%20Table%20Cell%20Detection%20Poster.pptx?dl=0

Abstract

In this project, we aim to develop a tool capable of identifying useful semantic structures from various file formats such as PDFs, images, etc. The tool attempts to identify cells of semantic continuity(e.g., table cells)within files using Image recognitionand table identification from images. The tool will first convert each page of the document to an image in order to detect these cells. The second stage will detect semantic structures and linkages within the data (such asdata presentin tables).To perform the task described in the previous paragraph, we collected over 400 PDF files that contain textual as well as tabular data. These files are used in our model as part of the training data. Thesefiles are converted to JPEG format using OpenCV.Toolssuch as PyTesseract are used to identify cells and create bounding boxes around them, andMachine Learning/Artificial Intelligence frameworks such as TensorFlow and Luminoth are utilized to extract tables from the images by training deep learning algorithms. Finally, arrays of these bounding boxes are analyzed to detect semantic structures.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
Table_cell_detection.ipynb		Table_cell_detection.ipynb
Tensor_Flow_Table _Detection.ipynb		Tensor_Flow_Table _Detection.ipynb
data.zip		data.zip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Semantic Structure Identification using Image Detection Algorithms

Abstract

About

Releases

Packages

Languages

kapitsa2811/table_cell_detection

Folders and files

Latest commit

History

Repository files navigation

Semantic Structure Identification using Image Detection Algorithms

Abstract

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages