Information Extraction from Unstructured Document
Keywords:
Information Extraction XY Cut Algorithm Ordering Problem Page segmentation Data miningAbstract
Now a days, PDF (Portable Document Format )is commonly used in industry as a common format for
data exchange. Extraction of information from unstructured document gives permission for analyzing and representing
in structured format. In this paper we present system for discovering knowledge from PDF and then represent it in
EXCEL
format .For this conversion first extraction of string contained in PDF is done and then applies different components to
express in Excel (the logically structured document).