The aim of this project is to develop a Python program that can be used to identify and extract various information from a set of engineering drawings with different layout designs using image processing. Generally, the information can be divided into two main categories, the drawing, and the title block that contains various information related to the drawing or the project (e.g. drawing number, author, status, etc.).
In this case, you are required to extract all the information and put them into two separate files. The drawing should be saved into a separate image file, and the contents extracted from the title block (with respective field titles) should be saved into an Excel file. You can create your own naming convention to name these two files, but do not change the naming of those engineering drawings (e.g. 01.png, 02.png, …) given to you.
You program should achieve the following requirements:
- Able to identify and extract the drawing number (e.g. SU BOL E 01 27 09 10 B) from the set of engineering drawings given to you and save them into an Excel file.
- Able to also extract other information (e.g. drawing title, author, status, etc.) from the title block and save them into an Excel file. The contents should match with the respective field titles extracted from the engineering drawing. For example, the content “CWC” should be placed beside the “DRAWN BY” field title in the Excel file.
- Able to identify and extract the drawing (only the drawing) and save it into a separate image file with .png extension.
- Able to handle an additional engineering drawing with layout design not known to you (will not be “very different” from the engineering drawings given to you). This is to evaluate how well your program (or is it smart enough) in handling engineering drawings with different layout designs.
You are allowed to use the following packages or libraries to complete the project:
- Python Standard (List)
- Google Tesseract (PyTesseract)
- OpenCV
- NumPy
- Matplotlib
- Openpyxl