Datamining geo drilling reports with semantic analysis
Energie Beheer Nederland (EBN) is a non-operating energy company, managing the interests of the Dutch government in the subsurface energy value chain. Within this role EBN puts great effort into facilitating all stakeholders with insights and data from historical energy projects. EBN’s goal is to make future projects more safe and efficient.
Activating the geo-drilling reports archive
EBN has a large archive of geo-drilling reports. These reports describe the progress and finding of a specific exploration well. The information in these reports contain a lot of insights into the expected results of drilling a well within a certain geological formation. If these insights can be shared across all parties, the costs of future projects can be reduced and potential incidents prevented. The problem, however, is that most of these reports are scanned versions of typed reports without any meta data or look-up function. Initially, a set of about 800 documents were labelled manually, however this is a very labor intensive process. That’s why EBN asked Helin to train semantic algorithms to label the historical reports and extract specific meta data.
To extract valuable information from the archive a couple of main challenges had to be overcome. These challenges mostly centered around the quality of the available documents and the events described in these reports.
Poor data quality
As it involved archived documents of sometimes 30 or more years old, many documents had been created using typewriters. These documents were digitized via optical character recognition (OCR), but their digital quality was often not up to standards. This variance in data quality placed an additional focus on including techniques to retrieve as much information from the documents as possible.
Lack of naming convention
The main goal of the project was to label each document with multiple conclusions and retrieve specific meta data from them. However, there was no uniform structured text format in the underlying documents, i.e. geo drilling reports. Nor was there a standardized naming convention for indicating specific events during the drilling processes. These events only became apparent if you understood the context. This means you needed domain expertise to interpret the texts.
Solution: semantic data mining
TA powerful document processing tool was used to label all the documents and collect all the required information, along with dedicated semantic analysis algorithms.
Viewport for document enhancement
The first step was the enhancement of the available documents. These documents were entered in the Viewport document solution. Its document processing engine used advanced OCR and text analysis techniques to get much better results then regular OCR techniques. This allowed us to retrieve more information from the digitized documents and we were able to effectively cleanse the document data set. In addition, Viewport created a relational database between all the documents and allowed for fast search functions.
Semantic algorithms were trained based on a data set of 200 documents that had previously been manually labelled. These Naive Bayes learning algorithms were able to automatically label the remaining documents with an accuracy comparable to humans. The occurrence of specific events in the documents were labelled and the prescribed meta data was collected in generated reports. This allowed the document reviewers to reduce their workload by more than 90 percent.
The main result of this project was the workload reduction of over 90%. This meant the archive could be processed 10 times as fast, making the required insights available to the stakeholders at a much earlier instance. Also, the relational setup of the reports in Viewport made any additional search for information both fast and intuitive.
Automatic labels to indicate
Meta data retrieved from
Fast search functions
within Viewport setup
Want to learn more?
If you want to learn more about this case or what we can do for you, contact us today.