markdown formatting fix: whitespace only; headings are always followed by an empty line.

GerHobbelt · stweil · commit 40cd874cf99c · 2023-12-05T18:42:09.000+01:00
diff --git a/Compiling-–-GitInstallation.md b/Compiling-–-GitInstallation.md
@@ -14,6 +14,7 @@
 These are the instructions for installing Tesseract from the git repository. You should be ready to face unexpected problems.
 
 ## Installing With Autoconf Tools
+
 In order to do this; you must have automake, libtool, leptonica, make and pkg-config installed. In addition, you need Git and a C++ compiler.
 
 On Debian or Ubuntu, you can probably install all required packages like this:
@@ -143,6 +144,7 @@ If you want to put the traineddata files in a different directory than the direc
 1. Place any language training data you need into this `tessdata` folder as well. For example, the English one is called `eng.traineddata`. Download it [from the tessdata repository here](https://github.com/tesseract-ocr/tessdata), and move it to your `tessdata` directory you just specified in your `TESSDATA_PREFIX` variable above. 
 
 ### Build with TensorFlow
+
 Building with TensorFlow requires additional packages for Protocol Buffers and TensorFlow.
 On Debian or Ubuntu, you can probably install them like this:
 
@@ -160,6 +162,7 @@ Build support with TensorFlow is a new feature in Git master. The resulting code
 
 
 ### Unit test builds
+
 Such builds can be used to run the automated regression tests, which have additional requirements. This includes the additional dependencies for the training tools (as mentioned above), and downloading all git submodules, as well as the model repositories (`*.traineddata`):
 
     # Clone the Tesseract source tree:
@@ -190,6 +193,7 @@ Failed tests will show prominently as segfaults or SIGILL handlers (depending on
 
 
 ### Debug Builds
+
 Such builds produce Tesseract binaries which run very slowly. They are not useful for production, but good to find or analyze software problems. This is a proven build sequence:
 
     cd tesseract
@@ -227,6 +231,7 @@ GNU gprof is used to show the profiling information from that file.
 
 
 ### Release Builds for Mass Production
+
 The default build creates a Tesseract executable which is fine for processing of single images. Tesseract then uses 4 CPU cores to get an OCR result as fast as possible.
 
 For mass production with hundreds or thousands of images that default is bad because the multi threaded execution has a very large overhead. It is better to run single threaded instances of Tesseract, so that every available CPU core will process a different image.
@@ -246,6 +251,7 @@ This disabled OpenMP (multi threading), does not use a shared Tesseract library
 disables setting of `errno` for mathematical functions (faster execution!) and enables lots of compiler warnings.
 
 ### Builds for fuzzing
+
 Fuzzing is used to test the Tesseract API for bugs. Tesseract uses [OSS-Fuzz](https://oss-fuzz.com/),
 but fuzzing can also run locally. A newer Clang++ compiler is required.
 
@@ -273,4 +279,5 @@ Example (Run the fuzzer to find new bugs):
     nice bin/fuzzer/fuzzer-api -jobs=16 -workers=16
 
 ## Building using Windows Visual Studio
+
 See [Compiling for Windows](Compiling.md#windows).
diff --git a/Data-Files-in-tessdata_best.md b/Data-Files-in-tessdata_best.md
@@ -15,6 +15,7 @@ network.
 There are two sections below: 125 languages, followed by 37 scripts.
 
 ### Languages (123 + osd + eq)
+
 All language and script models have the same values for the following parameters which have been removed from the
 individual descriptions: `int_mode=0, recoding=1, learning_rate=0.001, momentum=0.5, adam_beta=0.999 `
 
diff --git a/Docker-Containers.md b/Docker-Containers.md
@@ -1,18 +1,23 @@
 ## Tesseract 4 OCR Compilation - Docker Container
+
 [This Github repository](https://github.com/tesseract-shadow/tesseract-ocr-compilation) contains scripts and definition of Docker container that helps to compile Tesseract 4. 
 
 Automated build Docker image: [`docker pull tesseractshadow/tesseract4cmp`](https://hub.docker.com/r/tesseractshadow/tesseract4cmp/)
 
 ## Tesseract 4 OCR Runtime Environment - Docker Container
+
 If you are looking for ready to use Teserract 4 Runtime Environment container (and don't want to compile it) please take look at [this Github repository](https://github.com/tesseract-shadow/tesseract-ocr-re). The repository also contains some examples of usage.
 
 Automated build Docker image: [`docker pull tesseractshadow/tesseract4re`](https://hub.docker.com/r/tesseractshadow/tesseract4re/).
 
 ## Tesseract 4 OCR with OpenCV Environment - Docker Container
+
 Automate build Docker Image: [`docker pull mylamour/tesseract-ocr:opencv`]
 
 ## Building for Android with Docker
+
 [This Github repository](https://github.com/rhardih/bad/tree/master/tesseract) contains Docker images for Tesseract 4.0 and earlier.
 
 ## Docker - Get Started
+
 If you are not familiar with Docker please read [Docker - Get Started](https://docs.docker.com/get-started/).
diff --git a/Examples_C++.md b/Examples_C++.md
@@ -4,34 +4,49 @@ title: C++ API Examples
 ## C++ Examples
 
 ### Basic_example
+
 ```
 {% include_relative examples/Basic_example.cc %}
 ```
+
 ### SetRectangle_example 
+
 ```
 {% include_relative examples/SetRectangle_example.cc %}
 ```
+
 ### GetComponentImages_example
+
 ```
 {% include_relative examples/GetComponentImages_example.cc %}
 ```
+
 ### ResultIterator_example 
+
 ```
 {% include_relative examples/ResultIterator_example.cc %}
 ```
+
 ### OSD_example
+
 ```
 {% include_relative examples/OSD_example.cc %}
 ```
+
 ### LSTM_Choices_example
+
 ```
 {% include_relative examples/LSTM_Choices_example.cc %}
 ```
+
 ### OpenCV_example
+
 ```
 {% include_relative examples/OpenCV_example.cc %}
 ```
+
 ### UserPatterns_example
+
 ```
 {% include_relative examples/UserPatterns_example.cc %}
 ```
diff --git a/Fonts.md b/Fonts.md
@@ -108,6 +108,7 @@ The installed fonts are shown by the command `fc-list`. See also the [Debian wik
 * http://www.steffmann.de/wordpress/test-2/
 
 #### Arabic Fonts
+
 * https://fonts.google.com/?subset=arabic
 
 #### Devanagari Fonts
@@ -160,6 +161,7 @@ The installed fonts are shown by the command `fc-list`. See also the [Debian wik
 * http://www.morscher.com/3r/fonts/fraktur.htm
 
 #### Hebrew Fonts
+
 * [A list of Hebrew fonts from the Open Siddur Project](http://opensiddur.org/tools/fonts/)
 
 #### Collections of fonts
diff --git a/Planning.md b/Planning.md
@@ -89,16 +89,19 @@ Depending on available resources and opinions, these suggestions will either be
 
 * #### Add option to optionally select implementation for dot product (CPU, SSE, AVX, ...)
 
-* #### Relative includes for traineddata 
+* #### Relative includes for traineddata
+
   tessedit_load_sublangs should search for the sublangs relative to the parent, not starting in tessdata dir.
 
 * #### More fixes for compiler warnings and issues reported by Coverity Scan
 
 * #### Add a simple bash script for building tesseract
 
 * #### New traineddata format 
+
   In addition to the current proprietary format Tesseract could also support ZIP archives (see [discussion](https://github.com/tesseract-ocr/tesseract/pull/911)).
-A possible implementation using libarchive is [available](https://github.com/stweil/tesseract/tree/libarchive), but needs more testing.
+
+  A possible implementation using libarchive is [available](https://github.com/stweil/tesseract/tree/libarchive), but needs more testing.
 
 * #### "Training light" - Learning by doing (see [issue](https://github.com/tesseract-ocr/tesseract/issues/1442))
 
@@ -143,8 +146,11 @@ Here we collect important issues and features for the release(s) following 4.0.0
   This does not include OpenCL or the old Tesseract engine.
 
 * #### Tesseract creates output for missing input (see [issue 1023](https://github.com/tesseract-ocr/tesseract/issues/1023)).
+
   Mostly solved, but could be improved.
 
 
 * ####  Issue 1353: Patch for /training/tessopt.cpp (see [pull request 13](https://github.com/tesseract-ocr/tesseract/pull/13))
-  It looks like it is not possible to run more than one training in the same process. The pull request describes a possible fix, but does not include a complete implementation (low priority).
+
+  It looks like it is not possible to run more than one training in the same process. The pull request describes a possible fix, but does not include a complete implementation (low priority).
+
diff --git a/README.md b/README.md
@@ -153,11 +153,13 @@ Please use scripts from [tesseract-ocr/tesstrain](https://github.com/tesseract-o
 - [Training LSTM Tesseract 5](tess5/TrainingTesseract-5.md) - based on [detailed Tesseract 4 tutorial and guide by Ray Smith](tess4/TrainingTesseract-4.00.md)
 
 ### Testing
+
 - [Benchmarks](Benchmarks.md)
 - [TestingTesseract](TestingTesseract.md)
 - [UNLV Testing of Tesseract](UNLV-Testing-of-Tesseract.md)
 
 ### External Projects
+
 - [AddOns](AddOns.md)
 - [User Projects - 3rdParty](User-Projects-–-3rdParty.md)
 
diff --git a/User-Projects-–-3rdParty.md b/User-Projects-–-3rdParty.md
@@ -57,6 +57,7 @@
     * [Tesseract-OCR-iOS](https://github.com/gali8/Tesseract-OCR-iOS) - Tesseract OCR iOS is a Framework for iOS7+, compiled also for armv7s and arm64.
     * [OCR-iOS-Example](https://github.com/robmathews/OCR-iOS-Example) - a simple example of how to do optical character recognition (OCR) on iOS.
     * [Tesseract-iPhone-Demo ](https://github.com/nolanbrown/Tesseract-iPhone-Demo) - example based on tesseract 2.04.
+
   * _More OS_:
     * [ScanBizCards](http://www.scanbizcards.com): Mobile solution for business card scanning. _Requirements:_ iPhone 4/iPhone 3/Android 2.0
 
@@ -66,13 +67,15 @@
 ## 4. Others (Utilities, Tools, Command-Line Interfaces [CLI], etc)
 
 ### A. PDF to Searchable PDF tools 
+
 (ie: any tool which can also handle a non-searchable PDF as an input):
 
   1. [OCRmyPDF](https://github.com/jbarlow83/OCRmyPDF) - Adds OCR text layer to scanned PDF files and images, allowing them to be searched. Processes pages in parallel on multi-core CPUs. Keeps exact resolution of original embedded images without recompressing JPEGs, when possible. Includes image several preprocessing options, detailed documentation, and support for many exotic PDFs.
   1. [pdf2pdfocr](https://github.com/LeoFCardoso/pdf2pdfocr) is a tool to OCR a PDF (or supported images) and add a text layer in the original file making it a searchable PDF. It is a python script that uses tesseract and other open source tools. Linux, macOS and Windows supported.
   1. [pdf2searchablepdf](https://github.com/ElectricRCAircraftGuy/PDF2SearchablePDF) - a tool which allows converting any non-searchable PDF, OR any entire directory of images, to a searchable PDF
   
 ### B. Others:
+
   1. [ocr-fileformat](https://github.com/UB-Mannheim/ocr-fileformat) - Validate and transform between OCR file formats (hOCR, ALTO, PAGE, FineReader)
   1. [Tess4J](https://github.com/nguyenq/tess4j) - A Java JNA wrapper for Tesseract OCR API.
   1. [Traineddata inspector](https://mazoea.com/te/traineddata/) -  to inspect some of the internals of traineddata files 
diff --git a/tess3/Training-Tesseract-3.00–3.02.md b/tess3/Training-Tesseract-3.00–3.02.md
@@ -465,4 +465,4 @@ tesseract image.tif output -l [lang]
 
 More options of `combine_tessdata` can be found on its [Manual Page](https://github.com/tesseract-ocr/tesseract/blob/3.02.02/doc/combine_tessdata.1.asc) or in comment of its [source code](https://github.com/tesseract-ocr/tesseract/blob/3.02.02/training/combine_tessdata.cpp#L23).
 
-You can inspect some of the internals of traineddata files  in 3rd party online [Traineddata inspector](https://te-traineddata-ui.herokuapp.com).
+You can inspect some of the internals of traineddata files in 3rd party online [Traineddata inspector](https://te-traineddata-ui.herokuapp.com).

Original file line number	Diff line number	Diff line change
`@@ -465,4 +465,4 @@ tesseract image.tif output -l [lang]`
`465`	`465`
`466`	`466`	More options of `combine_tessdata` can be found on its [Manual Page](https://github.com/tesseract-ocr/tesseract/blob/3.02.02/doc/combine_tessdata.1.asc) or in comment of its [source code](https://github.com/tesseract-ocr/tesseract/blob/3.02.02/training/combine_tessdata.cpp#L23).
`467`	`467`
`468`		`-You can inspect some of the internals of traineddata files in 3rd party online [Traineddata inspector](https://te-traineddata-ui.herokuapp.com).`
	`468`	`+You can inspect some of the internals of traineddata files in 3rd party online [Traineddata inspector](https://te-traineddata-ui.herokuapp.com).`