Skip to content

Commit 1ebaa74

Browse files
Merge pull request #1904 from syncfusion-content/997098-dev
997098-dev: Added volume 4 UG changes in PDF library.
2 parents 1e446e8 + 4b5610f commit 1ebaa74

File tree

4 files changed

+280
-23
lines changed

4 files changed

+280
-23
lines changed

Document-Processing/PDF/PDF-Library/NET/Supported-and-Unsupported-Features.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,11 @@
11
---
2-
title: Supported and Unsupported Features | Syncfusion
2+
title: Supported and Unsupported PDF Features | Syncfusion
33
description: This section explains about features available in Essential PDF and their availability in different platforms.
44
platform: document-processing
55
control: PDF
66
documentation: UG
77
---
8-
# Supported and Unsupported Features
8+
# Supported and Unsupported PDF Features
99

1010
The following table shows the various features available in the Essential<sup>&reg;</sup> PDF and their availability in different platforms.
1111

@@ -660,10 +660,10 @@ Yes<br/><br/></td>
660660
PDF/x1a: 2001 Compliance<br/><br/></td><td>
661661
Yes<br/><br/></td><td>
662662
Yes<br/><br/></td><td>
663-
No<br/><br/></td>
664-
<td>No<br/><br/></td>
665-
<td>No<br/><br/></td>
666-
<td>No<br/><br/></td></tr>
663+
Yes<br/><br/></td>
664+
<td>Yes<br/><br/></td>
665+
<td>Yes<br/><br/></td>
666+
<td>Yes<br/><br/></td></tr>
667667
<tr>
668668
<td>
669669
ZUGFeRD Invoice<br/><br/></td><td>

Document-Processing/PDF/PDF-Library/NET/Working-with-OCR/Features.md

Lines changed: 235 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -152,7 +152,7 @@ You can downloaded a complete working sample from [GitHub](https://github.com/Sy
152152

153153
## Performing OCR with tesseract version 3.05
154154

155-
The [TesseractVersion](https://help.syncfusion.com/cr/document-processing/Syncfusion.OCRProcessor.OCRSettings.html#Syncfusion_OCRProcessor_OCRSettings_TesseractVersion) property is used to switch the tesseract version between 3.02 and 3.05. By default, OCR works with tesseract version 3.02.
155+
The [TesseractVersion](https://help.syncfusion.com/cr/document-processing/Syncfusion.OCRProcessor.OCRSettings.html#Syncfusion_OCRProcessor_OCRSettings_TesseractVersion) property is used to switch the tesseract version between 3.02 and 3.05. By default, OCR works with tesseract version 5.0.
156156

157157
N> The starting supported version of tesseract in ASP.NET Core is 4.0. So the lower tesseract versions 3.02 and 3.05 are not supported and we don't have the property called ``TesseractVersion`` in ASP.NET Core platform.
158158

@@ -216,9 +216,7 @@ End Using
216216
217217
## Performing OCR with Tesseract Version 4.0
218218
219-
The [TesseractVersion](https://help.syncfusion.com/cr/document-processing/Syncfusion.OCRProcessor.OCRSettings.html#Syncfusion_OCRProcessor_OCRSettings_TesseractVersion) property is used to switch the tesseract version to 4.0. By default, OCR will be performed with tesseract version 3.02.
220-
221-
N> In ASP.NET Core platform, the default and starting supported version of tesseract is 4.0. So we did not have the property called ``TesseractVersion`` in ASP.NET Core platform.
219+
The [TesseractVersion](https://help.syncfusion.com/cr/document-processing/Syncfusion.OCRProcessor.OCRSettings.html#Syncfusion_OCRProcessor_OCRSettings_TesseractVersion) property is used to switch the tesseract version to 4.0. By default, OCR will be performed with tesseract version 5.0.
222220
223221
The following code sample explains the OCR processor with Tesseract version 4.0 for PDF documents.
224222
@@ -277,6 +275,67 @@ End Using
277275

278276
{% endtabs %}
279277

278+
## Performing OCR with Tesseract Version 5.0
279+
280+
The [TesseractVersion](https://help.syncfusion.com/cr/document-processing/Syncfusion.OCRProcessor.OCRSettings.html#Syncfusion_OCRProcessor_OCRSettings_TesseractVersion) property is used to switch the tesseract version to 5.0. By default, OCR will be performed with tesseract version 5.0.
281+
282+
The following code sample explains the OCR processor with Tesseract version 5.0 for PDF documents.
283+
284+
{% tabs %}
285+
286+
{% highlight c# tabtitle="C# [Cross-platform]" %}
287+
288+
using Syncfusion.OCRProcessor;
289+
using Syncfusion.Pdf.Parsing;
290+
291+
//Initialize the OCR processor.
292+
using (OCRProcessor processor = new OCRProcessor())
293+
{
294+
//Load an existing PDF document.
295+
PdfLoadedDocument document = new PdfLoadedDocument("Input.pdf");
296+
297+
//Set OCR language.
298+
processor.Settings.Language = Languages.English;
299+
//Set tesseract OCR Engine.
300+
processor.Settings.TesseractVersion = TesseractVersion.Version5_0;
301+
//Perform OCR with input document, tessdata (Language packs) and enabling isMemoryOptimized property.
302+
processor.PerformOCR(document);
303+
304+
//Save the PDF document.
305+
document.Save("Output.pdf);
306+
//Close the document.
307+
document.Close(true);
308+
}
309+
310+
{% endhighlight %}
311+
312+
{% highlight vb.net tabtitle="VB.NET [Windows-specific]" %}
313+
314+
Imports Syncfusion.OCRProcessor
315+
Imports Syncfusion.Pdf.Parsing
316+
317+
'Initialize the OCR processor with tesseract binaries folder path.
318+
Using processor As OCRProcessor = New OCRProcessor("TesseractBinaries/5.0/")
319+
'Load an existing PDF document.
320+
Dim document As PdfLoadedDocument = New PdfLoadedDocument("Input.pdf")
321+
322+
'Set OCR language.
323+
processor.Settings.Language = Languages.English
324+
'Set tesseract OCR Engine.
325+
processor.Settings.TesseractVersion = TesseractVersion.Version5_0
326+
'Perform OCR with input document, tessdata (Language packs) and enabling isMemoryOptimized property.
327+
processor.PerformOCR(document)
328+
329+
'Save the PDF document.
330+
document.Save("Output.pdf")
331+
'Close the document.
332+
document.Close(True)
333+
End Using
334+
335+
{% endhighlight %}
336+
337+
{% endtabs %}
338+
280339
## Performing OCR on image
281340
282341
The below code example illustrates how to perform OCR on image file using [PerformOCR](https://help.syncfusion.com/cr/document-processing/Syncfusion.OCRProcessor.OCRProcessor.html#Syncfusion_OCRProcessor_OCRProcessor_PerformOCR_System_Drawing_Bitmap_System_String_) method in [OCRProcessor](https://help.syncfusion.com/cr/document-processing/Syncfusion.OCRProcessor.OCRProcessor.html) class.
@@ -1005,6 +1064,178 @@ End Using
10051064

10061065
N> The OCR Engine Mode is supported only in the Tesseract version 4.0 and above.
10071066

1067+
## Performing OCR with different OCR Image Enhancement Mode
1068+
1069+
The `ImageEnhancementMode` property is used to set the OCR image enhancement modes. By default, OCR works with the `EnhanceForRecognitionOnly` image enhancement mode. Kindly refer to the following code example to perform OCR with different OCR image enhancement segmentation mode.
1070+
1071+
The following table describes the available OCR image enhancement modes and their respective purposes.
1072+
1073+
<table>
1074+
<thead>
1075+
<tr>
1076+
<th>
1077+
OCR Image Enhancement Mode<br/><br/></th><th>
1078+
Description<br/><br/></th></tr>
1079+
</thead>
1080+
<tbody>
1081+
<tr>
1082+
<td>
1083+
EnhanceForRecognitionOnly<br/><br/></td><td>
1084+
Image is enhanced internally to improve OCR accuracy, but the original image is retained in the output.<br/><br/></td></tr>
1085+
<tr>
1086+
<td>
1087+
EnhanceAndIncludeInOutput<br/><br/></td><td>
1088+
Image is enhanced and the enhanced version is used in the output document.<br/><br/></td></tr>
1089+
<tr>
1090+
<td>
1091+
None<br/><br/></td><td>
1092+
No image enhancement is performed. The original image is used for OCR processing.<br/><br/></td></tr>
1093+
</tbody>
1094+
</table>
1095+
1096+
{% tabs %}
1097+
1098+
{% highlight c# tabtitle="C# [Cross-platform]" %}
1099+
1100+
using Syncfusion.OCRProcessor;
1101+
using Syncfusion.Pdf.Parsing;
1102+
1103+
// Initialize the OCR processor
1104+
using (OCRProcessor processor = new OCRProcessor())
1105+
{
1106+
// Load an existing PDF document
1107+
PdfLoadedDocument document = new PdfLoadedDocument("Input.pdf");
1108+
// Set the OCR language to English for text recognition.
1109+
processor.Settings.Language = Languages.English;
1110+
// Set the OCR image enhancement mode to improve recognition accuracy.
1111+
processor.ImageEnhancementMode = OcrImageEnhancementMode.EnhanceForRecognitionOnly;
1112+
// Perform OCR with input document and tessdata (Language packs)
1113+
processor.PerformOCR(document);
1114+
// Save the processed PDF document
1115+
document.Save("Output.pdf");
1116+
// Close the document
1117+
document.Close(true);
1118+
}
1119+
1120+
{% endhighlight %}
1121+
1122+
{% highlight vb.net tabtitle="VB.NET [Windows-specific]" %}
1123+
1124+
Imports Syncfusion.OCRProcessor
1125+
Imports Syncfusion.Pdf.Parsing
1126+
1127+
' Initialize the OCR processor
1128+
Using processor As New OCRProcessor()
1129+
1130+
' Load an existing PDF document
1131+
Dim document As New PdfLoadedDocument("Input.pdf")
1132+
' Set the OCR language to English for text recognition
1133+
processor.Settings.Language = Languages.English
1134+
' Set the OCR image enhancement mode to improve recognition accuracy
1135+
processor.ImageEnhancementMode = OcrImageEnhancementMode.EnhanceForRecognitionOnly
1136+
' Perform OCR with input document and tessdata (Language packs)
1137+
processor.PerformOCR(document)
1138+
' Save the processed PDF document
1139+
document.Save("Output.pdf")
1140+
' Close the document
1141+
document.Close(True)
1142+
1143+
End Using
1144+
1145+
{% endhighlight %}
1146+
{% endtabs %}
1147+
1148+
## Performing OCR with different OCR Image Enhancement options
1149+
1150+
The `ImageEnhancementMode` property is used to set the OCR image enhancement mode. Refer to the following code example to perform OCR with different image enhancement options.
1151+
1152+
The following table describes the available OCR image enhancement options and their respective purposes.
1153+
1154+
<table>
1155+
<thead>
1156+
<tr>
1157+
<th>
1158+
OCR Image Enhancement options<br/><br/></th><th>
1159+
Description<br/><br/></th></tr>
1160+
</thead>
1161+
<tbody>
1162+
<tr>
1163+
<td>
1164+
IsGrayscaleEnabled<br/><br/></td><td>
1165+
Simplifies image data by removing color information, making text easier to detect.<br/><br/></td></tr>
1166+
<tr>
1167+
<td>
1168+
IsDeskewEnabled<br/><br/></td><td>
1169+
Corrects tilted or rotated text for proper alignment.<br/><br/></td></tr>
1170+
<tr>
1171+
<td>
1172+
IsDenoiseEnabled<br/><br/></td><td>
1173+
Removes speckles and artifacts that can interfere with character recognition.<br/><br/></td></tr>
1174+
<tr>
1175+
<td>
1176+
IsConstrastEnabled<br/><br/></td><td>
1177+
Enhances text visibility against the background.<br/><br/></td></tr>
1178+
<tr>
1179+
<td>
1180+
IsBinarizeEnabled<br/><br/></td><td>
1181+
Converts images to black-and-white for sharper text edges, using advanced thresholding methods.<br/><br/></td></tr>
1182+
</tbody>
1183+
</table>
1184+
1185+
{% tabs %}
1186+
1187+
{% highlight c# tabtitle="C# [Cross-platform]" %}
1188+
1189+
using Syncfusion.OCRProcessor;
1190+
using Syncfusion.Pdf.Parsing;
1191+
1192+
// Initialize the OCR processor
1193+
using (OCRProcessor processor = new OCRProcessor())
1194+
{
1195+
// Load an existing PDF document
1196+
PdfLoadedDocument document = new PdfLoadedDocument("Input.pdf");
1197+
// Set the OCR language to English for text recognition.
1198+
processor.Settings.Language = Languages.English;
1199+
// Set the options for image enhancement during the OCR process.
1200+
OcrImageEnhancementOptions options = new OcrImageEnhancementOptions();
1201+
// Enable grayscale conversion to improve OCR accuracy by reducing color noise.
1202+
options.IsGrayscaleEnabled = true;
1203+
// Perform OCR with input document and tessdata (Language packs)
1204+
processor.PerformOCR(document);
1205+
// Save the processed PDF document
1206+
document.Save("Output.pdf");
1207+
// Close the document
1208+
document.Close(true);
1209+
}
1210+
1211+
{% endhighlight %}
1212+
1213+
{% highlight vb.net tabtitle="VB.NET [Windows-specific]" %}
1214+
1215+
Imports Syncfusion.OCRProcessor
1216+
Imports Syncfusion.Pdf.Parsing
1217+
1218+
' Initialize the OCR processor inside a Using block to ensure proper disposal.
1219+
Using processor As New OCRProcessor()
1220+
' Load an existing PDF document.
1221+
Dim document As New PdfLoadedDocument("Input.pdf")
1222+
' Set the OCR language to English for text recognition.
1223+
processor.Settings.Language = Languages.English
1224+
' Set the options for image enhancement during the OCR process.
1225+
Dim options As New OcrImageEnhancementOptions()
1226+
' Enable grayscale conversion to improve OCR accuracy by reducing color noise.
1227+
options.IsGrayscaleEnabled = True
1228+
' Perform OCR on the input document using tessdata (language packs).
1229+
processor.PerformOCR(document)
1230+
' Save the processed PDF document.
1231+
document.Save("Output.pdf")
1232+
' Close the document and release resources.
1233+
document.Close(True)
1234+
End Using
1235+
1236+
{% endhighlight %}
1237+
{% endtabs %}
1238+
10081239
## White List
10091240
10101241
The [WhiteList](https://help.syncfusion.com/cr/document-processing/Syncfusion.OCRProcessor.OCRSettings.html#Syncfusion_OCRProcessor_OCRSettings_WhiteList) property specifies a list of characters that the OCR engine is only allowed to recognize. If a character is not on the white list, it will not be included in the output OCR results. For more information, refer to the following code sample.

Document-Processing/PDF/PDF-Library/NET/Working-with-OCR/Working-with-OCR.md

Lines changed: 15 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,15 @@ keywords: Assemblies
1111

1212
Optical character recognition (OCR) is a technology used to convert scanned paper documents in the form of PDF files or images into searchable and editable data.
1313

14-
The [Syncfusion<sup>&reg;</sup> OCR processor library](https://www.syncfusion.com/document-processing/pdf-framework/net/pdf-library/ocr-process) has extended support to process OCR on scanned PDF documents and images with the help of Google’s [Tesseract](https://github.com/tesseract-ocr/tesseract) Optical Character Recognition engine.
14+
The [Syncfusion<sup>&reg;</sup> OCR processor library](https://www.syncfusion.com/document-processing/pdf-framework/net/pdf-library/ocr-process) has extended support to process OCR on scanned PDF documents and images with the help of Google’s [Tesseract](https://github.com/tesseract-ocr/tesseract) Optical Character Recognition engine.
15+
16+
An inbuilt `image preprocessor` has been added to the OCR to prepare images for optimal recognition. This step ensures cleaner input and reduces OCR errors. The preprocessor supports the following enhancements:
17+
18+
* **Convert to Grayscale** – Simplifies image data by removing color information, making text easier to detect.
19+
* **Deskew** – Corrects tilted or rotated text for proper alignment.
20+
* **Denoise** – Removes speckles and artifacts that can interfere with character recognition.
21+
* **Apply Contrast Adjustment** – Enhances text visibility against the background.
22+
* **Apply Binarize** – Converts images to black-and-white for sharper text edges, using advanced thresholding methods
1523

1624
The Syncfusion<sup>&reg;</sup> OCR processor library works seamlessly in various platforms: Azure App Services, Azure Functions, AWS Textract, Docker, WinForms, WPF, Blazor, ASP.NET MVC, ASP.NET Core with Windows, MacOS and Linux.
1725

@@ -85,14 +93,6 @@ ASP.NET
8593
</tr>
8694
<tr>
8795
<td>
88-
ASP.NET MVC4
89-
</td>
90-
<td>
91-
{{'[Syncfusion.Pdf.OCR.AspNet.Mvc4.nupkg](https://www.nuget.org/packages/Syncfusion.Pdf.OCR.AspNet.Mvc4)'| markdownify }}
92-
</td>
93-
</tr>
94-
<tr>
95-
<td>
9696
ASP.NET MVC5
9797
</td>
9898
<td>
@@ -133,6 +133,7 @@ Windows Forms, WPF, ASP.NET, and ASP.NET MVC
133133
<li>Syncfusion.OCRProcessor.Base.dll</li>
134134
<li>Syncfusion.Pdf.Base.dll</li>
135135
<li>Syncfusion.Compression.Base.dll</li>
136+
<li>Syncfusion.ImagePreProcessor.Base.dll</li>
136137
</ul>
137138
</td>
138139
</tr>
@@ -146,21 +147,23 @@ Windows Forms, WPF, ASP.NET, and ASP.NET MVC
146147
<li>Syncfusion.PdfImaging.Portable.dll</li>
147148
<li>Syncfusion.Pdf.Portable.dll</li>
148149
<li>Syncfusion.Compression.Portable.dll</li>
149-
<li>{{'[SkiaSharp](https://www.nuget.org/packages/SkiaSharp/2.88.0-preview.232)'| markdownify }} package</li>
150+
<li>{{'[SkiaSharp](https://www.nuget.org/packages/SkiaSharp/3.119.1)'| markdownify }} package</li>
151+
<li>Syncfusion.ImagePreProcessor.Portable.dll</li>
150152
</ul>
151153
</td>
152154
</tr>
153155
<tr>
154156
<td>
155-
.NET 8/.NET 9
157+
.NET 8/.NET 9/.NET 10
156158
</td>
157159
<td>
158160
<ul>
159161
<li>Syncfusion.OCRProcessor.NET.dll</li>
160162
<li>Syncfusion.PdfImaging.NET.dll</li>
161163
<li>Syncfusion.Pdf.NET.dll</li>
162164
<li>Syncfusion.Compression.NET.dll</li>
163-
<li>{{'[SkiaSharp](https://www.nuget.org/packages/SkiaSharp/2.88.0-preview.232)'| markdownify }} package</li>
165+
<li>{{'[SkiaSharp](https://www.nuget.org/packages/SkiaSharp/3.119.1)'| markdownify }} package</li>
166+
<li>Syncfusion.ImagePreProcessor.NET.dll</li>
164167
</ul>
165168
</td>
166169
</tr>

Document-Processing/PDF/PDF-Library/NET/Working-with-PDF-Conformance.md

Lines changed: 24 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1220,7 +1220,30 @@ You can create a PDF/X-1a document by specifying the conformance level as ```Pdf
12201220

12211221
{% highlight c# tabtitle="C# [Cross-platform]" %}
12221222

1223-
//Creating PDF/X-conformance documents is not supported on C#.NET cross-platform environments.
1223+
using Syncfusion.Pdf;
1224+
using Syncfusion.Pdf.Graphics;
1225+
1226+
//Create a new document with PDF/x standard.
1227+
PdfDocument document = new PdfDocument(PdfConformanceLevel.Pdf_X1A2001);
1228+
//Add a page.
1229+
PdfPage page = document.Pages.Add();
1230+
//Set color space.
1231+
document.ColorSpace = PdfColorSpace.CMYK;
1232+
1233+
//Create Pdf graphics for the page.
1234+
PdfGraphics graphics = page.Graphics;
1235+
//Create a solid brush.
1236+
PdfBrush brush = new PdfSolidBrush(Color.Black);
1237+
//Load the TrueType font from the local file.
1238+
FileStream fontStream = new FileStream("Arial.ttf", FileMode.Open, FileAccess.Read);
1239+
//Set the font.
1240+
PdfFont font = new PdfTrueTypeFont(fontStream, 14);
1241+
//Draw the text.
1242+
graphics.DrawString("Hello world!", pdfFont, brush, new PointF(20, 20));
1243+
1244+
//Save and close the document.
1245+
document.Save("Output.pdf");
1246+
document.Close(true);
12241247

12251248
{% endhighlight %}
12261249

0 commit comments

Comments
 (0)