Class PdfExtractor
Namespace: Aspose.Pdf.Plugins
Assembly: Aspose.PDF.dll
Represents base functionality to extract text, images, and other types of content that may occur on the pages of PDF documents.
public abstract class PdfExtractor : IPlugin, IDisposable
Inheritance
Derived
Implements
Inherited Members
object.GetType(), object.MemberwiseClone(), object.ToString(), object.Equals(object?), object.Equals(object?, object?), object.ReferenceEquals(object?, object?), object.GetHashCode()
Examples
The example demonstrates how to extract text content of PDF document.
// create TextExtractor object to extract PDF contents
using (TextExtractor extractor = new TextExtractor())
{
// create TextExtractorOptions object to set instructions
textExtractorOptions = new TextExtractorOptions();
// add input file path to data sources
textExtractorOptions.AddInput(new FileDataSource(inputPath));
// perform extraction process
ResultContainer resultContainer = extractor.Process(textExtractorOptions);
// get the extracted text from the ResultContainer object
string textExtracted = resultContainer.ResultCollection[0].ToString();
}
Remarks
The Aspose.Pdf.Plugins.TextExtractor object is used to extract text, or Aspose.Pdf.Plugins.ImageExtractor to extract images.
Constructors
PdfExtractor()
protected PdfExtractor()
Methods
Dispose()
Implementation of IDisposable. Actually, it is not necessary for PdfExtractor.
public void Dispose()
Process(IPluginOptions)
Starts PdfExtractor processing with the specified parameters.
public ResultContainer Process(IPluginOptions pdfExtractorOptions)
Parameters
pdfExtractorOptions
IPluginOptions
An options object containing instructions for the PdfExtractor.
Returns
A ResultContainer object containing the result of the extraction.