Class PdfExtractor

Numele spaţiului: Aspose.Pdf.Plugins Asamblare: Aspose.PDF.dll (25.4.0)

Reprezintă funcționalitatea de bază pentru a extrage text, imagini și alte tipuri de conținut care pot apărea pe paginile documentelor PDF.

public abstract class PdfExtractor : IPlugin, IDisposable

Examples

Exemplul arată cum să extrageți conținutul text din documentul PDF.

// create TextExtractor object to extract PDF contents
using (TextExtractor extractor = new TextExtractor())
{
    // create TextExtractorOptions object to set instructions
    textExtractorOptions = new TextExtractorOptions();

    // add input file path to data sources
    textExtractorOptions.AddInput(new FileDataSource(inputPath));

    // perform extraction process
    ResultContainer resultContainer = extractor.Process(textExtractorOptions);

    // get the extracted text from the ResultContainer object
    string textExtracted = resultContainer.ResultCollection[0].ToString();
}

Remarks

Obiectul Aspose.Pdf.Plugins.TextExtractor este utilizat pentru a extrage text, sau Aspose.Pdf.Plugins.ImageExtractor pentru a extrage imagini.

Constructors

PdfExtractor()

protected PdfExtractor()

Methods

Dispose()

De fapt, nu este necesar pentru PdfExtractor.

public void Dispose()

Process(Opțiuni IPlugin)

Începe procesarea PdfExtractor cu parametrii specificați.

public ResultContainer Process(IPluginOptions pdfExtractorOptions)

Parameters

pdfExtractorOptions IPluginOptions

Un obiect de opțiuni care conține instrucțiuni pentru PDFExtractor.

Returns

ResultContainer

Un obiect ResultContainer care conține rezultatul extracției.