John Davidson

laravel - read pdf with php and PDF2Text or pdf-to-text (spatie)

0 comments
Message:


i´m traying to extract text from my PDF with class PDF2Text or spatie/pdf-to-text but i can´t to do... i´m traying first with:


PDF2Text


if($request->file('adjunto') != ""){

$fileName = $request->file('adjunto')[0]->getClientOriginalName();

$ext = pathinfo($fileName, PATHINFO_EXTENSION);

$name = $fileName;
$result = $request->file('adjunto')[0]->storeAs('importaciones', $name, 'public');

$route = public_path('storage/importaciones/'.$name);

}else{
return "Error al subir fichero, consulte al administrador del sistema";
}

if ($ext == "pdf" || $ext == "PDF"){
$a = new PDF2Text();
$a->setFilename($route);
$a->decodePDF();
echo $a->output();
}else{

this else it´s for read CSV files. This code in block if PDF return empty result. And if i tray with


spatie library:


if($request->file('adjunto') != ""){

$fileName = $request->file('adjunto')[0]->getClientOriginalName();

$ext = pathinfo($fileName, PATHINFO_EXTENSION);

$name = $fileName;
$result = $request->file('adjunto')[0]->storeAs('importaciones', $name, 'public');

$route = public_path('storage/importaciones/'.$name);

}else{
return "Error al subir fichero, consulte al administrador del sistema";
}

if ($ext == "pdf" || $ext == "PDF"){
$text = (new Pdf())
->setPdf($route)
->text();
echo $text;
}else{

return this:


Error Output:
================

El sistema no puede encontrar la ruta especificada.
{"userId":60,"exception":"[object] (Spatie\\PdfToText\\Exceptions\\CouldNotExtractText(code: 0): The command \"\"/usr/bin/pdftotext\" \"C:\\xampp\\htdocs\\gdsRepository\\public\\storage/importaciones/VIZNAR.PDF\" -\" failed.

Exit Code: 1(General error)

Working directory: C:\\xampp\\htdocs\\gdsRepository\\public

Output:
================

if i tray with $fileName not $route


return this: local.ERROR: could not find or read pdf 'x.pdf' for my application i´m using laravel 5.6


mi PDF it´s:



i need 'nombre'(name), 'domicilio'(address), 'telefonos'(phone)


in this link, have pdf example:


https://drive.google.com/file/d/14uz4_sRphrOVvziv6n6_illb778WkETl/view?usp=sharing


Thanks for read and thanks for help me, please


searching in google i´m getting one solution for my problem and it´s:


PDF Parser

with this library -> https://www.pdfparser.org/documentation i´ve got read all my pdf file and this it´s result:


Código Nombre Domicilio Teléfonos Observacion
1740756 ESPIGARES CALVO MARIA LUISA CONSTITUCION 8 958 54 91 23
1769291 GOMEZ ESPIGARES ANGEL LUIS CONSTITUCION 9 958 54 31 04
1781409 JIMENEZ MORENO MARCO ANTONIO CORAZON DE JESUS 10 958 49 99 01

but now i need get only nombre(name), domicilio(address), teléfonos(phone)


how i can to do this?



My Answer:
To extract only the 'nombre', 'domicilio', and 'telefonos' fields from the PDF using the PDF Parser library, you can use regular expressions to match the specific patterns in the text. Here's an example of how you can achieve this:

php
use Smalot\PdfParser\Parser;

// Load the PDF file
$parser = new Parser();
$pdf = $parser->parseFile('path/to/your/pdf/file.pdf');

// Get the text content of the PDF
$text = $pdf->getText();

// Define the regular expressions to match the fields
$patterns = [
'nombre' => '/(?<=\bNombre\b\s)(.*?)(?=\s)/',
'domicilio' => '/(?<=\bDomicilio\b\s)(.*?)(?=\s)/',
'telefonos' => '/(?<=\bTeléfonos\b\s)(.*?)(?=\s)/',
];

// Extract the fields using the regular expressions
$data = [];
foreach ($patterns as $field => $pattern) {
preg_match($pattern, $text, $matches);
$data[$field] = isset($matches[0]) ? $matches[0] : '';
}

// Output the extracted data
echo 'Nombre: ' . $data['nombre'] . '
';
echo 'Domicilio: ' . $data['domicilio'] . '
';
echo 'Teléfonos: ' . $data['telefonos'] . '
';


This code snippet will extract the 'nombre', 'domicilio', and 'telefonos' fields from the PDF text using regular expressions and output the extracted data. You can adjust the regular expressions to match the specific patterns in your PDF file.

Rate this post

3 of 5 based on 3024 votes

Comments




© 2024 Hayatsk.info - Personal Blogs Platform. All Rights Reserved.
Create blog  |  Privacy Policy  |  Terms & Conditions  |  Contact Us