i´m traying to extract text from my PDF with class PDF2Text
or spatie/pdf-to-text
but i can´t to do... i´m traying first with:
PDF2Text
if($request->file('adjunto') != ""){
$fileName = $request->file('adjunto')[0]->getClientOriginalName();
$ext = pathinfo($fileName, PATHINFO_EXTENSION);
$name = $fileName;
$result = $request->file('adjunto')[0]->storeAs('importaciones', $name, 'public');
$route = public_path('storage/importaciones/'.$name);
}else{
return "Error al subir fichero, consulte al administrador del sistema";
}
if ($ext == "pdf" || $ext == "PDF"){
$a = new PDF2Text();
$a->setFilename($route);
$a->decodePDF();
echo $a->output();
}else{
this else it´s for read CSV files. This code in block if PDF return empty result. And if i tray with
spatie library:
if($request->file('adjunto') != ""){
$fileName = $request->file('adjunto')[0]->getClientOriginalName();
$ext = pathinfo($fileName, PATHINFO_EXTENSION);
$name = $fileName;
$result = $request->file('adjunto')[0]->storeAs('importaciones', $name, 'public');
$route = public_path('storage/importaciones/'.$name);
}else{
return "Error al subir fichero, consulte al administrador del sistema";
}
if ($ext == "pdf" || $ext == "PDF"){
$text = (new Pdf())
->setPdf($route)
->text();
echo $text;
}else{
return this:
Error Output:
================
El sistema no puede encontrar la ruta especificada.
{"userId":60,"exception":"[object] (Spatie\\PdfToText\\Exceptions\\CouldNotExtractText(code: 0): The command \"\"/usr/bin/pdftotext\" \"C:\\xampp\\htdocs\\gdsRepository\\public\\storage/importaciones/VIZNAR.PDF\" -\" failed.
Exit Code: 1(General error)
Working directory: C:\\xampp\\htdocs\\gdsRepository\\public
Output:
================
if i tray with $fileName
not $route
return this: local.ERROR: could not find or read pdf 'x.pdf'
for my application i´m using laravel 5.6
mi PDF it´s:
i need 'nombre'(name), 'domicilio'(address), 'telefonos'(phone)
in this link, have pdf example:
https://drive.google.com/file/d/14uz4_sRphrOVvziv6n6_illb778WkETl/view?usp=sharing
Thanks for read and thanks for help me, please
searching in google i´m getting one solution for my problem and it´s:
PDF Parser
with this library -> https://www.pdfparser.org/documentation
i´ve got read all my pdf file and this it´s result:
Código Nombre Domicilio Teléfonos Observacion
1740756 ESPIGARES CALVO MARIA LUISA CONSTITUCION 8 958 54 91 23
1769291 GOMEZ ESPIGARES ANGEL LUIS CONSTITUCION 9 958 54 31 04
1781409 JIMENEZ MORENO MARCO ANTONIO CORAZON DE JESUS 10 958 49 99 01
but now i need get only nombre(name), domicilio(address), teléfonos(phone)
how i can to do this?
My Answer:
To extract only the 'nombre', 'domicilio', and 'telefonos' fields from the PDF using the PDF Parser library, you can use regular expressions to match the specific patterns in the text. Here's an example of how you can achieve this:
php
use Smalot\PdfParser\Parser;
// Load the PDF file
$parser = new Parser();
$pdf = $parser->parseFile('path/to/your/pdf/file.pdf');
// Get the text content of the PDF
$text = $pdf->getText();
// Define the regular expressions to match the fields
$patterns = [
'nombre' => '/(?<=\bNombre\b\s)(.*?)(?=\s)/',
'domicilio' => '/(?<=\bDomicilio\b\s)(.*?)(?=\s)/',
'telefonos' => '/(?<=\bTeléfonos\b\s)(.*?)(?=\s)/',
];
// Extract the fields using the regular expressions
$data = [];
foreach ($patterns as $field => $pattern) {
preg_match($pattern, $text, $matches);
$data[$field] = isset($matches[0]) ? $matches[0] : '';
}
// Output the extracted data
echo 'Nombre: ' . $data['nombre'] . '
';
echo 'Domicilio: ' . $data['domicilio'] . '
';
echo 'Teléfonos: ' . $data['telefonos'] . '
';
This code snippet will extract the 'nombre', 'domicilio', and 'telefonos' fields from the PDF text using regular expressions and output the extracted data. You can adjust the regular expressions to match the specific patterns in your PDF file.
Rate this post
3 of 5 based on 3024 votesComments