John Davidson

mysql - PHP search engine for text files with indexing


I have some text files inside a directory (and its sub directories). The number of text files will be (50000+) and the directory is outside 'public_html':

|-- |-- `001
|-- text0003.txt
|-- text0004.txt
|-- text0005.txt
|-- `001_a
|-- text0006.txt
|-- text0007.txt
|-- text0008.txt
|-- text0001.txt
|-- text0002.txt

The text file details are saved in a MySQL table (with the 'art_textfile' storing the text file name and 'art_path' column storing the file path):

CREATE TABLE `stxt_articles` (
`art_title` VARCHAR(127) NOT NULL,
`art_author` VARCHAR(255) NOT NULL,
`art_textfile` VARCHAR(255) NOT NULL, /* TEXT FILE NAME */
`art_path` VARCHAR(255) NOT NULL, /* TEXT FILE PATH */
PRIMARY KEY(`art_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

I am using PHP/MySQL (LAMP) and want to do a string search on the text files (with regular expressions if possible). The methods that will work logically are:

  1. Storing the contents in the MySQL database and perform a search with MySQL query (LIKE 's%')

  2. Scan the directory by PHP and search within each text file for a search expression.

But with a large dataset of 5000 +files (tend to grow over time), the above options are not practical. It will be too slow to use.

What I am looking for is a PHP/MySQL search idea which creates index for text files and do a search. Pretty much what Lucene does in JAVA. Maybe I can refer it as a lucene alternative in PHP with MySQL.

Thanks for reading this far. Also thanks for your thoughts.

My Answer:
One possible solution for your scenario is to use a full-text search engine like Elasticsearch or Apache Solr. These search engines are specifically designed for indexing and searching large amounts of text data efficiently.

Here's how you can implement this solution:

1. Install and configure Elasticsearch or Apache Solr on your server.
2. Write a script in PHP to index the text files in your directory into the search engine. This script should read each text file, extract the content, and send it to the search engine for indexing.
3. Implement a search functionality in your PHP application that sends search queries to the search engine and retrieves the results.

By using a full-text search engine, you can take advantage of its indexing and search capabilities to quickly search through your large dataset of text files. This approach is much more efficient than storing the contents in a MySQL database or scanning the directory for each search query.

I hope this helps! Let me know if you have any questions.

Rate this post

3 of 5 based on 8551 votes


© 2024 - Personal Blogs Platform. All Rights Reserved.
Create blog  |  Privacy Policy  |  Terms & Conditions  |  Contact Us