I've created a php script to parse my payslip PDFs automatically and want to store them in a database for ease of reference (preferably MySQL for familiarity).
The data I've ended up with (per payslip) is (for example):
(plus around 40 other cols most of which don't change regularly).
Array
(
[assignment] => 123456
[name] => JOE BLOGS
[location] => 42 WEST STREET
[department] => Lancashire
[job_title] => Imaginary
[payscale_description] => Generic desciprtion
[salary] => 12345.00
[hours] => 40
[pt_salary] => 12345.00
[pay] => Array
(
[Basic Pay] => 1234.00
[Additional Hours (OOH)] => 567.00
[Night Allowance] => 891.10
[Weekend Allowance] => 112.13
)
[deductions] => Array
(
[PAYE] => 123.00
[National Insurance] => 456.00
[Pension] => 789.00
[Student Loan] => 101.12
)
)
Now the [Pay] and [Deductions] might change per month (e.g. sometimes there may be expenses / subsistence / car parking etc). The script just creates the assoc arrays but there are approx 20 variable additions / deductions on top of these common ones. Without creating another 20 cols in the MySQL table, so far I've resorted to serialising these two fields with JSON and storing them like that, but then it's difficult to search them / dynamically modify etc.
I'm just wondering if there's an obvious solution that I've missed for dynamic columns (some of which I might not have predicted yet).
I wondered further about dynamically creating a table for each pay / deduction per month and then doing a JOIN when searching but this seemed excessive.
Any help gratefully appreciated!.
My Answer:
In this case, it would be better to store the data in a normalized format in the database rather than serializing the arrays. You can create separate tables for pay and deductions and use a foreign key to link them back to the main payslip table. This way, you can easily query and search the data as needed.
You can also consider using a key-value store like Redis or MongoDB if you need more flexibility with dynamic columns. These databases allow you to store data in a more flexible format without the need to define a schema upfront.
Overall, storing the data in a normalized format in a relational database would be the most efficient and scalable solution for your needs.
Rate this post
4 of 5 based on 7999 votesComments