I need to normalize some texts (product descriptions) in regard to the correct usage of .
,,
,:
symbols (no space before and one space after)
The regex I've come up with is this:
$variation['DESCRIPTION'] = preg_replace('#\s*([:,.])\s*(?!<br />)#', '$1 ', $variation['DESCRIPTION']);
The problem is that this matches four cases it shouldn't touch:
- Any decimal number, like 5.5
- Any thousand separator, like 4,500
- A "fixed" phrase in Greek,
ό,τι
- The ellipsis symbol,
...
- Basically ellipsis is a totally special case, that I'm thinking should be taken care of in a separatepreg_replace
maybe? I mean, the three dots should be treated as one thing, meaning thatsome text ...
should indeed be matched and converted tosome text...
but not tosome text. . .
Especially for the numeric exception, I know it can be achieved with some negative lookahead/lookbehind but unfortunately I can't combine them in my current pattern.
This is a fiddle for you to check (the cases that shouldn't be matched are in lines 2, 3, 4).
EDIT: Both of the solutions posted below work fine, but end up adding a space after the last fullstop of the description. This is not much of a problem, as earlier in my code, I was taking care of the <br />
s and spaces at the beginning and end of the description, so I moved this preg_replace before that one...
So, the final code I ended up using is this:
$variation['DESCRIPTION'] = preg_replace('#\s*([:,.])(?!(?<=\d.)\d)(?!(?<=ό,)τι)\s*#ui', '$1 ', $variation['DESCRIPTION']);
$variation['DESCRIPTION'] = preg_replace('#^\s*(<br />)*\s*|\s*(<br />)*\s*$#', '', $variation['DESCRIPTION']);
So the only thing that's left to achieve is alter this code so that it treats the ellipsis the way I describe above.
Any help with this last requirement will be very much appreciated! TIA
My Answer:
match a valid email address:
php
$email = "example@example.com";
if (preg_match("/^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$/", $email)) {
echo "Valid email address";
} else {
echo "Invalid email address";
}
This regex pattern checks for a valid email address format, including alphanumeric characters, special characters like ".", "_", "%", "+", and "-", and a valid domain name with at least 2 characters.
Rate this post
3 of 5 based on 3216 votesComments