EG Information
Training Missions
Knowledge Bank
Pimp Us Out!
Has Enigma Group Helped You? Then Help Us By Advertising For Us. Place One Of The Following Images On Your Site And Create A Link Back To Enigma Group.
|
| |
Affiliates
|
|
Enigma Group's Articles
Return to Category Selection
Regular Expressions in PHP4 - Submitted By: Paradox 2008-08-18 00:48:41
Regular Expressions in PHP4 Regular Expressions are a big part of PHP people often skip because they look or seem to difficult, and is often not even attempted even with advanced programmers. Regular Expressions (Also know as Regexp or Regex) are using in many different other languages such as Perl and Python. What Exactly are Regular Expressions? Regular Expressions are a way of finding patterns inside a string. In this tutorial we are going to go over the ereg function in PHP. The format of it is ereg(pattern, string, array to put in); Why Use Regular Expressions? It is more of a professional, neater, and easier way (when you get used to them) to code. Look at the following line of code - <?php $days = "monday, tuesday, wednesday, thursday, friday, saturday, sunday"; $datearray = explode(" ", $days); foreach ($datearray as $day) { if ($day == "sunday") echo "Found string 'sunday'"; ?> Alright, that might be okay when you first start, and it does work to some extent, but there is a MUCH more easier way to do this. If you are reading this, you most likely know what explode() is, if not, stop reading and read up on PHP :p But what it does is cut-up each instance where there is a space, and puts them into an array (word[0], word[1], etc. etc.). and either way with that code sunday won't actually be found, it would actually be "sunday,". Now we have a difficult problem here don't we? Well take a look at what it would be with Regex - <?php $days = "monday, tuesday, wednesday, thursday, friday, saturday, sunday"; if (ereg("sunday", $days)) echo "Found string 'sunday'"; ?> Much simpler, and is actually correct! Now let's break it down a bit - <?php - starting tag $days = "monday, tuesday, wednesday, thursday, friday, saturday, sunday"; - The string the dates are held in if (ereg("sunday", $days)) echo "Found string 'sunday'"; - What that does is see if "sunday" is matched in the $days string, and if so, it returns TRUE, if not, it would return FALSE. We can also specify a third argument in ereg(): the name of an array, which is used to store successfully, matched expressions. We can modify the last example to make use of it like this: <?php $words = "one, two, three, four, five, six"; if (ereg("one", $words, $reg)) echo "Found string '$reg[0]'"; ?> Literal text is the simplest regular expression of all to look for, but we need not look for just the one word '. we could look for any particular phrase. However, we need to make sure that we match all the characters exactly '. words (with correct capitalization), numbers, punctuation, and even whitespace: <?php $words = "It's life Jim, but not as we know it!"; if (ereg("jim but", $words, $reg)) echo "Found string '$reg[0]'"; ?> This string won't match, because it's not an exact match for capital "J" and the comma. Similarly, spaces inside the pattern are significant: <?php $words1 = "The dog is in the kennel..."; $words2 = "...but the sheepdog is in the field"; $regexp = " dog"; if (ereg($regexp, $words1, $reg)) echo "Found string '$reg[0]'"; if (ereg($regexp, $words2, $reg)) echo "Found string '$reg[0]'"; ?> This will only find the first dog, as both ereg() calls are specifically looking for a space followed by the three letters "d", "o", and "g" Now this is just basic Regex, let's get into the more difficult stuff, the special characters. Special Characters As you saw above, a "." is a special character. What it does is it would get the next 3 characters after that. <?php $words1 = "The dog is in the kennel but the sheepdog is in the field."; $regexp = "kennel..."; if (ereg($regexp, $words1, $reg)) echo "Found string '$reg[0]'"; ?> This will return the following: Found string 'kennel bu' The following special characters are - These are the characters that are given special meaning within a regular expression, which you will need to backslash if you want to use literally: . * ? + [ ] ( ) { } ^ $ | \ Any other characters automatically assume their literal meanings. But wait a minute, what if I want to match a * or a . out of a string, what would I do? Well in regular PHP if you want to add a quote ("") you have to do a backslash (\) so it'll include it into the text without exiting the original quotes. Well the same goes for this... <?php $words1 = "The dog is in the kennel..."; $regexp = "kennel\.\.\."; if (ereg($regexp, $words1, $reg)) echo "Found string '$reg[0]'"; ?> would be the correct coding. Character Classes '. [xyz] These signify that any one of a set of characters is acceptable; we put the acceptable characters inside square brackets. For example, the regexp "w[ao]nder" will match against both the words "wander" and "wonder". Conversely, we can say that everything is acceptable except a given sequence of characters '. we can "negate the character class". To do so, the character class should start with a ^. For example, the regexp "^1234567890" will match against anything that isn't a number. If, like here, the characters we want to match form a sequence in the ASCII character set, we can use a hyphen to specify a range of characters, rather than spelling out the entire range. For instance, our last example can be rewritten as [^0-9]. Alternatively, a lower case letter can be matched with [a-z]. We can use one or more of these ranges alongside each other, so if you wanted to match a single hexadecimal digit, you could write [0-9A-F]. Note that the brackets contain the whole expression. If we used [0-9][AF] instead, we'd match a digit followed by a letter from A to F. Some character classes are going to come up again and again, like digits, letters, and various types of whitespace. There are some neat shortcuts for these . here are the most common ones, and what they represent: Shortcut Expansion Description \d [0-9] Digits 0 to 9 \w [0-9A-Za-z_] A "word" character. \s [ \t\n\r] A whitespace character. That is, a space, a tab, a newline or a return. Also the negative forms of the above: Shortcut Expansion Description \D [^0-9] Any non-digit \W [^0-9A-Za-z_] A non-"word" character \S [^ \t\n\r] A non-blank character Anchors So far, our patterns have all tried to find a match anywhere in the string. We can dictate where the match must occur . that is, we can say "these characters must match the beginning of the string" or "this text must be at the end of the string", by anchoring the match to either end. The two anchors we have are ^, which appears at the beginning of the pattern, anchoring a match to the beginning of the string, and $ appearing at the end of the pattern, anchoring it to the end of the string. So, to see if a string ends with a full stop (and remember that the full stop is a special character) we could use a regexp like this: "\.$". Likewise, we can use "^I" tell us if we have a capital "I" at the beginning of the string. Word Boundaries As we saw above, one problem we can have with trying to match against text is that words don't always sit neatly between two spaces. They may be followed or preceded by punctuation, or appear at the beginning or end of a string, or otherwise next to non-word characters. To help us properly search for words in such cases, we can use the special \b metacharacter. Like the anchors above, it doesn't actually match any character in particular '. rather, it matches the point between something that isn't a word character (either \W, or one end of the string) and something that is . hence \b for boundary. For example, we could look for one-letter words using the regexp "\s\w\s". Alternatives Instead of just giving a single series of acceptable characters, you may want to say "match either this or that". The "either-or" operator in a regular expression is just the same as the bitwise "or" operator: |. So to match either "yes" or "maybe" we'd just use the regexp "yes|maybe". Qualifiers What if we want to match against a set of characters that may occur once, may occur more than once, or may even not occur at all? Call in the qualifiers! The easiest of these is ?, which matches the immediately preceding character(s) or metacharacter(s) . if they either appear once, or not at all. It's a good way of saying that a particular character or group is optional. To match the word "he or she", you can therefore use "s?he". To make a series of characters (or metacharacters) optional, group them in parentheses: you can match either "man" or "woman" with the regexp "(wo)?man". As well as matching something one or zero times, you can match something one or more times. We do this with the plus sign +. to match an entire word without specifying how long it should be, you can use "\w+". If, on the other hand, you have something which may occur any number of times but might not be there at all (that is, zero or one or many) you need what's called "Kleene's star" (the * quantifier). So, to find a capital letter after any (but possibly no) spaces at the start of the string, we could use "^\s*[A-Z]". Let's review the three qualifiers: bea?t Matches either "beat" or "bet" bea+t Matches "beat", "beaat", "beaaat"¦ bea*t Matches "bet", "beat", "beaat"¦ Novice programmers tend to go to town on combinations of dots and stars, and the results often surprise them . bear the following in mind: A regular expression should seldom, if ever, start or finish with a starred character. You should also consider the fact that .* and .+ in the middle of a regular expression, will match as much of your string as they possibly can. Quantifiers Now say we want to match against a specific quantity of characters . three digits in a row, for example. The metacharacters we use to handle such situation are called quantifiers. If you want to be more precise about how many times a sequence of characters is repeated, you can specify maximum and minimum numbers of repeats in curly brackets: \s{2,3}" will match against "2 or 3 spaces". Omitting either the maximum or the minimum signifies "or more" and "or fewer" respectively. For example, {2,} denotes "2 or more", while {,3} is "3 or fewer". In these cases, the same warnings apply as for the star operator. Finally, we can specify exactly how many things are to be in a row by putting just that number inside the curly brackets. For example, "\b\w{5}\b" will match a five-letter word. So an example of checking if something was an e-mail format you would do - ^[^@ ]+@[^@ ]+\.[^@ \.]+$ That would make sure it was x@x.x, x x@x.x wouldn't be accepted, @@x.x wouldn't be accepted, ect. ect. Written by Paradox Bits referenced from a book by Wrox Return to Category Selection
If you wish to submit a comment, you must be a registered member and logged in. Login or Register.
Return to Category Selection
|
| |
|
|
Who Visited EnigmaGroup Today?
1508 Guests, 295 Users (193 Spiders)
g3nu1n3, Distorted, ant0601, BlAd373, nmobin27, myfabregas, spartanvedicrishi, DrOptix, saraf, VireekadiaFap, obencefoozy, memoryshot, mongrel88, drag0n, Kearstin29, litbk, alexelixir, r0z4, Abhinav2107, theanonymous21, greatg, CreedoFiegree, bivaEmilltite, posthuman01, Taireegaddita, Taicadine, c_a13, hizImmoli, scifics, slchill, KELATALFTUS, kynapse, Tonyui, Hackpad, Epilioptiop, Mamorite, IodindDog, brunoriversyhn, Effomeidonize, ReottphoffBom, arktek, burgeoningneophyte, TradaGreant, SlayingDragons, Waldlyeps, Arsenal, CJ_Omaha, Ryuske, thethird3y3, todayadvila, pwnpwnlolz, NeetaexomYgom, ookami-namikaze, dot_Cipher, Unotohumsmush, SaubymorRoyab, loltyg, Ausome1, Rik, hrangel, cyber-guard, Meonkzt, mori, 31415926, optioniLele, intorerse, FlifobbyFloks, Ios, Røgue, cossyDrybrich, IvanDimitriev, havisham, KIKNWING, fitz, fleeloCycle, hackboy302, strudels, CootoDorbeeft, gymnediny, hustleman9tv, comando300, Ysri13, thatoneguy, Paran0id, whoami, Pitanteerve, Reapon, cls777, Afrika, suetekh, somebody777, floontiny, Frudopvia, jasonbourne, zombiehack640, CloverCipher, spoosh, Fraubbova, rulebreaker, dncjor, Fintyoptots, viRuleNt, NipPaineHainy, TheHarrisonW, Jamesgo, TheGanjator, psychomarine, 1421carter, tingle65, claudius, Feld Grau, Partisan, Gunslinger, gydeqqzpn, yshiau, Zaccarato, chromoSone, priovasashCor, ellisp, GothicLogic, keetone, M0rdak, UsedDeteKef, nhorton, archestraty, HatriteBeft, JC06dc5, alpha1, spg, dark_void, wakazi, mtroscheck, TheCheeseDemon, ach.n30, sahariar, hervelegeraf, Psiber_Syn, hackaday, Mod777, neompenly, pollolololo, SnoopSky, Cigmimifs, ProloG-Shaman, unicornrainbow, cheapnikeshoxog, bobsters, foofthoorgo, polemarchos, avacraft, spencerwilliams23, lotato, ryanjcrook, dollerolf, robintenboden, rospark, WexEmbet, BeefSupreme, Hessesian, whydoyoulook, cdpirate, DnA-Ender, CaNcEr, zheincnoob, Vengeance987, justforfun363, RawTeefecycle, Squissesk, aVoid, SaMTHG, neodude, Marion1p, Ops, ddxc, Klosse, khamhou, samsatHD80, PauffPubadvic, AnnaNoult, SexyCreerve, newb1, robster1977, Blizer, Dudleypagrove, Mr_KaLiMaN, FirewallPenetrator, GMo, Seasharp, mrchicken1, Zaxem, N4g4c3N, MaxMeier, Ian, sander.ashwin, Predatorc, lonely.connection, ElEnfermado, wavyd, dirkdanblue, cve916, kalak55, a1los, jell0, Exclaw, veceattainc, Muselele, Mr Pacifist, stylish007, zach, closednetwork99, soroimmuror, PlaneReaction, Wamemanytex38, DieAble, d0seN_36b, jeremy.whitson, lol, nefeolnb, Noticon, statix, anandoump, RomeoG, advilapyday, snorapa, Gkjt, autotuneuser, beanulpinee, 2142, kiklopas, door51, Pizza, deepakkumar, makler2004, M4rcy, Xargos, bdkoenig, Blavatsky, m4f10, Huasca, itsme, xu_lain, Nikhil, ChewBigRed, samxoxo, incicaMaidits, toudioria, Chidokage, Jigoku, cesecyclelm, schn1ffl3r, sam20000, learning, kentora, San Marino, Nightraven, zanydouner, FrofErrodslot, FatalEror, wheaties, akki, AlexDiru, unclejos666, override101, blink_212, uncowstientee, lilkpoigogs, Innonaenupt607, Killshot, ZheIncKnight, ActictGlync, acarseflalk, ___, trashsporn, Memartent, Zoorsornaks, z3z3, heyhey123, Ghajnm, usaliaPels, Ordeptpen, pelly, quellense, Szuba, lamb, x1rt4m, ToutousaRulty, vipervince2002, mannavard1611, BinaryShinigami, Duchdund, afgnumgt, Anatissa, darkfire1515, bennyblanco5000, Mmmett50 |
| |
|
|
|
|
|