What is regex and why is it important?
Bu metinde her yazılımcının bilmesi gereken Regex'i inceliyoruz.
What Are Regular Expressions (Regex) and Why Are They Important?
Regular Expressions (Regex) is a powerful tool used to identify and search for patterns within text. We can also say that regex is a special sequence of characters that identifies certain patterns within the text and helps us find matching texts.
So why should we learn regex?
Actually, the reason is very simple... Regex is a separator that you can use anywhere that contains string variables. The best part is that when I ask you about popular programming languages right now, you can understand RegEx in all the languages you can think of. For this reason, someone who knows RegEx well will have no problem with string data.
How does Regex work?
The statements you give us will return the matching data (as “true” to the matching ones and “false” to those that don't match) by hovering them first in character, then matching them with the entire statement we gave.
Let's look at the examples.
First, you can define RegEx with 2 slash (/) keys, i.e.Blablabla /
And you can write this function in the same way as in some software languages. However, in some you can use it as a function input; that is, like RegExp ('regex', 'flag')... By looking at the software language you are working on and the regex function, you can quickly understand how it works.
In RegEx, we sometimes want to enable certain features, for example, select more than 1, not consider uppercase letters, do not consider rows, include Turkish characters... We call the field where we specify these characteristics “flag”. For example: /blablabla/gmis.
Now let's quickly explain the flags, after which we can start describing the contents of the regex.
g (global): It is the most commonly used flag. Selects multiple statements that return (match) “true”. However, Regex usually works 95% the same everywhere, but that 5% difference could steal an hour or maybe a day. Therefore, it is usually better not to open this flag on strings that contain a single block of data.
m (multiline): Used to treat each line as a single string value.
i (insensitive): Does not distinguish uppercase.
u (unicode): Puts characters other than English letters in our Turkish and other languages with A-Z alpha numeric characters. That is, if you are cleaning a Turkish data, you should use the unicode flag, otherwise characters such as “ç” may not match properly. This flag provides language independence.
You do not need to use the flags after this, because they are usually not used except in specific cases. Now let's move on to the meta characters.
Yes, it's time to fill the inside of Regex. So, what are we going to fill it with? With meta characters... The “meta” here is used, which is not real, but rather in the sense of a marker or condition. Let's take a look at these characters!
.: The dot character represents any character. Let's say, we have a text like “dana mana me” and we want to get all of those texts. To express to the first character that there can be any character. we use it. For example,. We say /main/g ().
(): Grouping includes a group of characters that contain one or more conditions into a single group. Thus, this group of characters can behave as if they were a single character. For example, let's group 4 values with any character: (...).
With these phrases, we can easily match parts that contain texts such as “dana”, “mana” and “me”.
+: If the character next to itself is the same character as itself, or if it is a group of characters that we allow, or if it is a group of characters that we have taken to any group, this will continue to select it until it reaches false. And this counts all the characters he chooses as a single group. For example, let's take a group of 4 characters combined until the end of 4'4: (...) +.
? : Removes the requirement to be the character or group of characters it ends with. So, our string data containing “dana mana me trouble” might have the phrase “lila” at the beginning, if so, include it as well/(lila)? (...) We can say +/.
*: + and? ' It is a combination of. Congratulations, now you can quickly extract any group of simple words.
[]: Yes, you can think of it as a dot, but it matches the characters we choose (true) rather than any character. For example, to select strings with a 4-digit character with the 2nd character “e” from the characters with “dana hera bela” /. (e..) We can use +/g. But the characteristics of this friend do not end there.
The square brackets also include alpha numeric letters and numbers if you write expressions such as a-z, A-Z, 0-9.
Note: Each character in square brackets in it counts as normal characters, except only A-za-z0-9.
Let's say that the 1st character of the string value “mana bina bena” is any character, the 2nd character only covers the letters “a” and “e”, and the other 2 characters have the character “na”. We can use/[ea] na/ to express this situation.
[^]: This meta character allows us to select characters other than those in which it is included. For example, the 1st character of the string value “main building bena” can be any character. Let's say the 2nd character is any character except the letters “e” and “a”. To express this situation /(. [^ea] na) we can use /g.
{}: Fancy brackets allow us to select a character or group of characters one or more or infinite times. For example, if we want to select only 3 digits, it will be enough to write\ d {3}. However, let's say we want to select at least 3 at most 11 word characters, then we can use\ w {3,11}. Or if we want to select at least 3 words “yipyipyipyip...” written side by side, we can use\ w {3,} to express this situation.
^: It is a meta character that can only be used at the beginning of Regex or at the beginning of a group. String is used to provide a specific state of the metadata characters at the beginning of the data. For example, we can use it to get only the first word at the beginning of the word “RocknRoll” and use /^\ w+/g if we want to print the same word 3 lines below. This statement takes only the first word in the top line. If we want to do the same operation for all rows we can use /^\ w+/mg.
The $: Dollar meta character is used to check a specific state at the end of string data. For example, if we want to get only the last word of the word “RocknRoll”, we can use/\ w+$/ to express this situation.
|: The “or” meta character is used to mean that one or more groups of characters can meet that condition. For example, if the word “cats” or “dogs” appears in the string phrase “I love cats but hate snakes”, we can use /I love (cats|dogs) but hate snakes/g to select the whole phrase.
\: The backslash meta character is used to search for custom meta characters inside string data. For example, +, *,? to search for special meta characters as normal characters such as\? we can use it. If we want to call a custom meta character (such as only texts, just numbers) we can call it by adding a character next to that meta character. However, I will show you these special meta characters. First of all,”https://app.patika“in the link”https://app.patika“Let's take the part
\ w: the word meta character A-Z a-z selects all characters between 0-9 and _.
/\ w+/g
\ W: selects non-word characters that are the opposite of the word meta character.
/\ w+/g
\ d: selects the character that is only a number.
\ D: selects all characters that are not numbers.
\ s: selects characters with spaces only.
\ S: selects all characters without spaces.
Since you've come this far like this you can solve a test:)
\ b: When the Boundary meta character precedes the expression we write, it says that the statements preceding it should not be a character from a word (\ w) group. Although it comes to the end of the statement, we say that the character after itself should not be any A-za-z0-9.
Let's say we want to find “s” characters without word characters at the end of the string data “word boundaries are odd win to war happiness again”. We can use /s\ b/g to express this situation.
\ B: The non-Boundary meta character is literally the opposite of boundary (\ b). It has to be any word character.
\ K: Once this meta character finds an expression or group in the string value we are looking for, our starting point becomes the end of the value we found instead of the beginning of the string value. Let's say, after finding the first 3 digits in our string yield “123,456,789", we want to select other digits and commas. To express this situation, we can use /^\ d {3}\ K [,0-9] +/g. This expression finds the first three digits and then selects the numbers and commas that follow.
\ 1,\ 2: Backreference is used to call the same expression of a group. It works because it is used when we cannot write the same of any xxx numbered group into the condition, but when we reference the same of that condition. These meta characters are very effective in rare cases, such as groups of words that repeat each other.
Let's say you're trying to pull “Testing <B><I>bold italic</I></B> text.” from within an HTML file. < (B|I) >If you are <b><i>trying to draw with the inscription “bold italic” inside and its elements, use/(.*?) to express this situation <\/\ 1>You can use /g. This statement <B>selects <I>content that starts with or and ends with the same tag. Yes, it must be admitted that it was difficult at the beginning...
Since we also learned about the meta characters from BackSlash, here like this Let's leave a test.
We've come to parenthesized meta characters...
(? =...): The lookahead meta character specifies that the condition of this meta character (true) must first be met in order for the word before it to be selected. Let's say we want to get the word “foo” in the string data “foobar foobaz”, but the word “foo” which is “base” at the end. To express this situation, /foo (? =bar) we can use /g.
(?!...) Negative lookahead is the opposite of lookahead. Let's say we want to get the word “foo” in the string data “foobar foobaz”, but at the end the word “foo”, which is not “base”. To express this situation, /foo (?! bar)/we can use.
(? <=...): The lookbehind meta character is used when we want to get a character or group of characters, we mean that we have to have the following character or group of characters in front of it. Let's say we want to get the word “bar” with “fuu” in front of “foobar fuubar” in our string yield. To express this situation,/(? We can use <=fuu) bar/.
(? <!...) Negative lookbehind is the opposite of lookbehind. Let's say we want to get the word “bar” without “fuu” in front of our “foobar fuubar” string yield. To express this situation,/(? <! fuu) bar/ we can use.
(? :...): This meta character combines matches that are selected first, matched, and then grouped again. Let's say we want to get the “match that” data in the string “match this match that” into a group, but during the match we want to have it somehow match with the phrase “match this” and make a return. To express this situation, /match this (? We can use :match that) /g. This expression combines the part that matches “match this” and then matches “match that” into a group.
(? <HelloWorld>..): This meta character is used to name groups. That is, the directional brackets are written at the beginning of the group, and in it we can write that name whatever name we want to give this group. For example, if we want to divide a 9-digit number into three groups and get each of them in separate groups:
/(? <milyonlar>\ d {3}) []? (? <binler>\ d {3}) []? (? <yuzler>\ d {3})/
This expression forms a group of millions, a group of thousands, and a group of faces.
Below are regex expressions with widely used examples:
Selecting phone numbers:
1234567890
123-456-7890
123,456,7890
(123) 456-7890
+1 123,456,7890
regex =/((? <area>\ +\ d {1,2}) [-])? \ (? (? <operator>\ d {3})\)? [-]? (? <main>\ d {3}) [-]? (? <number>\ d {4}) /gm
Select date:
14/02/2018
14-02-2018
February 14, 2018
2/14/18
regex=/(? <day>([0-9] {2})) ([\/\ -\.]) (? <month>([0-9] {2})) ([\/\ -\.]) (? <year>([0-9] {2,4})) /mg
Url secmek:
[https://www.patika.dev~reactogreninogretin]
[https://www.klasikyazilimci.com,php-oldu-abi-artik]
[https://www.youtube.com/kodluyoruz|patika youtube channel]
regex=/(? <url>(? <=\ [) (.*) (? = [~,\ |])) [~|, |\ |] (? <title>(? <= [~|, |\ |]) (.*) (? =\]))? /gm
More examples of codewars regex You can solve the tests.
With Patika+, you learn the software with the best instructors, accompanied by a one-on-one simulation of your business life. Apply now, start your software career regardless of your age: https://www.patika.dev/patikaplus
Because he wrote this text Mehmet Yagiz MaktavThank you. You can check out our student's blog: https://academy.patika.dev/tr/blogs/detail/her-yazilimcinin-bir-gun-ihtiyaci-olan-sey-regex-nedir -