6th December 2022

What is Regex? How to use? Regex Examples

Regex (Regular Expression) is a kind of algorithm that allows us to easily find a phrase in texts. It allows us to find the phrase we want to find in long sentences with the patterns we will use. We use Regular Expression in the programming world to be able to do what we think of in a short time without a crowd of code. Especially those dealing with log and dlp work have to use these expressions to reach the values ​​they will parse. You can use this structure in almost all modern programming languages. With the examples we will do, you can visualize the regex patterns that you can use in your next projects. Now let’s try to understand this issue with our examples. Below is a visual version of the command for us to better understand the logic of Regex. If you want a command to appear in the form of a map like this, you can see it by typing it on https://regex101.com.

HOW REGEX WORKS IN PYTHON?

Regex commands are generally found in the “re” module in python. You can get the commands from this library by saying “import re”. The Re module offers us several functions. The purpose of these functions is to search for the string or character we want. Then he accesses it. You can see examples of functions in the Re module below.

  • Findall()
  • Search()
  • Split()
  • Sub()

Findall() Function

The findall () function returns a list with all its matches. The list contains matches in the order in which they are found. If no match is found, an empty list is returned.

Findall () Function
Findall () Function

 

Search() Function

The Search () function searches for a string to match, and if it matches, it returns the object it matches. If there is more than one match, it returns only the first one found. The sample code looking for the first space in the string is as follows. If no match is found, the value returns None. So it will return empty.

Search () Function
Search () Function

 

Split() Function

The Split () function array returns a split list every time it matches. Thanks to this list, we can make our writing or printing operations easier. Below you can find the code example for splitting each space character.

Split () Function
Split () Function

 

Maxsplit() Function

You can check the number of events that occur using the maxsplit function. Below you can find the code sample that separates it according to the desired state.

Maxsplit() Function
Maxsplit() Function

 

Sub() Function

The sub () function allows you to print the character or text you want instead of matching it. Below is the code sample replacing each space character with 9.

Sub() Function
Sub() Function

 

Match Object

It gives us information about the search and the result. Sometimes it also explains when there is no matching result. If there is no match, it returns the value instead of the match object. You can find an example below.

Match Object
Match Object

 

Flags of the Regex Library

This structure, which we call flag or options, means the setting of our regex patterns. Although the picture below is similar for each programming language, the differences are also quite present. Therefore, check the options of the regex library of the platform you are using.

Flags of the Regex Library
Flags of the Regex Library

 

“Global” Flag

When not used, it returns only the first found result. It never returns other values. If we don’t use this Flag, we don’t need to use the array structure. Because only the zeroth index will have a value. You can see an example of this in the picture below.

"global" Flag
“global” Flag

 

“Unicode” Flag

It helps us solve the Turkish character shortage. It will automatically recognize characters such as Ç, Ş, Ğ, Ö, even if we do not write our pattern. It is an important flag. We can only use this flag with the “\ w” pattern.

LEARN MORE  Some Scenarios for DLP (Data Loss Prevention) POC

Regex Meta Characters

You can try the functions of the following characters by using google’s https://regex101.com site. You can shape it according to your own wishes by using more than one character.

[abc] Meta Character

Matches the letters a, b, and c enclosed in parentheses. You can type any letter or number you want here. When we examine the example below, we see that the regex pattern finds results from all of the text. Assuming we keep them in array format;

array [0] = b

array [1] = c

will continue in the form.

[abc] Meta Character
[abc] Meta Character

[^abc] Meta Character

Whichever letters or numbers are written in the parentheses will match any other letters or numbers. When we examine the following regex example, we see that it takes all the characters except the character a, b or c. Considering that we store this data in the array structure as in the first example; Index 0 will give the result B, index 1 will give the result u.

[^abc] Meta Character
[^abc] Meta Character

[a-z] Meta Character

Retrieves all characters that include and between characters in parentheses. Here it takes the letters from a to z. It does not take uppercase letters, Turkish characters and numbers as shown in the example below.

[a-z] Meta Character
[a-z] Meta Character

[a-zA-Z] Meta Character

It allows us to take all characters except Turkish characters from small a to big A.

[a-zA-Z] Meta Character
[a-zA-Z] Meta Character

“.” Meta Character

Allows us to take all characters (including spaces) except Newline. When we examine the example below, we see that he selected all the characters separately. The dot (.) Character points to us all characters.

"." Meta Character
“.” Meta Character

 

“\s” Meta Character

Indicates the space or tab character. This pattern can come to mind from the initials of the word space. As you can see in the example below, it has marked all the space characters separately.

"\s" Meta Character
“\s” Meta Character

 

“\S” Meta Character

Indicates all characters except spaces or tab characters. The example below gives the opposite result. For example, we can use this when we want to choose the sentence word by word.

"\S" Meta Character
“\S” Meta Character

 

“+” Meta Character

Indicates a situation where the expression to the left is at least one or more. In the example below, we see that he has selected each word individually. In this case, the \ S character marks every character except space and tab. When we use it with the + (plus) operator, he chose that sentence until he saw the space character and gave us the chance to choose a word.

"+" Meta Character
“+” Meta Character

 

“\d” Meta Character

You can actually guess. The meaning of the letter d means digit. When we use this operator, we choose the number character. In the example, he has selected each number character separately.

"\d" Meta Character
“\d” Meta Character

 

“+” Meta Character

Indicates a situation where the expression to the left is at least one or more. In the example below, we see that he has selected each word individually. In this case, the \ S character marks every character except space and tab. When we use it with the + (plus) operator, he chose that sentence until he saw the space character and gave us the chance to choose a word.

"+" Meta Character
“+” Meta Character

 

Now we will use the (/d) operator, which chooses the number character, together with the plus (+) operator. Yes, when we look at our example below, we see that you have selected the number clauses separately.

\d+
(/d+) operator
(/d+) operator

 

“\D” Meta Character

If you want to do the opposite of this situation, we’ll use the \D clause. So the \D pattern points to characters other than numbers. I will not do an example of this.

\D
\D operator
\D operator

 

\D+
\D+ operator
\D+ operator

 

LEARN MORE  What is Cross-Site Scripting (XSS) Vulnerability?

“\w” Meta Character

Returns all numbers and letters, meaning the same as [a-zA-Z0-9_]. The only difference is that \ w also takes Turkish characters. [a-zA-Z0-9_] does not receive Turkish characters. It does not take characters other than numbers and numbers as shown in the picture. Unicode Flag only works on this pattern.

\w
"\w" Meta Character
“\w” Meta Character

 

“\W” Meta Character

It yields all characters except letters and numbers. (“\ W” is the opposite).

\W
"\W" Meta Character
“\W” Meta Character

 

“\v” Meta Character

It gives new rows and vertical tabs. It works with Unicode. You can add vertical tabs in some word processors using CMD/CTRL+ENTER. It is not a very common command.

\v
"\v" Meta Character
“\v” Meta Character

 

“\ddd” Meta Character

Equates eight-bit characters with the assigned octal values ​​and returns them to us. Type the code of the character you want to look at the table and get it instead of ddd. You can get help from the “Octal Character Table” here. You can reach the limit below. https://www.utf8-chartable.de/unicode-utf8-table.pl?utf8=oct

\056
"\ddd" Meta Character
“\ddd” Meta Character

 

“[\b]” Meta Karakter

This operator signals that the sentence ends with the statement to the left. You can examine the example below.

[a-z]+r\b
“[\b]” Meta Karakter
“[\b]” Meta Karakter

“\” Meta Character

This character allows us to get the true value of a metacharacter or delimiter.

\.
"\" Meta Character
“\” Meta Character

 

“(a|b)” Meta Character

It matches part a or b of the subexpressions and shows us there. Its use is not very common.

(a|b)
"(a|b)" Meta Character
“(a|b)” Meta Character

 

“?” Meta Character

Indicates whether the character preceding this operator or not. You can see this in the example below.

sa?
"?" Meta Character
“?” Meta Character

 

“(? #…)” Meta Character

Any text appearing in this group can be ignored in the regex. Another option is to allow the x flag to # comments. This flag also causes the regex to ignore whitespace.

(?#...)
"(? #…)" Meta Character
“(? #…)” Meta Character

 

“(?…)” Meta Character

This structure (…) is very similar to the structure but it doesn’t give us anything as it does. Cannot be used in the same mission.

(?:al)
"(?…)" Meta Character
“(?…)” Meta Character

 

“(?P<name>…)” Meta Character

With this command, we can capture the capture group by using the name given instead of a number. Alternative methods are (? <name>…) and (? ‘Name’…). You can also use these methods while using PCRE.

(?P<name>Ömer)
“(?P<name>…)” Meta Character
“(?P…)” Meta Character

 

“(?imsxXU)” Meta Character

This statement allows the regex flags to be set inside the expression. You can also set flags using a minus sign. (?-i)

“(?(1)yes|no)” Meta Character

This command tries to match the first left of the capture group. If it doesn’t match the left one, it matches the right one. Usually used.

(a short)?(?(1) a crowd|of code)
“(?(1)yes|no)” Meta Character
“(?(1)yes|no)” Meta Character

 

“(?P=name)” Meta Character

This command is a command specific to python. Captures text that matches a predefined capture group. It can be very useful when combined with other commands. In the picture below, it is combined with other commands.

(?P<named_group>systemconf.com)[a-z ]+(?P=named_group)
“(?P=name)” Meta Character
“(?P=name)” Meta Character

 

“(? =…)” Meta Character

Although the command is like this, its usage is usually (…(?=…)). This command tells you that the given subpath can be matched without using a character. It can be in the form of sample usage (system(?=conf)).

(system(?=conf))
"(? =...)" Meta Character
“(? =…)” Meta Character

 

“(?!…)” Meta Character

Makes the given pattern mismatch, starting from the current position in the expression. It does not consume character. We can also say the opposite of the (? =…) command.

“(?<=…)” Meta Character

Although the command is like this, it is usually used in the form (?<=system)conf. Returns the place in the expression that ends in the current position of the specified pattern with this method. The pattern should have a fixed width. It does not consume any characters.

(?<=system)conf
(?<=system)conf
(?<=system)conf

 

LEARN MORE  Possible Errors While Running the MSF Psexec Exploit Module and Causes of These Errors

“(?<!…)” Meta Character

Ensures that the specified pattern matches that end at the current position in the expression. The pattern should have a fixed width. Here, too, the general usage can be used like “(?<!not)conf“.

(?<!not)conf
"(?<!...)" Meta Character
“(?<!…)” Meta Character

“a?” Meta Character

It matches or does not match any character you typed in place of a.

"a?" Meta Character
“a?” Meta Character

 

“a+” Meta Character

Matches any character we type in place of a with consecutive characters typed one or more times.

"a+" Meta Character
“a+” Meta Character

 

“a{3}” Meta Character

It tells you how many consecutive characters are left of the parentheses. The number in parentheses tells us how many consecutive characters there are.

"a{3}" Meta Character
“a{3}” Meta Character

 

“a{3,}” Meta Character

This pattern is actually similar to the 4.29 pattern, but here the number you write in the brackets job indicates at least that much. Here, as an example, it allows us to take 3 and more than 3 consecutive written a’s.

"a{3,}" Meta Character
“a{3,}” Meta Character

 

The characters used at the beginning and the end of the template have nothing to do with the pattern here. The system perceives it like this. If you fill in after the comma, it returns the character that has repeated so often, including the numbers you filled.

"a{3,6}" Meta Character
“a{3,6}” Meta Character

 

“^“ Meta Character

Matches the beginning of the string without consuming any characters. Matches after newline characters in multi-line text or structs. It does the same thing as the (\A) command.

“^“ Meta Character
“^“ Meta Character

 

“$” Meta Character

Matches the end of the string without consuming any characters. Matches after newline characters in multi-line structures. In short, it gives the end of the line. It does the same thing as the (\Z) command.

"$" Meta Character
“$” Meta Character

 

“g” Meta Character

It tells the engine not to stop after the first match has been found, but to continue until it finds no more matches.

“m” Meta Character

This command is actually like a combination of two commands. A combination of the newline (^) and line ending ($) commands. It makes sure that each line in turn matches from the beginning to the end. It allows us to get the whole line.

“\0” Meta Character

Returns a string containing the exact match result of Regex.

“$1” Meta Character

This command returns a string with the content in the first capture group. In this case, the number 1 can be any number as long as it corresponds to a valid capture group.

“\t” Meta Character

This command adds one tab character.

“\x20” Meta Character

You can use hexadecimal elements to add any character to the replacement string using standard syntax.

“*” Meta Character

This operator, on the other hand, does not have the statement to the left, but if there is, it allows it to select all of them if more than one is coming together. We will be able to understand this better with an example. In the example below, the character s is strictly, and the character can or may not be.

Note: The difference between the + character and the + character must be the character to the left of it at least once.

"*" Meta Character
“*” Meta Character

 

Regex Examples

Phone Number Format Example

You can return a phone number such as “5xx-xxx-xx-xx” with the following command.

“5[0-9]{2}-[0-9]{3}-[0-9]{2}-[0-9]{2}”
Phone Number Format Example
Phone Number Format Example

 

Serial Number Format Example

You can return a serial number in the form of “b34f12345” with the command below.

(?i)[a-z]{1}[0-9]{2}[a-z]{1}[0-9]{5}
Serial Number Format Example
Serial Number Format Example

 

Blood Type Format Example

You can return a blood type sample in the form of “Blood type: a rh +” with the command below. You can return all blood group samples with this command.

\b(?i)(blood)(\s)(type){1}(\s|\:\s|\-|\-\s)(A|B|AB|B|0)(| )(R|r)(H|h)(| )(\+|-)
Blood Type Format Example
Blood Type Format Example

 

Leave a Reply

Your email address will not be published. Required fields are marked *