Regex Beginner Tutorial: Master Regular Expressions

Published: June 7, 2026 · 14 min read

What is Regex and Why Learn It?
Basic Pattern Matching
Character Classes
Quantifiers: Repetition Made Easy
Anchors and Boundaries
Groups and Capturing
Lookahead and Lookbehind
Common Regex Patterns
Practical Examples

What is Regex and Why Learn It?

Regular expressions, commonly called regex or regexp, are powerful patterns used to match character combinations in text. They're indispensable tools for any developer, data scientist, or anyone who works with text processing.

From validating email addresses and form inputs to parsing log files and extracting data from documents, regex enables sophisticated text manipulation that would otherwise require complex parsing code. Once you master the basics, you'll find countless opportunities to automate repetitive text tasks.

Major programming languages support regex, including JavaScript, Python, Java, C#, Ruby, PHP, and Go. While syntax details vary slightly between languages, the core concepts remain consistent across all implementations.

Basic Pattern Matching

At its simplest, regex matches literal characters exactly. The pattern hello matches the string "hello" anywhere in the text.

Literal Characters

Pattern: cat
Matches: "cat", "catalog", "concatenate", "scattered"
Does not match: "Cat" (case-sensitive by default), "dog"

Special Characters (Metacharacters)

Certain characters have special meaning in regex:

. ^ $ * + ? { } [ ] \ | ( )

To match these literally, escape them with a backslash:

Pattern: 3\.14
Matches: "3.14"
Does not match: "3x14", "3,14"

Case-Insensitive Matching

Most regex implementations support flags to modify behavior:

// JavaScript
/hello/i  // matches hello, Hello, HELLO, HeLLo

// Python
import re
pattern = re.compile(r'hello', re.IGNORECASE)

Character Classes

Character classes let you match one character from a set of possibilities. They're created with square brackets.

Basic Character Class

Pattern: [aeiou]
Matches: any vowel (a, e, i, o, u)
Matches in: "hello" → e, o
Matches in: "sky" → (none)

Character Ranges

[a-z]    matches any lowercase letter
[A-Z]    matches any uppercase letter
[0-9]    matches any digit
[a-zA-Z] matches any letter
[a-zA-Z0-9] matches any alphanumeric character

Negated Character Class

Use ^ inside brackets to negate:

[^0-9]   matches any non-digit
[^aeiou] matches any consonant
[^a-zA-Z] matches any non-letter

Shorthand Character Classes

\d  matches any digit [0-9]
\D  matches any non-digit [^0-9]
\w  matches word character [a-zA-Z0-9_]
\W  matches non-word character
\s  matches whitespace (space, tab, newline)
\S  matches non-whitespace
.   matches any character except newline

Practical Example: Phone Number

Pattern: \d\d\d-\d\d\d-\d\d\d\d
Matches: 555-123-4567
Matches: 800-555-1234

Pattern: \(\d{3}\) \d{3}-\d{4}
Matches: (555) 123-4567

Quantifiers: Repetition Made Easy

Quantifiers specify how many times a character or group should match.

Basic Quantifiers

*    zero or more
+    one or more
?    zero or one (optional)

Pattern: colou?r
Matches: "color", "colour" (u is optional)

Pattern: go+gle
Matches: "gogle", "google", "gooogle", "goooogle"
Does not match: "ggle"

Specific Count

{n}    exactly n times
{n,m}  between n and m times (inclusive)
{n,}   at least n times

Pattern: \d{4}
Matches: "2026", "1999", "1234"

Pattern: \d{3,5}
Matches: "123", "1234", "12345"
Does not match: "12", "123456"

Greedy vs. Non-Greedy

By default, quantifiers are greedy—they match as much as possible. Add ? for non-greedy (lazy) matching:

Input: <div>Hello</div><div>World</div>

Greedy: <div>.*</div>
Result: Matches entire string (too much!)

Non-greedy: <div>.*?</div>
Result: Matches "<div>Hello</div>" and "<div>World</div>"

Anchors and Boundaries

Anchors don't match characters—they match positions in the string.

Start and End Anchors

^    start of string (or line with /m flag)
$    end of string (or line with /m flag)

Pattern: ^Hello
Matches: "Hello World" ✓
Matches: "Say Hello" ✗

Pattern: World$
Matches: "Hello World" ✓
Matches: "World News" ✗

Word Boundaries

\b    word boundary
\B    non-word boundary

Pattern: \bcat\b
Matches: "the cat sat" → "cat"
Does not match: "concatenate", "scat"

Pattern: \bcat
Matches: "category", "cat", "caterpillar"

Practical Example: Whole Word Match

// Match only complete word "function"
Pattern: \bfunction\b
Matches: "function", "function()", "my function is"
Does not match: "functions", "functionality", "defunction"

Groups and Capturing

Parentheses create groups that capture matched text for later use.

Basic Groups

Pattern: (\d{4})-(\d{2})-(\d{2})
Input: "2026-06-15"
Captures: group(1)="2026", group(2)="06", group(3)="15"

Named Capture Groups

// JavaScript
const pattern = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/;
const match = "2026-06-15".match(pattern);
console.log(match.groups.year);   // "2026"
console.log(match.groups.month);  // "06"
console.log(match.groups.day);    // "15"

// Python
import re
pattern = r'(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})'
match = re.match(pattern, '2026-06-15')
print(match.group('year'))   # "2026"

Non-Capturing Groups

Use (?:...) for grouping without capturing:

// Only capture the full date, not the separators
Pattern: (\d{4})-(?:0[1-9]|1[0-2])-(\d{2})
Captures: year, day only (month group is non-capturing)

Alternation with Groups

Pattern: (cat|dog|bird)
Matches: "cat", "dog", "bird"

Pattern: file\.(jpg|png|gif)
Matches: "file.jpg", "file.png", "file.gif"

Backreferences

Reference captured groups within the pattern:

Pattern: (\w+)\s+\1
Matches: "the the", "word word" (repeated words)
Does not match: "the cat"

Pattern: <(\w+)>.*?<\/\1>
Matches: "<div>content</div>", "<span>text</span>"
Does not match: "<div>content</span>" (mismatched tags)

Lookahead and Lookbehind

Lookarounds assert patterns without consuming characters. They're zero-width assertions that check surrounding context.

Positive Lookahead (?=...)

Pattern: \d+(?= dollars)
Matches: "100 dollars" → "100" (only if followed by " dollars")
Does not match: "100 euros"

Negative Lookahead (?!...)

Pattern: \d+(?! dollars)
Matches: "100 euros" → "100" (only if NOT followed by " dollars")
Does not match: "100 dollars"

Positive Lookbehind (?<=...)

Pattern: (?<=\$)\d+
Matches: "$100" → "100" (only if preceded by "$")
Does not match: "100 dollars"

Negative Lookbehind (?<!...)

Pattern: (?<!\$)\d+
Matches: "100 dollars" → "100" (only if NOT preceded by "$")
Does not match: "$100"

Real-World Example: Extract Price Values

// Extract numbers NOT already prefixed with currency symbol
Pattern: (?<!\$)\b\d+\.?\d{0,2}\b

Input: "Price is $50, or 50 dollars, or just 100"
Matches: "50" (first one already has $), "100"

Common Regex Patterns

Pattern	Regex	Use Case
Email	`[\w.-]+@[\w.-]+\.\w+`	Basic email validation
URL	`https?://[\w.-]+(?:/[\w./-]*)?`	HTTP/HTTPS URLs
Phone (US)	`\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}`	US phone numbers
Date (ISO)	`\d{4}-\d{2}-\d{2}`	YYYY-MM-DD format
IPv4	`\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}`	IP addresses
Hex Color	`#[0-9A-Fa-f]{6}`	Hex color codes
Strong Password	`(?=.[A-Z])(?=.[a-z])(?=.*\d).{8,}`	8+ chars, upper, lower, digit

Practical Examples

1. Validate Email Address

function isValidEmail(email) {
    const regex = /^[^\s@]+@[^\s@]+\.[^\s@]+$/;
    return regex.test(email);
}

isValidEmail("[email protected]"); // true
isValidEmail("invalid@");          // false
isValidEmail("@example.com");      // false

2. Extract All URLs from Text

const text = "Visit https://example.com or http://test.org for more info";

const urlRegex = /https?:\/\/[\w.-]+(?:\/[\w./-]*)?/g;
const urls = text.match(urlRegex);
// ["https://example.com", "http://test.org"]

3. Replace Numbers with Formatted Version

const text = "The price is 1000 dollars";

const formatted = text.replace(/\d+/, (num) => {
    return Number(num).toLocaleString();
});
// "The price is 1,000 dollars"

4. Split and Parse CSV-like Data

const data = "John,Doe,35,New York";
const parts = data.split(/,\s*/);
// ["John", "Doe", "35", "New York"]

5. Remove HTML Tags

const html = "<p>Hello, <strong>World</strong>!</p>";
const text = html.replace(/<[^>]+>/g, '');
// "Hello, World!"

Test your regex patterns instantly with the JieBang Regex Tester—perfect for debugging and experimenting with complex patterns.

Try Regex Tester Online →