Regex Beginner Tutorial: Master Regular Expressions
Table of Contents
What is Regex and Why Learn It?
Regular expressions, commonly called regex or regexp, are powerful patterns used to match character combinations in text. They're indispensable tools for any developer, data scientist, or anyone who works with text processing.
From validating email addresses and form inputs to parsing log files and extracting data from documents, regex enables sophisticated text manipulation that would otherwise require complex parsing code. Once you master the basics, you'll find countless opportunities to automate repetitive text tasks.
Major programming languages support regex, including JavaScript, Python, Java, C#, Ruby, PHP, and Go. While syntax details vary slightly between languages, the core concepts remain consistent across all implementations.
Basic Pattern Matching
At its simplest, regex matches literal characters exactly. The pattern hello matches the string "hello" anywhere in the text.
Literal Characters
Pattern: cat
Matches: "cat", "catalog", "concatenate", "scattered"
Does not match: "Cat" (case-sensitive by default), "dog"
Special Characters (Metacharacters)
Certain characters have special meaning in regex:
. ^ $ * + ? { } [ ] \ | ( )
To match these literally, escape them with a backslash:
Pattern: 3\.14
Matches: "3.14"
Does not match: "3x14", "3,14"
Case-Insensitive Matching
Most regex implementations support flags to modify behavior:
// JavaScript
/hello/i // matches hello, Hello, HELLO, HeLLo
// Python
import re
pattern = re.compile(r'hello', re.IGNORECASE)
Character Classes
Character classes let you match one character from a set of possibilities. They're created with square brackets.
Basic Character Class
Pattern: [aeiou]
Matches: any vowel (a, e, i, o, u)
Matches in: "hello" → e, o
Matches in: "sky" → (none)
Character Ranges
[a-z] matches any lowercase letter
[A-Z] matches any uppercase letter
[0-9] matches any digit
[a-zA-Z] matches any letter
[a-zA-Z0-9] matches any alphanumeric character
Negated Character Class
Use ^ inside brackets to negate:
[^0-9] matches any non-digit
[^aeiou] matches any consonant
[^a-zA-Z] matches any non-letter
Shorthand Character Classes
\d matches any digit [0-9]
\D matches any non-digit [^0-9]
\w matches word character [a-zA-Z0-9_]
\W matches non-word character
\s matches whitespace (space, tab, newline)
\S matches non-whitespace
. matches any character except newline
Practical Example: Phone Number
Pattern: \d\d\d-\d\d\d-\d\d\d\d
Matches: 555-123-4567
Matches: 800-555-1234
Pattern: \(\d{3}\) \d{3}-\d{4}
Matches: (555) 123-4567
Quantifiers: Repetition Made Easy
Quantifiers specify how many times a character or group should match.
Basic Quantifiers
* zero or more
+ one or more
? zero or one (optional)
Pattern: colou?r
Matches: "color", "colour" (u is optional)
Pattern: go+gle
Matches: "gogle", "google", "gooogle", "goooogle"
Does not match: "ggle"
Specific Count
{n} exactly n times
{n,m} between n and m times (inclusive)
{n,} at least n times
Pattern: \d{4}
Matches: "2026", "1999", "1234"
Pattern: \d{3,5}
Matches: "123", "1234", "12345"
Does not match: "12", "123456"
Greedy vs. Non-Greedy
By default, quantifiers are greedy—they match as much as possible. Add ? for non-greedy (lazy) matching:
Input: <div>Hello</div><div>World</div>
Greedy: <div>.*</div>
Result: Matches entire string (too much!)
Non-greedy: <div>.*?</div>
Result: Matches "<div>Hello</div>" and "<div>World</div>"
Anchors and Boundaries
Anchors don't match characters—they match positions in the string.
Start and End Anchors
^ start of string (or line with /m flag)
$ end of string (or line with /m flag)
Pattern: ^Hello
Matches: "Hello World" ✓
Matches: "Say Hello" ✗
Pattern: World$
Matches: "Hello World" ✓
Matches: "World News" ✗
Word Boundaries
\b word boundary
\B non-word boundary
Pattern: \bcat\b
Matches: "the cat sat" → "cat"
Does not match: "concatenate", "scat"
Pattern: \bcat
Matches: "category", "cat", "caterpillar"
Practical Example: Whole Word Match
// Match only complete word "function"
Pattern: \bfunction\b
Matches: "function", "function()", "my function is"
Does not match: "functions", "functionality", "defunction"
Groups and Capturing
Parentheses create groups that capture matched text for later use.
Basic Groups
Pattern: (\d{4})-(\d{2})-(\d{2})
Input: "2026-06-15"
Captures: group(1)="2026", group(2)="06", group(3)="15"
Named Capture Groups
// JavaScript
const pattern = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/;
const match = "2026-06-15".match(pattern);
console.log(match.groups.year); // "2026"
console.log(match.groups.month); // "06"
console.log(match.groups.day); // "15"
// Python
import re
pattern = r'(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})'
match = re.match(pattern, '2026-06-15')
print(match.group('year')) # "2026"
Non-Capturing Groups
Use (?:...) for grouping without capturing:
// Only capture the full date, not the separators
Pattern: (\d{4})-(?:0[1-9]|1[0-2])-(\d{2})
Captures: year, day only (month group is non-capturing)
Alternation with Groups
Pattern: (cat|dog|bird)
Matches: "cat", "dog", "bird"
Pattern: file\.(jpg|png|gif)
Matches: "file.jpg", "file.png", "file.gif"
Backreferences
Reference captured groups within the pattern:
Pattern: (\w+)\s+\1
Matches: "the the", "word word" (repeated words)
Does not match: "the cat"
Pattern: <(\w+)>.*?<\/\1>
Matches: "<div>content</div>", "<span>text</span>"
Does not match: "<div>content</span>" (mismatched tags)
Lookahead and Lookbehind
Lookarounds assert patterns without consuming characters. They're zero-width assertions that check surrounding context.
Positive Lookahead (?=...)
Pattern: \d+(?= dollars)
Matches: "100 dollars" → "100" (only if followed by " dollars")
Does not match: "100 euros"
Negative Lookahead (?!...)
Pattern: \d+(?! dollars)
Matches: "100 euros" → "100" (only if NOT followed by " dollars")
Does not match: "100 dollars"
Positive Lookbehind (?<=...)
Pattern: (?<=\$)\d+
Matches: "$100" → "100" (only if preceded by "$")
Does not match: "100 dollars"
Negative Lookbehind (?<!...)
Pattern: (?<!\$)\d+
Matches: "100 dollars" → "100" (only if NOT preceded by "$")
Does not match: "$100"
Real-World Example: Extract Price Values
// Extract numbers NOT already prefixed with currency symbol
Pattern: (?<!\$)\b\d+\.?\d{0,2}\b
Input: "Price is $50, or 50 dollars, or just 100"
Matches: "50" (first one already has $), "100"
Common Regex Patterns
| Pattern | Regex | Use Case |
|---|---|---|
[\w.-]+@[\w.-]+\.\w+ | Basic email validation | |
| URL | https?://[\w.-]+(?:/[\w./-]*)? | HTTP/HTTPS URLs |
| Phone (US) | \(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4} | US phone numbers |
| Date (ISO) | \d{4}-\d{2}-\d{2} | YYYY-MM-DD format |
| IPv4 | \d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3} | IP addresses |
| Hex Color | #[0-9A-Fa-f]{6} | Hex color codes |
| Strong Password | (?=.*[A-Z])(?=.*[a-z])(?=.*\d).{8,} | 8+ chars, upper, lower, digit |
Practical Examples
1. Validate Email Address
function isValidEmail(email) {
const regex = /^[^\s@]+@[^\s@]+\.[^\s@]+$/;
return regex.test(email);
}
isValidEmail("[email protected]"); // true
isValidEmail("invalid@"); // false
isValidEmail("@example.com"); // false
2. Extract All URLs from Text
const text = "Visit https://example.com or http://test.org for more info";
const urlRegex = /https?:\/\/[\w.-]+(?:\/[\w./-]*)?/g;
const urls = text.match(urlRegex);
// ["https://example.com", "http://test.org"]
3. Replace Numbers with Formatted Version
const text = "The price is 1000 dollars";
const formatted = text.replace(/\d+/, (num) => {
return Number(num).toLocaleString();
});
// "The price is 1,000 dollars"
4. Split and Parse CSV-like Data
const data = "John,Doe,35,New York";
const parts = data.split(/,\s*/);
// ["John", "Doe", "35", "New York"]
5. Remove HTML Tags
const html = "<p>Hello, <strong>World</strong>!</p>";
const text = html.replace(/<[^>]+>/g, '');
// "Hello, World!"
Test your regex patterns instantly with the JieBang Regex Tester—perfect for debugging and experimenting with complex patterns.
Try Regex Tester Online →