Title: Match URLs
Slug: match_urls
Summary: Match URLs
Date: 2016-05-01 12:00
Category: Regex
Tags: Basics
Authors: Chris Albon

Source: StackOverflow

Preliminaries


In [1]:
# Load regex package
import re

Create some text


In [14]:
# Create a variable containing a text string
text = 'My blog is http://www.chrisalbon.com and not http://chrisalbon.com'

Apply regex


In [18]:
# Find any ISBN-10 or ISBN-13 number
re.findall(r'(http|ftp|https):\/\/([\w\-_]+(?:(?:\.[\w\-_]+)+))([\w\-\.,@?^=%&:/~\+#]*[\w\-\@?^=%&/~\+#])?', text)


Out[18]:
[('http', 'www.chrisalbon.com', ''), ('http', 'chrisalbon.com', '')]