Title: Match URLs
Slug: match_urls
Summary: Match URLs
Date: 2016-05-01 12:00
Category: Regex
Tags: Basics
Authors: Chris Albon

Source: StackOverflow

Preliminaries



In [1]:

    
# Load regex package
import re

Create some text



In [14]:

    
# Create a variable containing a text string
text = 'My blog is http://www.chrisalbon.com and not http://chrisalbon.com'

Apply regex



In [18]:

    
# Find any ISBN-10 or ISBN-13 number
re.findall(r'(http|ftp|https):\/\/([\w\-_]+(?:(?:\.[\w\-_]+)+))([\w\-\.,@?^=%&amp;:/~\+#]*[\w\-\@?^=%&amp;/~\+#])?', text)









    Out[18]:





[('http', 'www.chrisalbon.com', ''), ('http', 'chrisalbon.com', '')]