


For other options, check out the pandas installation guide.) The easiest way to do this is to download Anaconda and work through this tutorial in a Jupyter notebook. (To work through the pandas section of this tutorial, you will need to have the pandas library installed. You’ll also get an introduction to how regex can be used in concert with pandas to work with large text corpuses ( corpus means a data set of text). (If you need a refresher on any of this stuff, our introductory Python courses cover all of the relevant topics interactively, right in your browser!)īy the end of the tutorial, you’ll be familiar with how Python regex works, and be able to use the basic patterns and functions in Python’s regex module, re, for to analyze text strings. In this tutorial, though, we’ll learning about regular expressions in Python, so basic familiarity with key Python concepts like if-else statements, while and for loops, etc., is required. Regular expressions can be used across a variety of programming languages, and they’ve been around for a very long time! This can make cleaning and working with text-based data sets much easier, saving you the trouble of having to search through mountains of text by hand. Regular expressions (regex) are essentially text patterns that you can use to automate searching through and replacing elements within strings of text. In this tutorial, we’re going to take a closer look at how to use regular expressions (regex) in Python. Often, this means number-crunching, but what do we do when our data set is primarily text-based? We can use regular expressions. Diving headlong into data sets is a part of the lesson for anyone working in data science.
