Join me on my ML Journey from novice to expert! with some nomadic meanderings along the way.

Datacamp, Machine Learning, Online Courses

How to Learn SQL

Here’s a comprehensive guide on how to go from a SQL beginner to a seasoned pro.

SQL overview

    • What is SQL?
    • Benefits of SQL
    • SQL vs. Excel
    • Who uses SQL?
    • SQL dialects

Assess

    • Take an online SQL skill assessment

Learn

    • Learn basic SQL syntax
    • Understand SQL query structure
    • Learn more advanced SQL syntax
    • Get started with SQL—without even needing to set up a database

Practice

    • Guided vs. unguided practice
    • SQL guidelines

Apply

    • Become production-ready
    • Import real-world data
    • Prepare for a SQL job interview

SQL overview

What is SQL?

SQL, also known as Structured Query Language, is a data programming language that was originally developed in the 1970s to isolate and return data from a database, but it’s come a long way since then! SQL (pronounced either as “sequel” or as “ess-que-ell”) has branched into a variety of dialects that all share basic syntax.

Many organizations use a relational database to store and process large amounts of data. With SQL, you can “query,” or ask questions of, the data in a relational database. SQL is the programming language used to communicate with these databases, and each database uses its own SQL dialect. So, the SQL dialect you learn depends on which database your company uses. Knowing how to work with SQL and relational databases is crucial for anyone aspiring to be a data analyst, data engineer, or data scientist, and helpful in many other roles such as web development or marketing.

Benefits of SQL

SQL has many benefits that contribute to its popularity:

  • It’s the lingua franca of everything related to data.
  • It’s easy to understand semantically.
  • It can be used to directly access large amounts of data without opening other applications.
  • Data analysis done in SQL is easy to audit and replicate, especially when compared to spreadsheet tools.

Additional resources:

  • Structured Query Language – Importance of Learning SQL | CodingSight
  • Advantages and Disadvantages of SQL | GeeksforGeeks

SQL vs. Excel

It’s important to note that SQL and Excel aren’t competitors—in fact, many business analysts link SQL in Excel to extract raw data. But SQL can handle big data much more easily than Excel. The two are very similar in a lot of ways—but SQL often requires less steps to accomplish the same tasks.

Similarities between SQL and Excel:

  • Both have the ability to create and organize data tables.
  • Both have the ability to filter your data based on certain criteria to only view what you need.
  • Both have functions to clean up messy data and prepare it for analysis.
  • Both can join multiple tables together. SQL uses JOIN statements to achieve this, whereas in Excel, you would use a lookup function such as VLOOKUP or INDEX/MATCH.

Here are some of the major differences between SQL and Excel:

SQL
Excel
Can handle many millions of rows
Can only handle a little over a million rows
Can handle calculations for large datasets with ease
Can lag and run slowly when performing calculations over large datasets
Is just a query language, not a visualization tool—extracted data must be exported to another program to create visualizations
Can be used to create charts, graphs, and plots

Additional resources:

  • Excel to SQL: Why You Should Make the Switch | Codecademy
  • How to Switch from Excel to SQL | Mode Analytics

Who uses SQL?

Anyone who works with data, particularly data stored in relational databases, can benefit from learning SQL. Common roles include:

  • Data scientist
  • Data engineer
  • Data analyst
  • Database administrator
  • Quality assurance tester
  • Web developer

SQL dialects

Although all SQL languages share a basic structure, some of the specific commands and styles can differ (kind of like how different regions of the U.S. can refer to a carbonated sugary beverage as either a “soda,” a “pop,” or a “coke”). With so many SQL dialects, it can be hard to know where to start. Knowing the different dialects and their common uses can help guide how you begin your SQL learning journey. Here’s a breakdown of some of the more common SQL dialects.

PostgreSQL

Also known just as “Postgres,” PostgreSQL is a free, open-source object-relational database management system. It’s often considered a good starting point to learn SQL, as it is easy to learn and access, and the closest to standard SQL syntax, which means it is easily adapted to other dialects. If you know you want to learn SQL, but don’t know which database you will end up using, we recommend starting with PostgreSQL.

MySQL

Another open-source and free dialect, MySQL has slight syntax differences than Postgres, and is commonly used with web applications and social media sites such as Twitter, Facebook, and YouTube. This is a good dialect to begin with if you want to become a web analyst.

SQLite

Another free, open-source SQL system. As the name implies, SQLite is intended to be lightweight and easy to install and use. SQLite is one of the most widely used database systems in the world, and another good choice to start with if you plan to go into web development.

Microsoft SQL Server

Like many Microsoft products, MS SQL Server has been adapted to use a Windows based filing system. This is a popular choice for many large businesses and industries, including computer software companies, finance, information system technologies, and healthcare.

Oracle SQL

Used by over 97% of Fortune 1000 companies, Oracle is both a product and a dialect, developed by Oracle Corporation and used with their Oracle SQL Server. This is another popular choice for many businesses.

Additional resources:

  • SQLite vs MySQL vs PostgreSQL: A Comparison Of Relational Database Management Systems | DigitalOcean
  • What Is a SQL Dialect, and Which One Should I Learn? | LearnSQL.com
  • The Many Flavors of SQL | Training-NYC
  • SQL Dialects Reference | Wikimedia

Assess

The first step in DataCamp’s learning methodology is to take an honest assessment of where your skills are right now.

Take an online SQL skill assessment

If you have some prior experience with SQL, a great starting point to gauge your competence is by taking an online assessment. Knowing how your skills compare can help you figure out the next steps in your SQL learning journey. Even if you are starting entirely from scratch, taking an assessment can help you figure out what you need to learn.

Additional resources:

  • Data Analysis in SQL Skill Assessment | DataCamp
  • SQL Quiz | W3Schools
  • SQL Online Test | TestDome

Learn

Once you assess your current skills, then you can focus your learning on where you want to improve. Learning SQL requires understanding how queries are executed and how to write queries with the proper syntax.

Learn basic SQL syntax

You’ll need to understand the basics of SQL syntax, including the functions SELECT, FROM, WHERE, GROUP BY, HAVING, ORDER BY, and LIMIT. The different SQL dialects use very similar syntax, with some notable exceptions.

Additional resources:

  • What is SQL? A Beginner’s Guide to the SQL Language
  • SQL Syntax | W3Schools
  • SQL Basics – Difference between WHERE, GROUP BY and HAVING clause | SQL with Manoj
  • Getting Started in SQL Live Training | DataCamp
  • Tutorial for Data Analysis | Mode

Understand SQL query structure

With SQL, the order of written code differs from the order of execution. So you’ll need to understand proper query structure and the order in which queries must be written. Queries are processed differently from how they’re written.

SQL order of execution

What the query looks like
How it’s executed
Why SQL works this way
1.SELECT
1.FROM
SQL starts with which table your query is taking data from.
2.FROM
2.WHERE
This is how SQL filters on rows.
3.WHERE
3.GROUP BY
This is where your SQL query checks if you have an aggregation.
4.GROUP BY
4.HAVING
HAVING requires a GROUP BY statement.
5.HAVING
5.SELECT
Only after all these calculations have been made will SQL “SELECT” which columns you want to see returned.
6.ORDER BY
6.ORDER BY
This sorts the data returned.
7.LIMIT
7.LIMIT
Lastly, you can limit the number of rows returned.

In terms of execution, a SQL query starts with the FROM statement, as you have to start with which table your query will be taking the data from. It will then filter on the rows using the WHERE statement, where you can choose from a variety of functions to narrow down and isolate your data.

If you are aggregating your data in any way, the next statement will be GROUP BY, (in many instances, you “group” your quantitative columns “by” a categorical one. After that, if you need to filter on your aggregated data, you would use the HAVING statement.

Only after all these calculations have been made will SQL “SELECT” which columns you want to see returned, and if you want them modified or aggregated in any way.

The last statements to execute will be ORDER BY, which will sort the data returned either in ascending or descending order, and LIMIT, which limits the number of rows returned.

TL;DR: Your query will always need a SELECT and a FROM statement (to identify which columns you want returned from which table)—the others are optional.

Additional resources:

  • SQL Query Order of Execution | Sisense
  • A Beginner’s Guide to the True Order of SQL Operations | jOOQ

Learn more advanced syntax

Basic SQL concepts can be picked up pretty quickly, but mastery of complicated concepts may take weeks to develop. In some roles, basic SQL syntax may be all you need to get the job done—but in other roles, you may need to dive deeper!

Aggregations

Similar to Excel, you can use SQL to aggregate your data by sum, average, maximum, minimum, and count. You can do these in the SELECT statement to return a simple aggregation, or use a GROUP BY statement to create an Excel “pivot table” out of your data.

For example, using the ‘FRUIT’ table below:

Product Quantity
Apple 3
Banana 5
Pear 2

You could use aggregate functions in the SELECT statement like this:

Query
Output
SELECT sum(quantity) FROM fruit
The sum of all values in the Quantity column – 10
SELECT avg(quantity) FROM fruit
The average of all values in the Quantity column – 3.33
SELECT max(quantity FROM fruit
The highest value in the Quantity column – 5
SELECT min(quantity) FROM fruit
The lowest value in the Quantity column – 2
SELECT count(quantity) FROM fruit
The count of all values in the Quantity column – 3

Additional resources:

  • SQL | String Functions | GeeksforGeeks
  • SQL Aggregate Functions | Mode

Unions and joins

A lot of the time, the data you need will not be stored in one giant table but will be spread out across multiple tables. Using SQL, you can merge these tables together to combine all the data you need in one place. Since UNION and JOIN deal with tables, they go in the FROM statement.

A UNION will basically stack two tables with identical columns on top of each other. This is useful for sales transaction data that may be broken into different tables by month, quarter, or year.

There are several kinds of commonly used JOINS—Inner, Outer, Left, and Exception. To put it simply, these will return different combinations of rows from the tables being joined together.

Additional resources:

  • SQL Joins | Mode
  • Inner vs. Outer Joins | Diffen

Case statements

A SQL CASE statement is similar to an Excel IF() function—if the data in a column matches a set criteria, then return “this”. This can be useful when reclassifying quantitative data as categorical, such as when breaking down a set of values into “High”, “Medium”, or “Low”. CASE statements go in the SELECT statement.

See also:

  • Case statements | Mode

Subqueries

Subqueries may be used in several different ways, but are usually found in either the FROM or the WHERE statement. They create a small, temporary table from your data that you can then use as a new table to query from in a different way.

See also:

  • Subqueries | Essential SQL

Dates and times

As with many other data languages, dealing with dates and times can be tricky. Sometimes dates act like “strings”, or pieces of information. Sometimes they can be treated as actual dates, with SQL syntax that can break the data down into units such as months or years. Since there are many different ways to do this, these functions are considered more advanced SQL syntax.

See also:

  • Date and Time Functions | SQLite

Get started with SQL—without even needing to set up a database

You can learn how to use SQL in production even before accessing a database. Here are some hands-on courses that introduce SQL concepts.

  • Learn SQL | Codecademy
  • Introduction to SQL | DataCamp
  • Intro to SQL: Querying and managing data | Khan Academy
  • Data Analyst with SQL Server career track | DataCamp
  • SQL Fundamentals skill track | DataCamp
  • Database Design | DataCamp

Practice

Practice makes perfect! You’ll want to find ways to practice in a safe SQL environment to determine if you’ve retained what you’ve learned before going out to apply your skills in the real world.

Guided vs. unguided practice

One of the best ways to learn any coding language is to find a “sandbox” to play around in. You’ll be able to test out your code, see if it runs, modify it, run it again, over and over until you feel comfortable with the coding principles and syntax. To start out, you may want a little more guidance—try finding a project with clear instructions and solution code to check your work.

Guided resources:

  • Solve SQL | HackerRank
  • Analyze International Debt Statistics | DataCamp

Unguided:

  • CodingGround
  • SQL Fiddle

Additional resources:

  • 12 Ways to Practice SQL Online | LearnSQL
  • The SQL Murder Mystery | Knightlab

SQL guidelines

When you’re working in the real world, business questions won’t likely translate neatly into SQL queries. That’s why you need to carefully assess what’s needed before even writing a single line of code. Here are some good guidelines to follow when practicing SQL.

Rewrite your business question as a comment first

Use code commenting to “brainstorm” your query first—what is the question you’re trying to answer? What pieces and elements do you need to focus on? This will guide your SQL query.

There are two kinds of commenting, inline and multi-line.

–Inline is just two hyphens and it lives on one line of code—good for short, quick notes

/*Multi-line commenting looks like this.

It’s good for creating banners that explain a query’s purpose, author, etc*/

Sketch out your query

Before you even start typing, you should figure out what pieces you’ll need for your query. You will always need a SELECT and FROM statement—but what else might you need? Do you need to filter the data using a WHERE statement? Will you need to use aggregate functions?

Understand the different ways to achieve your objective

When possible, try to have multiple ways of doing the same thing. That way, you can quality check yourself and validate if you can get the same results. For example, you can find the highest value in a column either by using the MAX() function in the STATEMENT, or by using ORDER BY to sort the data and then LIMIT to limit so only one row is returned.

Test your query

Build your query one line at a time and run it frequently to check that it works. Adding pieces one at a time also helps prevent you from having to go back through your code line by line in case it doesn’t work.

Profile your table

In order to start performing queries, you first must understand the data elements within it. Start out by viewing the entire database; then, a few columns at a time. Find the range of values, both categorical and quantitative, to get an idea of what information is contained in the table.

Keep a list of assumptions

When writing any kind of data code, you should always make a list of assumptions to easily track your limitations and dependencies.

Keep a data dictionary

Compile a central repository for the data elements you’re using, containing descriptions of the different tables and fields used within the analysis. This can include the different data types in each column (character, integer, money, date, etc), and a brief description of the column itself.

Always allocate time to perform data quality checks

The analysis doesn’t end when you’ve finished writing your query. You still have to understand your results and perform quality checks to make sure your analysis is correct. Try different ways of achieving the same result to see if you get something different—this is another time when inline commenting comes in handy!

Additional resources:

  • SQLZOO
  • Learn SQL | Codecademy
  • Data Analysis in SQL Live Training | DataCamp
  • String Manipulation in SQL Live Training | DataCamp

Apply

Now you’re ready to find some use cases for your new SQL skills. Become real-world application-ready by trying out real world scenarios and preparing to interview.

Become production-ready

To apply your SQL skills to real business scenarios, you’ll need access to a database.

Get access to your company’s preferred database

If your company already uses a database, you should go through the proper channels to install it on your system. Your SQL environment needs to be set up correctly in order for you to access and query company data. You (or your company) may be more comfortable working in a duplicate database that is not connected to live data at first—and that’s ok!

Install a free database

The best way to master SQL is by practicing in your own environment. If you don’t yet have a database, there are many free database vendors to choose from.

Options include:

  • PostgreSQL (PGadmin)
  • SQLite
  • MySQL
  • Microsoft SQL
  • Teradata Database
  • MongoDB

Import real-world data

Practice on your own company data

There’s no better place to start than on datasets that you already have. Find a business use case relevant to your role, like segmenting customers or merging datasets. You can use your own database, or get permission to download the data as a .csv and upload to your own SQL database.

Import open-source datasets

Public datasets are widely available for data scientists looking to explore and build models.

  • Kaggle
  • FiveThirtyEight
  • Data.gov
  • freeCodeCamp Open Data
  • Google Public Data Explorer
  • World Bank Open Data
  • TinyLetter—Subscribe for a weekly newsletter of datasets

Prepare for a SQL job interview

If you’re preparing for a job interview or cramming for an exam, you should review key SQL concepts. This means knowing the basic syntax and structure of SQL, as well as data cleaning and manipulation. In many cases, you will only need to learn how to query and extract data from a database, but some employers may ask you to create tables. Focus on practicing real world business scenarios, such as profiling a database, cleaning messy data, and creating new tables.

Additional resources:

  • SQL Interview Questions | InterviewBit
  • 41 Essential SQL Interview Questions | Toptal

Content retrieved from:
https://www.datacamp.com/community/blog/how-to-learn-sql
.

Leave a Reply