SQL: Structured Query Language
SQL Basics
All good relational databases speak SQL - Structured Query Language. This is how you tell the database to do things like create and destroy tables, insert and update data and so on. There aren't that many SQL commands, as there's a limited number of things you can do with data. There's an ANSI standard covering the SQL language.
| This tutorial covers the basic commands and concepts. Refer to the MySQL Reference Manual for the complete syntax of any feature. |
Here are the basic SQL commands:
The first three are data definition language (DDL) commands - for changing the database structure...
- create: creates (databases), tables, indexes
- alter: change the structure of tables and indexes. Alter database settings (e.g. turn logging on.)
- drop: delete (databases), tables, indexes
The following data manipulation language (DML) commands change data.
- truncate: delete all rows from a table
- insert: insert whole rows of data
- update: change a value in a column in one or more rows.
- delete: delete rows
- select: get data from the database.
The data manipulation commands have the following basic syntax:
- insert into table (list of columns) values (list of values);
e.g. insert into customers (customer_id, name, email) values (4, "Fred B", "fredb@mymail.com");
- update table set column = value where (list of conditions);
e.g. update customers set name = "Fred Bloggs" where customer_id = 1;
- delete from table where (list of conditions);
e.g. delete from customers where name = "Fred Bloggs";
- select (list of columns you want) from (list of tables) where (list of conditions) group by (list of conditions);
e.g. select name from customers where customer_id = 1;
There are a few commands dedicated to setting up user access permissions, called privileges. You won't need to use this feature to program most websites. Only if you're setting up a company database that lots of people will access directly (i.e. not through web pages that you've written yourself) will you need the set up the privilege system.
The syntax and use of most of the commands is simple. 'select' is the most interesting command as you use it to retrieve data. A query always starts with 'select'...
Here's some examples of queries you could write for our simple 2 table example (orders and customers)...
If you want to see all the orders you'd got, you'd write:
select * from orders;
To see the everything about orders that 'Bob' has made:
select * from orders where customer_id=1;
To see how much you've made in total:
select sum(price) from orders;
To see how much you made in March
select sum(price) from orders
where timestamp between '01-03-01' and '31-03-01';
To just see the number of orders in March:
select count(*) from orders
where timestamp between '01-03-01' and '31-03-01';
To see the products bought by 'Bob' in March:
select product_id from orders
where timestamp between '01-03-01' and '31-03-01'
and customer_id=1;
To see which products you have sold:
select distinct(product_id) from orders;
The distinct keyword means you only want unique values returned. In the above query, we want to see each product_id we've sold just once. If there were no 'distinct' keyword, you'd get one row returned for every sale you made. i.e. you'd get the same product_id returned lots of times.
Group Commands
To see how many of each product we sold, and how much money it made:
select product_id, count(product_id), sum(price)
from orders
group by product_id;
This leads us onto the concept of aggregation:
The following group commands which determine which order the data is brought back in.
- group by: group the data together as long as this value stays the same.
(e.g. sum the sales for a particular salesperson, or year, or area).
- order by: order the rows returned by the values in a column / columns.
(e.g. order by time, number, alphabetically). ASC and DESC define whether to return the data in ascending or descending order.
- having: only include groups of data that meet these criteria (e.g. more than 10 items in the group).
These come after the where clause of the select command. They allow sets of data to be ordered, and grouped together or aggregated. For example, you would use these commands for a query which gets the total sales, per month for the whole year:
select sum(price) from orders
where date between '01-01-2001' and '31-12-2001'
group by month(date);
Note, 'sum' is a function which adds lots of values together, 'month' is a function which returns just the month in the date supplied.
Or, to find all customers who have made more than 6 orders with you.
select customer_id from customers c, orders o
where c.customer_id = o.customer_id
group by o.customer_id
having count(*) > 6;
Other aggregate functions are:
- avg: Average (Mean)
- min: Minimum
- max: Maximum
- std: Standard Deviation.
Table Joins
Table joins are how relational databases attain their flexibility and power. You join tables when the results you want need information out of two or more tables.
If you wanted to know the email addresses of everyone who bought product '1', you couldn't get it straight from 'orders' as email address isn't in the orders table, it's in the customers table...
Well, we have a product_id in the orders table; that also contains the id of the customer who bought that product. The 'customers' table contains the id and the email address of the customer...
So, we'd issue a query with a table join in it to get the data we want:
select email from customers c, orders o
where c.customer_id = o.customer_id and o.product_id=1;
When you need data from more than one table, you have to join them. You do this by telling the database which are the common columns. In the above example it's ... where c.customer_id = o.customer_id... - the 'customer_id' column in both tables.
Note, we're using table aliases in the SQL - the 'c' & 'c.', & 'o' & 'o.' - here to clarify exactly which table the columns in the query are referring to.)
You use an outer join if you want the database to fill in blank rows in a query.
If we want to see a list of all of the customers with the number of orders they made this month, then this query will not give us that:
select customer_id, count(product_id)
from orders
where date between '01-01-2001' and '31-01-2001'
group by product_id;
What it gives us is a list of the customers who bought something, but not the ones who didn't...
What we need is to outer join the orders table to the customers table:
select c.customer_id, count(o.product_id)
from orders o LEFT JOIN customers c
ON o.customer_id = c.customer_id
where o.date between '01-01-2001' and '31-01-2001'
group by o.customer_id;
Database Design
During the design of a database, you have to decide how to store your data. Each table should ideally contain data that's never repeated elsewhere. This is a process called normalisation.
Another way of looking at this is to say that each individual entity has it's own table.
You could also say that the data in a table must be directly related to the primary key column. Anything that's sometimes not related to the key shouldn't be in there. This is why, in the examples, we don't have all the customer details in the orders table, but in a separate customers table.
It is appropriate to have the customer details in the orders table if we only ever sell one product, once to each customer. If you want to have a website where customers only ever pay once, this is perfectly adequate.
If you also need to track things other than orders for your customers. (i.e. incoming communications such as complaints, outgoing communications such as marketing promotions) then it's much simpler if you have a 'customers' table with all the details (address, company, age etc...). Then you just refer to the customer's unique key (such as an email address or even a unique customer id) in the related tables.
The power and flexibility comes from the normalisation of the data. As the data is separated into logical entities, it takes less space because there's less redundancy, and you can perform arbitrarily complex queries on it... You don't have to know all the questions you're going to ask before you design it, you just organise it in the most logical way so when you want to interrogate the database - it's easy.
For example, consider the following database:
Orders
| Column Name |
Type |
| order_id |
integer |
| timestamp |
date |
| product_id |
integer |
| customer_id |
integer |
| product_id |
integer |
|
Customers
| Column Name |
Type |
| customer_id |
integer |
| name |
string |
| addr1 |
string |
| addr2 |
string |
| etc... |
... |
|
Products
| Column Name |
Type |
| product_id |
integer |
| name |
string |
| description |
string |
| price |
decimal |
|
Complaints
| Column Name |
Type |
| complaint_id |
integer |
| customer_id |
integer |
| product_id |
integer |
| status |
character |
| complaint |
text |
| date |
date |
|
To see the products which people have complained about, and the complaint:
select p.product_id, p.name, c.complaint
from products p, complaints c
where c.product_id = p.product_id;
This query would give an output something like:
| Product_id |
Name |
Complaint |
| 1 |
M. Jones. |
product was faulty... |
| 5 |
Mrs J. Fox |
I ordered it in pink, but... |
To see the product details for everything each customer has bought:
select c.name, p.product_id, p.name
from products p, customers c
where c.product_id = p.product_id
order by c.customer_id;
This query would give an output something like:
| Customer |
Product ID |
Product |
| Bob |
3 |
Book 3 |
| Fred |
1 |
Book 1 |
| Bill |
1 |
Book 1 |
We could even write a query which joins all four tables:
A customer called 'Andrews' or similar has rung to complain that his previous complaint hasn't been answered, so you run the following query to find out what's in the database...
select o.order_id, o.timestamp, c.name, p.name, x.complaint, x.status
from orders o, products p, customers c, complaints x
where UPPER(c.name) like "%ANDREW%"
and o.customer_id = c.customer_id
and p.product_id = o.product_id
and x.customer_id = c.customer_id;
The UPPER function changes text to upper case. This ensures you get the data no matter what case the name was entered in.
The 'like' operator matches text with wildcards in it... The '%' wildcard symbol means 'any number of any characters'. So this match will find any customers whose names have the text 'andrew' in them.
The remainder of the where clause is simply joining one table to the next, until all the relationships are established. The result of the query could look like this:
| OrderID |
Date |
Customer |
Product |
Complaint |
Status |
| 25 |
3/4/2000 |
John F. Andrews |
Book 3 |
Missing pages... |
O |
| 29 |
3/4/2000 |
Andrew Frank |
Report Pack |
Late Delivery... |
C |
You see from the result of the query that the first entry is the one you want, and that the complaint is still at status 'O'pen...
As you can see from the examples, database design and use is a fairly straightforward process. You now know what databases do, why they do it, and how they do it. You can now design, build and use your own databases.
Here are some general hints:
Try to keep your designs as simple and intuitive as possible. If you find yourself building a complicated design, stop and analyse the problem again. There's always a simple and elegant solution to every problem.
Try to give things meaningful, but short names. When programming, you will have to type table, column, and variable names all the time. It's easier to type 'desc' than 'description'. Do aim for consistency in your naming schemes.
Start small and test everything as you go. There are always unforeseen complications.
SQL Resources
Books (from Amazon.com)
See the PHP page for books on using MySQL with PHP. |
MySQL Pocket Reference by George Reese |
An excellent reference. If you use MySQL, you know why it's the world's most popular open source database: small, fast, and powerful. There are always some little-known commands and clauses that could help you become more efficient in your work if you could just keep them in mind. The MySQL Pocket Reference can lead you straight to the answers you need to know to get the job done. |
| Online Resources |
| SQL.org |
A website devoted to SQL - in all it's glory! SQL is so powerful and so intuitive, that once you get the hang of it - I guarantee you'll find it fun! |
| A gentle introduction to SQL |
A great site with beginners tutorials etc. SQL doesn't have to be scary! |
| About.com |
A series of articles on SQL. About.com provide fairly good info on all sorts of topics. |
| A Brief History Of SQL |
An online history... |
|
|