Company Database Creation
In the previous blog we created a simple table. Let’s dive deeper in this blog.
A primary key is an identifier that references a column in which each value is unique. In the example below, we shall create an employee table with employee details. We will set the employee ID as the primary key. This is unique and is used to identify each employee thus cannot be repeated or assigned to multiple employees.
CREATE TABLE employee(
emp_id INT PRIMARY KEY,
first_name VARCHAR(40),
last_name VARCHAR(40),
birth_day DATE,
sex VARCHAR(1),
salary INT,
super_id INT,
branch_id INT
);
The above command creates emp_id as the primary key. A value once entered under it, cannot be repeated as such would result in an error.
A foreign key on the other hand is a field within a table that is a primary key in another table. This key acts as a link connecting one table to another. This implies that we should have multiple tables.
I’m going to create tables with each containing different data. If you watched and enjoyed The Office as much as I did, you’re going to find this data relatable.
By now you’ve realized that SQL commands are not case sensitive, thus create table and CREATE TABLE give the same results. Let us now create another table and call it branch.
create table branch(
branch_id int primary key,
branch_name varchar(40),
mgr_id int,
mgr_start_date DATE,
foreign key(mgr_id) references employee(emp_id) on delete set null
);
In the above table, branch_id is the primary key and mgr_id(manager id) is the foreign key and it references emp_id, a primary key in the employee table. This simply implies that an employee is also a manager. The mgr_start_date uses the date data type. In the code above, we’ve used ON DELETE SET NULL. This is an important SQL command that implies if an entry set as a foreign key is deleted, NULL replaces its position as the new entry. The other command we’ll see shortly is ON DELETE CASCADE. This implies that if an entry in a row is deleted then the entire row is deleted.
Below we’re linking the two tables by adding foreign keys by using the ALTER TABLE command. We’re altering the employee table by making the branch_id a foreign key referencing the branch_id in the branch table and the super_id (supervisor id) referencing the emp_id in employee table. We could only do this after creating the two tables first.
--we're linking the two tables created
alter table employee
add foreign key(branch_id)
references branch(branch_id) on delete set null;
alter table employee
add foreign key(super_id)
references employee(emp_id) on delete set null;
Next we’ll create another table client. This will store the clients’ information. Here we’re able to make branch_id the foreign key because the two tables, client and branch, already exist.
create table client(
client_id int primary key,
client_name varchar(40),
branch_id int,
foreign key(branch_id) references branch(branch_id) on delete set NULL
);
Let’s create the last two tables, works_with and branch_supplier;
--this table shows which clients an employee works with
CREATE TABLE works_with(
emp_id int,
client_id int,
total_sales int,
primary key(emp_id, client_id),
foreign key(emp_id) references employee(emp_id) on delete cascade,
foreign key(client_id) references client(client_id) on delete cascade
);
--this table shows what each branch is supplied with and by whom
create table branch_supplier(
branch_id int,
supplier_name varchar(40),
supply_type varchar(40),
primary key(branch_id, supplier_name),
foreign key(branch_id) references branch(branch_id) on delete cascade
);
To create comments in SQL, we can use -- (2 dashes) or /**/.
--this is a comment
/*this is also a comment*/
In the previous blog we had added info to a table using INSERT INTO table VALUES(). Let’s now added info into our tables created above. We’re going to do this in order because the database tables are linked and interact with each other. Step one is to enter the first employee’s details, who’s a branch manager at branch_id 1 called Corporate. We’ll do this in the order below;
insert into employee values(100, 'David', 'Wallace', '1967-11-17', 'M', 250000, NULL, NULL);
insert into branch values(1, 'Corporate', 100, '2006-02-09');
Once we’ve entered this, we then update the employee table under the branch_id column with the details above. We use the command below. This simply says in the employee table we have the emp_id column which has a row with the entry value of 100. Somewhere in this row we have an entry for branch_id with an entry value. It is this entry we are updating from its current entry (NULL) to 1. This is done as below;
update employee
set branch_id = 1
where emp_id = 100;
We can now update the other entry values in our tables as below;
insert into employee values(101, 'Jan', 'Levinson', '1961-05-11', 'F', 110000, 100, 1);
INSERT INTO employee VALUES(102, 'Michael', 'Scott', '1964-03-15', 'M', 75000, 100, NULL);
INSERT INTO branch VALUES(2, 'Scranton', 102, '1992-04-06');
UPDATE employee
SET branch_id = 2
WHERE emp_id = 102;
INSERT INTO employee VALUES(103, 'Angela', 'Martin', '1971-06-25', 'F', 63000, 102, 2);
INSERT INTO employee VALUES(104, 'Kelly', 'Kapoor', '1980-02-05', 'F', 55000, 102, 2);
INSERT INTO employee VALUES(105, 'Stanley', 'Hudson', '1958-02-19', 'M', 69000, 102, 2);
-- Stamford
INSERT INTO employee VALUES(106, 'Josh', 'Porter', '1969-09-05', 'M', 78000, 100, NULL);
--branch
INSERT INTO branch VALUES(3, 'Stamford', 106, '1998-02-13');
UPDATE employee
SET branch_id = 3
WHERE emp_id = 106;
INSERT INTO employee VALUES(107, 'Andy', 'Bernard', '1973-07-22', 'M', 65000, 106, 3);
INSERT INTO employee VALUES(108, 'Jim', 'Halpert', '1978-10-01', 'M', 71000, 106, 3);
-- BRANCH SUPPLIER
INSERT INTO branch_supplier VALUES(2, 'Hammer Mill', 'Paper');
INSERT INTO branch_supplier VALUES(2, 'Uni-ball', 'Writing Utensils');
INSERT INTO branch_supplier VALUES(3, 'Patriot Paper', 'Paper');
INSERT INTO branch_supplier VALUES(2, 'J.T. Forms & Labels', 'Custom Forms');
INSERT INTO branch_supplier VALUES(3, 'Uni-ball', 'Writing Utensils');
INSERT INTO branch_supplier VALUES(3, 'Hammer Mill', 'Paper');
INSERT INTO branch_supplier VALUES(3, 'Stamford Labels', 'Custom Forms');
-- CLIENT
INSERT INTO client VALUES(400, 'Dunmore Highschool', 2);
INSERT INTO client VALUES(401, 'Lackawana Country', 2);
INSERT INTO client VALUES(402, 'FedEx', 3);
INSERT INTO client VALUES(403, 'John Daly Law, LLC', 3);
INSERT INTO client VALUES(404, 'Scranton Whitepages', 2);
INSERT INTO client VALUES(405, 'Times Newspaper', 3);
INSERT INTO client VALUES(406, 'FedEx', 2);
-- WORKS_WITH
INSERT INTO works_with VALUES(105, 400, 55000);
INSERT INTO works_with VALUES(102, 401, 267000);
INSERT INTO works_with VALUES(108, 402, 22500);
INSERT INTO works_with VALUES(107, 403, 5000);
INSERT INTO works_with VALUES(108, 403, 12000);
INSERT INTO works_with VALUES(105, 404, 33000);
INSERT INTO works_with VALUES(107, 405, 26000);
INSERT INTO works_with VALUES(102, 406, 15000);
INSERT INTO works_with VALUES(105, 406, 130000);
We can run various commands to view our tables’ data.
select * from employee;
This returns the employee table with all columns and rows as below;
Now let’s see what our other tables look like;
select * from branch;
This returns all the info under the branch table;
select * from client;
select * from works_with;
select * from branch_supplier;
Awesome! Our sample company database is complete. We can now use our database to answer questions or seek to display specific data that we are interested in.
The following are examples of questions we might ask ourselves;
What are the total sales achieved by each employee? To answer this we shall use the works_with table. We can write the query below;
select emp_id, sum(total_sales),
from works_with group by emp_id;
We’re selecting the emp_id and the sum of their total sales from total_sales column, from the works_with table, and grouping it by emp_id. This returns;
The other question is to find all employees ordered by salary.
We’ll use the employees table for this;
select * from employee order by salary desc;
order by command organizes the data by the salary column. The desc command orders the salary in descending order. To have the data in ascending order we run the command without the desc part. The above query returns;
Show all employees that are female.
select * from employee,
where sex = "F";
This selects all the columns from the employee table but it will show only show results where under the sex column the entry is “F” for female;
If we need the employee id, first name, salary and sex salary ordered by employee id, then we can query this as follows;
select emp_id, first_name, salary, sex,
from employee,
order by emp_id;
The above query is similar to the query below;
select emp_id, sex, first_name, salary from employee order by emp_id;
This returns;
The emp_id column is in ascending order and the displayed table is ordered by employee number.
We can run a query and limit the number of returns we get using LIMIT. The question being ‘Select the first 6 employees from the employee table’;
select * from employee limit 6;
This returns the 1st 6 employees on the employee table;
We can order for info but with alias column names. Let’s call emp_id employee_number and first_name forename and last name surname. This is achievable using AS. This is only temporary and doesn’t rename the columns in the table;
select emp_id as empoloyee_number, first_name as forename, last_name as surname from employee;
This returns the following. Note the column names have changed;
This is important in data cleaning as it enables us to name columns in ways we can easily understand for a given task without altering the parent database.
To find out all the different genders we use DISTINCT as follows;
select distinct sex from employee;
This returns the different sexes;
This is important and can be used to identify the different values stored in a given column.
We can use the COUNT() formula as well to count the number of employees. This can be done by the following query sequence;
select count(emp_id) from employee;
This returns 9, which is the number of employees in the employee table;
COUNT() is used to give a count of entries in a database table that actually have values in them.
We can use count() and group by to know the exact number of male and female employees. We can do this as follows;
select count(sex), sex from employee group by sex;
This returns the number of female and male employees grouped by the sex column;
Wildcards
Wildcards are different ways we can use to define patterns when we need to identify specific data. There are two wildcards in SQL;
% - this represents any number of characters
_ - this is represents a single character.
An example would be looking up employees with a first name starting with J. Here the first name starts with J followed any number of characters (unspecified).
LIKE is a special SQL keyword that’s used together with wildcards. The query would be;
select * from employee where first_name like "J%";
This returns;
To get employees with first name starting with J but with 2 characters after we run the following query;
select * from employee where first_name like "J__";
This returns;
Union
UNION is a special SQL operator used to combine results of multiple SELECT statements into one. For it work, we MUST use equal number of columns and the columns should use the same data type, e.g. int, string.
From our table we can run the query below;
select first_name from employee
union
select branch_name from branch;
This returns a list of all employee first names and branch names;
The query combines both first name and branch name under one column titled firts_name. This is because from the first SELECT statement the column included is first_name.
A more applicable use of UNION would be to select the sum of money spent by the company.
We can run the following query;
select sum(salary) from employee
union
select sum(total_sales) from works_with;
This returns;
The fist figure is the total money spent on salaries and the second the sum total of sales.
JOIN
This used to combine two or more rows from different tables based on a related column between them.
select employee.emp_id, employee.first_name, employee.last_name, branch.branch_name
from employee
join branch
on employee.emp_id = branch.mgr_id;
The above code uses employee.emp_id, employee.first_name, employee.last_name, branch.branch_name format. This is to make our query understandable by attaching the table from which a column is selected to avoid confusion as we’re selecting columns from more than one table.
This returns;
From the above query we’re able to join employee and branch tables and get to know which branches employees work at. The query is selecting the employee id, first name and last name from employee table. This is then joined with the branch table where the employee id matches the manager id in the branch table returning the above image results. We’re basically finding all branches and their manager’s names.
There are different types of JOIN: RIGHT JOIN, LEFT JOIN and FULL OUTER JOIN.
LEFT JOIN includes all data in a column from the table referred to in the FROM statement.
select employee.emp_id, employee.first_name, employee.last_name, branch.branch_name
from employee
left join branch
on employee.emp_id = branch.mgr_id;
This returns the results below. We can see that the results include all info from the employee table captured in the SELECT statement.
RIGHT JOIN includes all the data from the other table not included in the FROM statement. From the same query above, let’s just replace LEFT with RIGHT.
select employee.emp_id, employee.first_name, employee.last_name, branch.branch_name
from employee
right join branch
on employee.emp_id = branch.mgr_id;
This returns the entire column branch_name from the branch table;
FULL OUTER JOIN includes info from both tables as it combines both LEFT and RIGHT JOINs.
In the next blog we’ll cover nested queries and much more.