Hello, SQL!

SQL, or Structured Query Language, is the industry-standard language for working with an relational database management system (RDBMS), including Oracle. It is not the database itself but a language used to design relational data models, manage database objects, and perform CRUD operations (Create, Read, Update, Delete) on data tables, within an RDBMS. SQL commands can be run independently or invoked by applications during runtime. Many popular programming languages, such as Java or Python, can embed SQL statements and issue them to interact with database.

Essential Syntax of SQL

Let's get started with some "Hello, World!" example. In the figure shown below, an SQL query is submitted, and a 1x1 table is returned. Because the statement starts with the keyword SELECT, it retrieves data from a specified table[1] and is called the SELECT statement. Likewise, every SQL statement begins with a specific keyword that declares what to do.


SQL statements are composed of clauses, each beginning with its own keyword. In this example, the SELECT statement includes two clauses: SELECT and FROM. The SELECT clause specifies the columns in the resulting dataset; in this case, it contains a single dummy column with a row containing the string 'Hello, SQL!'. 

The FROM clause defines the tables from which the columns listed in the SELECT clause are retrieved. This can involve a single table or view, or multiple tables joined together. The SELECT and FROM clauses are two mandatory components of a SELECT statement. Additional clauses, such as WHERE, GROUP BY, HAVING, and ORDER BY, can further extend the statement to refine the result set.

SQL syntax is case-insensitive, meaning the query, select 'Hello, SQL!' from dual;, works just like its uppercase equivalents. However, it is conventional to write keywords in uppercase and other elements, such as column or table names, in lowercase. This practice improves readability by clearly distinguishing keywords from identifiers.

In the example, notice that the statement is closed by a semicolon. Every SQL statement must end with a semicolon (;). Omitting semicolons is a common error, even experienced SQL developers often commit. Luckily, when you run a batch of SQL statements, Oracle does not raise an error for the very last closing semicolon. However, it is considered good practice to include semicolons no matter which statement in your script, as it helps avoiding potential errors and improves readability of your code.

Adding Comments

Comments are lines of code that are ignored by the computer but are intended for human readers. In an SQL script, and code scripts written in any other programming languages, comments act as explanatory notes that provide clear explanation about program's logic. In particular, in a team environment, adding comments is an essential tool for effective communication.

In Oracle SQL, there are two ways to add comments. One way is to use two consecutive dashes (--) at the beginning of the text; everything placed after the dashes within a line will be treated as a comment and ignored by Oracle. Another way to add comments is by enclosing comment lines between /* and */, which serve as opening and closing delimiters of a comment block. A comment block can span multiple lines and everything between /* and */ will be considered as a comment. Here's an example demonstrating both:


Lexical Components of SQL Statements

An SQL statement is consists of four lexical components: literals, operators, reserved words, and identifiers.

Literals

Literals are raw values that are directly provided by developers in the SQL scripts. They can be numbers, strings, or NULL:

  • Number Literals: Numerical values, such as 5, -20, or 3.14
  • String Literals: Values that are enclosed in single quotes (e.g., 'Hello', 'John Doe'). String literals are case-sensitive. So, for example, 'hello' and 'Hello' are treated differently.
    • Datetime Strings: These are special kinds of strings that represent time-related values. Unlike other character strings, they can be added or subtracted to each other. To be recognized as a datetime value, a string must be in a specific formats defined by nls_date_format (e.g., 'YYYY-MM-DD', 'YYYY-MM-DD HH:MM:SS')
  • NULL: Literal that is specific to Oracle's SQL dialect, used to represent unknown values. It differs from numeric zero or an empty string and requires the keywords IS and IS NOT for comparison.
    • Example: SELECT * FROM hr.employees WHERE commission_pct IS NULL;

Literals are used to represent constant values in SQL queries, and they are usually found in conditions, expressions, or as a part of data being inserted or updated to the database.

Identifiers

Identifier are names that are assigned to database objects. When create a new database object, you must adhere to the following naming rules:

  • Length: Names must be between 1 and 30 characters long.
  • First Character: The first character must be a letter.
  • Subsequent Characters: Allowed characters include letters, numbers, $, _, and #. No other special characters may be used.
  • Case Sensitivity: Names are not case-sensitive.
  • Reserved Words: Reserved words cannot be used as identifiers.

To override the rules described above, you could use double quotations. Quoted names differ from non-quoted names in several ways: they can include any characters, have to be double quoted for retrieval, and are case-sensitive. However, Oracle strongly discourages overriding the naming rules, as it may lead to compatibility issues; not all database tools and applications support quoted identifiers.

Operators

Operators are special symbols that perform specific actions on the values (or sets of values) placed on left and right sides (called operands), and produce a result. An operator, along with its operands, forms an expression that can filter data, perform calculations, or combine multiple conditions.

  • Arithmetic Operators
    • Used to perform mathematical operations on numeric values.
    • Examples: + (addition), - (subtraction), * (multiplication), / (division), % (modulo).
  • Comparison Operators
    • Used to compare two values and return true, false, or null.
    • Examples: = (equal to), != (not equal to), > (greater than), < (less than), >= (greater than or equal to), <= (less than or equal to).
  • Logical Operators
    • Combine multiple conditions, often placed in WHERE clauses, to form complex queries.
    • Examples: AND (both conditions are true), OR (at least one condition is true), NOT (reverse the result of a condition).
  • Set Operators
    • Used to combine the result sets of two or more SELECT statements.
    • Examples: UNION (combines results from two queries and removes duplicates), UNION ALL (combines results from two queries and retains duplicates), INTERSECT (returns only rows found in both queries), MINUS (returns only rows from the first query not in the second).
  • Other Common Operators
    • IN: Checks if a values exists in a list of values.
    • BETWEEN a AND b: Checks if a value falls within a range given by a and b (inclusive).
    • LIKE: Checks if a string matches a specified pattern using wild cards (% for multiple characters, _ for a single character).
    • IS NULL: Checks if a value is NULL.

Reserved Words

Reserved words are predefined terms in SQL that carry specific meanings and serve syntactic roles. They are integral to the language's structure and are used to enable operations such as data definition, transaction control, and database access management. Due to their predefined roles in SQL syntax, reserved words cannot be used as identifiers-such as table names, column names, or user names-unless enclosed in double quotes. However, overring naming rules in this way is strongly discouraged, as it can lead to reduced readability, portability issues, and potential conflicts across different SQL dialects.

In Oracle, the V$RESERVED_WORDS view contains the full list of the reserved words for the current version. So, you can query the view to check whether a particular word is reserved like this:


Types of SQL Statements

SQL statements can be broadly categorized into Data Definition Language (DDL), Data Manipulation Language (DML), and Transaction Control Language (TCL). 

DDL Statements[2]: Used to create and manage database objects, such as tables, views, and users.

  • CREATE: Creates a user, table, view, index, synonym or other objects in the database.
  • ALTER: Used to modify the structure, name, or other attributes of an existing database object[3].
  • DROP: Removes a database object from the database that has already been created with the CREATE statement.
  • RENAME: Changes the name of an existing database object.
  • TRUNCATE: Removes all records from an existing table. 
  • GRANT: Provides privileges to user objects and enables them to perform specific tasks in the database.
  • REVOKE: Removes privileges that have been issued with the GRANT statement.
  • FLASHBACK: Restores an earlier version of a table or database.
  • PURGE: Irrevocably removes database objects from the recycle bin.
  • COMMENT: Add comments to the data dictionary for an existing database object.

DML Statements: Retrieve and manipulate data records through tables and views. 

  • SELECT: Retrieves data from a table or view
  • INSERT: Adds data records to a database table, either directly or through a view.
  • UPDATE: Modifies existing data records in a table, either directly or through a view.
  • DELETE: Removes existing data records in a table, either directly or through a view. 
  • MERGE: Performs a combinations of INSERT, UPDATE, and DELETE statements in a single statement, depending on some condition.

TCL Statements: Manage transactions, ensuring data integrity and avoiding data anomalies.

  • COMMIT: Saves a set of DML modifications performed in the current database session.
  • ROLLBACK: Undoes a set of DML modifications performed during the current database session.
  • SAVEPOINT: Marks a point in a session to prepare for a future ROLLBACK to enable ROLLBACK to restore data at the selected point in a session

Key Characteristics of SQL

SQL is a foundational tool for managing and analyzing data, widely regarded as the industry-standard for the purpose. Its broad adoption stem from several defining characteristics that make it exceptionally powerful for tasks such as querying, updating, and organizing data. 

Set-Based Language

SQL operates on entire sets of data simultaneously, rather than processing data row by row. For example, consider an SQL query like:

SELECT salary * (1 + commission_pct) AS total_salary
FROM employees
WHERE EXTRACT(YEAR FROM hire_date) = 2018;

This query returns a table containing a column with total salary of each employees whose hire dates is the year of 2018. When Oracle handles this query, it follows these steps:

  1. Identify the Data Table: Oracle identifies the employees table and temporarily holds all its rows in the result set.
  2. Apply the WHERE Clause: Using the EXTRACT(YEAR FROM hire_date) = 2018 condition, Oracle filters out rows in the result set where hire_date column value does not fall in the year 2018. Only rows matching the condition remain in the result set..
  3. Perform Column Operation: Apply the formula salary * (1 + commission_pct) to all rows in the result set.
  4. Return the Result: Finally, Oracle returns the processed result set as a table with a single column, total_salary, containing the calculated values for all filtered rows.

In SQL's set-based paradigm, the database engine processes the dataset as a whole and applies filtering and calculations to the entire set simultaneously. This behavior is very different from procedural approach, where an explicit iteration over rows is required to achieve the same result. For example, using Python, similar work could be done as follows:

# Example dataset: List of employee dictionaries
employees = [
{'hire_date': '2018-01-15', 'salary': 5000, 'commission_pct': 0.1},
{'hire_date': '2017-06-20', 'salary': 6000, 'commission_pct': 0.15},
{'hire_date': '2018-08-03', 'salary': 7000, 'commission_pct': 0.2},
{'hire_date': '2016-12-11', 'salary': 8000, 'commission_pct': 0.25}
]

# Step 1: Filter employees hired in 2018
filtered_employees = []
for employee in employees:
if employee['hire_date'].startswith('2018'):
filtered_employees.append(employee)

# Step 2: Calculate the average salary
total_salaries = []
for employee in filtered_employees:
total_salary = employee['salary'] * (1 + employee['commission_pct'])
total_salaries.append(total_salary)

# Output the results
for total_salary in total_salaries:
print(f"Total Salary: {total_salary:.2f}")

Declarative Language

SQL is also a declarative language, meaning it specifies what we want to achieve without detailing how the database should go about performing the task. For example, let's consider another example query as follows:

SELECT
dp.department_id AS dept_id,
dp.department_name AS dept_name,
ROUND(AVG(ee.salary),2) AS avg_salary
FROM
employees ee
JOIN departments dp ON ee.department_id = dp.department_id
GROUP BY
dp.department_id,
dp.department_name;

In this query:

  • The SELECT clause specifies desired result: a table with three columns--dept_id, containing unique identifier for each department, dept_name containing the name of the department, and avg_salary, containing the average salary of the department.
  • The FROM and JOIN clause declare the tables involved (employees and departments) and condition linking them (ee.department_id = dp.department_id).
  • GROUP BY clause groups the data by department ID and name, enabling the calculation of average salaries per department.

The query does not specify how the database should retrieve or combine the data, such as which indexes to use, the order of operations, or the join algorithm. These details are abstracted, and the database engine determines the optimal execution plan to retrieve and process the data efficiently.

On the other hand, in a procedural approach, the task would involve explicitly iterating through the data to calculate the average salary for each department. Here's an example of how this could be done in Python:

# Example dataset: List of employee and department dictionaries
employees = [
{'employee_id': 1, 'department_id': 101, 'salary': 5000},
{'employee_id': 2, 'department_id': 102, 'salary': 6000},
{'employee_id': 3, 'department_id': 101, 'salary': 5500},
{'employee_id': 4, 'department_id': 103, 'salary': 7000}
]

departments = [
{'department_id': 101, 'department_name': 'HR'},
{'department_id': 102, 'department_name': 'Finance'},
{'department_id': 103, 'department_name': 'Engineering'}
]

# Step 1: Join employees with departments based on department_id
joined_data = []
for employee in employees:
for department in departments:
if employee['department_id'] == department['department_id']:
joined_data.append({
'dept_id': department['department_id'],
'dept_name': department['department_name'],
'salary': employee['salary']
})

# Step 2: Group data by department_id and calculate average salary
department_totals = {}
for record in joined_data:
dept_id = record['dept_id']
if dept_id not in department_totals:
department_totals[dept_id] = {'dept_name': record['dept_name'], 'total_salary': 0, 'count': 0}
department_totals[dept_id]['total_salary'] += record['salary']
department_totals[dept_id]['count'] += 1

# Step 3: Calculate the average salary for each department
result = []
for dept_id, data in department_totals.items():
avg_salary = round(data['total_salary'] / data['count'], 2)
result.append({'dept_id': dept_id, 'dept_name': data['dept_name'], 'avg_salary': avg_salary})

# Output the result
for row in result:
print(row)

In this example, notice the explicit iteration through the dataset, with each operation--joining, grouping, and calculating--carried out step by step. The developer is responsible for implementing all the data processing logic, resulting in longer and more complex code.

In contrast, SQL abstracts these operations into a concise declarative statement, delegating the implementation of the business logic to the database engine. Developers only need to specify what they want, rather than how to achieve it. This abstraction makes SQL particularly well-suited for efficiently handling large datasets, enabling simpler, more efficient, and more maintainable solutions compared to the verbose manual nature of procedural approaches.


[1] In this example, observe that a dummy table DUAL is used as a placeholder. This table is specific to Oracle's SQL dialect and fulfills the syntax requirement of including a table name in the FROM clause. This table always returns a table with the calculation results and is commonly used for testing SQL syntax and functions.  
[2] Note that these keywords are not really statements or commands by themselves, but become one when combined with other reserved keywords. For example, CREATE TABLE or CREATE SEQUENCE is an SQL statement. In practice, however, many professionals call them CREATE statement. It is also worth noting that the term command and statement are used interchangeably by Oracle's official documentation. Either is fine!  
[3] With exceptions for SESSION and SYSTEM. ALTER SESSION changes settings for the current session, lasting only until the connection ends. ALTER SYSTEM modifies system-wide parameters, effective until the database restarts. Technically, neither ALTER SESSION nor ALTER SYSTEM is a DDL statement; they fall under a different category.  

Post a Comment

0 Comments