Essential Concepts for PostgreSQL

Chapter 2: Building blocks, Keys, Indexes and DataTypes

·

13 min read

Essential Concepts for PostgreSQL
Play this article

This is Chapter 2 of the series PostgreSQL Distilled, where I teach only the most important parts of PostgreSQL to aspiring Fullstack Engineers.

Here is the Chapter 1: Getting Started with PostgreSQL

Understanding the Building Blocks: Tables, Rows, and Columns

When it comes to databases, especially in PostgreSQL, as we've begun exploring in Chapter 1, the basic building blocks are tables, rows, and columns.

To truly grasp these concepts, we'll break them down one by one.

Tables: The Framework of Data Storage

A table is a collection of related data organized within a database. Think of a table as a neatly structured grid, like a spreadsheet. It's a space where you can store information about a specific topic, such as employees in a company.

In our ongoing exploration, we're working with an employees table. This table is designed to hold all the information about employees in a particular organization.

An employee table inside PostgreSQL

Columns: Defining the Characteristics

Columns in a table define what kind of information will be stored. Each column has a unique name and a specific data type, which sets the rules for the data that can be stored in that column.

In the employees table, here are some of the columns:

  • id: The unique identifier for each employee.

  • first_name: The first name of the employee.

  • gender: The gender of the employee.

  • start_date: The date the employee started working.

CREATE TABLE employees (
  id VARCHAR(21) PRIMARY KEY,
  first_name VARCHAR(50),
  gender VARCHAR(10),
  start_date DATE,
  ...
);

These columns provide structure and enforce consistency within the data.

A column in a table

Rows: Individual Entries of Data

Rows are the individual records within a table. Each row contains a unique set of data, corresponding to the columns defined in the table.

Imagine each row in the employees table as different employees. The id column will contain that employee's unique ID, the first_name column will contain their first name, and so on.

For example, a row might look like this:

  • id: '12345'

  • first_name: 'John'

  • gender: 'Male'

  • start_date: '2022-08-17'

Types of Tables in PostgreSQL

Imagine a rapidly growing tech company, TechCorp, that needs to build a robust HR System to manage its expanding workforce.

The HR System must handle various aspects such as employee details, temporary analysis, fast data processing, historical data, and remote data access.

Let's see how different types of PostgreSQL tables can be utilized to meet these requirements.

1. Regular Tables: Storing Employee Details

Requirement: TechCorp needs to store detailed information about its employees, including their names, salaries, positions, and departments.

Solution: Regular tables can be used to store this structured data.

CREATE TABLE employees (
    id SERIAL PRIMARY KEY,
    name VARCHAR(50),
    salary DECIMAL(10, 2),
    position VARCHAR(30),
    department VARCHAR(30)
);

2. Temporary Tables: Conducting Temporary Analysis

Requirement: The HR team often conducts temporary analysis on employee performance and needs a place to store this transient data.

Solution: Temporary tables are perfect for this purpose.

CREATE TEMP TABLE temp_performance AS
SELECT * FROM employees WHERE performance_rating > 8;

3. Unlogged Tables: Fast Data Processing

Requirement: TechCorp's HR System requires real-time processing of attendance data, and speed is crucial.

Solution: Unlogged tables, which skip Write-Ahead Logging, can be used for fast data processing.

CREATE UNLOGGED TABLE attendance (
    id SERIAL PRIMARY KEY,
    employee_id INT,
    date DATE,
    status VARCHAR(10)
);

4. Partitioned Tables: Managing Historical Data

Requirement: The company wants to maintain historical records of employee promotions and salary changes.

Solution: Partitioned tables can be used to divide this large dataset into manageable pieces.

CREATE TABLE promotions (
    promotion_id SERIAL PRIMARY KEY,
    employee_id INT,
    promotion_date DATE,
    new_position VARCHAR(30)
) PARTITION BY RANGE (promotion_date);

CREATE TABLE promotions_2021 PARTITION OF promotions FOR VALUES FROM ('2021-01-01') TO ('2022-01-01');

5. Foreign Tables: Accessing Remote Data

Requirement: TechCorp has a subsidiary company, and the HR System needs to access employee data from the subsidiary's remote database.

Solution: Foreign tables enable this remote data access.

CREATE EXTENSION postgres_fdw;
CREATE SERVER remote_server
    FOREIGN DATA WRAPPER postgres_fdw
    OPTIONS (host 'remote-host', dbname 'remote-db');

CREATE FOREIGN TABLE remote_employees (
    id INT PRIMARY KEY,
    name VARCHAR(50)
) SERVER remote_server;

Summation

The real power of tables, rows, and columns comes when you understand how they interconnect. A table like employees is not just a random collection of data; it's a meticulously structured arrangement that allows for efficient querying, updating, and management of information.

If a database is a library, then tables are the shelves, columns are the different sections of each shelf, and rows are the individual books. Together, they allow the librarian (that's you, the database manager) to find exactly what's needed, quickly and efficiently.

As we continue to explore more complex concepts they are the foundation on which everything else is built.

Everything that we will learn about PostgreSQL will be around how to put and retrieve data from these tables.

Setup your DB for the Next Steps

Let’s set up our DB that we need to follow along:

Step 1: Create the Necessary Tables

Initiate your adventurous exploration by executing this specific SQL query in your PgAdmin Query Tool for the HR Department Database. This script will build the structural framework of our database, laying the groundwork for our real-world examples.

Link : SQL Query for Table Creation - Chapter 02

Step 2: Inject Life with Data

Import all the data from these folders to breathe life into our tables.

The information will mirror the intricacies of a real-world HR Department, enriching our learning experience.

You can use the PgAdmin tool to import the datasets as shown in the previous chapter or from here

Link: Mock Data for Chapter 02

A Simple Challenge

Don’t ignore this

You're most likely to stumble upon an obstacle while uploading the data. Fear not, for this challenge is by design.

Hint: Your path may be blocked by a foreign key CONSTRAINT. Examine the object explorer closely, and you'll find the key to unlock the door.

I want you to figure out the reason, analyze the obstacle, and triumph over it. In the world of databases, problems are puzzles, and solutions are achievements share on Twitter / X using #pgdistilled and tag me in your post.

Hint:

Link: Stack Overflow

Summation

By following these steps, you have not only set up a database but engaged in a journey filled with learning, exploration, and triumph. It's more than technical execution; it's a metaphor for the real-world challenges you'll conquer in your career to handle databases.

Now, with the database ready, our adventure in understanding PostgreSQL continues!

The Power of Primary Keys and Indexes

In the world of databases, primary keys and indexes are the most important ideas.

They are the linchpins holding the structure together, ensuring data integrity, and optimizing access paths.

Indexing is a major strategy for the performance optimisation of the database and we’ll cover that in the eighth chapter in greater detail.

Now, using our HR Department's database, we will delve into the power of these essential components.

Understanding Primary Keys

What is a Primary Key?

A primary key is a specific field or a combination of fields that uniquely identify a record within a table. It's like the unique employee ID in a corporation; no two employees can have the same ID.

The Importance of Primary Keys

  • Uniqueness: Ensures that no two records have the same primary key value.

  • Data Integrity: By maintaining uniqueness, primary keys prevent accidental duplication.

  • Relationships: Facilitates the creation of relationships between tables through foreign key constraints.

Example from Our HR Database:

In our employees table, the field employee_id is the primary key. It uniquely identifies each employee.

More keys in PostgreSQL

There are other keys in SQL that are important when you design large-scale databases.

  1. Candidate Key

  2. Super Key

  3. Foreign Key

  4. Composite Key

  5. Alternate Key

  6. Unique Key

You can use this below prompt in ChatGPT to learn about them

Prompt #1:

Roleplay a Senior Engineer who is an expert in PostgreSQL and verbose while explaining

With 10+ years of experience you are now asked to teach the basic of SQL to the new hires or less experienced colleagues.

You're teaching them about these topics

Primary Key
Candidate Key
Super Key
Foreign Key
Composite Key
Alternate Key
Unique Key

Can you explain them with real world examples?

Prompt #2:

Let's dive into super key/candidate key and create a dummy table and query to explain the idea

Indexes

Unindexed vs. Indexed Queries

Imagine you're in a vast library filled with shelves of data. You're searching for a specific piece of information, and it feels like searching for a needle in a haystack.

Let's take our employees table:

CREATE TABLE employees (
    employee_id integer,
    first_name varchar,
    last_name varchar,
    ...
);

When you run a query like:

SELECT first_name FROM employees WHERE employee_id = constant;

Without an index on the employee_id column, your database would be like a reader without a guide, having to scan every page (row) in the employees table to find the desired data. A monumental task, indeed!

With Index: The Guided Path

But what if we give our reader a map?

By creating an index on the employee_id column, we're building a pathway to the information, like a book's index guiding a reader to the right pages.

The search becomes:

CREATE INDEX employees_id_index ON employees (employee_id);

The name employees_id_index is like the title of the chapter – choose something that resonates with the purpose of the index.

With this, our database no longer roams through the employees table. It quickly navigates to the desired data, just like scanning an alphabetic index at the end of a book.

The Power of Index: Dynamic and Versatile

Indexes are like living pathways, evolving with your data. Once created, they automatically update with the table modifications, breathing alongside your data's rhythm.

Need to remove an index? Use the DROP INDEX command, like pruning a branch that's no longer needed.

And it's not just about searching. Indexes fuel UPDATE and DELETE commands, and they accelerate join searches too. They're the jet engines to your queries, propelling them to the right destinations at breakneck speed.

Remember to run the ANALYZE command regularly. It's like fine-tuning your navigation system, helping the query planner make well-informed decisions.

You'll learn more about this in Chapter 8.

Dig Deeper

  1. Now that you have a basic understanding of what it is, delve into the documentation to read even further.

Chapter 11. Indexes

You can explore all the strategies of indexing in PostgreSQL using this mindmap

Navigating PostgreSQL's Data Types and Their Use Cases

A glance of all the data types and their subtypes used in PostgreSQL

PostgreSQL Data Types

If you are now overwhelmed let's learn 20% of the most important data types that will help you achieve 80% of the use cases.

1. Numeric Types

a. Integer

  • Smallint: Used for small ranges, such as age, rating, etc.

  • Integer: Commonly used for IDs, quantities, counts, etc.

  • Bigint: Used for very large numbers, like population, views, etc.

b. Floating-Point

  • Real: Suitable for scientific calculations, measurements, etc.

  • Double Precision: Used for precise calculations, financial data, etc.

c. Arbitrary Precision

  • Numeric: Ideal for monetary values, exact arithmetic, etc.

2. Character Types

a. Character Varying (Varchar)

  • Used for names, addresses, emails, etc.

b. Text

  • Suitable for long texts, comments, descriptions, etc.

3. Date/Time Types

a. Timestamp

  • Used for recording events, logging, tracking changes, etc.

b. Date

  • Suitable for birthdates, anniversaries, deadlines, etc.

c. Time

  • Used for scheduling, opening hours, etc.

4. Boolean Type

  • Boolean: Ideal for flags, status (active/inactive), etc.

5. Binary Data Types

a. Bytea

  • Used for storing binary data like images, files, etc.

6. UUID Type

  • UUID: Suitable for unique identifiers across distributed systems, etc.

Example: Employee Table

Here's an example of an employee table that utilizes all the above data types:

CREATE TABLE employees (
    employee_id SERIAL PRIMARY KEY,
    first_name VARCHAR(50),
    last_name VARCHAR(50),
    age SMALLINT,
    salary NUMERIC(10, 2),
    is_manager BOOLEAN,
    hire_date DATE,
    last_login TIMESTAMP,
    profile_picture BYTEA,
    precise_rating DOUBLE PRECISION,
    unique_token UUID
);

Understanding the specific use cases for each data type helps in designing efficient and meaningful database schemas.

By selecting the appropriate data types for each field, developers can ensure data integrity, optimize storage, and enhance query performance.

Whether you're designing a simple application or a complex system, these core PostgreSQL data types are the building blocks that will support your data needs.

Dig Deeper

Official Docs: https://www.postgresql.org/docs/current/datatype.html

Practising SQL Basics: SELECT, INSERT, UPDATE, DELETE

SELECT, INSERT, UPDATE, and DELETE: Let's explore these keywords and then provide some practice questions to hone your skills.

1. SELECT

The SELECT statement is used to query data from one or more tables in the database. You can specify the columns you want to retrieve and apply various conditions to filter the results.

Example:

SELECT first_name, last_name FROM employees WHERE work_type = "Remote";

2. INSERT

The INSERT statement is used to add new rows to a table. You can specify the values for each column or use a subquery to insert data from another table.

Example:

INSERT INTO employees (first_name, last_name, age) VALUES ('John', 'Doe', 25);

3. UPDATE

The UPDATE statement is used to modify existing data in a table. You can change the values of one or more columns based on specific conditions.

Example:

UPDATE compensation SET salary = salary * 1.10 WHERE bonus > 100;

4. DELETE

The DELETE statement is used to remove rows from a table. You can delete all rows or apply conditions to delete specific records.

Example:

DELETE FROM employees WHERE hire_data < 2015-12-12;

Practice Questions

  1. Easy: Write a SELECT query to retrieve all the employee names and ages from the "employees" table.

  2. Intermediate: Write an INSERT query to add a new employee to the "employees" table with the following details: First Name - "Jane", Last Name - "Smith", Age - 28.

  3. Intermediate: Write an UPDATE query to increase the salary of all employees who are managers (is_manager = TRUE) by 5%.

  4. Advanced: Write a DELETE query to remove all employees who have not logged in since the year 2021.

  5. Advanced: Write a SELECT query to retrieve the first name, last name, and salary of the top 5 highest-paid employees, ordered by salary in descending order.

Summation

Understanding and practising the basic SQL operations of SELECT, INSERT, UPDATE, and DELETE is foundational for anyone working with relational databases.

These commands allow you to retrieve, add, modify, and delete data, providing the flexibility to manage and analyze information effectively.

Don’t skip unless you want half-baked knowledge for your career.

TLDR;

Pheww!! Congratulations on making it this far. Here's a quick look at what we have learnt

Here's a concise summary for those looking to get the core insights quickly:

  1. The Building Blocks: Tables, rows, and columns are the foundation of PostgreSQL databases. They align together to store and organize data effectively.

  2. Setting up your Database: Here we set up your entire database to follow along with the advanced concepts that are coming in the future blogs.

  3. Primary Keys & Indexes: They are the guiding stars and the fast tracks of the database. Primary keys uniquely identify records, while indexes speed up searches, just like an index in a book.

  4. Data Types: PostgreSQL's various data types are the colours of our data canvas. From integers to JSON, each type serves a unique purpose, helping to shape the data landscape.

  5. SQL Basics: The brush strokes of database management. SELECT retrieves data, INSERT adds it, UPDATE modifies, and DELETE removes. Mastery of these commands empowers control over the database.

What's next

I am writing this series on PostgreSQL to teach frontend engineers full-stack engineering.

The next chapter is Chapter 3: The way of PostgreSQL

So, make sure to follow me on X/ Twitter sagar_codes to learn full-stack engineering.

Did you find this article valuable?

Support Sagar by becoming a sponsor. Any amount is appreciated!