A database is an organized collection of related data stored electronically in a computer system.
It is designed to store, retrieve, and manage data efficiently.
Example:
Student database containing Roll No, Name, Course, Marks.
A Database Management System (DBMS) is a software system that allows users to define, create, store, retrieve, update, and control access to a database.
Examples: MySQL, Oracle, SQL Server, PostgreSQL.
A database system consists of:
-
Database
-
DBMS software
-
Application programs
-
Users
The database approach has the following important characteristics:
A DBMS contains not only the data but also the description of the data, known as metadata.
-
Metadata includes:
-
Data type
-
Size
-
Structure
-
Constraints
-
This information is stored in a system catalog (data dictionary).
👉 Hence, the database is self-describing.
In DBMS, application programs are independent of data storage structure.
-
Changes in data structure do not require changes in application programs.
-
Achieved through:
-
Logical data independence
-
Physical data independence
-
👉 This makes the system flexible and easy to maintain.
DBMS hides complex internal details from users and provides different levels of abstraction.
-
Physical Level – How data is stored internally
-
Logical Level – What data is stored and relationships
-
View Level – User-specific view of data
👉 Data abstraction simplifies database usage.
A DBMS allows different users to have different views of the same database.
-
Each user sees only the data relevant to them.
-
Improves security and usability.
Example:
-
Student sees marks
-
Admin sees all student records
A DBMS allows multiple users to access the database simultaneously.
-
Supports:
-
Concurrency control
-
Transaction management
-
-
Prevents data inconsistency.
👉 Enables efficient data sharing in organizations.
A file system stores data in separate files managed by the operating system, whereas a Database Management System (DBMS) stores data in an integrated database and provides controlled access to it.
| Basis of Comparison | File System | Database System (DBMS) |
|---|---|---|
| Data Redundancy | High data redundancy due to duplicate files | Minimal redundancy due to centralized database |
| Data Consistency | Data inconsistency may occur | Data consistency is maintained |
| Data Sharing | Data sharing is difficult | Data sharing is easy among multiple users |
| Security | Less security, handled by OS | High security using authorization and access control |
| Data Integrity | Integrity constraints are difficult to enforce | Integrity constraints are easily enforced |
| Backup & Recovery | Manual and difficult | Automatic backup and recovery mechanisms |
| Data Independence | No data independence | Logical and physical data independence supported |
| Concurrency Control | Not supported | Supported using transaction management |
| Query Processing | No query language support | SQL support for efficient querying |
| Maintenance Cost | High maintenance cost | Lower maintenance cost in long term |
A DBMS is better than a file system because it reduces redundancy, maintains consistency, ensures data security and integrity, supports multi-user access, provides backup and recovery, and offers data independence, making data management more efficient and reliable.
A Database Management System (DBMS) provides a systematic way to store, manage, and retrieve data efficiently. It offers several advantages over traditional file systems, but also has some limitations.
-
Data is stored in a centralized database.
-
Duplicate data is minimized.
👉 This saves storage space.
-
Since data is not duplicated, updates are reflected everywhere.
-
Prevents conflicting data values.
-
Multiple users can access the same database simultaneously.
-
Supports multi-user environment.
-
DBMS provides authorization and authentication.
-
Only authorized users can access data.
-
Automatic backup and recovery mechanisms are available.
-
Protects data from system failure or crashes.
-
Changes in data structure do not affect application programs.
-
Supports:
-
Logical data independence
-
Physical data independence
-
-
DBMS software is expensive.
-
Requires costly hardware and maintenance.
-
DBMS is complex to design and manage.
-
Requires careful configuration.
-
For small applications, DBMS may be slower than file systems.
-
Extra processing for security and concurrency.
-
Requires trained database administrators (DBA).
-
Users need technical knowledge.
A DBMS hides the complex details of data storage and provides users with a simplified view of data. This is achieved through data abstraction and data independence.
Data abstraction is the process of hiding internal implementation details of data and showing only the relevant information to the user.
A DBMS provides three levels of abstraction:
-
Lowest level of abstraction.
-
Describes how data is stored on disk.
-
Includes file structures, indexes, and storage details.
👉 Used by database administrators.
-
Middle level of abstraction.
-
Describes what data is stored and the relationships among data.
-
Does not include storage details.
👉 Used by database designers.
-
Highest level of abstraction.
-
Shows only a part of the database to the user.
-
Different users can have different views.
👉 Improves security and simplicity.
Data independence is the ability to change the database schema at one level without affecting the schema at the next higher level.
-
Ability to change the logical schema without changing the view level.
-
Examples:
-
Adding a new field
-
Adding a new table
-
👉 Harder to achieve.
-
Ability to change the physical schema without changing the logical schema.
-
Examples:
-
Changing file organization
-
Adding indexes
-
👉 Easier to achieve.
| Data Abstraction | Data Independence |
|---|---|
| Hides complexity of data | Allows schema changes without affecting programs |
| Concerned with levels | Concerned with schema changes |
| Improves usability | Improves flexibility |
Data abstraction simplifies database usage by hiding internal details, while data independence allows changes in database structure without affecting applications, making DBMS flexible and scalable.
DBMS architecture defines the logical design of a database system that determines how data is stored, accessed, and managed by users.
In 1-tier architecture, the database, DBMS, and application all reside on the same system.
User
|
Application + DBMS + Database
-
Used for local applications
-
No network involved
MS Access on a single computer
In 2-tier architecture, the client communicates directly with the database server.
Client (Application)
|
DBMS Server
|
Database
-
Client handles presentation and logic
-
Server handles database processing
Client–server applications using MySQL
In 3-tier architecture, the application is divided into three layers: presentation, application, and database.
Client (Presentation Layer)
|
Application Server (Business Logic)
|
Database Server (DBMS + Database)
-
Client sends request to application server
-
Application server processes logic
-
Database server stores data
- Database is not directly accessible to users
- Easy to add more users and servers
- Load is distributed across layers
- Changes in one layer do not affect others
- Failure in one layer does not crash entire system
Among all architectures, three-tier architecture is most widely used in modern DBMS applications due to its security, scalability, and maintainability.
A database system is accessed and managed by different types of users, and its overall control is handled by a Database Administrator (DBA).
-
Use database through predefined applications.
-
Do not have knowledge of DBMS.
Example: Bank customers using ATM.
-
Develop application programs to access database.
-
Use programming languages like Java, C++, and SQL.
Example: Software developers.
-
Directly interact with the database using SQL queries.
-
Have good knowledge of DBMS.
Example: Data analysts.
A Database Administrator (DBA) is a person responsible for the overall management, control, and maintenance of the database system.
-
Defines database structure using DDL.
-
Creates tables, views, and constraints.
-
Controls user access using authorization.
-
Prevents unauthorized access.
-
Takes regular database backups.
-
Restores data after system failure.
-
Optimizes queries and indexing.
-
Improves system efficiency.
-
Ensures data accuracy and consistency.
-
Enforces integrity constraints.
Database users interact with the DBMS in different ways, while the DBA plays a critical role in ensuring security, performance, and reliability of the database system.
⭐⭐⭐⭐ (10 Marks)
The Entity–Relationship (E-R) Model is a conceptual data model used to represent the structure of a database using entities, attributes, and relationships.
An entity is a real-world object that has an independent existence and can be uniquely identified.
Example: Student, Employee, Course
An attribute describes a property of an entity.
Example:
Student → Roll_No, Name, Age
A relationship represents an association between two or more entities.
Example:
Student enrolls in Course
A strong entity:
-
Has a primary key
-
Exists independently
Example:
Student (Roll_No is primary key)
A weak entity:
-
Does not have a primary key
-
Depends on a strong entity for identification
Dependent depends on Employee
-
Employee (Emp_ID) → Strong entity
-
Dependent (Name, Age) → Weak entity
Dependent is identified using Emp_ID + Name
| Strong Entity | Weak Entity |
|---|---|
| Has primary key | No primary key |
| Exists independently | Depends on strong entity |
| Represented by single rectangle | Represented by double rectangle |
The E-R model provides a clear and simple way to design databases by representing real-world data using entities, attributes, and relationships.
⭐⭐⭐ (Short Answer / 5–7 Marks)
Data modeling is the process of creating a conceptual representation of data and its relationships before designing a database.
-
High-level design using E-R diagrams
-
Independent of DBMS
-
Converts conceptual model into relational schema
-
Defines tables, keys, and relationships
-
Defines storage structure
-
Indexes, file organization, and memory allocation
-
Improves data understanding
-
Reduces redundancy
-
Ensures data consistency
-
Saves development time
Data modeling helps in designing a well-structured, efficient, and scalable database system.
⭐⭐ (Short Answer)
Database design is the process of organizing data structure by defining tables, attributes, relationships, and constraints to efficiently store and retrieve data.
-
Minimize data redundancy
-
Ensure data consistency and integrity
-
Improve data security
-
Efficient data retrieval
-
Easy maintenance and scalability
⭐⭐
-
Simple and easy to understand
-
Graphical representation of data
-
Helps in database planning
-
Improves communication between users and designers
-
Independent of DBMS
-
Not suitable for very complex systems
-
No standard notation universally followed
-
Cannot represent procedural logic
-
Large diagrams become confusing
-
Needs conversion to relational model
The Relational Model is a widely used logical database model in which data is stored in tables (relations) consisting of rows and columns.
It was proposed by E.F. Codd in 1970.
A relation is a table with rows and columns.
-
Rows → Represent individual records (tuples)
-
Columns → Represent attributes of data
Example: Student Table
| Roll_No | Name | Age | Course |
|---|---|---|---|
| 101 | Rohan | 20 | BTech |
| 102 | Anjali | 21 | BSc |
-
A tuple is a row in a relation.
-
Represents a single record of the table.
Example: (101, Rohan, 20, BTech)
-
An attribute is a column in a relation.
-
Represents a property of the entity.
Example: Roll_No, Name, Age, Course
- A domain is a set of permissible values for an attribute.
Example: Age → {18, 19, 20, 21…}
Course → {BTech, BSc, BCom}
-
A relation schema defines the name of the relation and its attributes.
-
Example: Student(Roll_No, Name, Age, Course)
| Term | Meaning | Example |
|---|---|---|
| Degree | Number of attributes in a relation | Student table → 4 |
| Cardinality | Number of tuples (rows) in a relation | Student table → 2 |
-
Uniquely identifies each tuple in a relation
-
Cannot have NULL values
Example: Roll_No in Student table
-
Set of attributes that can uniquely identify a tuple
-
One candidate key is chosen as primary key
Example: Roll_No, Email (both can uniquely identify Student)
-
Attribute in one table that refers to primary key of another table
-
Ensures referential integrity
Example:
Enrollment(Student_Roll_No → Student.Roll_No)
-
One or more attributes that can uniquely identify a tuple
-
May include extra attributes
Example: (Roll_No), (Roll_No, Name)
The relational model provides a simple, structured, and mathematical approach to store and manage data.
Keys ensure uniqueness and relationships, making the database reliable and consistent.
Codd’s Rules are a set of 13 rules proposed by E.F. Codd to define a fully relational database system.
They ensure data integrity, consistency, and reliability in relational databases.
- The system must qualify as a relational database and support relational capabilities fully.
-
All data is represented as values in tables (relations).
-
Everything, including metadata, must be stored in tables.
-
Every value in the database can be accessed by specifying table name, primary key, and column name.
-
Ensures no hidden data.
-
The DBMS must support NULLs for missing or inapplicable data.
-
NULLs are handled consistently in all operations.
- Changes in physical storage (e.g., indexing, file structure) do not affect application programs.
- Changes in logical schema (e.g., adding attributes or tables) do not affect user views or applications.
-
Integrity constraints (e.g., primary key, foreign key) should be stored in the catalog and not in application programs.
-
Allows constraints to be modified without affecting programs.
-
Low-level access (like pointers or record-level operations) cannot bypass integrity rules.
-
Ensures all database rules are enforced.
-
All views that are theoretically updatable must be updatable by the system.
-
Guarantees data consistency.
-
The database must contain a catalog (data dictionary) that describes all the database objects in relational terms.
-
Users can query the catalog using the same relational language (SQL).
Codd’s rules define a fully relational DBMS.
They ensure data consistency, integrity, independence, and accessibility, forming the foundation for modern relational databases.
-
Simplicity
- Data is stored in tables (relations), making it easy to understand and use.
-
Structural Independence
- Changes in table structure do not affect applications if keys and relationships are maintained.
-
Easy Data Manipulation
- Supports powerful query languages like SQL for inserting, updating, deleting, and retrieving data.
-
Data Integrity
- Enforces primary key, foreign key, and other constraints to maintain accuracy and consistency.
-
Flexibility
- Easily accommodates new data and changing relationships without redesigning the database.
-
Security
- Access can be controlled at table or column level to prevent unauthorized access.
-
Data Consistency
- Minimizes redundancy and ensures all data is consistent across tables.
-
Multi-user Support
- Multiple users can simultaneously access and update the database safely using transaction management.
Mapping an E-R model to a relational model is the process of converting entities, attributes, and relationships into tables (relations) in a database.
This is an essential step in database design.
-
Each strong entity becomes a relation (table).
-
Attributes of the entity become columns.
-
Primary key of the entity becomes the primary key of the table.
Example:
Entity: Student(Roll_No, Name, Age) → Table: Student(Roll_No PK, Name, Age)
-
Weak entity becomes a relation.
-
Primary key of the strong entity (owner) is included as foreign key.
-
Combination of weak entity’s partial key + owner key becomes the primary key.
Example:
Dependent(Name, Age) → Table: Dependent(Emp_ID FK, Name, Age, PK = (Emp_ID, Name))
-
Include primary key of one entity as a foreign key in the other entity.
-
Choose the side that frequently accesses the relationship.
Example:
Employee ↔ Passport (1:1)
Table: Employee(Emp_ID PK, Name, Passport_No FK)
- Include primary key of “one” side as foreign key in “many” side.
Example:
Department(Dept_ID PK, Name)
Employee(Emp_ID PK, Name, Dept_ID FK)
-
Create a separate relation for the relationship.
-
Include primary keys of both entities as foreign keys.
-
Combined keys become primary key of the relationship table.
Example:
Student ↔ Course (M:N)
Enrollment(Student_Roll_No FK, Course_ID FK, PK = (Student_Roll_No, Course_ID))
- Directly becomes a column in the table.
Example: Name → Name
- Break into component attributes and add each as a column.
Example: Full_Name(First, Last) → First_Name, Last_Name
- Create a separate table with foreign key referencing the main entity.
Example:
Phone_Numbers → Table: Student_Phone(Roll_No FK, Phone_No, PK=(Roll_No, Phone_No))
Mapping E-R model to relational model ensures that the conceptual design is correctly implemented in the relational database, maintaining keys, relationships, and constraints.
Data integrity refers to the accuracy, consistency, and correctness of data in a database over its entire lifecycle.
It ensures that the database remains reliable and error-free.
-
Ensures that values stored in a column are valid and within a specified domain.
-
Implemented using data types, constraints, and validation rules.
Example:
-
Age attribute in Student table must be between 18 and 30.
-
Course can only be {BTech, BSc, BCom}
-
Ensures that each row (tuple) in a table is uniquely identifiable.
-
Achieved by primary key constraint.
-
Primary key cannot be NULL.
Example:
- Roll_No in Student table is a primary key → cannot be NULL and must be unique
-
Ensures that foreign key values in one table must match primary key values in another table.
-
Maintains consistency among related tables.
Example:
-
Employee.Department_ID in Employee table must exist in Department.Dept_ID
-
Prevents orphan records
Data integrity constraints ensure that the database remains accurate, consistent, and reliable, which is crucial for relational database management systems.
Data manipulation refers to the process of adding, deleting, modifying, or retrieving data in a relational database.
It is performed using Data Manipulation Language (DML), but here we focus on the concepts, not SQL syntax.
-
Used to add new tuples (rows) to a relation (table).
-
Ensures that new data obeys integrity constraints.
Example:
Adding a new student record in Student table.
-
Used to remove existing tuples from a relation.
-
Must ensure referential integrity is not violated.
Example:
Removing a student who has left the college.
-
Used to modify existing data in one or more tuples.
-
Must maintain data integrity and consistency.
Example:
Changing a student’s course from BSc to BTech.
-
Refers to retrieving specific tuples from a table based on conditions.
-
Helps in querying only relevant data.
Example:
Select all students with Age > 20 from the Student table.
Data manipulation operations allow users to manage and maintain data efficiently in a relational database while ensuring integrity, consistency, and relevance of information.
SQL is a standard computer language used for managing and manipulating relational databases.
SQL is a high-level language that allows users to define, manipulate, control, and query data in a relational database.
-
To create and modify database structures
-
To insert, update, delete, and retrieve data
-
To control access and security of data
-
To manage integrity and consistency of the database
-
Used to define and modify database structures
-
Commands: CREATE, ALTER, DROP
Example: CREATE TABLE Student(...)
-
Used to manipulate data stored in tables
-
Commands: INSERT, UPDATE, DELETE, SELECT
Example: INSERT INTO Student VALUES(...)
-
Used to control access and permissions
-
Commands: GRANT, REVOKE
Example: GRANT SELECT ON Student TO User1
-
Used to manage transactions
-
Commands: COMMIT, ROLLBACK, SAVEPOINT
SQL provides a comprehensive set of commands for creating, manipulating, controlling, and querying databases, making it the core language for relational database management.
DDL (Data Definition Language) is used to define and manage database structures, such as tables, schemas, and indexes.
It focuses on the structure of the database, not the data itself.
-
Purpose: To create new database objects like tables, views, or databases.
-
Use: Defines the structure of tables and columns.
-
Example: Creating a table named
Student.
-
Purpose: To delete existing database objects permanently.
-
Use: Removes the table or database along with its structure and data.
-
Example: Dropping the
Studenttable.
-
Purpose: To modify the structure of an existing table.
-
Use: Can add, delete, or modify columns and constraints.
-
Example: Adding a new column
Courseto theStudenttable.
-
Purpose: To delete all data from a table while keeping its structure intact.
-
Use: Quickly removes all rows without logging individual deletions.
-
Example: Removing all records from the
Studenttable.
DML (Data Manipulation Language) is used to manipulate data stored in database tables.
It allows users to insert, update, delete, and retrieve data, but does not affect the table structure.
-
Purpose: To retrieve data from one or more tables.
-
Optional Clauses:
-
WHERE: Selects rows based on a condition.
Example: Select students with Age > 20. -
DISTINCT: Retrieves unique values only.
Example: Select unique courses from Student table. -
ORDER BY: Sorts the result ascending or descending.
Example: Order students by Name in ascending order.
-
-
Purpose: To add new rows (tuples) into a table.
-
Use: Adds new records while maintaining integrity constraints.
-
Example: Adding a new student to the Student table.
-
Purpose: To modify existing data in one or more rows.
-
Use: Updates values while maintaining data consistency.
-
Example: Changing the Course of a student from BSc to BTech.
-
Purpose: To remove existing rows from a table.
-
Use: Deletes records based on conditions, maintaining referential integrity.
-
Example: Removing students who have left the college.
-
INT (Integer)
-
Stores whole numbers without decimals.
-
Example: 10, 250, -5
-
-
FLOAT (Floating Point)
-
Stores decimal numbers (approximate values).
-
Example: 3.14, 0.5, -2.75
-
-
CHAR (Character)
-
Stores fixed-length text.
-
Example: CHAR(5) → “Hello”
-
-
VARCHAR (Variable Character)
-
Stores variable-length text.
-
Example: VARCHAR(20) → “Database”
-
-
DATE
-
Stores date values in YYYY-MM-DD format.
-
Example: 2025-12-20
-
-
BOOLEAN
-
Stores TRUE or FALSE values.
-
Example: TRUE, FALSE
-
Aggregate functions perform calculations on a set of values and return a single summary value.
They are commonly used in reporting, grouping, and analytics.
| Function | Purpose | Example |
|---|---|---|
| COUNT() | Counts the number of rows | COUNT(*) → Number of students |
| SUM() | Adds values of a column | SUM(Marks) → Total marks of all students |
| AVG() | Calculates average value | AVG(Marks) → Average marks |
| MIN() | Returns minimum value | MIN(Age) → Youngest student |
| MAX() | Returns maximum value | MAX(Age) → Oldest student |
-
GROUP BY is used to group rows with same values in one or more columns.
-
Aggregate functions can then summarize each group separately.
Example:
- Total marks obtained by students in each course.
SELECT Course, SUM(Marks)
FROM Student
GROUP BY Course;
- HAVING is used to filter groups based on aggregate values (like WHERE but for groups).
Example:
- Courses with total marks > 200.
SELECT Course, SUM(Marks)
FROM Student
GROUP BY Course
HAVING SUM(Marks) > 200;Aggregate functions allow summarizing and analyzing data efficiently, especially when combined with GROUP BY and HAVING clauses.
JOINs are used to combine rows from two or more tables based on a related column between them.
They help in retrieving related data from multiple tables efficiently.
-
Definition: Returns only the matching rows from both tables.
-
Example: Retrieve students with their course details where enrollment exists.
SELECT Student.Name, Course.Course_Name
FROM Student
INNER JOIN Enrollment
ON Student.Roll_No = Enrollment.Student_Roll_No;-
Definition: Returns all rows from the left table, and matching rows from the right table.
-
Example: List all students and their courses, including students who are not enrolled.
SELECT Student.Name, Course.Course_Name
FROM Student
LEFT JOIN Enrollment
ON Student.Roll_No = Enrollment.Student_Roll_No;-
Definition: Returns all rows from the right table, and matching rows from the left table.
-
Example: List all courses and students enrolled, including courses with no students.
SELECT Student.Name, Course.Course_Name
FROM Student
RIGHT JOIN Enrollment
ON Student.Roll_No = Enrollment.Student_Roll_No;-
Definition: Returns all rows when there is a match in either left or right table.
-
Example: List all students and courses, including unmatched students or courses.
SELECT Student.Name, Course.Course_Name
FROM Student
FULL JOIN Enrollment
ON Student.Roll_No = Enrollment.Student_Roll_No;JOIN expressions allow combining related data from multiple tables.
-
INNER JOIN: Only matches
-
LEFT JOIN: All left + matches
-
RIGHT JOIN: All right + matches
-
FULL JOIN: All rows from both tables
Normalization is the process of organizing data in a database to reduce redundancy and avoid anomalies during insertion, deletion, or update.
-
Remove redundancy – avoids duplicate data.
-
Prevent anomalies – eliminates insert, update, and delete anomalies.
-
Improve data consistency – ensures data is accurate and reliable.
-
Definition: Table is in 1NF if all columns have atomic (indivisible) values and there are no repeating groups.
-
Example:
| Student_ID | Name | Courses |
|------------|---------|---------------|
| 101 | Rohan | BTech, Math | → Not 1NF
Converted to 1NF:
| Student_ID | Name | Course |
|---|---|---|
| 101 | Rohan | BTech |
| 101 | Rohan | Math |
-
Definition: Table is in 2NF if it is in 1NF and no non-prime attribute is partially dependent on part of a composite primary key.
-
Example:
| Student_ID | Course_ID | Course_Name | Grade |
Partial dependency: Course_Name depends only on Course_ID → Not 2NF
Converted to 2NF:
-
Student_Course(Student_ID, Course_ID, Grade)
-
Course(Course_ID, Course_Name)
-
Definition: Table is in 3NF if it is in 2NF and no transitive dependency exists.
-
Example:
| Emp_ID | Emp_Name | Dept_ID | Dept_Name |
Dept_Name depends on Dept_ID, not Emp_ID → Transitive dependency
Converted to 3NF:
-
Employee(Emp_ID, Emp_Name, Dept_ID)
-
Department(Dept_ID, Dept_Name)
-
Definition: Stronger version of 3NF.
-
Every determinant must be a candidate key.
-
Resolves anomalies that 3NF cannot handle in some rare cases.
Normalization ensures efficient database design, reduces redundancy, prevents anomalies, and improves data consistency.
-
1NF → Atomic values
-
2NF → Remove partial dependency
-
3NF → Remove transitive dependency
-
BCNF → Stronger than 3NF
A functional dependency (FD) is a relationship between two sets of attributes in a relation where the value of one attribute (or set of attributes) uniquely determines the value of another attribute (or set of attributes).
It is a fundamental concept in normalization.
- Represented as:
A→BA \rightarrow BA→B
-
Meaning: Attribute B is functionally dependent on attribute A.
-
Example:
-
In a Student table:
-
Roll_No → Name
-
This means Roll_No uniquely determines the Name.
-
-
-
Ensures atomic values, which reduces repeating groups.
-
Functional dependencies are identified to remove multivalued attributes.
-
Eliminates partial dependency.
-
A non-prime attribute should be fully dependent on the whole primary key, not part of it.
-
FD Example:
-
Composite Key (Student_ID, Course_ID) → Grade
-
Grade depends on the whole key, not just Student_ID.
-
-
Eliminates transitive dependency.
-
FD Example:
-
Emp_ID → Dept_ID → Dept_Name
-
Dept_Name depends transitively on Emp_ID → move Dept_Name to Department table.
-
-
Ensures that every determinant is a candidate key.
-
FD Example:
- If Course → Instructor, but Course is not a candidate key, BCNF requires decomposition.
Decomposition is the process of breaking a single relation (table) into two or more relations to eliminate redundancy and anomalies while maintaining data integrity.
-
Achieve higher normal forms (1NF, 2NF, 3NF, BCNF)
-
Reduce redundancy and update, insert, delete anomalies
-
Ensure lossless join and dependency preservation
-
A decomposition is lossless if the original table can be reconstructed exactly by joining the decomposed tables.
-
Ensures no data is lost after decomposition.
Example:
Original Table: Student_Course(Student_ID, Course_ID, Grade)
-
Decompose into:
-
Student(Student_ID, Name)
-
Enrollment(Student_ID, Course_ID, Grade)
-
Join: Student ⋈ Enrollment → Original table
-
A decomposition preserves functional dependencies if all original dependencies can be enforced without joining the tables.
-
Ensures data integrity rules remain intact in decomposed tables.
Original Table:
| Emp_ID | Emp_Name | Dept_ID | Dept_Name |
- Problem: Dept_Name depends on Dept_ID → transitive dependency (not 3NF)
Decomposition into 3NF:
-
Employee(Emp_ID, Emp_Name, Dept_ID)
-
Department(Dept_ID, Dept_Name)
-
Now, redundancy and anomalies are removed
-
Lossless: Joining Employee ⋈ Department reconstructs original table
-
Dependency preserved: Emp_ID → Dept_ID and Dept_ID → Dept_Name
Decomposition is essential for efficient, anomaly-free database design, ensuring lossless join and dependency preservation while achieving higher normal forms.
A transaction is a logical unit of work in a database that performs one or more operations (like read, write, update) on the database.
-
A transaction is atomic, meaning it is either fully executed or not executed at all.
-
Transactions ensure data consistency, integrity, and reliability.
-
Definition: A transaction is atomic, meaning it is indivisible.
-
Either all operations succeed or none are applied.
-
Example:
-
Transferring $100 from Account A to Account B:
-
Debit A by $100
-
Credit B by $100
-
-
If any step fails, rollback occurs and no money is transferred.
-
-
Definition: Transaction must take the database from one valid state to another, maintaining all integrity constraints.
-
Example:
-
Inserting a student record with a valid Roll_No and Age.
-
Database rules like primary key uniqueness remain intact after transaction.
-
-
Definition: Transactions are executed independently.
-
Intermediate results of a transaction are not visible to other transactions until commit.
-
Example:
-
Two transactions update the same account simultaneously.
-
Isolation ensures final balance is correct, preventing dirty reads.
-
-
Definition: Once a transaction is committed, its changes are permanent, even if there is a system crash.
-
Example:
- After transferring money, even if the system fails, the updated balances remain in the database.
Transactions ensure reliable, consistent, and concurrent operations in a database.
The ACID properties guarantee atomicity, consistency, isolation, and durability, which are essential for a robust DBMS.
Concurrency in a database occurs when multiple transactions are executed simultaneously.
-
It improves system efficiency and resource utilization.
-
However, it can cause anomalies or inconsistencies if not controlled properly.
-
Definition: Occurs when two transactions simultaneously update the same data, and one update overwrites the other.
-
Example:
-
Transaction T1 reads Account A = 100 and adds 50 → 150
-
Transaction T2 reads Account A = 100 and subtracts 30 → 70
-
Without control, final balance may become 70 instead of 120, losing T1’s update.
-
-
Definition: Occurs when intermediate data produced by one transaction is read by another transaction, leading to wrong results.
-
Example:
-
T1 transfers $100: Debit A = 500 → 400, Credit B = 200 → 300
-
T2 reads B = 200 during transfer → temporary wrong balance
-
-
Definition: Occurs when a transaction reads data written by another uncommitted transaction.
-
Example:
-
T1 updates Employee salary to 50,000 but not yet committed
-
T2 reads salary as 50,000
-
If T1 rolls back, T2 has inconsistent data (dirty read).
-
Concurrent execution improves performance, but can lead to lost updates, temporary inconsistency, and dirty reads.
- To prevent these problems, DBMS uses concurrency control mechanisms like locking, timestamps, and isolation levels.
A deadlock occurs when two or more transactions are waiting indefinitely for resources held by each other, so none of them can proceed.
-
Mutual Exclusion
-
At least one resource must be held in a non-shareable mode.
-
Only one transaction can use it at a time.
-
-
Hold and Wait
- A transaction holds at least one resource and waits for additional resources held by other transactions.
-
No Preemption
- Resources cannot be forcibly taken from a transaction; they are released only voluntarily.
-
Circular Wait
- A set of transactions exists such that each transaction is waiting for a resource held by the next transaction in a cycle.
-
DBMS monitors resource allocation graph or uses wait-for graph.
-
If a cycle is detected, a deadlock exists.
-
Action: Abort one or more transactions to break the cycle.
-
Eliminate one or more necessary conditions:
-
Mutual exclusion: Make resources sharable if possible
-
Hold and wait: Require transactions to request all resources at once
-
No preemption: Preempt resources from transactions
-
Circular wait: Impose an ordering of resource requests
-
Deadlocks block concurrent transactions, reducing system performance.
Detection and prevention techniques ensure smooth transaction execution in DBMS.
A deadlock occurs when two or more transactions are waiting indefinitely for resources held by each other, so none of them can proceed.
-
Mutual Exclusion
-
At least one resource must be held in a non-shareable mode.
-
Only one transaction can use it at a time.
-
-
Hold and Wait
- A transaction holds at least one resource and waits for additional resources held by other transactions.
-
No Preemption
- Resources cannot be forcibly taken from a transaction; they are released only voluntarily.
-
Circular Wait
- A set of transactions exists such that each transaction is waiting for a resource held by the next transaction in a cycle.
-
DBMS monitors resource allocation graph or uses wait-for graph.
-
If a cycle is detected, a deadlock exists.
-
Action: Abort one or more transactions to break the cycle.
-
Eliminate one or more necessary conditions:
-
Mutual exclusion: Make resources sharable if possible
-
Hold and wait: Require transactions to request all resources at once
-
No preemption: Preempt resources from transactions
-
Circular wait: Impose an ordering of resource requests
-
Deadlocks block concurrent transactions, reducing system performance.
Detection and prevention techniques ensure smooth transaction execution in DBMS.
A transaction is a logical unit of work that performs one or more database operations (like read, write, update) as a single unit.
-
Either the transaction completes fully (commit) or does not execute at all (abort/rollback).
-
Ensures consistency, integrity, and reliability of the database.
-
Atomicity: All operations of a transaction succeed or none succeed.
-
Consistency: Transaction moves database from one valid state to another.
-
Isolation: Transactions execute independently without affecting each other.
-
Durability: Committed changes are permanent, even if system fails.
-
Read-only Transaction
-
Only reads data and performs no updates.
-
Example: Viewing account balance.
-
-
Update Transaction
-
Reads and modifies data.
-
Example: Transferring money between accounts.
-
-
Single-user Transaction
-
Only one user executes the transaction at a time.
-
Example: Updating personal profile in a local database.
-
-
Multi-user Transaction
-
Multiple users concurrently access and modify data.
-
Requires concurrency control to avoid anomalies.
-
-
Short Transactions
-
Execute quickly, often read/write a few records.
-
Example: Updating a single student's marks.
-
-
Long Transactions
-
Take more time and may involve multiple operations.
-
Example: Processing end-of-month payroll for all employees.
-
| State | Description |
|---|---|
| Active | Transaction is executing. |
| Partially Committed | Final operation executed, but not yet committed. |
| Committed | All operations successfully completed; changes permanent. |
| Failed | Some operation failed; transaction cannot commit. |
| Aborted/Rolled Back | Transaction rolled back to initial state; no changes saved. |
| Terminated | Transaction ends after commit or rollback. |
-
System Crash: Hardware or software failure during transaction.
-
Transaction Failure: Logical error or constraint violation.
-
Media Failure: Disk or storage corruption.
- Recovery mechanisms like logs, checkpoints, and rollbacks restore consistency.
-
Ensures data consistency and integrity
-
Prevents lost updates, dirty reads, and temporary inconsistencies
-
Supports concurrent multi-user environments
-
Guarantees durability of committed changes
Concurrency occurs when multiple transactions are executed simultaneously in a database system.
-
Improves system throughput and resource utilization.
-
Without proper control, concurrency can lead to data anomalies and inconsistencies.
-
Read-Only Concurrency
-
Transactions only read data, no updates.
-
Example: Multiple users viewing the same report.
-
-
Update Concurrency
-
Transactions read and write data simultaneously.
-
Requires concurrency control to avoid conflicts.
-
Example: Two users updating the same account balance.
-
-
Definition: Two transactions simultaneously update the same data, and one update overwrites the other.
-
Example:
-
T1 reads Account A = 100 and adds 50 → 150
-
T2 reads Account A = 100 and subtracts 30 → 70
-
Final balance = 70 (T1 update lost)
-
-
Definition: A transaction reads intermediate data of another transaction that has not yet committed, causing inconsistency.
-
Example:
-
T1 transfers $100: Debit A → 400, Credit B → 300
-
T2 reads B = 200 during transfer → temporary wrong balance
-
-
Definition: A transaction reads data written by another uncommitted transaction.
-
Example:
-
T1 updates Employee salary to 50,000 (not committed)
-
T2 reads salary = 50,000
-
T1 rolls back → T2 read invalid data (dirty read)
-
-
Concurrency improves database performance, but may cause lost updates, temporary inconsistencies, and dirty reads.
-
To prevent these problems, DBMS uses locks, 2PL, timestamps, and isolation levels.