ERD vs. Star Schema Analysis. Exploring Database Designs through Sales Data
ERD vs. Star Schema Analysis. Exploring Database Designs through Sales Data In this assignment, you have an RDBMS that represents sales information for a retail company. Your task is to explore this dataset through two different database designs: Entity-Relationship Diagram (ERD) model and Star Schema. You’ll write SQL queries for both models, focusing on the implementation of various SQL operations and how they differ between these database architectures. Assignment Objectives • • • Learn to populate data and establish relationships in database tables. Understand and apply SQL queries in two distinct database designs. Analyze the similarities and differences in query construction and execution between the ERD and Star Schema models. Although each one has its different design, the ERD and STAR SCHEMA in the assignment are equivalent when you perform the tasks. Part 1: ERD Model Structure The ERD model consists of the following tables and columns: Customers Table • • • • • • • CustomerID (PK, int) FirstName (varchar/string) LastName (varchar/string) Email (varchar/string) Address (varchar/string) City (varchar/string) Country (varchar/string) Orders Table • • • • OrderID (PK, int) CustomerID (FK, int, referencing Customers.CustomerID) OrderDate (date) TotalAmount (decimal/float) Products Table • • • • ProductID (PK, int) ProductName (varchar/string) Price (decimal/float) Category (varchar/string) OrderDetails Table • • • • • OrderDetailID (PK, int) OrderID (FK, int, referencing Orders.OrderID) ProductID (FK, int, referencing Products.ProductID) Quantity (int) UnitPrice (decimal/float) Second, the Star Schema Tables FactSales Table • OrderDetailID (int, PK): Reflects the unique identifier for each sale, • • • • • • • mirroring OrderDetails.OrderDetailID. OrderID (int, FK): Links to the Orders table in the ERD model. CustomerID (int, FK): Links to DimCustomers.CustomerID. ProductID (int, FK): Links to DimProducts.ProductID. OrderDate (date, FK): Links to DimTime.OrderDate. Quantity (int): Represents the quantity of products sold. UnitPrice (decimal/float): Price per unit of the product. TotalAmount (decimal/float): Calculated as Quantity * UnitPrice. DimCustomers Table • Mirrors the Customers table from the ERD Model with CustomerID as the primary key. DimProducts Table • Mirrors the Products table from the ERD Model with ProductID as the primary key. DimTime Table • OrderDate (date, PK): Unique dates from Orders in the ERD Model. • Year (int): The year extracted from OrderDate. • Month (int): The month extracted from OrderDate. • Day (int): The day extracted from OrderDate. The assignmnt is modifed to include this. ———————————————————————– Queries Assignment Your task is to write SQL queries for both the ERD model and the Star Schema that perform the following operations. Both queries should be equivalent. Total Sales by Category • Calculate the total sales amount for each product category. • • • • • • • • • • • • • This covers: Grouping, aggregation. Total Sales by Category (must be documented) #Total Sales by Category #SQL query for ERD model #Calculates total sales by category in the ERD model. #Execution Time: X milliseconds #SQL query for Star Schema #Calculates total sales by category in the Star Schema. #Execution Time: Y milliseconds #The ERD Model query is (X – Y) milliseconds faster/slower than the Star Schema query. Average Order Value by City • Determine the average order value for customers in each city. • This covers: Joining tables, grouping, calculating averages. List Unique Products Sold • Generate a list of unique products sold. • This covers: Distinct selections, potentially joining tables. Monthly Order Count • Count the number of orders placed in each month. • This covers: Date functions (extracting month), counting, grouping. The goal is to measure the execution time for each query, providing insight into their performance in different database models (ERD vs. Star Schema). Execute Queries Using Python: Utilize Python scripting to execute these queries against your database. The script should be structured to measure and record the time taken to execute each query. Record Execution Time: After executing each query, record the time taken directly below the query for easy reference and comparison. For example: 1. Total Sales by Category • ERD Model: • [SQL Query] • Execution Time: [Recorded Time] • Star Schema: • [SQL Query] • Execution Time: [Recorded Time] 2. Average Order Value by City 3. List Unique Products Sold 4. etc. Attached is an example script, similar to the one we reviewed earlier, now including a timing module for accurate performance tracking. This script, refined for simplicity and effectiveness, can be used as a straightforward template. Just replace the SQL queries within this script with those required for each of your tasks, using the same structure as demonstrated in the attachment. Should you choose to, you can simply copy the attached .py file to record all query timings. Just insert the queries you’ve prepared as answers into the designated sections of the script. ———————– FURTHER INFORMATION AND EXPLANATION TIPS ————————Please include a screenshot of the design and the relationship. You can use XAMPP or any other software. Data Generation • • • • Ensure the data is realistic (i.e., valid email formats, reasonable names). Include a range of values to adequately test queries (i.e., orders across different months and years, a variety of product categories) Best way is always to use simple Python generator to fill the tables according to their stricture. This is the easiest and most efficient /popular way if you can’t find a fit dataset. • If you use this method, please attach the Python file. Once you write the code (like 8 lines), it can generate any number. Please make the dataset large enough. Like 10k or more (you can go for large dataset using the same script, if you prefer). Summarization and deliverables: • • • • • • • • • Begin by creating the ERD, following the guidelines in the page above. Put the commands in a text file so that I can copy it and paste it in the interpreter (like the text file that has all commands to create ur three databases and table). Next, develop the Star Schema as described in the second text file. Put the commands so that I can copy it and paste it in the interpreter. Populate the database tables using a Python generator script, targeting a dataset of at least 50,000 records for effective comparison. Larger datasets is better for querying big data. I recommend using the library Faker in Python. It’s designed for such situations. For ease, use the attached Python script for recording query execution times. Attach it in .py format with comments and results for each. Only include a portion of the generated data in your final submission (in the text file), but ensure the whole dataset is referenced and in separate file (in whatever format you prefer). You can find a way to reference the dataset file to the queries and put them all in the text file. Write the SQL queries for each task in the two text files (ERD and Star Schema), and then run them in Python code to measure the execution time, and place the timing beneath them (Like the shown Record Execution Time example). Attach the Python file in .py format only and document the code. You can place them in a text file as well. Assemble all documents into a single Zip file. Name your Zip file as YourLastname_Assignment2_10.zip The assignment is straightforward. You only need to populate the data, write the queries, and measure the time. Make sure about writing equivalent queries in each design and test the performance for each and record the time. Due date: The assignment is due on April 4 by 11:50 PM, giving you two weeks. Please don’t hesitate to contact either Brunda or myself if you have any questions. Also, Please ensure your emails focus on seeking guidance rather than direct solutions to questions. The goal is for you to independently arrive at these answers. While I’m here to assist with other matters, I cannot provide the exact solutions for writing queries. Additionally, I encourage you not to wait until the last day to submit your work. Further tips: All relationships in ERD are One-to-Many Start schema, as fact table with it dimension, is many-to-one ERD Model Relationships • • • Customers and Orders: One-to-Many Relationship Orders and OrderDetails: One-to-Many Relationship Products and OrderDetails: One-to-Many Relationship Star Schema Relationships In the Star Schema, the relationships are centered around the FactSales table, with dimension tables providing descriptive attributes: • FactSales and DimCustomers: Many-to-One Relationship • FactSales and DimProducts: Many-to-One Relationship • FactSales and DimTime: Many-to-One Relationship In the Star Schema, FactSales acts as the central fact table, capturing transactional data. The dimension tables (DimCustomers, DimProducts, DimTime) provide the context for these transactions, such as who made the purchase, what was purchased, and when it was purchased. You have 3 attempts to upload the assignment and I’ll count the last one. All the best-
Collepals.com Plagiarism Free Papers
Are you looking for custom essay writing service or even dissertation writing services? Just request for our write my paper service, and we'll match you with the best essay writer in your subject! With an exceptional team of professional academic experts in a wide range of subjects, we can guarantee you an unrivaled quality of custom-written papers.
Get ZERO PLAGIARISM, HUMAN WRITTEN ESSAYS
Why Hire Collepals.com writers to do your paper?
Quality- We are experienced and have access to ample research materials.
We write plagiarism Free Content
Confidential- We never share or sell your personal information to third parties.
Support-Chat with us today! We are always waiting to answer all your questions.