Skip to content

davideFerigato/BachelorThesis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Query Optimization for Business Intelligence Systems: Advanced Techniques to Improve Performance and Decision Support

License: MIT GitHub last commit Compilation status LaTeX SQL Server Azure Synapse

Author: Davide Maria Ferigato
Supervisor: Prof. Domenico Beneventano
University: University of Modena and Reggio Emilia, Department of Engineering "Enzo Ferrari"
Degree: Bachelor's Degree in Computer Engineering (L-08)
Academic Year: 2024/2025

This repository contains the complete project for my bachelor's thesis, which addresses the topic of query performance optimization in Business Intelligence systems. The work analyzes the impact of ETL process evolution, physical design techniques (indexing, partitioning, compression, caching), and the implementation of materialized views on Microsoft platforms (SQL Server and Azure Synapse).

"A ship in port is safe, but that's not what ships are built for." — Grace Murray Hopper

📥 Direct Download

📋 Table of Contents

📄 Abstract

This thesis analyzes advanced techniques for query optimization in Business Intelligence systems, considering Data Warehouse and ETL architectures. The work integrates theory and practice: from physical design (indexing, partitioning, compression, caching) to view materialization, up to measurable performance metrics and governance. The goal is to define reproducible procedures to reduce latency and I/O while preserving data quality and freshness.

The discussion follows a progressive path: fundamentals of BI and OLTP/OLAP differences, ETL quality and structural metrics, physical design choices for analytical workloads, and the role of materialized views in reducing query costs. The applicative chapter analyzes SQL Server and Azure Synapse: creation requirements, automatic query rewrite, data distribution, statistics, use of CTAS and temporary tables, diagnostic tools, and overhead management.

The results converge into operational guidelines: columnstore and partitioning as a physical base, views designed on shared joins and aggregations with consistent distribution, validation through automatic rewrite, and overhead thresholds for rebuild. Overall, the coordinated adoption of these techniques stably reduces latencies and improves the overall efficiency of the Data Warehouse, providing applicable and replicable guidelines.

🗂️ Repository Structure

The structure follows a standard LaTeX project organization, with separate chapters and supplementary materials logically organized.

├── main.tex                                    # Main LaTeX file
├── main.bib                                     # Bibliography
├── thesis.pdf                                    # Thesis in PDF format (final version)
├── .gitignore                                    # Git ignore file
├── LICENSE                                       # MIT License
│
├── chapters/                                     # Thesis chapters (included by main.tex)
│   ├── Abstract.tex
│   ├── introduction.tex
│   ├── Fondamenti_della_Business_Intelligence_e_Data_Warehouse.tex
│   ├── Evoluzione_e_Qualità_dei_Processi_ETL.tex
│   ├── Viste_Materializzate.tex
│   ├── Ottimizzazione_Fisica_del_Data_Warehouse.tex
│   ├── Viste_materializzate_OLAP_su_SQL_Server_e_Synapse.tex
│   └── conclusion.tex
│
├── adds/                                         # Supplementary materials (images, snippets, tables)
│   ├── cover-page/                               # Cover images
│   ├── Evoluzione_e_Qualità_dei_Processi_ETL/    # ETL chapter assets
│   ├── Fondamenti_della_Business_Intelligence_e_Data_Warehouse/
│   ├── Ottimizzazione_Fisica_del_Data_Warehouse/
│   ├── Viste_Materializzate/
│   └── Viste_materializzate_OLAP_su_SQL_Server_e_Synapse/
│
├── code/                                          # All SQL scripts and code examples
│   ├── sql-server/                                 # SQL Server examples
│   ├── synapse/                                    # Azure Synapse examples
│   ├── etl-example/                                # ETL-related snippets
│   └── README.md                                   # Detailed code documentation
│
└── presentation/                                  # Defense materials
    └── slides.pdf                                  # Presentation slides

💻 Technologies Used

  • Databases & Data Warehousing: Microsoft SQL Server 2019, Azure Synapse Analytics (Dedicated SQL Pool)
  • Languages: T-SQL, LaTeX
  • Key Concepts: Data Warehousing, Dimensional Modeling, ETL, Materialized Views (Indexed Views / Materialized Views), Query Optimization, OLAP, Columnstore Indexes, Partitioning, Compression, Caching, Query Hints (NOEXPAND/EXPAND VIEWS), CTAS, TPC-DS Benchmark
  • Tools: Overleaf, SQL Server Management Studio (SSMS), Azure Data Studio, Git

🔍 Code Examples

All SQL scripts discussed in the thesis are available in the code/ folder, organized by platform. Each script is commented and ready to run in a suitable environment.

Platform Focus Area Key Scripts
SQL Server Indexed Views, Query Hints, Columnstore Monitoring 01_creazione_vista_indicizzata.sql
02_NOEXPAND_EXPAND_VIEWS.sql
03_columnstore_rowgroup_stats.sql
Azure Synapse Materialized Views, CTAS, Optimization Cycle, TPC-DS Evaluation 01_create_materialized_view.sql
02_optimization_loop_ctas_mv.sql
03_operational_cycle_explain.sql
04_tpcds_evaluation.sql
ETL Examples Miscellaneous snippets codice_partizione.sql

For a detailed description of each script and instructions on how to use them, please refer to the code/README.md file.

🚀 How to Compile the Thesis (LaTeX)

To compile the thesis locally and generate the PDF:

pdflatex main.tex
biber main
pdflatex main.tex
pdflatex main.tex

Alternatively, you can import the entire folder into Overleaf and compile online (make sure to also upload the chapters/ and adds/ folders).

📜 License

The source code in this repository is distributed under the MIT License.
The thesis text is copyright © 2025 Davide Maria Ferigato. All rights reserved.

📧 Contact

Davide Maria Ferigato


⭐ If you find this project interesting or useful, please leave a star!

About

Bachelor thesis on query optimization for Business Intelligence systems. Analysis of ETL, physical design (indexing, partitioning, compression, caching), and materialized views on SQL Server and Azure Synapse. Includes SQL scripts and TPC-DS evaluations.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Contributors