capstone4ds · Rahulreddyputta · Jan 21, 2025 · Jan 29, 2025 · Jan 29, 2025 · Feb 3, 2025
diff --git a/Capstone_final.html b/Capstone_final.html
diff --git a/Capstone_final.qmd b/Capstone_final.qmd
diff --git a/CreditCardFraudDetection.html b/CreditCardFraudDetection.html
diff --git a/CreditCardFraudDetection.qmd b/CreditCardFraudDetection.qmd
diff --git a/Online Payment Fraud Detection using Machine Learning draft.pdf b/Online Payment Fraud Detection using Machine Learning draft.pdf
diff --git a/Online Payment Fraud Detection using Machine Learning_Final.pdf b/Online Payment Fraud Detection using Machine Learning_Final.pdf
diff --git a/Online Payment Fraud Detection using Machine Learning__Final.pdf b/Online Payment Fraud Detection using Machine Learning__Final.pdf
diff --git a/Online Payment Fraud Detection using Machine Learning__final.pdf b/Online Payment Fraud Detection using Machine Learning__final.pdf
diff --git a/Online Payment Fraud Detection using Machine Learning_draft.pdf b/Online Payment Fraud Detection using Machine Learning_draft.pdf
diff --git a/index.html b/index.html
diff --git a/index.qmd b/index.qmd
diff --git a/week 2 review.md b/week 2 review.md
@@ -0,0 +1,73 @@
+      Supply Chain Optimization by using Gen AI
+
+Introduction:
+
+
+In light of the growing complexity of global supply chains in the age of Industry 4.0, there is a pressing need for sophisticated technologies to help increase operational efficiency and resilience. Supply chain optimization in any supply chain management is meant at reducing costs, increasing the delivery time and being able to adjust to the disruptions. Nevertheless, the achievement of these goals is difficult because the data is limited, there are privacy concerns and the supply chain environments are dynamic.
+
+In this study, we attempt to answer these challenges via the integration of artificial intelligence (AI), in particular synthetic data generation. Through use of the Gaussian Copula within the Synthetic Data Vault (SDV) framework, the study creates high fidelity synthetic datasets that resemble real world data while maintaining the integrity of sensitive observations. Then these datasets are enriched with key variables - supplier reliability, season demand, and external disruptions - to train machine learning models for predicting shipping delays and facilitating supply chain operations.
+
+The predictions are also validated through simulation testing under dynamic conditions inspired by the idea of digital supply chain twins. The results highlight the feasibility of combining generative AI with machine learning to architect scalable and privacy preserving supply chain management solutions in the modern world. The concept of this approach is in line with the rising degree of importance of resilience, sustainability, and ethics in data practices at the global supply chain.
+
+
+Background/Motivation
+
+Context: It studies the growing complexity of the global supply chain while addressing their needs for efficiency, resilience, and their ability to adapt to constraints such as data scarcity, privacy concerns and dynamic environments.
+
+Research Gap: Real world supply chain data is often limited and sensitive, and the traditional optimization methods are not particularly good tools for real time monitoring and predictive analysis. Using AI driven approaches such as synthetic data generation and machine learning, the study seeks to fill this gap.
+
+Significance: The research achieves better data availability without compromising the privacy of the underlying information by leveraging synthetic datasets in training the models and performing the experimentation. This is important because of the supply chain resilience and decision making that is key in the age of Industry 4.0.
+
+Methods Used
+
+Synthetic Data Generation:
+Replicated real world data properties using the Synthetic Data Vault (SDV) framework with a Gaussian Copula model while preserving privacy.
+Features revealing operationally meaningful insights (e.g., supplier reliability, seasonal demand) are added to the enriched dataset.
+Descriptive statistics and visual comparisons towards valided data fidelity.
+
+Machine Learning Model:
+Random Forest Classifier was used to predict shipping delays on enriched synthetic data.
+Car price, sales, shipping duration and one hot encoded seasonal demand variables were the features.
+Precison, Recall, F1-score, and AUC ROC were used to evalute the perfomance of the model.
+
+Simulation Testing:
+Supplier reliability and external disruptions were varied in dynamic real world scenarios.
+Modeling supplier reliability fluctuations was achieved through the use of random noise and through probabilistic disruption.
+Simulated outcomes were validated against predictions of the model.
+
+Visualization and Dashboarding:
+Created an interactive dashboard with Python’s Dash framework.
+Supplied metrics like reliability, delay duration and resulting impact of interruptions to help make decisions.
+
+Significance of the Work
+
+Key Findings: With 1,000 to 87,700 records, the synthetic dataset was near perfect with respect to fidelity to the original data. In static conditions, the Random Forest Classifier performed with 100% accuracy, precision, and recall. Delay was found to be highly sensitive to external disruptions and low supplier reliability using simulations.
+
+Broader Impact: It validates the use of generative AI to optimize supply chain whilst addressing the ethical concerns of data privacy. It offers actionable insights to stakeholders who can predict and reduce risks in line with sustainable supply chain management principles.
+
+Implications: The case study demonstrates the scalability of the combination of synthetic data with AI to enable resolution of complex supply chain challenges. It shows how digital supply chain twins can be integrated to generate proactive risk management and resilience.
+
+Relevance to other work
+References: 
+                        It expands beyond Min’s (2010) focus on AI tools for supply chain analytics and Dubey et. al’s (2020) talk of AI driven supply chain resilience. It uses the concept of digital supply chain twins put forward by Ivanov and Dolgui (2020) for disturbance management.
+
+Differentiation:  Specific to the generation of synthetic data to address data limitations, and a less explored area in prior studies. It introduces simulation testing as a dynamic validation technique connecting the theoretical models with the real world applications.
+
+Cited Seminal Works: References foundational texts like Chopra & Meindl (2007) for supply chain strategies and Sánchez-Flores et al. (2020) for supply chain management which is sustainable.
+
+Relevance to Capstone Project 
+
+Applicability: The method that the study takes to generate synthetic data will help you in deciding on how to pre process and what do to for privacy preservation of the data that you use. In similar contexts such as fraud detection or loan prediction, machine learning models like Random Forests offer robust baseline results that can be used for predictive modelling.
+
+Incorporation: We can adapt the methodology of enriching datasets with the operational features to use financial or customer behavior attributes in your project. Dash boarding techniques inspire real time visualization of monitoring and analyzing of prediction.
+
+Expansion or Divergence: Your capstone could try out other machine learning algorithms such as Gradient Boosting or Neural Networks to handle overfitting that you observed in this study. Combine real world data validation or compare synthetic and real datasets, to verify generalizability.
+
+
+Conclusion:
+
+Integrating generative AI techniques and machine learning models into supply chain optimization are the focus of this research, which contributors significantly advance the state of the art considering topical challenges in this area. The study deals with the data scarcity and privacy problem in real world, by using Gaussian Copula model for generating synthetic data. A statistical fidelity validated synthetic dataset allowed robust machine learning models which in turn predict shipping delays and improve operations. Industry 4.0 principles were embodied by the innovative combination of predictive analytics with simulation testing under dynamic conditions for the resilience and adaptability of the framework.
+
+The real-world implementation of this was further emphasized by the use of interactive dashboards for visualization and real-time insights, for making data driven decisions. Nevertheless, some of these limitations include potential overfitting, simplified assumptions in simulations, and the inherent issues of synthetic data. To extend the applicability of the proposed framework, addressing these concerns with real world validation, expanded simulation scenarios, real time data integration, and ethical safeguards can further refine the proposed framework.
+
+The implications of the findings for supply chain management are particularly important in terms of improving resilience to disruptions and attaining operational sustainability. Moreover, this research also helps advance the literature on AI based supply chain management and is a foundation for next wave of innovations like agent based modeling, real time streaming capability and scaled computational infrastructure. This study armed with digital supply chain twinning, and resolved ethical and representational concerns about synthetic data generation, made a pathway for a transformative impact on global supply chains.