| layout | page-steps | |||
|---|---|---|---|---|
| language | Python | |||
| title | Perform customer clustering | |||
| permalink | /python/customerclustering/ | |||
| redirect_from |
|
In this tutorial, we are going to get ourselves familiar with clustering. Clustering can be explained as organizing data into groups where members of a group are similar in some way. We will be using the Kmeans algorithm to perform the clustering of customers. This can for example be used to target a specific group of customers for marketing efforts. Kmeans clustering is an unsupervised learning algorithm that tries to group data based on similarities. Unsupervised learning means that there is no outcome to be predicted, and the algorithm just tries to find patterns in the data. You will learn how to perform clustering using Kmeans and analyze the results. We will also cover how you can deploy a clustering solution using SQL Server. You can copy code as you follow the tutorial. All code is also available on GitHub.
{% include partials/install_sql_server_windows_ML.md %}
Download and install SQL Server Management studio: SSMS
Now you have installed a tool you can use to easily manage your database objects and scripts.
Run SSMS and open a new query window. Then execute the script below to enable your instance to run R scripts in SQL Server.
EXEC sp_configure 'external scripts enabled', 1;
RECONFIGURE WITH OVERRIDEYou can read more about configuring Machine Learning Services here. Don't forget to restart your SQL Server Instance after the configuration! You can restart in SSMS by right clicking on the instance name in the Object Explorer and choose Restart.
Optional: If you want, you can also download SSMS custom reports available on github. The report "R Services - Configuration.rdl" for example provides an overview of the R runtime parameters and gives you an option to configure your instance with a button click. To import a report in SSMS, right click on Server Objects in the SSMS Object Explorer and choose Reports -> Custom reports. Upload the .rdl file.
Now you have enabled external script execution so that you can run Python code inside SQL Server!
1.You need to install a Python IDE. Here are some suggestions:
-
Python Tools for Visual Studio (PTVS) Download
-
VS Code (download) with the Python Extension and the mssql extension
-
PyCharm Download
Note!!! To be able to use some of the functions in this tutorial, you need to have the revoscalepy package.
Follow instructions here to learn how you can install Python client libraries for remote execution against SQL Server ML Services:
How to install Python client libraries
Terrific, now your SQL Server instance is able to host and run R code and you have the necessary development tools installed and configured! The next section will walk you through how to do clustering using R.