Hey babies! 😀
Like any project, the key of success is a clear and organized methodology! You worked may be in some projects before, so you have an idea about how to start and which steps to follow. Starting from explaining your problematic, setting the goals of the project, studying the market and estimating the success of your idea.
In data science it is not the same, in addition of the market study and the other steps you have an important part that needs your attention and concentration. We focus here on DATA!
How a data science project is done. IBM set a very structured methodology in the 60s and it’s now the only one followed in every successful data science project.
Ready to explore the CRISP-DM Methodology? Let’s gooooooooooo!
CRISP-DM stends for cross-industry process for data mining. It’s about 6 steps making you closer to the end of a professionnal project.
Step 1: Business understanding
You have to understand what are you doing. You have to set your objectives, to make your project plan, to list your requirements, your constraints, the risks that may delay your work. Well, make a whole study about your project like you used to do in any other project. At the end, you will present a product! So make sure to give this product the chance to shine among others.
Step 2: Data understanding
What is the kernel of our work here? It’s data! If you don’t understand your data you will never succeed your project. First of all, you have to determine the needed data. You understood your business very well and you know what you need, so you will know which data you have to collect.
Three important stages exist in this step:
- Describing the data: Find the format of the data and its quantity, identify the different fields in it and evaluate wether this data is usefull in your case.
- Explore the data: Here you will have a data scientist touch! You will make some analysis in your data, by data vizualisation which means by representing the collected numbers (data) into charts, by asking some questions to your data, by making some statistical techniques.
- Verify data quality: Is there any illogical results? any incorrect data, any missing values? All of that should be fixed at this stage.
Step 3: Data preparation
Here, you will make your final decision about which data you will use for your analysis. Based on what you understand from the previous step, you will take only the needed data that will solve your problem.
After that, you have to “clean” your data. Don’t think that you are very lucky and that you will find beautiful organized and structured data. Your data could be a mess! and most of the time the best information is extracted from unorganized data, (we call it unstructured data).
So organize your data, make it beautiful and clean, give it a structure and go to the next step!
Step 4: Modelling
It’s about testing algorithms. There are always many solutions to a problem but your mission is to choose the best one. The most optimized one. That’s why at this stage you will test many algorithms to obtain the best model. The model that will give you the best results.
Step 5: Evaluation
Here you will evaluate your model! You will make sure that you made the right choice by obtaining the right results. If you reached the goals set in the first step, your model is the one!
Step 6: Deployment
Here, you need a professional context. Your models are put to production to be used and experienced.
My goal here was to just give you an idea about how data science projects work. Before going to these steps, we need to learn more and more about data science.
By writing about the format of data I thought about the topic of the next article. Next time babies we will talk about Structured, semi-structured and unstructured data! 😀
Stay tuned! 😉