How to succeed in data science projects?

project-management-manager-980x505

Hey babies!  😀

Like any project, the key of success is a clear and organized methodology! You worked may be in some projects before, so you have an idea about how to start and which steps to follow. Starting from explaining your problematic, setting the goals of the project, studying the market and estimating the success of your idea.

In data science it is not the same, in addition of the market study and the other steps you  have an important part that needs your attention and concentration. We focus here on DATA! 

How a data science project is done. IBM set a very structured methodology in the 60s and it’s now the only one followed in every successful data science project.

Ready to explore the CRISP-DM Methodology? Let’s gooooooooooo!

601fa40a578d7c658ba6cc4d502174e3

CRISP-DM stends for cross-industry process for data mining. It’s about 6 steps making you closer to the end of a professionnal project.

Step 1: Business understanding

You have to understand what are you doing. You have to set your objectives, to make your project plan, to list your requirements, your constraints, the risks that may delay your work. Well, make a whole study about your project like you used to do in any other project. At the end, you will present a product! So make sure to give this product the chance to shine among others.

Step 2: Data understanding

What is the kernel of our work here? It’s data! If you don’t understand your data you will never succeed your project. First of all, you have to determine the needed data. You understood your business very well and you know what you need, so you will know which data you have to collect.

Three important stages exist in this step:

  • Describing the data: Find the format of the data and its quantity, identify the different fields in it and evaluate wether this data is usefull in your case.
  • Explore the data: Here you will have a data scientist touch! You will make some analysis in your data, by data vizualisation which means by representing the collected numbers (data) into charts, by asking some questions to your data, by making some statistical techniques.
  • Verify data quality: Is there any illogical results? any incorrect data, any missing values? All of that should be fixed at this stage.

Step 3: Data preparation

Here, you will make your final decision about which data you will use for your analysis. Based on what you understand from the previous step, you will take only the needed data that will solve your problem.

After that, you have to “clean” your data. Don’t think that you are very lucky and that you will find beautiful organized and structured data. Your data could be a mess! and most of the time the best information is extracted from unorganized data, (we call it unstructured data).

So organize your data, make it beautiful and clean, give it a structure and go to the next step!

Step 4: Modelling

It’s about testing algorithms. There are always many solutions to a problem but your mission is to choose the best one. The most optimized one. That’s why at this stage you will test many algorithms to obtain the best model. The model that will give you the best results.

Step 5: Evaluation

Here you will evaluate your model! You will make sure that you made the right choice by obtaining the right results. If you reached the goals set in the first step, your model is the one!

Step 6: Deployment

Here, you need a professional context. Your models are put to production to be used and experienced.

9a34cd165a95ee7d19e1aeed4364df7c We finished!

My goal here was to just give you an idea about how data science projects work. Before going to these steps, we need to learn more and more about data science.

By writing about the format of data I thought about the topic of the next article. Next time babies we will talk about Structured, semi-structured and unstructured data! 😀

Stay tuned! 😉

Jobs in the joblessness crisis…

Multiethnic Group of People with Various Occupations Concept

Hi Babies,

I promised to post an article about the different jobs of the data world and heere we are!

You dreamt about one job when you were a child for sure, but I don’t think that you dreamt about one of these jobs! Because at that time these jobs did not exist!

Do you want to choose one of the sexiest and most needed and funniest job in the world?  (People working in other sectors don’t be jealous, I said “one of” even your job is sexy and needed and funny if you are doing it with love ❤ )

Well, jobs in data science are multiple, and a data science project could not be done without one of them! Every hero has a mission and some roles to make the project done!

The list is long but it turns always around these 4 main roles, This is the list that I propose to you:

Data scientist: 

We talked about magic? Ladies and gentlmen let me introduce you the Magician! 😀

I like this definition about data scientists:

Person who is better at statistics than any software engineer and better at software engineering than any statistician.

So your role as a data scientist is to collect data, explore it and transform it into useful information. Data scientists work closely with clients, they need to be creative thinkers and propose innovative ways to look at problems by using data mining. What is data mining? 😮  it’s simply the process of transforming the messy data into information.

Data analyst:

I liked this sentence: “A data analyst translates numbers into plain Text” Which means that, a data analyst is someone who analyzes numbers related to a business, a market and transforms it into an information.

A data analyst needs to be a good “translator”,  he should make sense of numbers and represent them into graphs and text so any user can understand what is behind. And any decision maker can take his decision with confidence.

Data Engineer: 

Here it’s different! A data engineer does not explore data! He helps people to do that!

How? By implementing databases requirements, analyzing performance, and troubleshooting any existent issues.

Data Architect: 

He develops the data architecture to organize and maintain data. So he prepares the best environement for data to be stored.

A data architect creates the architecture and a data engineer  tests and maintains it to keep data accessible. Not a very big difference, but for sure there are some different details that only data engineers and data architects know about it.

What an employer want you to learn if you will apply one day for one of these jobs?

Let’s forget about employers that want you to learn everything and anything and to be a superhero in everything, Normal employers usually look for these skills in each job: 

  • Data scientist: R, Python, SQL, Data mining tools, Hadoop and Spark (Sometimes), an idea about NoSQL Databases…
  • Data analyst: R, Python, SQL, an idea about NoSQL databases, and… html, javascript
  • Data engineer: Hadoop, Spark, NoSQL Databases
  • Data Architect: Hive, Pig, Spark, SQL, and… XML, database architectures

 

I hope that you learned some new information…

The next article will be…

I don’t know You will discover that soon! stay tuned see you 😀

 

What is Data Science…

AAEAAQAAAAAAAAMEAAAAJGRiZGVlZGExLTY0NzItNGNhOS1hMjNjLWUyZmMyYzdhYWQ1Ng

Data is everything, this article is data, your tweets are data, your snaps are data, your photos posted in instagram are data, but also the places you visit are data, the things you bought from supermarket are data and your favorite drink is data.

Yes! everything is stored, everything is analysed and this is Data science. It’s about analysing any small and silly data about you, it’s about transforming this data into an information.

Data science is the art of collecting data, storing it, analysing it, and extracting useful information from it to help you make your life better or may be to make their lifes better.

I love Data science and I am aware of the importance of this field in the world but let me tell you also the sad side about data analytics.  I don’t think that you will be happy if you know that an other person somewhere in this world know everything about you and may be know things that you don’t.

Solution: Don’t post everything about your personnal life in social media. I know, you are happy and you want to share your best moments with your friends, but remember always that you are not only sharing that with your friends.

Well, Data science is good and powerful. Analysing data is magic and unfortuantely not all the magicians are good people!                                                                                                 So let’s make a commitemnt together that you will never manipulate data and exploit it for your own needs. Think about helping people and helping yourself at the same time. Think about solving problems… O:)

The question now is, what data scientists use to do this magic! It’s statistics. For those who hate maths please remember my first article, you are in love, and when we are in love with someone we love every little thing about him.

Statistics will be implemented in algorithms so you will really enjoy that! trust me 😀

So let’s recapitulate! Data science is about transforming data into information, how with statistics! Any use case to understand?

Yes of course! A simple one! I have a new product that I’ve just put in the market and I want to know if people like that or not and what do they really like and what they don’t! It will not be funny if I will ask people one by one! So, because Thanks GOD! people post everything in social media I will extract every tweet talking about my product automatically and I will have a statistical review about what do people think and then I will be able to make my business strategy according to the market!

I will maybe change the product if nobody liked it xD

I tried to explain a little bit but this is not enough! Stay tuned for other articles!

The next one I think will be about….. Data science Jobs… You will maybe choose one of them!

See you! 😀 :*

 

Why you should fall in love with Data Science!

Capture

Hey you! 😀
The first article and Heeere we are! 😀

I was thinking about, how to start, which technology should we talk about, which skill, which topic…
And then I said that before talking about anything… you should first of all:

FALL IN LOVE ❤ ❤ ❤

Yes  my dears, it’s all about love! and to fall in love you have to know why you should!
Well it’s only about three main reasons!

1- Because it’s magic!

Yes it is!
What is data science?
It’s about that big amount of data, extracted from everywhere.
It seems to have no sens, but with a super power called Statistics, you can reveal hidden information and take decisions that can help you to grow your business.
Isn’t MAGIC! grow your business and decide to start it or not with messy data!!

2- Because it has no limits!

Because data is everywhere and generated every second. Data science has no limits and will never never DIE!

3- Because everybody is here!

What ever your field is, Agriculture, Politics, Marketing, Health, Gaming, Telecommunications, anything else..
Data science is for YOU!
It’s the hugest field that helps all the others to make sens to their stored data and then… solve their problems!

If it’s the first time that you read something  about data science, you will may be don’t understand what I am talking about, but I think with these three reasons you can be excited to know more about it.
Don’t worry, the next article will explain what is Data science!See you! 😀