Data Cleaning :
Data Scientists can spend up to 80% of the time on a new project cleaning data .this is a huge difficult topic for teams .if you can show that you’re experienced at cleaning data, you’ll immediately be more valuable.to create a data cleaning project, find some messy data sets, and start cleaning.
If you’re working with python, pandas is a great library to use, and if you’re working with R, you can use the dplyr package. make sure you can showcase the following skills.
- Importing data
- joining multiple datasets
- Detecting missing values
- Detecting anomalies
- Imputing for missing values
- Data Quality assurance
2.Exploratory Data Analysis:
Another important aspect of data science is exploratory data analysis (EDA). This is the process of generating questions and investigating them with visualizations.EDA allows an analyst to draw conclusions from data to drive business impact .it might include interesting insights based on customer segments, or sales trends based on seasonal, or sales trends based on seasonal effects .often you can make interesting discoveries that weren’t initial considerations. some useful python libraries for exploratory analysis are pandas and matplotlib. for R users, the ggplo t2 package will be useful .an eda project should show the following skills
- Ability to formulate relevant questions for investigation
- Identifying trends
- Identifying covariation between variables
- Communicating results effectively using visualizations.
3.Interactive Data Visualizations :
Interacting data visualizations include tools such as dashboards.these tools are useful for both data science teams, as well as more business-oriented end users dashboard allow data science team to collaborate, and draw.insights together. even more important, they provide an interactive tool for business-oriented customers. These individuals focus on strategic goals rather than technical details after the deliverable for a data science project to a client will be in the form of a dashboard.
- Including metrics relevant to your customer’s needs
- Creating useful features
- A logical layout
- Creating an optimum refresh rate
- Generating reports
4.Machine learning :
A machine learning project is another important piece of your data science portfolio. Now before you run off and start building some deep learning project, take a step back for a minute .rather than building a complex machine learning model, linear regression and logistic regression are great to start with .these models are easier to interpret and communicate to upper-level management.
If you’re a python user, use the scikit -learn library. for R users, use the caret package. your machine learning project should convey the following skills.
- Reason why you choose to use a specific machine learning model.
- Splitting data into training
- Selecting data into training selecting the right evaluation metrics
- Feature engineering and selection
- Hyperparameter tuning.
5. Communication :
communication is an important aspect of data science. Effectively communicating results is what separates the good data scientists from the great ones. It doesn’t matter how fancy your model is, if you can’t explain it to teammates or customers, you won’t get their buy-in. Slides and notebooks are both great communication tools. Use one of your machine learning projects and put it into slide format. You could also use a Jupyter Notebook or RMarkdown file for a communication project.