Feature Engineering is an important term in data science and machine learning. Data scientists spent 80% of time in processing tasks of Feature Engineering, and 20% in training machine learning (ML) (*3). To elaborate, it is crucial processes of selecting, transforming, extracting, combining, and manipulating raw data to generate the desired variables for analysis or predictive modeling (*2).
By building up an accurate predictive model, we could apply it to predict peculiar business trend, such as consumer’s behavior of repeat buying in e-Commerce, etc.(*4), thus the company could be able to grip a clearer picture of market, make more precise decisions, and make more profit. There are foundational 5 steps to help decision-makers easily know more about what data scientists would render their help (*1 and *2), these are: Data Cleansing, Data Transformation, Feature Extraction, Feature Selection, Feature Iteration.
Feature creation
Raw data could be stored into different formats, such as images, text files, videos, photos. In the very first beginning of data labeling, we need to make these kinds of data identifiable.
Data Cleansing
To cope with the present complex business environment, some companies must have stored various kinds of set of data for long years. To delete irrelevant data (outlier) or modifying data is primitive action, making the data more readable and valuable to model.
Data Transformation
From the experience of Datacube’s clients, they are normally troubled by data stored in different systems and inconsistent forms for long years (*1), without any effort of consolidation by data expert before. Thus in next step, we must standardize or convert the data set into uniform format, such as from categorical variables to numerical ones, so that we could fully utilize these valuable data under one umbrella to help business.
Feature Selection and Extraction
Factors or variables could be vast, raw and abstract or even confusing. In this process, data scientists could apply statistical and analytic techniques to help you group multiple and target variables into one feature, ending up identifying several numbers of prominent features in the model ultimately.
Feature Iteration
Features could be extracted and grouped into multiple sub-sets. By applying these subsets and wrapper method in running ML algorithm, model performance become measurable by scores in ranks and visualized for insight (*1). Thus, we could further add, remove or retain the selected features which really help boost model prediction.
After knowing about the processes of feature engineering, some might consider it purely technical and should leave it to the hands of data scientists, but it is not. As a boss, decision-makers, management, you could actually be empowered to take some parts in data management, let us discuss more in Part 2.
Further Readings (*):
- https://aws.amazon.com/what-is/feature-engineering/
- https://corporatefinanceinstitute.com/resources/data-science/feature-engineering/
- https://www.youtube.com/watch?v=DkLQtGqQedo
- https://www.researchgate.net/publication/366279094_A_Feature_Engineering_and_Ensemble_Learning_Based_Approach_for_Repeated_Buyers_Prediction
About the Our Capability in Machine Learning:
https://www.datacube.hk/aibook/
#Big_data #data_management #feature_engineering #artificial_intelligence #predictive_model #AIBook #machinelearning #decision_maker