Weather highly involves life-and-death matters. Missions in weather science and prediction is not merely to help us understand the environment and its changing nature, such as global warming, but also to devise proactive strategies into improving preparedness in disaster, mitigating economic and lives’ losses (*1) and enhancing the overall well-being of citizens.
With these missions, both weather and data scientists devote enormous effort into gathering physical, live and historical data (*2) in realms of meteorological statistics, trying to use ML to capture the weather patterns in advance for preventive purpose (*6).
Datacube won the Excellence award by offering weather solutions to one of mainland municipal government in the year of 2022 (*10). With the above experience and other references, we will discuss some of practical processes from data scientists’ point of view in the following bullet-point content by STAGES:
What is The Role of Data scientists during various stages
Pre-stage –
- Identify which situations should be measured, eg. rain, snow, hail, flood (*3, 6)
- Identify clearly the definitions of level of features, eg. “moderate”, “heavy”, “severe” (*3)
- Identify the most probable stricken regions, eg., coasts under tropical cyclones; transportation hubs (*4); agricultural farm under drought (*1), health issue like Asthma (*4, 6, 7), forest fire (*7)
- Identify the horizons of stricken regions (*geospatial consideration), eg. 10km / 50km (*4)
- Identify time intervals, stratification, which are clearly recorded, eg. seasonal, monthly, weekly, hourly (*8), recycling data in minutes (*multivariate time series) (*5, 6)
- Identify which sets of data (*physical behaviors of weather) should be crucially collected, eg. including carbon dioxide, arctic sea ice, mountain glaciers, ocean heat, sea levels, spring snow, surface temperatures, incoming sunlight, humidity, air quality (*1, 4)
- Identify / select impactful features / variables, while screen out those irrelevant (*2)
- Avoid recruiting too many variables in the beginning, make it less reductive in forecast errors (*8)
- Then, make sure the datasets are unambiguously labelled and partitioned as target classes (*5)
- Deploy enough nodes / sensors / buoys / weather balloons / barometers / thermometers / satellites as ambient network to collect sufficient data and increase model’s accuracy (*3, 7, 8)
Preparation Stage –
- Liaise with model providers, eg. AccuWeather, MetOffice, OpenWeatherMap, HARP, NOAA (*1, 5)
- Take reference with existing up-to-dated weather models, eg. ECMWF, GFS, UM, WRF (*2, 8, 9)
- Import datasets into Numpy / Geopandas of Python to help identify the pattern (*4)
- Take reference to other sources of datasets from third parties, eg. SSW Latest Events and Hinode Flare Catalog as auxiliary data sources, if missing or null values are vast (*5)
Action Stage –
- Make sure contents of data are with high quality, well defined and consistent for better forecast (*5, 8)
- Check if variable is truly targeted (*2)
- Check if datasets are missing or with any anomalies (*2)
- Guide ML to categorize target variables into classification (*discrete) and regression (*continuous) in light of requirement of prediction (*5)
- Secondary sets of data from satellites and sensors would be beneficial to patch missing values as addendum or to validate the performance of forecasting (*5, 8)
- Check if complementary datasets should be added to improve model build-up (*2)
- Check if small change of atmosphere would make the model less predictable (*3)
- Identify constantly changing atmospheres whose factors would complicate the prediction (*7)
- Apply artificial neural networks (*ANNs) to aid building algorithms (*8, 9)
- Test and compare datasets with common benchmark models (*2)
- In comparison of time parameters, make sure it is either cross-sectional or point-in-time (*5)
- Split a mass set of data to predict different scenario to predict outcome (*3)
- Pinpoint if any types of datasets that are not helpful in forecasting predictions (*3)
- Extract historical data to track long-term climate patterns and in-between correlation (*1, 4)
- Consider to synthase huge amounts of weather information everyday (*3)
- Compare with satellite’s imagery to aid short-term forecast or pointed prediction (*7, 8), by which we could decide to validate our existing model or not (*6)
- Continuously record live data, it is helpful to boost accurate weather forecast (*7)
- Input datasets into ML engine to train data and generate various models
- Transform tedious data and visualize them into maps, graphics and visuals
- Analyze vast streams of real-time weather data, spot trends, draw conclusions, make accurate and timely weather forecasts, and make data-driven predictions (*1)
- To draw insight from these visuals, eg. help identify any risks amongst regions under severe climate impact, and offer them to officials
Build up Teams to Help –
- ML could help improve physically grounded models. By the help of physical models and insights from measured data, data scientists could get more accurate predictions and results (*6, 9)
- Take into consideration that weather changes are physical phenomena and restricted by physical laws, but not the case to machine learning in extreme cases, thus errors are generated from the model, and penalize forecast of models (*3, 8, 9)
- Compare proximate sensors and its datasets from same regions to detect hidden errors, thus, to minimize errors and keep the datasets unbiased (*8)
- Formulate a domain expert team is helpful to cope with meteorology, data issue and technical issues, and could therefore generate insight from different perspectives (*3, 9)
- Inform and increase understanding of physical weather phenomena and reach mutual recognition of the definitions and terms (*5)
- Working with weather scientists, data scientists have to decide which sampling method(*s) should be adopted, eg. under-sampling or over-sampling (*5)
- Take reference to outside algorithms, such as backpropagation algorithm, eg. minimizing the mean squared error (*MSE) between the predicted and actual values of the target variable (*8)
- By building up effective structure and parameter estimation methodologies and algorithms, data scientists could be able to discover weather patterns, minimize errors and enhance forecast in warning system (*8)
- Weather and data scientists have to define and reach consensus that what levels of predictive accuracy are acceptable, to make sure the model(*s) are effectively and practically applicable to real life (*8)
Post Stage –
- Weather forecast models could simulate weather processes with numerical solutions, but fail to describe it accurately and physically. When numerical errors are found out, it needs to be removed for reliable physical predictions immediately (*8)
- Frequently retrain the model, to make sure weather trends are always captured (*2)
- When high error in the model is removed, additional variables could be suggested to input to test and enhance accuracy (*8)
- Compare models’ predictiveness with outside advanced-built models, eg. Global Forecasting System (*GFS) by US National Aeronautics and Space Administration (*NCAR) (*8, 9)
- Enable ML to learn and adapt itself without manual programming, so that ML might be more able to identify patterns and generalize knowledge to make more accurate predictions while human is unseen those areas (*8)
- Focus on quality of predictions rather than scores from models’ dashboard (*9)
- Keep asking question “Are our ML-based models producing physically consistent and meteorologically meaningful forecasts?” (*9)
- Raise out insight to the government officials for further actions
- Keep obtaining weather data and get support from data provider or funds, as it is generally quite difficult and costly (*2)
- Deploy more sensors to collect data in the future, enhancing quantity and quality of datasets
Further Reference Readings (*):
- https://www.heavy.ai/blog/how-big-data-on-weather-patterns-can-help-us-respond-to-the-climate-crisis
- https://www.analyticsvidhya.com/blog/2023/07/machine-learning-models/
- https://theconversation.com/ai-and-machine-learning-are-improving-weather-forecasts-but-they-wont-replace-human-experts-182498
- https://www.analyticsinsight.net/7-ways-to-harness-the-power-of-a-weather-api-for-data-science/
- https://www.nature.com/articles/s41597-020-0548-x
- https://data-flair.training/blogs/data-science-for-weather-prediction/
- https://www.nobledesktop.com/classes-near-me/blog/data-analytics-for-weather-forecasting
- https://www.mdpi.com/2673-4931/26/1/49
- https://www.ecmwf.int/en/about/media-centre/science-blog/2023/rise-machine-learning-weather-forecasting
- https://www.datacube.hk/news/datacube-won-the-excellence-award-of-the-data-innovative-application-competition-by-weather-city-operation-and-management-solution-in-baiyun-guangzhou-2022/
Newsletter of Award winning on Datacube:
About the Our Capability in Machine Learning:
https://www.datacube.hk/aimanager/https://www.datacube.hk/aibook/
#Datacube #Big_data #data_management #weather_science #weather_forecast #predictive_weather_model #climate_change #machinelearning #government_sectors #disaster_prevention