The importance of wind in smart grids with a large number of renewable energy resources is increasing. With the growing infrastructure of wind turbines and the availability of time-series data with high spatial and temporal resolution, the application of data mining techniques comes into play. The windML framework provides an easy-to-use access to wind data sources within the Python world, building upon numpy , scipy , sklearn , and matplotlib . As a machine learning module, it provides versatile tools for various learning tasks like time-series prediction, classification, clustering, dimensionality reduction, and related tasks.
For an installation guide, an overview of the architecture, and the functionalities of windML, please visit the Getting Started page. For a formal description of the applied techniques, see section techniques. The examples gallery illustrates the main functionalities.
In the following, we give a brief example of wind time-series forecasting based on K nearest neighbors (KNN) regression. For a further list of examples with plots, we refer to the examples page.
from windml.datasets.nrel import NREL from windml.mapping.power_mapping import PowerMapping from sklearn.neighbors import KNeighborsRegressor import math windpark = NREL().get_windpark(NREL.park_id['tehachapi'], 3, 2004, 2005) target = windpark.get_target() feature_window, horizon = 3, 3 mapping = PowerMapping() X = mapping.get_features_park(windpark, feature_window, horizon) Y = mapping.get_labels_turbine(target, feature_window, horizon) reg = KNeighborsRegressor(10, 'uniform') train_to, test_to = int(math.floor(len(X) * 0.5)), len(X) train_step, test_step = 5, 5 reg = reg.fit(X[0:train_to:train_step], Y[0:train_to:train_step]) y_hat = reg.predict(X[train_to:test_to:test_step])
The model predicts wind power exclusively based on past wind power measurements. For this task, one can formulate the prediction as regression problem examplary for a single turbine. The wind power measurement (pattern) is mapped to the power production at target time (label). For the regression model, we assume to have of such pattern label pairs that are basis of our training set and allow via a regression to predict the label for unknown patterns. It can be expected that the model yields better predictions, if more information of the times series is employed. For this reason, we extend the patterns with past measurements to . The implementation of this approach is called Power Mapping.
Furthermore, we test, if taking into account differences of measurements further improves the results. The absolute values and their differences result in patterns with a dimension of , see Power Diff Mapping. Most prediction tasks require the construction of a pattern which consists of wind power time series of turbines in the neighborhood of the target turbine. See the corresponding figure below. A wind park is defined by a target wind turbine and a certain radius . Wind power values can be aggregated to a single value or can seperately be used in the pattern vector.
The windML framework has initially been developed by the Computational Intelligence Group of the University in Oldenburg. The contributors are Nils André Treiber, Jendrik Poloczek, Oliver Kramer, Justin Philipp Heinermann, Fabian Gieseke. For questions and feedback contact us via email.
The windML framework is released under the open source BSD 3-clause license. The LICENSE file is available here.
|||(1, 2) Travis E. Oliphant (2007). Python for Scientific Computing. Computing in Science & Engineering 9, IEEE Soc., pp. 10-20.|
|||Hunter, J. D. (2007). Matplotlib: A 2D Graphics Environment. Computing in Science & Engineering 9, IEEE Soc., pp. 90-95.|
|||Pedregosa et al. (2011). Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research (JMLR) 12, pp. 2825-2830.|