The selected task was linked to the real world problem and linked to the company behind. The competitors have submitted research papers which can be downloaded in the previous chapter and the publication can help the development of the prediction systems in general.
It must be said that under the ”Equal information access policy” the results are excellent and we can document that these results are the ”worse as possible” because there was no interaction with the electricity distribution center and generally it is covering only one part of the development of the adaptive system. Generally we can state that for developing the intelligent adaptive system we need (if possible) 2 sources of information - one from data and one from the expert. The competition fulfilled the task to get information from data and there could possibly be some further improvements of the results if also expert knowledge could be incorporated.
From this point of view only the right part of building the intelligent system was investigated and the results of around 2 percent of error are very promising. The improvement can be under 2 percent if some additional information from the expert could be incorporated into the complex adaptive system.
Prediction is a process of extrapolation of unknown function. The problem is then addressed to identify the unknown function. Once you can approximate the assumed function with accepted accuracy. The prediction is a highly adaptive system which is able continuously to adapt and identify the system (approximate the function). A system which exhibits adaptability is said to be adaptive. According to the Haykin  biological systems are adaptive systems, animals for example can adapt to changes in their environment through a learning process. The basic concept of such an adaptive system is on the following figure (according to Mandic  .).
On this figure is presented the supervised learning. It consists of the following parts:
- a set of adjustable parameters (weights) within filter structure an error calculation block (difference between the desired output and filter output) and the structure of the filter can be linear e.g. finite impulse response filters (FIR) or infinite impulse response filters (IIR) or nonlinear such as Voltera filters, neural networks, fuzzy inference systems, SVM or some others based on feature to approximate the unknown function which has to be extrapolated.
- a control (learning) algorithm for the adaptation of the weights
So choosing the filter is the key issue to build the adaptive filter with prediction functionality (see more in Mantic ).
The basic approach is based on interactive search for appropriate filter design in the manner which is based on the similar principles as learning procedure for classification. The procedure was iterative and basically they were making approximation of the desired filter to be able to provide the best prediction results.
All the participants were using the typical schema for prediction.
The above figure can be described as follows:
The used scheme consists of the procedure block and a number of feedback loops. The overall approach is highly iterative and adaptive. The feedback loops are in operation if the results are not accepted and if there are repeated situations. When results of prediction are not sufficient then the loop is becoming deeper and deeper. If there is still no solution the feedback is influencing the change of inputs and change of filter design as well.
The overall process is adaptive with the aim to reach a situation when results are accepted.
A more detailed description of the particular blocks is the following :
- The Data Analysis (measurements) block - this part is based on statistical or empirical analysis of provided data including correlation analysis and determination of all data provided for prediction procedure. In this part it is necessary to make assumption about the length of data to be predicted and also needed for prediction. All the consideration has to be made with taking into account the fact that output and input are in relation and in fact there is a functional relation between output and input. Data analysis should focus to this problem including case dependent aspects. A priory knowledge can be generated from the data analysis part of the procedure.
Relating to the competition the most important problem of all participants was to find out the data primer analysis and following the problem of temperature influence into the load in general. All the participants were observing the de-seasoning procedure and majority of them took into consideration the temperature from the data. The impact of the temperature was observed in both years as it is seen on the following graphs.
The structure of the research strategy at majority of participants was to make 2 predictions as follows:
a) prediction of the average temperature for January 1999 in eastern Slovakia
b) prediction of load for 31 days of January 1999 in Eastern Slovakia.
Some of the attempts to get the proper data for temperature were simulated using data available on the Internet but there are only the major cities as Vienna, Budapest and some participants have studied the estimation using this information. Other approaches were based on predicting the temperature for expected days so in fact the task was doubled for temperature and load as well.
The holiday impact was also studied in a number of approaches and it has a significant influence to the load as well but it is clear that some further discussion with expert from electricity distribution center would help to increase the ability to predict the holiday days more precisely and with respect to the type of the holiday and unexpected events could change the load in the particular region (TV influence, sport events …..).
This part also includes some rule generation from data e.g. ”The lower temperature the higher electricity load” (see Esp ) or ”If there is a holiday on Monday then Load is smaller” (see Brockmann ) and some others.
A part of data analysis is the called data enrichment procedure where we are looking to enrich the data to get the best prediction results. In this problem it was some additional information about the illumination in prediction days (see Esp ) and many others.
Data analysis includes also data transformation for better prediction purposes. Usually this transformation is non-linear to achieve a new situation in the process description and being able to make better extrapolation of the function.
- Input & Output data determination - this part is based on above consideration and assumptions precisely define mainly the input to the filter (prediction) system. The output is usually given by the project and request for prediction results. From this respect the question ”How much history is needed to be able to predict the defined future ?” is interesting. The length of the ”history can be defined empirically or by means of of a linear adaptive filter as FIR or AFIR (see Sincak ). The main role of this block is to find the input output relation in the form of approximate-able function. The relation between output and time window input can be described on the enclosed figure.
Generally the prediction can be auto associative and hetero-associative. In auto associative approach we are doing usually time series prediction that means the input consists of the same type of data as outputs. In hetero-associative case it is not true that means in the inputs and output are completely different types of data.
The above shows the non-functional relation between A(t) and the inputs what means that inputs A(t-τ 1 ) and A(t- τ 2 ) cannot be sufficient for prediction of value A(t) .Therefore the precise determination of inputs for appropriate output is the crucial part of prediction procedure.
Relating to the competition the determination of inputs was various e.g.:
1. according to Lin  who uses 16 the inputs as follows :
- seven maximal loads for the past seven days
- seven binary attributes indicating the day in the week
- one binary attribute indicating the holiday or not
- one attribute with temperature (assumed)
output was 1 number as a predicted load and it was used in the following loads prediction. So he uses in fact recursive prediction.
2. according to Esp  he uses different inputs to predict the same task
- day of the year
- day of the week
- time of the day
- information about illumination
- indication if it is a holiday or not
output was 1 number as a predicted load. Interesting in this approach is that there was no load in the input so it was a heterogeneous prediction and fully different of the previous one.
3. according to Brockman  he uses an extremely simple approach and the following inputs to his prediction system as follows (if predicting load in i-th day in 1999:
- load from 97 from day (i+2) and load from 98 for day i+1 for days 2 to 5 and 7-31
- load from 97 for day (i) and load from 98 for day (i)
output was a single value for certain day in 1999.
4. according to Zivcak  he uses the inputs as follows
- type of day in the week (8 inputs)
- indication of day type (3 inputs) (holiday, working day..)
- 8 inputs with indication of maximum load in previous days
output are 31 values of the predicted max. load.
The input determination depends on user and also on previous step which was data analysis. Generally various approaches (inputs) can bring similar results of prediction quality.
- The filter identification is the most important part of the prediction approach and deals with the choice of techniques to make prediction. The general aim is to make a system which will be able to behave like an observed system. The prediction then can be tested on simulation of the behavior of this system. The foundations of linear predictors are in works of Yule (1927), Kolmogorov(1941) and also Wiener(1949). The later studies in Box and Jenkins (1970) were developing these ideas. Such approaches were very well established in braches as signal processing and defined as finite impulse response (FIR) filters or infinite impulse response filters (IIR). In statistical signal modeling the FIR approaches are identified as moving averages (MA) structures and IIR and autoregressive MA designed as ARMA models. The nonlinear version then developed in NARMA models and is widely used in many applications related to prediction problems.
The non-linear filters can be represented by various means which have ability to approximate the unknown function which are given by data. These technologies include a wide range of tools as neural networks, fuzzy systems, evolutionary programming, support vector machine tools and many combinations of them. The proper choice of these tools is data and case dependent and also user depended. It can be stated that similar results can be achieved on the same data by different tools.
Relating to the competition the use of these technologies as filter design was as follows :
- Lin  uses a support vector machine approach and machine learning tools. He makes recursive prediction so he uses predicted values to predict the next values
- Esp  uses an Adaptive linear network which is in fact modified neural networks with linear activation function and special approach in training. He makes hetero-associative training which means he has no loads in the input of the predictor.
- Brockman  uses the most simple averaging of certain days which he experimentally determined. He uses special simple averaging for 2 type of periods in January 1999
- Zivcak  uses the simple neural networks based on back propagation and appropriate learning for prediction of 31 values for January 1999
- Training and testing - this part is a very important part of the prediction scheme and fully depends on filter design. Simple filters with no learning abilities do not need this step and on the other hand the tools based on machine learning or computational intelligence need training and testing procedure.
This part includes also proper division of representative set into training and testing set. In this process the various types of changes are useful to test the system to be robust and ready to learn various situation represented in the data.
- Evaluation of results - the quality of solutions was measured by two error functions:
Mean average percentage error
With maximal error which can describe the stability of the system.
where and denote real and predicted values , respectively. Hence these 2 values are correlated it is useful to follow them both to avoid some situation which could lead to false accepted of results and not proper prediction tool.
Sometimes the first criteria is sufficient but there are some cases (e.g. Load forecast ) where it is extremely important to follow the maximum error which value could cause some damage or dangerous situation.
Conclusion of the comments :
The scientific comments to the competition are an attempt to make synthesis and theoretical and practical consideration about the best practice guideline for building prediction system. The general scheme was presented which MUST be used and a guideline for making a prediction system of any kind. The particular steps of these scheme are commented and could be a motivation for getting feedback for enhancing these comments to get a broader overview about the problem of building highly adaptive systems for prediction procedures.
 S. Haykin : Neural Network - Comprehensive foundation, MacMillan Press 1994
 D. Mandic and al.: Recurent Neural Networks for Prediction, Wiley, 2001
 P. Sincak and al.: FIR neural networks for prediction problems, EUNITE meeting, December 2001
 Chin-Jen Lin and al: Eunite Network Competition: Electricity Load Forecasting , Eunite meeting , December 2001
 D. Esp : Adaptive Logic Networks for East Slovakian Electrical Load Forecasting , Eunite meeting , December 2001
 W. Brockmann and al.: Different Models to Forecast Electricity Loads, Eunite meeting , December 2001
 D. Živèák : Electricity load forecasting using ANN, Eunite competition, December 2001