These results are entirely consistent with how classification trees work. Simply rescaling each of the inputs by multiplying them with different coefficients should have no effect on the tree.
For exactly why this is, I'd recommend Breiman's book (which is referenced in the doc), but the short answer is that trees sort each predictor's observations and try a candidate split within each of the gaps. The tree will then select the split that gives the "best" splitting criterion (and that's an entirely different discussion). Scaling the predictor only serves to scale this process, but it doesn't fundamentally change the results.
As an example: suppose we have a simple set of obervations where the predictor has been measured at 1, 2, 4, and 10. The tree will try splits at 1.5, 3, and 7. Let's say that the "best" split is at 7.
Now we go ahead and rescale this input-- mulitply it by 100 or some other coefficient. Now, the tree tries splits at 150, 300, and 700, and it will still select the split at 700. Rescaling doesn't change anything.
Now, if we were to cleverly create _new_ predictors out of a well-chosen combination (linear or otherwise) of our existing predictors, then that certainly would change the tree's performance. For instance, make a 6th predictor in your X from Altman's coefficient's times your original X-- then you might get some interesting results.
I'm afriad that I don't understand what histograms will do for you in this case. One typically matches the "actual" outputs to the model's "predicted" outputs and compares the difference between them in some way to assess the model's performance. Confusion matrices, ROC curves, and other techniques are commonly used to do this. Histograms are not used because they can hide a lot of information. Consider this simple case of classifying data that can take values of either "1" or "2":
% The "actual" data:
Y = [1 2 1 2 2 1];
% The "predicted" data, arrived at through some model:
Y_Pred = [2 1 2 1 1 2];
Most would argue that this "model" is terrible: it has a 100% misclassification rate! In spite of this, hist(Y) and hist(Y_Pred) give the exact same plot.
That visualization concern aside, I think there is some confusion about how Bagging works. In this example, the "actual" ratings are Y, and every observation is used for training in some way. So, Y also represents the training ratings. One of the strengths of ensemble methods like Bagging is that it's not necessary to manually split the data into training and validation sets: you can have your proverbial cake and eat it, too. The out-of-bag errors in this case, though, have a special significance that is too much to explain here-- you should check the doc or (better yet) Breiman's original article for more details on that. Suffice to say, you should not use the OOB errors in the way that you seem to be using them here. If you're looking for the ensemble's predicted ratings, they are simply found by
Y_Pred = predict(b,X);
Philip: On the top of this page we list the required products to run this code. R2010b of MATLAB should be fine as long as you have all of the needed toolboxes as well.
Within this package is a README file that provides step-by-step instructions on how to define the data sources (which is what seems to be going wrong in your error message above), how to get a copy of the MCR, and when you might need to recompile the code. I'm always looking for ways to improve those instructions, so let me know where you find them lacking.
Kostas: You're basically correct. PRBYZERO quotes clean prices; to worry about the partial coupon period, you'd need to use a dirty price convention or (equivalently) calculate the accrued interest. The ACCRFRAC function is useful in this case, as is the (more robust) BONDBYZERO function within Financial Derivatives Toolbox.
You're also correct that our choice of clean or dirty prices doesn't really affect the result as long as we're consistent: if present, the accrued interests from the original bond prices and the simulated bond prices just end up cancelling each other out.
The webinar and scripts illustrate various ways to implement portfolio optimization under an assumption that you already have the data you need (gathering, managing, scrubbing, and forming total returns data can be a messy or easy process depending upon which data sources you might be able to access). The MATLAB Datafeed Toolbox has a good selection of sources although some sources require that you to have a license to obtain such data.
The BlueChipStocks file contains monthly total return data for 44 stocks, 1 market index, and 1 cash index. These data are in the variable Data with corresponding asset identifiers in the variable Asset and (month-end) dates in the variable Date. The last variable Map contains indicators that match the pattern of Data to identify which assets were in the universe on a given date.
If you have monthly total return data, you should be able to modify these four variables to suit your requirements and the scripts ought to work with minor modification (for example, you would have to specify the time period for backtests and so forth).
The question is how to create the file named bluechipstocks. Please help. I am stuck at this point and its useless I can not define my stocks and time period my choice. Any help will he highly appreciated. You can also send email at email@example.com