THRESHOLDING

Image binarization can be used to remove the unwanted information from a mammogram image data set. This was something that would be easy to implement and test for us. To do this we simply set a threshold arbitrarily to get rid of all of the image besides the breast (background information). We first used histogram equalization to further enhance the image, and bring out the white masses inside the breast tissue. Then we set a threshold on the image and replotted the photo with only the breast tissue information. After running these images through the classifiers in MATLAB there was no improvement on classification accuracy and this processing actually made the classifiers perform worse than before.

Other Attempts: Text

IMAGE BINARIZATION (THRESHOLDING) EXAMPLE

Screen Shot 2019-04-16 at 6.33.52 PM.png

Unprocessed

Screen Shot 2019-04-16 at 7.06.43 PM.png

Histogram equalization

Screen Shot 2019-04-16 at 7.05.10 PM.png

Binarized

Screen Shot 2019-04-16 at 7.12.52 PM.png

Histogram equalized and binarized to remove background

Other Attempts: Gallery

ADDING GLCM FEATURES

We attempted to add a number of GLCM features to our model, including correlation, entropy, autocorrelation, difference variance end entropy, and more. As a result, the accuracy of our model was drastically reduced, possibly because of overfitting. Since the five GLCM features we originally used were confirmed to work well with tree classifiers for mammograms by Vasantha et al, we decided to just use those five.

Other Attempts: Text

BANDPASSING

Due to the middle Fourier coefficients giving us an increased accuracy, we thought that bandpassing the images could yield in a better algorithm. Unfortunately, this did not occur, as it was likely removing other crucial information that our other features were being determined from.

Other Attempts: Text

NORMALIZATION

In machine learning, the range of each feature is often normalized, because many classifiers calculate the Euclidean distance between points. If a particular feature has a large spread or variance, it may dominate the machine learning algorithm. When we normalized the features, however, the two classifiers that give the best results (tree and ensemble) lost accuracy and the others did not improve. This means that the features with the highest variance may be the most predictive, so we've decided not to normalize the data.

Other Attempts: Text

INDEPENDENT COMPONENT ANALYSIS

Independent Component Analysis (ICA) is a technique used to separate a multivariate data into subcomponents. The subcomponents made by the ICA are non-Gaussian and statistically independent of one another. ICA constructs a linear non-orthogonal basis in the multivariate data used. The directions of the axes are determined by first, seconds and higher order statistics. The difficulty with ICA is that it is not orthogonal and utilizes independent components, so there is no guarantee that the features extracted are useful. This is why we struggled using ICA to extract helpful features from the images. The maximum accuracy we saw after using the ICA features was about 70%.

Other Attempts: Text