machine-learning-notes/week02.md at master · draachen/machine-learning-notes

Multiple features

Multiple features(variables)

Notation:
- n = number of features
- $x^{(i)}$ = input (features) of $i^{th}$ training example
- $x^{(i)}_j$ = value of feature $j$ in $i^{th}$ training example
Hypotheses for Multivariate linear regression $$ h_\theta(x) = \theta_0 + \theta_1 x_1 + \theta_2 x_2 + ... + \theta_n x_n $$

$$ --> h_\theta(x) = \theta^T x $$

Remark: for convenience of notation, define $x_0$ = 1

$x$ is (n+1) x 1 vector, $\theta$ is also (n+1) x 1 vector
Gradient descent for multiple variables
Gradient descent in practice

Feature Scaling

Idea: Make sure features are on a similar scale

Target: Get every feature into approximately a $-1\leq x_i \leq 1$ range

Mean normalization

Remark: do not apply to $x_0$ = 1

Learning rate
- If $\alpha$ is too small; slow convergence.
- If $\alpha$ too large; $J(\theta)$ may not decrease on every iteration; may not converge; slow convergence also possible.
Features and polynomial regression

$h_\theta(x) = \theta_0 + \theta_1(size) + \theta_x(size)^2$
Normal Equation

method to solve for $\theta$ analytically

$m$ examples $(x^{(1)}, y^{(1)}),...(x^{(m)}, y^{(m)})$; $n$ features

$\theta = (X^TX)^{-1}X^Ty$

Octave: $pinv (X'*X)*X'*y$

Normal equation and non-invertibility(option)

non-invertible usually caused by:
- Redundant features (linearly dependent)
- Too many features($e.g. m\leq n$)
  - delete some features, or use regularization

Octave tutorial

Basic operations/ Moving data around

1~=2 % not equal
1 && 0 %AND
1 || 0 %or
xor(0,1) %
a = 3
v = [1 2 3] %row vector
v = [1;2;3] %column vector
v = 1:0.1:2 %create a column vector, [1;1.1;1.2;...2]

help ones %display the help text for ones function
ones(2,3) %create a 2x3 matrixs, elements equals to 1
rand(3,3) %create a random 3x3 matrixs, elements less than 1
w = -6 + sqrt(10)*(rand(1,10))
hist(w)
hist(w, 5) %plot a histogram of the values in 5 bins
eye(4) %return a identity matrix
zeros(10,1) %return a 10x1 matrix whose elements are all 0

size(A) %return the number of rows and columns of A
size(X, 1) %return row of X
size(x, 2) %return column of X

[nr, nc] = size(A);
length(A) %for empty object:0, for scalars:1, for vectors: elements qty
		%for matrix: rows or columns qty, depends on which is larger
V + ones(length(V), 1);

data = load('featuresX.dat ');
X = data(:, 1:3);
y = data(:, 4);
m = length(y); 

A(2,:) %return 2nd row of matrix A
A(:,2) %return 2nd column of matrix A
A(:) %put all elements of A into a single vector
A(:, 3:5) = B(:,1:3) %replace columns 3 to 5 of A with columns 1 to 3 of B
A(:, 2: end) %return elements from 2nd to last column of A

C = [A A] %column doubled
C = [A,A] %column doubled
C = [A;A] %row doubled
A.*B %multiply corresponding elements of A,B
abs(A) %absolute elements

%max
max(A) %return max element of matrix A
[val, ind] = max(A, [], 1) %obtain max value of each column and its index, row vector format
[val, ind] = max(A, [], 2) %obtain max value of each row and its index, column vector format

pinv(A) % calculate inverse of matric 
A' %calculate transpose of matrix A

%Logical arrays
a = 1:10; b = 3; % create a vector a, and a scalar b
a == b; %return a vector same size of a, with ones at positions wjere the elements of a are equal to b, and zeros where they ar different

Computing on data

A = magic(3) %sum of elements in each row or column are equal
sum(A) %summarize each elements
prod(A) %multiply each elements
floor(A) %取整
max(A, [], 1) %obtain max value of each row
max(A, [], 2) %obtain max value of each column
max(max(A)) %obtain the max value of all elements
sum(A, 1) %obtain sum of column, get a row vector
sum(A, 2) %obtain sum of row, get a colum vector
flipud(eye(10)) %inner argument matrix should be a nxn matrix

Plotting data

t = [0:0.01:0.98];
y1 = sin(2*pi*4*t);
plot(t, y1)
hold on %displayed on a single graph
plot(t, y1, 'r') %display in red line
xlabel('time') %xlabel, ylabel, title, etc
legend('sin', 'cos') %display a legend for the current axes using the specified strings as labels
print -dpng 'myPlot.png' %save as picture
close %close figure
figure(1); plot(t, y1)

subplot(2,10,1) %create a 2x10 plot, current 1st plot activate
axis([0 1 1 2]) %change axis range, x from 0 to 1, y from 1 to 2
clf %clear figures

Control statement: for, while, if statements

v = zeros(10,1);
for i = 1: 10,
	v(i) = 2^i;
end 

i = 1;
while i<=5,
	v(i) = 100;
	i = i + 1;
end 

if i >= 3,
	xxx
else,
	xxx
end

fprint('now you have almost finished basic octave learning. \n');
pause; %suspend the execution for N seconds, e.g. pause (10)

Vectorial implementation
- can easily been solved in coding environments

Programming Exercise 1: Linear Regression

Part i: linear regression with one variable

ex1.m

load data

data = load('ex1data1.txt')  
X = data(:, 1); y = data(:, 2); %load data into variable X and y
m = length(y); %number of training example

plot data

  '''
  function plotData(x, y)
  figure; 
  plot(x, y, 'rx', MarkerSize', 10);
  ylabel('Profit in $10,000s');
  xlabel('Population of City in 10,000s');
  end
  '''
ployData(X, y)

cost

X = [ones(m, 1), data(:,1)]; %add a column of ones to x
theta = zeros(2,1); %initialize fitting parameter
iterations = 1500;
alpha = 0.01;

computing the cost $J(\theta)$

  '''
  function J = computeCost(X, y, theta)
  m = length(y);
  J = 1/(2*m) * sum((X * theta-y).^2);
  end
  '''
J = computeCost(X, y, theta)

Gradient descent

  '''
  function theta = gradientDescent(X, y, theta, alpha, num_iters)
  m = length(y);
  J_history = zeros(num_iters, 1);
  for iter = 1:num_iters
      a = 1/m*sum(X(:,1)'(X*theta-y));
  	  b = 1/m*sum(X(:,2)'*(X*theta-y));
      delta = [a;b];
      theta = theta - alpha*delta;
      J_history(iter) = computeCost(X, y, theta);
  end
  
  end 
  '''
theta = gradientDescent(X, y, theta, alpha, iterations); %run gradient descent

Plot the linear fit

hold on; % keep previous plot visible
plot(X(:,2), X*theta, '-')
legend('Training data', 'Linear regression')
hold off % don't overlay any more plots on this figure

Predict values for population sizes of 35,000

predict1 = [1, 3.5] *theta;
fprintf('For population = 35,000, we predict a profit of %f\n',...
    predict1*10000);

Visualize $J(\theta)$

% Grid over which we will calculate J
theta0_vals = linspace(-10, 10, 100);
theta1_vals = linspace(-1, 4, 100);

% initialize J_vals to a matrix of 0's
J_vals = zeros(length(theta0_vals), length(theta1_vals));

% Fill out J_vals
for i = 1:length(theta0_vals)
    for j = 1:length(theta1_vals)
	  t = [theta0_vals(i); theta1_vals(j)];
	  J_vals(i,j) = computeCost(X, y, t);
    end
end

% Because of the way meshgrids work in the surf command, we need to
% transpose J_vals before calling surf, or else the axes will be flipped
J_vals = J_vals';
% Surface plot
figure;
surf(theta0_vals, theta1_vals, J_vals)
xlabel('\theta_0'); ylabel('\theta_1');

% Contour plot
figure;
% Plot J_vals as 15 contours spaced logarithmically between 0.01 and 100
contour(theta0_vals, theta1_vals, J_vals, logspace(-2, 3, 20))
xlabel('\theta_0'); ylabel('\theta_1');
hold on;
plot(theta(1), theta(2), 'rx', 'MarkerSize', 10, 'LineWidth', 2);

Part ii: linear regression with multiple variables

n.a.

Changelog

17.10.02 detail octave tutorial
17.09.10 init create @draachen

Reference

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiple features

Octave tutorial

Programming Exercise 1: Linear Regression

Changelog

FilesExpand file tree

week02.md

Latest commit

History

week02.md

File metadata and controls

Multiple features

Octave tutorial

Programming Exercise 1: Linear Regression

Changelog