Fast Optimization Methods for L1 Regularization: A ...
[Pages:4]Fast Optimization Methods for L1 Regularization: A Comparative Study and Two
New Approaches Suplemental Material
Mark Schmidt 1, Glenn Fung2, Romer Rosales2
1 Department of Computer Science University of British Columbia, 2 IKM, Siemens Medical Solutions, USA
1 Smooth L1-norm aproximation
In order to deal with the non-differentiable penalty, we propose a smooth approximation to the L1 penalty based on the following:
(i) |x| = (x)+ + (-x)+, where the plus function is (x)+ = max {x, 0} (ii) The plus function can be approximated (smoothly)[2], by the integral to a
smooth approximation of the sigmoid function:
(x)+
p(x,
)
=
x
+
1
log(1
+
exp(-x))
(1)
Combinining these, we arrive to the following smooth approximation for the absolute value function consisting of the sum of the integral of two sigmoid functions (Fig. 1 plots this approximation near 0 for different values of ):
|x| = (x)+ + (-x)+ p(x, ) + p(-x, )
=
1
[log(1 + exp(-x)) + log(1
+ exp(x))]
(2)
= |x|
In practice, = 106 yields results that are within some small tolerance of the results produced by (optimal) constrained optimization methods. As opposed to the L1-penalty, this approximation is amenable to standard unconstrained optimization methods since it is twice-differentiable:
(|x|) (1 + exp(-x))-1 - (1 + exp(x))-1
(3)
2(|x|) 2 exp(x)/(1 + exp(x))2
(4)
This approximation can be used in conjuction with any general likelihood or loss functions. Next we will show that for optimization problems derived from learning methods with L1 regularization, the solutions of the smooth approximated problems approach the solution to the original problems when approaches infinity. We will start by proposing and proving a simple lemma similar to one proposed in [1] for the plus function. This gives us a bound relating |x| and |x|.
2
0.3
10000
1000
100
0.25
10
0.2
0.15
0.1
0.05
0
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
1.8
1
1.6
10
100
1000
1.4
10000
1.2
1
0.8
0.6
0.4
0.2
0 -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1
Fig. 1. Approximation near 0 for different settings of the parameter
3
Lemma 11 Approximation bound For any x and for any > 0:
||x|
-
|x||
2
log
2
(5)
Proof: Lets consider two cases. For x > 0,
p(x,
)
-
(x)+
=
x
+
1
log(1
+
exp(-x))
-
x
=
1
log(1
+
exp(-x))
log 2
(6)
For x 0,
0
p(x, )
-
(x)+
=
p(x, )
p(0, )
=
log
2
,
(7)
where the last inequality derives from the fact that p is a monotonically in-
cresing function. Hence, from equations (6) and (7) and because p(x, ) always
dominates (x)+, we conclude that:
|p(x, )
-
(x)+|
log 2
(8)
Since |x| = (x)+ + (-x)+, using equation (1) we have:
||x| - |x|| = |p(x, ) + p(-x, ) - ((x)+ + (-x)+)|
|p(x, ) - (x)+| + |p(-x, ) - (-x)+|
(9)
log 2
+
log 2
=
2
log
2
.2
Let us now define x (1,) as a smooth approximation to the 1-norm function
x 1 for a vector x n in the following way: x (1,) =
n i
|xi|.
Then, using Lemma 11 we have that
x (1,) -
x1
2n
log 2
(10)
Hence, we can conclude that:
lim x (1,) = x 1 x n
(11)
Letting L : n be any continuous loss function and defining f (x) = L(x) +
x 1 and f(x) = L(x) + x (1,). Let us also define x? = arg minx f (x) and x? = arg minx f(x). By the definition of f and f and by equation (11), it
follows that:
lim f(x) = f (x) x n
(12)
In adition, we know that f (x?) f (x), x. In particular f (x?) f (x?), then:
f (x?) f (x?) = L(x?) + x? 1 = L(x?) + x? 1 + x? (1,) - x? (1,) = L(x?) + x? (1,) + x? 1 - x? (1,) = f(x?) + x? 1 - x? (1,)
(13)
4
This
implies
f (x?) - f(x?)
-2n
log
2
(using
equation
(10)).
Similarly,
we
can
prove
that
f (x?)
-
f(x?)
2n
log
2
,
hence:
lim
f(x?)
=
f (x?).
Furthermore:
|f (x?) - f (x?)| = |f (x?) - f (x?) - f(x?) + f(x?)| |f (x?) - f(x?)| + |f(x?) - f (x?)|
(14)
This implies that: lim f (x?) = f (x?). Moreover, if L is stricly convex, it is easy to proove that: lim x = x?
References
1. Chunhui Chen and O. L. Mangasarian. A class of smoothing functions for nonlinear and mixed complementarity problems. Computational Optimization and Applications, 5(2):97?138, 1996.
2. Y.-J. Lee and O. L. Mangasarian. SSVM: A smooth support vector machine. Comp. Opt. and App., 20:5?22, 2001.
................
................
In order to avoid copyright disputes, this page is only a partial summary.
To fulfill the demand for quickly locating and searching documents.
It is intelligent file search solution for home and business.
Related download
- chapter8 logarithmsandexponentials log x
- fall17 hw04 — semilog and double log plots
- topic logarithms de nition the logarithm base b of x is
- logarithm and exponents 2 solving equations
- maths genie free online gcse and a level maths revision
- 1 1 e log2 p d log p log q 2x log x o x v x z
- logarithm questions q1
- fast optimization methods for l1 regularization a
- laws of logarithms and logarithmic
- 1 logarithmic equations logs type 1 solve log2 x
Related searches
- methods for effective teaching pdf
- best study methods for exams
- study methods for students
- best study methods for tests
- methods for analyzing qualitative data
- teaching methods for adult learners
- best study methods for college
- financing methods for new businesses
- effective training methods for adults
- studying methods for college students
- studying methods for students
- methods for teaching social studies