Procedings of the IASTED International Conference SIGNAL PROCESSING AND COMMUNICATIONS
 September19-22,2000, Marbella, Spain


Discussion of Simple Algorithms and Methods  to Separate Nonstationary Signals.


Ali MANSOUR and Noboru OHNISHI

Bio-Mimetic Control Research Center (RIKEN),
2271-130, Anagahora, Shimoshidami, Moriyama-ku, Nagoya 463 (JAPAN)


abstract
In the last decade, many researchers have investigated the blind separation of sources and
many algorithms have been proposed to solve this problem for the case of an instantaneous mixture
(memoryless mixture) \cite{mansour-ieice-2000}.

In general, high-order statistics (i.e., fourth order) are used. However, it has been shown that
algorithms and criteria can be simplified by adding special assumptions \cite{jutten-intsymp-95}.

In this paper, we outline the investigation of the separation of nonstationary signals using only
second-order statistics. For the case of independent nonstationary (at least using second-order statistics)
 sources such speech signals where the power of the signals is considered time variant, we prove, using
 geometrical information,  that the decorrelation of the output signals at any time leads to the separation
 of the independent sources. In other words, for these kinds of sources, any algorithm can separate the sources
 if at the convergence of this algorithm the covariance matrix of the output signals becomes a diagonal matrix at
  any time. Finally, some  algorithms are proposed and the experimental results are discussed and shown.


keywords:  Decorrelation, Second-order Statistics, Whiteness, Blind separation of sources,
 Natural gradient, Kull-back divergence, Hadamard inequality, Jacobi Diagonalization,
  Cyclic Jacobi Diagonalization, Joint Diagonalization.


\end{abstract}
 \pagebreak
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Introduction}

The blind separation of sources is a recent and important problem in the signal processing field. It involves retrieving unknown sources of unknown mixtures from observation using multisensors. The authors maintain two fundamental assumptions \cite{jutten-intsymp-95}.

\begin{itemize}
\item {\bf H1}: The sources are unknown and statistically independent from each other.
\item {\bf H2}: The channel model is known: as instantaneous  (or memoryless) \cite{cardoso-sp-90,mansour-ieee-95,puntonet-gretsi-95,puntonet-sp-95}, convolutive \cite{nguyen-sp-95}, or non-linear mixture \cite{krob-gretsi-93,taleb-ica-99}.
\end{itemize}

\noindent
 For the instantaneous mixture, one must assume that the mixture matrix ${\bf M}$ is a full-rank non-singular matrix \cite{comon-spie-89,amari-neural-95}. For the other kinds of mixtures, the authors maintain similar assumptions. For the instantaneous mixture, many algorithms have been proposed by different researchers \cite{delfosse-sp-95,macchi-digital-93,mansour-ieee-99,mansour-ssap-2000}. All of these algorithms are based on high-order statistics and in most cases fourth-order cumulants or moments are used.\\

\noindent
After further assumptions \cite{belouchrani-eusipco-94,gamboa-ieee-97}, researchers proposed algorithms and criteria based solely on second-order statistics, for example, those concerning the subspace properties of the channel \cite{gorokhov-ieee-97,mansour-ieee-2000}, the correlation properties of the sources (i.e.,  the samples of each source are correlated)  \cite{fety-phd-88,amari-ica-99}, or the nonstationary properties of the sources \cite{matsuoka-nn-95,kawamoto-ieice-97}. \\

\pagebreak
In this paper, we assume the following {\bf H3}: the sources are
independent nonstationary at least for second-order statistics such speech signals where the power of the signals can be considered time variant. Our first goal is to prove, using geometrical information, that
for such signals, the decorrelation of the output signals at any
time implies the separation of the sources. Therefore, the separation
of nonstationary signals is possible using only second-order
statistics. Finally, simple algorithms for speech or music
signals and the performances are also discussed.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Channel Model}

Let $X(n)$ be a $p \times 1$ zero-mean random vector denoting the  source vector at time $n$. Let $Y(n)$ denote the observed (or mixture) signals (see Fig.~\ref{mod}) at time $n$. According to the instantaneous model,
 \begin{equation}
Y(n) = {\bf M}  \ \ X(n),
\label{model}
\end{equation}
\noindent
where ${\bf M}=(m_{ij})$ is a $p \times p$ full-rank (non-singular) matrix which represents the unknown mixture.\\
\begin{figure}
\centerline{\psfig{figure=model.eps,width=8cm,height=2cm}}
\caption{Mixture Model.}
\label{mod}
\end{figure}

\noindent
Let ${\bf W} = (w_{ij})$ denote the $p \times p$ weight matrix. The estimated sources are given by
\begin{equation}
S(n) = {\bf W}  \ \  Y(n)  = {\bf W M}  \ \  X(n) = {\bf G}  \ \  X(n),
\label{est}
\end{equation}
where ${\bf G = W M }$ is the global matrix. It is obvious that by only using  the source independence assumption and  model (\ref{model}), we cannot exactly retrieve the sources ($S(n) \neq X(n)$). Generally, we can separate the sources up to a permutation and scales \cite{comon-sp-94}. The separation is considered to be achieved when  the global matrix becomes
\begin{equation}
{\bf G} = {\bf W M} = {\bf P \Delta},
\label{gpd}
\end{equation}
where ${\bf P}$ is any $p \times p$ permutation matrix and $\Delta$ is any $p \times p$ diagonal full-rank matrix.



%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Decorrelation and Separation}

In this section, it is proved  that one can separate nonstationary signals using only the second-order statistics of the estimated signals (i.e.,  the decorrelation  of the covariance matrix of the output signals). To simplify this idea and to explain the geometrical solutions of this problem, let us  first consider the case of two sensors and two sources.
%%%%%%%%%%%%%%%%%%%%%% relation
\subsection{First Case: Two Sources}

In this subsection, we consider that there are two sensors and two sources (i.e.,  $p = 2$). In the previous section, it was mentioned that the separation is achieved when the global matrix becomes the product of any permutation matrix and any non-singular diagonal matrix, as in (\ref{gpd}), thus one can use the value of $w_{ii} = 1$ without any loss of generality. Using (\ref{gpd}), the global matrix can be rewritten as
 \begin{equation}
{\bf G} = \left( \begin{array}{cc}
m_{11} + m_{21} w_{12} & m_{12} + m_{22} w_{12} \\
m_{21} + m_{11} w_{21} & m_{12} w_{21} + m_{22}
\end{array}
 \right ).
\label{glob}
\end{equation}
Supposing that one can achieve decorrelation of the output signals $S(n)$ and using assumption {\bf H1}, it is possible to prove that the coefficients of the weight matrix satisfy the following condition:
\begin{eqnarray}
\lefteqn{\mbox{E}\{ s_1(n)  \ \ s_2(n) \} = 0 \Longrightarrow}  \nonumber \\
 && (m_{11} + m_{21} w_{12}) ( m_{21} + m_{11} w_{21} ) P_1+ \nonumber \\
&& \ \ \ (m_{21} + m_{11} w_{21})(m_{12} w_{21} + m_{22}) P_2 = 0, \label{deco2}
\end{eqnarray}
where E\{$x(n)$\} is the expectation of $x(n)$ and $P_i= \mbox{E}\{x_i^2(n)\}$ is the power of the i-th source $x_i(n)$. When the sources are stationary then the powers $P_i$ stet constant. In this case, condition (\ref{deco2}) is the equation of a hyperbola. At the convergence, the point ($w_{12}, w_{21}$) can be any point on the hyperbola. Therefore, separation cannot be achieved by  using  only second-order~statistics.\\

In the general case, using assumptions {\bf H1} and {\bf
H2}, one can also assume hereafter the following {\bf H4}:
the ratio of two signal powers $P_i$ is also time variant (the
two powers $P_i$ cannot have a linear relationship).
Since condition (\ref{deco2}) must be satisfied
for any value of $P_i > 0$, the weight matrix
coefficients must satisfy the following conditions:
\begin{eqnarray}
 (m_{11} + m_{21} w_{12}) ( m_{21} + m_{11} w_{21} ) & = & 0, \label{cond1} \\
\nonumber \\
(m_{21} + m_{11} w_{21})(m_{12} w_{21} + m_{22}) & = & 0. \label{cond2}
\end{eqnarray}
The solutions of equations (\ref{cond1}) and (\ref{cond2}) must be considered for the following three cases
\begin{itemize}
\item The coefficients of the mixture matrix  are nonzero ($m_{ij} \neq 0$). Using equations (\ref{cond1}) and (\ref{cond2}), the coefficient $W_{ij}$ can be evaluated as
\begin{eqnarray}
w_{12}  =  - \frac{m_{11}}{m_{21}} & \mbox{ and } & w_{21} = - \frac{m_{22}}{m_{12}}, \label{sol1} \\
\nonumber \\
&  \mbox{ Or } & \nonumber \\
\nonumber \\
w_{12}  =  - \frac{m_{12}}{m_{22}} & \mbox{ and } & w_{21} = - \frac{m_{21}}{m_{11}}.  \label{sol2}
\end{eqnarray}
In both (\ref{sol1}) and (\ref{sol2}), the separation of sources can be achieved (i.e.,  the global matrix $ {\bf G}$ satisfies equation (\ref{gpd})).

\item One coefficient of the mixture matrix is equal to zero (for example $m_{11} =0$). Using (\ref{cond1}) and (\ref{cond2}), we can write
\begin{equation}
w_{12} = 0 \mbox{ and } w_{21} = - \frac{m_{22}}{m_{12}}. \label{sol3}
\end{equation}
In this case separation is also achieved.

\item If more than one coefficient of the mixture matrix are equal to zero then  ${\bf M}$ will become a permutation matrix, under the assumption that ${\bf M}$ is a full-rank nonsingular matrix. In this case, there is no mixture problem.
\end{itemize}

\begin{figure}
\centerline{\begin{picture}(200,200)
\put(0,0){\hbox{\psfig{figure=/home/mansour/Decor/sol.eps,width=7cm,height=7cm}}}
\put(180,110){$w_{12}$}
\put(90,180){$w_{21}$}
\end{picture}}
\caption{A set of hyperbolas, with the same mixing matrix and different stationary sources.}
\label{sol}
\end{figure}


Figure~\ref{sol} shows hyperbolas corresponding to the solutions of equation (\ref{deco2}) for mixing matrix \mbox{${\bf M}=\left ( \begin{array}{cc} 4 & -1 \\ 2 & 1 \end{array} \right )$} and different stationary sources. All of the hyperbolas have two intersection points corresponding to (\ref{sol1}) and (\ref{sol2}).
%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{General Case}
Let ${\bf \Lambda}$ denote the covariance matrix of the sources. Using assumption {\bf H1}, we can deduce that ${\bf \Lambda}$ is a diagonal matrix, ${\bf \Lambda} = \mbox{diag}(P_1, \ldots,P_p)$. After the decorrelation of the output signals $S(n)$, their covariance matrix becomes a diagonal one:
\begin{equation}
\mbox{E}\{ S(n)  \ \  S(n)^T\} = {\bf G \Lambda G}^T = {\bf D},
\label{decoG}
\end{equation}
where ${\bf D}=(d_{ij})$ is any diagonal matrix. From  last equation (\ref{decoG}), we can deduce that ${\bf G} $ is an orthogonal matrix and we can prove that
\begin{eqnarray}
g_{il}^2 P_l & = & d_{ii}, \\
\sum_l g_{il} g_{jl} P_l & = & 0  \; \; \; \forall l, \mbox{ and } i \neq j.
\label{import}
\end{eqnarray}
Generally the orthogonality of ${\bf G}$ is not great enough to separate the sources. In the case of nonstationary signals, the covariance matrix ${\bf \Lambda}$ changes with time. This means that equation (\ref{import}) must hold for any value of $P_i$ (her $P_i$ are assumed to be independently changing with time). Thus we can deduce that
\begin{equation}
g_{il} g_{jl} = 0 \; \; \; \; \forall l, \mbox{ and } i \neq j.
 \label{fond}
\end{equation}
Equation (\ref{fond}) implies the following:

\begin{itemize}

\item {\bf P1: All columns of {\bf G} have at most one nonzero coefficient}.

\item {\bf P2: All the rows of ${\bf G }$ have at least one nonzero coefficient.}: In fact, let $G_i$ (respectively $W_i$) denotes the i-th row of {\bf G} (respectively of {\bf W}) and let us put $w_{ii} = 1$, as in the previous sub-section. Using equation (\ref{est}), one can write
\begin{equation}
G_i = W_i . {\bf M }.
\label{fin}
\end{equation}
Using equation (\ref{fin}), and  the conditions that \mbox{$  w_{ii} = 1 $} (i.e.,  $ W_i \neq 0$) and ${\bf M}$ is a full-rank matrix, we can deduce that $G_i$ cannot be a zero vector and proposition {\bf P2} is valid.

\item Propositions {\bf P1} and {\bf P2} imply the following: \\

{\bf P3: Each column of $G$ has only one nonzero coefficient or $G$ satisfies the condition (\ref{gpd})}.\\

{\bf P3} simply means  that separation can be achieved using second-order statistics.

\end{itemize}



%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Algorithms \& Experimental Results}
 In this section, we discuss three possible approaches to the blind separation of nonstationary sources by using only second-order statistics

\subsection{Jacobi Diagonalization}
The first approach is based on the  Jacobi Diagonalization \cite{golub-84} and the Joint Diagonalization \cite{cardoso-siam-96}. Let us denote by ${\bf R} = (r_{ij})$ a $p \times p$ full rank matrix and let ${\bf J}(m,n,\theta)$ be a Givens\footnote{The Givens rotations ${\bf J}(m,n,\theta) =(J_{ij})$ are similar to identity matrix except for the four elements \mbox{$J_{mm} = J_{nn} = \cos{\theta}$} and \mbox{$J_{mn} = - J_{nm} = \sin{\theta}$}. The Givens rotations are also denoted by Jacobi rotations.} rotations matrix.\\

 By definition the Off function of a matrix ${\bf R}$ is:
\begin{equation}
\mbox{Off}({\bf R}) = \sqrt{\sum_{i=1}^{p} \hspace*{.5cm} \sum_{j=1, j\neq i}^p r_{ij}^2} \label{off}
\end{equation}
It is clear that the $\mbox{Off}({\bf R})$ is equal to zero when ${\bf R}$ is a diagonal matrix. The Jacobi method seeks for a set of Givens rotations matrix ${\bf J}(m,n,\theta) $ that minimize the Off function of ${\bf J}^T(m,n,\theta) {\bf R} {\bf J}(m,n,\theta)$. Using the same idea, the Cyclic Jacobi method \cite{golub-84} applied to a symmetric matrix ${\bf R}$ gives an orthogonal matrix ${\bf V}$ such that $\mbox{Off}({\bf V}^T {\bf R V}) \leq tol \|{\bf R}\|_F$, here $tol > 0$ is the tolerance and   $\|{\bf R}\|_F$ is the Frobenius norm\footnote{The Frobenius norm of a  $p\times p$ matrix ${\bf R} = (r_{ij})$ is \mbox{$\|{\bf R}\|_F = \sqrt{\sum_{i=1}^p \sum_{j=1}^p r_{ij}^2}$}.}. \\


According to the previous section, one can separate non-stationary sources (speech or music) from an instantaneous mixture by looking for a weight matrix ${\bf W}$ that can diagonalize the covariance matrix of the output signals. Unfortunately, the Cyclic Jacobi method can not directly be used to achieve our goal because the sources are assumed to be a second order non-stationary signals, therefore the covariance matrix of such signals are time variant.\\
\noindent
On the other hand, using the joint diagonalization algorithm proposed by cardoso and soulamic \cite{cardoso-siam-96}, one can jointly diagonalize a set of $q$ covariance matrix ${\bf R}_i = \mbox{E}\{S(n) S(n)^T\}$, here $ 1 \leq i \leq q$. The joint diagonalization algorithm is a modified version of the cyclic Jacobi method that minimize the following function with respect to a matrix ${\bf V}$:
\begin{equation}
\mbox{JOff}({\bf R}_1, \cdots, {\bf R}_q) = \sum_i Off({\bf V}^T {\bf R}_i {\bf V})
\end{equation}
It is obvious that ${JOff}({\bf R}_1, \cdots, {\bf R}_q) = 0$ when ${\bf V}^T {\bf R}_i {\bf V}$ is a diagonal matrix for every i. Because the estimation error and the noise, one can not minimize ${JOff}({\bf R}_1, \cdots, {\bf R}_q)$ to the lower limit (i.e $0$).\\

\noindent
In our experimental study, the number $q$ of the covariance matrices ${\bf R}_i$ has been chosen between 10 and 25. The covariance matrices ${\bf R}_i$ have been estimated according to the adaptive estimator of \cite{mansour-iconip-98} over some sliding windows of 500 to 800 samples and shifted 100 to 200 samples for each ${\bf R}_i$. All the previous limits have been determined by an experimental study using our data base signals.\\

 In addition, we should mention that we used a threshold to reduce the silence effect: When ever the observation signals at time $n_0$ is less than the predefined threshold $\epsilon$, it will not be considered as input signals: If the observation signals at time $n_0$ is less than the predefined threshold $\epsilon$ that means two things:\begin{enumerate}
\item That the sources are in common silence period, i.e we are receiving just noise signals.
\item  The samples of the sources at time $n_0$ have some relationship: For example, in the case of two sources, the first observation signal should be \mbox{$y_1(n_0) = m_{11} s_1(n_0) + m_{12} s_2(n_0) < \epsilon$}. Using the independence assumption {\bf H1}, one can consider, without loss of generality, that the probability to have such instant $n_0$ is so small and it has no effective effect on  the signal statistics or on the behavior of the algorithm.
\end{enumerate}



\begin{figure}
\centerline{\psfig{figure=/home/mansour/Decor/JointDiaco/Perfor/EXP/cost.eps,width=8cm}}
\caption{Evaluation of the cost function with respect to the iteration number.}
\label{cout}
\end{figure}

We conducted many experiments and found that the
crosstalk was between -17 dB and -25 dB. Fig.~\ref{cout} shows the evaluation of the cost
function with respect to the iteration number. The experimental study shows that the convergence
of this algorithm are obtained in few iterations. Fig.~\ref{experi}
shows the experimental results of the separation of two speech
sources.\\

\begin{figure*}[t]
\begin{tabular}{cc}
\psfig{figure=/home/mansour/Decor/JointDiaco/Perfor/EXP/sor1.eps,width=8cm} &
\psfig{figure=/home/mansour/Decor/JointDiaco/Perfor/EXP/sor2.eps,width=8cm} \\
\psfig{figure=/home/mansour/Decor/JointDiaco/Perfor/EXP/mix1.eps,width=8cm} &
\psfig{figure=/home/mansour/Decor/JointDiaco/Perfor/EXP/mix2.eps,width=8cm} \\
\psfig{figure=/home/mansour/Decor/JointDiaco/Perfor/EXP/est1.eps,width=8cm} &
\psfig{figure=/home/mansour/Decor/JointDiaco/Perfor/EXP/est2.eps,width=8cm}
\end{tabular}
\caption{First column contains the signals of the first channel (i.e.,  first source, first mixture signal and the first estimated source), the second column contains the signals of the second channel.}
\label{experi}
\end{figure*}

Finally, we should mention that the first one who suggest the separation by multi-diagonalization
of the covariance matrix was Fety \cite{fety-phd-88}. The approach of Fety have been the subject of research and discussion of many other researchers: It has been discussed and improved by Comon {\it et al.} \cite{comon-houche-93,comon-gretsi-93,lacoume-97}. Recently, Belouchrani {\it et al.} presented an algorithm based on the approach of Fety and the Joint Diagonalization \cite{belouchrani-digital-93,belouchrani-gretsi-93,belouchrani-ieee-97} to separate stationary correlated (in time) and independent (in space) sources signals from an instantaneous mixture. In \cite{belouchrani-ieee-97}  Belouchrani {\it et al.} discuss the performances of their algorithm and prove the convergence of such approach.


\subsection{Kull-back divergence}
The second approach is based on the Kull-back distance. The  Kull-back distance (or divergence) of two probability density functions (pdf) $f_x$ and $f_y$ is given by \cite{basseville-sp-89}
\begin{equation}
\delta (f_x,f_y) = \int f_x(u) \log  \left ( \frac{f_x(u)}{f_y(u)} \right ) \mbox{d}u.
\label{kullback1}
\end{equation}
It is known \cite{laheld-phd-94}, that the kull-back
divergence between two random zero mean Gaussian vectors $V_1$ and
$V_2$ is given by
\begin{equation}
\delta ({\bf R},\ { \bf I}) = \frac{1}{2} (\mbox{Trace} \{ {\bf R}\} - \log \det({\bf R}) ) \geq 0,
\label{kullback}
\end{equation}
where ${\bf I}$ is the $ p \times p$ identity matrix, and
\mbox{${\bf R} = \mbox{E}\{ S(n)  \ \ S(n)^T \}$} is the $ p \times p$
covariance matrix of the estimated sources $S(n)$. One of the
kull-back divergence properties is that
\begin{equation}
\delta ({\bf R},\ { \bf I}) = 0  \ \ \ \  \mbox{  iff   } \ \ \ \  {\bf R = I}.
\label{prop}
\end{equation}
Thus the minimization of divergence (\ref{kullback})  makes the matrix ${\bf R}$ close to an identity matrix (i.e.,  a diagonal matrix) and induces the separation of the sources, as we explained in the previous section.\\

\noindent
The minimization of divergence (\ref{kullback}) is achieved according to the natural gradient \cite{amari-nc-98,cardoso-ieee-96}. In this case the weight matrix ${\bf W}$ can be updated at iteration $(k+1)$ by
\begin{equation}
{\bf W}_{k+1}  = {\bf W}_k - \lambda \{ {\bf R} - {\bf I} \} {\bf W}_k,
\label{update}
\end{equation}
where $ 0 < \lambda < 1$ is a scale parameter. {\bf R} is estimated of ${\bf R}$ in slide windows of a small number
of samples, according to the method described in \cite{mansour-iconip-98}.\\

\noindent
 The advantage of this approach is that
the algorithm and the updating rules are simple. However the convergence point of this
criterion (\ref{kullback}) is a ${\bf W}^*$ that makes the matrix
${\bf R }$ close to an identity matrix (i.e.,  a special diagonal
matrix). It is obvious that this condition is more restrictive than
the initial condition described in the previous section where
${\bf R}$ must simply be a diagonal matrix.
%This means that the convergence  and the performance of the algorithm depend on the
%sources.
\begin{figure}
\centerline{\psfig{figure=/home/mansour/Decor/Performances/Proj/Try4/cost.eps,width=8cm}}
\caption{Evaluation of the cost function with respect to the iteration number.}
\label{cost}
\end{figure}
We conducted many experiments and found that the
crosstalk was between -15 dB and -23 dB. The evaluation of the cost
function with respect to the iteration number is shown in Fig.~\ref{cost}. The mixing matrix used  was ${\bf M}=\left ( \begin{array}{cc} 1 & -0.6 \\ 0.4 & 1 \end{array} \right )$.

Fig.~\ref{experires}
shows the experimental results of the separation of two speech
sources.

\begin{figure*}[t]
\begin{tabular}{cc}
\psfig{figure=/home/mansour/Decor/Performances/Proj/Try4/sour1.eps,width=8cm} &
\psfig{figure=/home/mansour/Decor/Performances/Proj/Try4/sour2.eps,width=8cm} \\
\psfig{figure=/home/mansour/Decor/Performances/Proj/Try4/mix1.eps,width=8cm} &
\psfig{figure=/home/mansour/Decor/Performances/Proj/Try4/mix2.eps,width=8cm} \\
\psfig{figure=/home/mansour/Decor/Performances/Proj/Try4/estsou1.eps,width=8cm} &
\psfig{figure=/home/mansour/Decor/Performances/Proj/Try4/estsou2.eps,width=8cm}
\end{tabular}
\caption{First column contains the signals of the first channel (i.e.,  first source, first mixture signal and the first estimated source), the second column contains the signals of the second channel.}
\label{experires}
\end{figure*}


\subsection{Hadamard's inequality}
The last approach is based on Hadamard's inequality, Hadamard's inequality \cite{noble-88} of an arbitrary positive semidefinite matrix ${\bf R}=(r_{ij})$ is given by
\begin{equation}
\prod_{i=1}^{p} r_{ii} \geq \hbox{det}\{ {\bf R} \},
\label{hadamard}
\end{equation}
where the equality holds if and only if the matrix ${\bf R}$ is a diagonal matrix. Using equation (\ref{hadamard}), it can be proven that:
\begin{equation}
\sum_{i=1}^p \log r_{ii} - \log \hbox{det}\{ {\bf R} \} \geq 0.
\label{hadamard2}
\end{equation}
Using this property, some authors \cite{matsuoka-nn-95,kawamoto-ieice-97,wu-ica-99} suggest  the separation of nonstationary signals by minimizing a modified version of Hadamard's inequality (\ref{hadamard2}) of the estimated source's covariance matrix ${\bf R} = \mbox{E}\{ S(n) \ \ S(n)^T \}$ with respect to the weight matrix ${\bf W}$
\begin{equation}
\hspace{-8mm} \min_{\bf W} \sum_{i=1}^p \log \hbox{E}\{s^2_i(n)\} - \log \hbox{det}\{ \hbox{E}\{S(n)  \ \  S^T(n)\} \},
\label{alg}
\end{equation}
The experimental results of this kind of algorithm are discussed in \cite{matsuoka-nn-95,kawamoto-ica-99}.



%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% picture
\section{Conclusion}

In this paper, we proved that second-order statistics are
sufficient to separate the instantaneous mixture of independent
nonstationary signals and that the decorrelation is equivalent to
the separation when the sources satisfy assumptions
{\bf H1} to {\bf H4}. The study was divided into two parts,
\begin{itemize}
\item  In the case of two sources, using the geometrical information of the mixing signals, we prove  that one can decorrelate the stationary signals or separate the nonstationary signals by using only second-order statistics.

\item For the general case, we proved that the diagonalization of the autocorrelation matrix can separate nonstationary signals.
\end{itemize}

Finally, the application of these theoretical results in a real world situation was  discussed by examining three possible approaches. In addition, we should mention that the first algorithm converge in few iterations but it needs more computation effort than the second one. In the other hand, the experimental study shows that the convergence of the second one needs much more iteration to converge than the first one. The comparison among these three algorithms and theirs performances will be the subject of a submitted paper \cite{mansour-tencon-2000}.


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \cR
%%%%%%%%ALI \appendix
%%%%%%%%ALI \section{The first algorithm in Mathematica code.}
%%%%%%%%ALI \begin{verbatim}
%%%%%%%%ALI lam = 0.001; (*internal coefficient*)
%%%%%%%%ALI lim = 500 (*estimation number*)
%%%%%%%%ALI alfa = 0.001 (* estimation coefficient*)
%%%%%%%%ALI
%%%%%%%%ALI X = Input[" The source vector X = ?"];
%%%%%%%%ALI (* X is a (p x Nbmax) matrix*)
%%%%%%%%ALI Nbmax = Dimensions[X][[2]];
%%%%%%%%ALI p = Dimensions[X][[1]];
%%%%%%%%ALI (* Nbmax is the number of samples*)
%%%%%%%%ALI
%%%%%%%%ALI M = Input["Please type the mixing matrix M = ?"];
%%%%%%%%ALI Y = M . X;
%%%%%%%%ALI Yt = Transpose[Y];
%%%%%%%%ALI
%%%%%%%%ALI W = IdentityMatrix[p];
%%%%%%%%ALI
%%%%%%%%ALI (* Rs is the covariance matrix of S*)
%%%%%%%%ALI Rs = Table[0,{p},{p}];
%%%%%%%%ALI
%%%%%%%%ALI
%%%%%%%%ALI dmin = 1000000;
%%%%%%%%ALI
%%%%%%%%ALI For[i=1,i< Nbmax-1,i++,
%%%%%%%%ALI     S = W . Yt[[i]] ;
%%%%%%%%ALI     sst = Table[S[[m]] S[[j]],{m,p},{j,p}];
%%%%%%%%ALI (*sst = S . S^T *)
%%%%%%%%ALI
%%%%%%%%ALI     Rs = (1 - 1 / i) Rs + sst / i ;
%%%%%%%%ALI     If[i > lim,
%%%%%%%%ALI         d = 0.5 Abs[Sum[Rs[[i,i]],{i,p}]
%%%%%%%%ALI                 - Log[Det[Rs] + 0.001]- p];
%%%%%%%%ALI     (*We added 0.001 for avoiding numerical error*)
%%%%%%%%ALI     If[ d < dmin , dmin = d; Wmin = W ];
%%%%%%%%ALI     ];
%%%%%%%%ALI
%%%%%%%%ALI     sst -= IdentityMatrix[p] ;
%%%%%%%%ALI    (* sst = S . S^T - I *)
%%%%%%%%ALI     W -= lam sst . W ;
%%%%%%%%ALI     W[[1,1]] = W[[2,2]] = 1 ;
%%%%%%%%ALI    ];
%%%%%%%%ALI
%%%%%%%%ALI S = Wmin . Y ;
%%%%%%%%ALI \end{verbatim}
%%%%%%%%ALI

\bibliographystyle{IEEEbib}
{\small \bibliography{/home/mansour/SEPSOURCE/sepsources}}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%5

\end{document}