Improved clustering and robust moment estimation via sum-of-squares

with , . STOC 2018.
(to appear).

abstract

\[ \newcommand{\undefined}{} \newcommand{\hfill}{} \newcommand{\qedhere}{\square} \newcommand{\bbA}{\mathbb A} \newcommand{\bbB}{\mathbb B} \newcommand{\bbC}{\mathbb C} \newcommand{\bbD}{\mathbb D} \newcommand{\bbE}{\mathbb E} \newcommand{\bbF}{\mathbb F} \newcommand{\bbG}{\mathbb G} \newcommand{\bbH}{\mathbb H} \newcommand{\bbI}{\mathbb I} \newcommand{\bbJ}{\mathbb J} \newcommand{\bbK}{\mathbb K} \newcommand{\bbL}{\mathbb L} \newcommand{\bbM}{\mathbb M} \newcommand{\bbN}{\mathbb N} \newcommand{\bbO}{\mathbb O} \newcommand{\bbP}{\mathbb P} \newcommand{\bbQ}{\mathbb Q} \newcommand{\bbR}{\mathbb R} \newcommand{\bbS}{\mathbb S} \newcommand{\bbT}{\mathbb T} \newcommand{\bbU}{\mathbb U} \newcommand{\bbV}{\mathbb V} \newcommand{\bbW}{\mathbb W} \newcommand{\bbX}{\mathbb X} \newcommand{\bbY}{\mathbb Y} \newcommand{\bbZ}{\mathbb Z} \newcommand{\sA}{\mathscr A} \newcommand{\sB}{\mathscr B} \newcommand{\sC}{\mathscr C} \newcommand{\sD}{\mathscr D} \newcommand{\sE}{\mathscr E} \newcommand{\sF}{\mathscr F} \newcommand{\sG}{\mathscr G} \newcommand{\sH}{\mathscr H} \newcommand{\sI}{\mathscr I} \newcommand{\sJ}{\mathscr J} \newcommand{\sK}{\mathscr K} \newcommand{\sL}{\mathscr L} \newcommand{\sM}{\mathscr M} \newcommand{\sN}{\mathscr N} \newcommand{\sO}{\mathscr O} \newcommand{\sP}{\mathscr P} \newcommand{\sQ}{\mathscr Q} \newcommand{\sR}{\mathscr R} \newcommand{\sS}{\mathscr S} \newcommand{\sT}{\mathscr T} \newcommand{\sU}{\mathscr U} \newcommand{\sV}{\mathscr V} \newcommand{\sW}{\mathscr W} \newcommand{\sX}{\mathscr X} \newcommand{\sY}{\mathscr Y} \newcommand{\sZ}{\mathscr Z} \newcommand{\sfA}{\mathsf A} \newcommand{\sfB}{\mathsf B} \newcommand{\sfC}{\mathsf C} \newcommand{\sfD}{\mathsf D} \newcommand{\sfE}{\mathsf E} \newcommand{\sfF}{\mathsf F} \newcommand{\sfG}{\mathsf G} \newcommand{\sfH}{\mathsf H} \newcommand{\sfI}{\mathsf I} \newcommand{\sfJ}{\mathsf J} \newcommand{\sfK}{\mathsf K} \newcommand{\sfL}{\mathsf L} \newcommand{\sfM}{\mathsf M} \newcommand{\sfN}{\mathsf N} \newcommand{\sfO}{\mathsf O} \newcommand{\sfP}{\mathsf P} \newcommand{\sfQ}{\mathsf Q} \newcommand{\sfR}{\mathsf R} \newcommand{\sfS}{\mathsf S} \newcommand{\sfT}{\mathsf T} \newcommand{\sfU}{\mathsf U} \newcommand{\sfV}{\mathsf V} \newcommand{\sfW}{\mathsf W} \newcommand{\sfX}{\mathsf X} \newcommand{\sfY}{\mathsf Y} \newcommand{\sfZ}{\mathsf Z} \newcommand{\cA}{\mathcal A} \newcommand{\cB}{\mathcal B} \newcommand{\cC}{\mathcal C} \newcommand{\cD}{\mathcal D} \newcommand{\cE}{\mathcal E} \newcommand{\cF}{\mathcal F} \newcommand{\cG}{\mathcal G} \newcommand{\cH}{\mathcal H} \newcommand{\cI}{\mathcal I} \newcommand{\cJ}{\mathcal J} \newcommand{\cK}{\mathcal K} \newcommand{\cL}{\mathcal L} \newcommand{\cM}{\mathcal M} \newcommand{\cN}{\mathcal N} \newcommand{\cO}{\mathcal O} \newcommand{\cP}{\mathcal P} \newcommand{\cQ}{\mathcal Q} \newcommand{\cR}{\mathcal R} \newcommand{\cS}{\mathcal S} \newcommand{\cT}{\mathcal T} \newcommand{\cU}{\mathcal U} \newcommand{\cV}{\mathcal V} \newcommand{\cW}{\mathcal W} \newcommand{\cX}{\mathcal X} \newcommand{\cY}{\mathcal Y} \newcommand{\cZ}{\mathcal Z} \newcommand{\bfA}{\mathbf A} \newcommand{\bfB}{\mathbf B} \newcommand{\bfC}{\mathbf C} \newcommand{\bfD}{\mathbf D} \newcommand{\bfE}{\mathbf E} \newcommand{\bfF}{\mathbf F} \newcommand{\bfG}{\mathbf G} \newcommand{\bfH}{\mathbf H} \newcommand{\bfI}{\mathbf I} \newcommand{\bfJ}{\mathbf J} \newcommand{\bfK}{\mathbf K} \newcommand{\bfL}{\mathbf L} \newcommand{\bfM}{\mathbf M} \newcommand{\bfN}{\mathbf N} \newcommand{\bfO}{\mathbf O} \newcommand{\bfP}{\mathbf P} \newcommand{\bfQ}{\mathbf Q} \newcommand{\bfR}{\mathbf R} \newcommand{\bfS}{\mathbf S} \newcommand{\bfT}{\mathbf T} \newcommand{\bfU}{\mathbf U} \newcommand{\bfV}{\mathbf V} \newcommand{\bfW}{\mathbf W} \newcommand{\bfX}{\mathbf X} \newcommand{\bfY}{\mathbf Y} \newcommand{\bfZ}{\mathbf Z} \newcommand{\rmA}{\mathrm A} \newcommand{\rmB}{\mathrm B} \newcommand{\rmC}{\mathrm C} \newcommand{\rmD}{\mathrm D} \newcommand{\rmE}{\mathrm E} \newcommand{\rmF}{\mathrm F} \newcommand{\rmG}{\mathrm G} \newcommand{\rmH}{\mathrm H} \newcommand{\rmI}{\mathrm I} \newcommand{\rmJ}{\mathrm J} \newcommand{\rmK}{\mathrm K} \newcommand{\rmL}{\mathrm L} \newcommand{\rmM}{\mathrm M} \newcommand{\rmN}{\mathrm N} \newcommand{\rmO}{\mathrm O} \newcommand{\rmP}{\mathrm P} \newcommand{\rmQ}{\mathrm Q} \newcommand{\rmR}{\mathrm R} \newcommand{\rmS}{\mathrm S} \newcommand{\rmT}{\mathrm T} \newcommand{\rmU}{\mathrm U} \newcommand{\rmV}{\mathrm V} \newcommand{\rmW}{\mathrm W} \newcommand{\rmX}{\mathrm X} \newcommand{\rmY}{\mathrm Y} \newcommand{\rmZ}{\mathrm Z} \newcommand{\paren}[1]{( #1 )} \newcommand{\Paren}[1]{\left( #1 \right)} \newcommand{\bigparen}[1]{\bigl( #1 \bigr)} \newcommand{\Bigparen}[1]{\Bigl( #1 \Bigr)} \newcommand{\biggparen}[1]{\biggl( #1 \biggr)} \newcommand{\Biggparen}[1]{\Biggl( #1 \Biggr)} \newcommand{\abs}[1]{\lvert #1 \rvert} \newcommand{\Abs}[1]{\left\lvert #1 \right\rvert} \newcommand{\bigabs}[1]{\bigl\lvert #1 \bigr\rvert} \newcommand{\Bigabs}[1]{\Bigl\lvert #1 \Bigr\rvert} \newcommand{\biggabs}[1]{\biggl\lvert #1 \biggr\rvert} \newcommand{\Biggabs}[1]{\Biggl\lvert #1 \Biggr\rvert} \newcommand{\card}[1]{\lvert #1 \rvert} \newcommand{\Card}[1]{\left\lvert #1 \right\rvert} \newcommand{\bigcard}[1]{\bigl\lvert #1 \bigr\rvert} \newcommand{\Bigcard}[1]{\Bigl\lvert #1 \Bigr\rvert} \newcommand{\biggcard}[1]{\biggl\lvert #1 \biggr\rvert} \newcommand{\Biggcard}[1]{\Biggl\lvert #1 \Biggr\rvert} \newcommand{\norm}[1]{\lVert #1 \rVert} \newcommand{\Norm}[1]{\left\lVert #1 \right\rVert} \newcommand{\bignorm}[1]{\bigl\lVert #1 \bigr\rVert} \newcommand{\Bignorm}[1]{\Bigl\lVert #1 \Bigr\rVert} \newcommand{\biggnorm}[1]{\biggl\lVert #1 \biggr\rVert} \newcommand{\Biggnorm}[1]{\Biggl\lVert #1 \Biggr\rVert} \newcommand{\iprod}[1]{\langle #1 \rangle} \newcommand{\Iprod}[1]{\left\langle #1 \right\rangle} \newcommand{\bigiprod}[1]{\bigl\langle #1 \bigr\rangle} \newcommand{\Bigiprod}[1]{\Bigl\langle #1 \Bigr\rangle} \newcommand{\biggiprod}[1]{\biggl\langle #1 \biggr\rangle} \newcommand{\Biggiprod}[1]{\Biggl\langle #1 \Biggr\rangle} \newcommand{\set}[1]{\lbrace #1 \rbrace} \newcommand{\Set}[1]{\left\lbrace #1 \right\rbrace} \newcommand{\bigset}[1]{\bigl\lbrace #1 \bigr\rbrace} \newcommand{\Bigset}[1]{\Bigl\lbrace #1 \Bigr\rbrace} \newcommand{\biggset}[1]{\biggl\lbrace #1 \biggr\rbrace} \newcommand{\Biggset}[1]{\Biggl\lbrace #1 \Biggr\rbrace} \newcommand{\bracket}[1]{\lbrack #1 \rbrack} \newcommand{\Bracket}[1]{\left\lbrack #1 \right\rbrack} \newcommand{\bigbracket}[1]{\bigl\lbrack #1 \bigr\rbrack} \newcommand{\Bigbracket}[1]{\Bigl\lbrack #1 \Bigr\rbrack} \newcommand{\biggbracket}[1]{\biggl\lbrack #1 \biggr\rbrack} \newcommand{\Biggbracket}[1]{\Biggl\lbrack #1 \Biggr\rbrack} \newcommand{\ucorner}[1]{\ulcorner #1 \urcorner} \newcommand{\Ucorner}[1]{\left\ulcorner #1 \right\urcorner} \newcommand{\bigucorner}[1]{\bigl\ulcorner #1 \bigr\urcorner} \newcommand{\Bigucorner}[1]{\Bigl\ulcorner #1 \Bigr\urcorner} \newcommand{\biggucorner}[1]{\biggl\ulcorner #1 \biggr\urcorner} \newcommand{\Biggucorner}[1]{\Biggl\ulcorner #1 \Biggr\urcorner} \newcommand{\ceil}[1]{\lceil #1 \rceil} \newcommand{\Ceil}[1]{\left\lceil #1 \right\rceil} \newcommand{\bigceil}[1]{\bigl\lceil #1 \bigr\rceil} \newcommand{\Bigceil}[1]{\Bigl\lceil #1 \Bigr\rceil} \newcommand{\biggceil}[1]{\biggl\lceil #1 \biggr\rceil} \newcommand{\Biggceil}[1]{\Biggl\lceil #1 \Biggr\rceil} \newcommand{\floor}[1]{\lfloor #1 \rfloor} \newcommand{\Floor}[1]{\left\lfloor #1 \right\rfloor} \newcommand{\bigfloor}[1]{\bigl\lfloor #1 \bigr\rfloor} \newcommand{\Bigfloor}[1]{\Bigl\lfloor #1 \Bigr\rfloor} \newcommand{\biggfloor}[1]{\biggl\lfloor #1 \biggr\rfloor} \newcommand{\Biggfloor}[1]{\Biggl\lfloor #1 \Biggr\rfloor} \newcommand{\lcorner}[1]{\llcorner #1 \lrcorner} \newcommand{\Lcorner}[1]{\left\llcorner #1 \right\lrcorner} \newcommand{\biglcorner}[1]{\bigl\llcorner #1 \bigr\lrcorner} \newcommand{\Biglcorner}[1]{\Bigl\llcorner #1 \Bigr\lrcorner} \newcommand{\bigglcorner}[1]{\biggl\llcorner #1 \biggr\lrcorner} \newcommand{\Bigglcorner}[1]{\Biggl\llcorner #1 \Biggr\lrcorner} \newcommand{\e}{\varepsilon} \newcommand{\eps}{\varepsilon} \newcommand{\from}{\colon} \newcommand{\super}[2]{#1^{(#2)}} \newcommand{\varsuper}[2]{#1^{\scriptscriptstyle (#2)}} \newcommand{\tensor}{\otimes} \newcommand{\eset}{\emptyset} \newcommand{\sse}{\subseteq} \newcommand{\sst}{\substack} \newcommand{\ot}{\otimes} \newcommand{\Esst}[1]{\bbE_{\substack{#1}}} \newcommand{\vbig}{\vphantum{\bigoplus}} \newcommand{\seteq}{\mathrel{\mathop:}=} \newcommand{\defeq}{\stackrel{\mathrm{def}}=} \newcommand{\bits}{\{0,1\}} \newcommand{\sbits}{\{\pm 1\}} \newcommand{\Mid}{\mathrel{}\middle|\mathrel{}} \newcommand{\Ind}{\mathbf 1} \newcommand{\R}{\mathbb R} \newcommand{\Rnn}{\R_{\ge 0}} \newcommand{\N}{\mathbb N} \newcommand{\Z}{\mathbb Z} \newcommand{\Q}{\mathbb Q} \newcommand{\mper}{\,.} \newcommand{\mcom}{\,,} \DeclareMathOperator{\Id}{Id} \DeclareMathOperator{\cone}{cone} \DeclareMathOperator{\vol}{vol} \DeclareMathOperator{\val}{val} \DeclareMathOperator{\opt}{opt} \DeclareMathOperator{\Opt}{Opt} \DeclareMathOperator{\Val}{Val} \DeclareMathOperator{\LP}{LP} \DeclareMathOperator{\SDP}{SDP} \DeclareMathOperator{\Tr}{Tr} \DeclareMathOperator{\Inf}{Inf} \DeclareMathOperator{\poly}{poly} \DeclareMathOperator{\polylog}{polylog} \DeclareMathOperator{\argmax}{arg\,max} \DeclareMathOperator{\argmin}{arg\,min} \DeclareMathOperator{\qpoly}{qpoly} \DeclareMathOperator{\qqpoly}{qqpoly} \DeclareMathOperator{\conv}{conv} \DeclareMathOperator{\Conv}{Conv} \DeclareMathOperator{\supp}{supp} \DeclareMathOperator{\sign}{sign} \DeclareMathOperator{\mspan}{span} \DeclareMathOperator{\mrank}{rank} \DeclareMathOperator{\E}{\mathbb E} \DeclareMathOperator{\pE}{\tilde{\mathbb E}} \DeclareMathOperator{\Pr}{\mathbb P} \DeclareMathOperator{\Span}{Span} \DeclareMathOperator{\Cone}{Cone} \DeclareMathOperator{\junta}{junta} \DeclareMathOperator{\NSS}{NSS} \DeclareMathOperator{\SA}{SA} \DeclareMathOperator{\SOS}{SOS} \notag \]

We develop efficient algorithms for clustering and robust moment estimation. Via convex sum-of-squares relaxations, our algorithms can exploit bounds on higher-order moments of the underlying distributions. In this way, our algorithms achieve substantially stronger guarantees than previous approaches.

For example, our clustering algorithm can learn mixtures of \(k\) spherical Gaussians in time \(n^{O(t)}\) as long as the means have minimum separation \(\sqrt t \cdot k^{1/t}\) for any \(t\ge 4\), significantly improving over the minimum separation \(k^{1/4}\) required by the previous best algorithm due to Vempala and Wang (JCSS’04).

Given corrupted samples that contain a constant fraction of adversarial outliers, our moment estimation algorithms can approximate low-degree moments of unknown distributions, achieving guarantees that match information-theoretic lower-bounds for a broad class of distributions. These algorithms allow us to enhance, in a black-box way, many previous learning algorithms based on the method of moments, e.g., for independent component analysis and latent Dirichlet allocation, such that they become resilient to adversarial outliers.

Based on joint work with Pravesh Kothari and Jacob Steinhardt.

keywords

sum-of-squares method, machine learning, semidefinite programming.