class: center, middle # Through the Lens of Oracle: Efficient Algorithm for Finding Selective Classifier ### Presented by Yanlin (Alan) Duan, Rong Zhou --- # Review of selective classifier Recall the definition of a selective classifier: -- A selective classifier `\(C\)` is a tuple `\((h, (\gamma_1, \cdots , \gamma_m)\)`, where `\(h\)` lies in a hypothesis class `\(\mathcal{H}\)`, and `0 \le \gamma_i \le 1\)` for all `\(i = 1, \cdots , m\)`. For any `\(x_{n+j} \in U\)`, `\(C(x_{n+j}) = h(x_{n+j})\)` wp `\(gamma_j\)` and `\(0\)` wp `\(1 − \gamma_j\)`. --- # Review of selective classifier Below is the algorithm (optimization problem) for finding selective classifier: .center[![:scale 60%](https://yanlin-duan.github.io/assets/pics/selective-classifier1.png)] -- How many constraints do we have? -- As many as `\(|V|)`! --- # Motivation of our project -- Can we solve the optimization problem more efficiently by accessing the hypothesis class not directly through (large amount of) constraints, but through some ERM oracle? -- Yes! -- Definition of ERM oracle we use: For a set of hypothesis `\(\mathcal{H}\)`, the `\(\textit{Weighted ERM Oracle}\)` is an algorithm, which for any sequence `\((x_1,y_1,w_1),(x_2,y_2,w_2),...(x_t,y_t,w_t) \in Z \subseteq \mathcal{X} \times Y \times \mathbb{R}_{\ge 0}\)`, returns `\( \text{argmin}_{h \in \mathcal{H}} \text{ } \sum_{(x,y,w) \in Z}\mathbbm{1}\{h(x) \ne y\}\cdot w \)` --- # Main Algorithm .center[![:scale 60%](https://yanlin-duan.github.io/assets/pics/selective-classifier2.png)] --- # Analysis of algorithm Question: After how many rounds will the algorithm halt? -- High level idea: .center[![:scale 60%](https://yanlin-duan.github.io/assets/pics/selective-classifier3.png)] --- # Find step size Transform from primal to dual by introducing Lagrange multipliers: `\( \mathcal{L} = \sum_{i = 1}^m \frac{1}{\gamma_i} + \sum_{j = 1}^N \lambda_j \left( \left( \sum_{i = 1}^m \gamma_i \mathbbm{1}_{h_j(x_{n+i}) \neq h_0(x_{n+i})} \right) - \epsilon m \right) \)` -- Find derivative and set to zero. Substitute `\(\gamma_i\)` using the following: `\( \gamma_i^2 = \left(\sum_{j = 1}^N \lambda_j \mathbbm{1}_{h_j(x_{n+i}) \neq h_0(x_{n+i})} \right)^{-1} \)'