Given the Boom of Deep Learning Approaches
INFOTEC

INFOTEC
Centro Público de Investigación del Gobierno Federal, que contribuye a la Transformación Digital de México, a través de la investigación, la innovación, la formación académica y el desarrollo de productos y servicios TIC. Sus alcances abarcan al sector público y privado, habilitando caminos que conduzcan hacia un México moderno y de inclusión digital.
GitHub: https://github.com/INGEOTEC
WebPage: https://ingeotec.github.io/
Artificial Intelligence (AI)
Theory and development of computer systems able to perform tasks that normally require human intelligence, such as visual perception, speech recognition, decision-making, and translation between languages.
Machine Learning
Machine learning (ML) is a subfield of artificial intelligence that focuses on the development and implementation of algorithms capable of learning from data without being explicitly programmed.
Natural Language Processing (NLP)
NLP is a branch of artificial intelligence (AI) that uses machine learning and other technologies to enable computers to understand, process, and manipulate human language.
How we look at it

How I see it

Problem

Classifier

Definition
The aim is the classification of documents into a fixed number of predefined categories.
Polarity
El día de mañana no podré ir con ustedes a la librería
Negative
https://ingeotec.github.io/Delitos
| texto | etiqueta | |
|---|---|---|
| 0 | Venezuela hagamos una paticion al gobierno de ... | N |
| 1 | Texas ejecutó a un hombre por un asesinato rac... | P |
| 2 | @makusdhy @FridaSiKahlo @Eskol69903 @freddy_st... | N |
| 3 | Joven venezolana desaparecida en #CostaRica. A... | P |
| 4 | En qué momento llegaron a ser las 2:47am voy e... | N |
| 5 | #Reddenoticias: Arrestan dos en Salcedo por co... | P |
| 6 | De gea e Cristiano, os salvadores do United | N |
| 7 | Lo mas cabron es que fue porque los clientes d... | P |
Bag of Words

Associate token \(t\)
\[ \mathbf{v_t} \in \mathbb R^d \]
Bag of words
\[ \mathbf x = \frac{\sum_t \mathbf{v_t}}{\lVert \sum_t \mathbf{v_t} \rVert} \]
Orthogonal
\[ \forall_{i \neq j} \mathbf{v_i} \cdot \mathbf{v_j} = 0 \]
Consequences
Document / TFIDF

Select a token

Supervised learning

Classification

CBOW
Dense representation
Classification

Linear Classifier
Dense representation
Classification

Outside NLP
Attention is All you Need
BERT
Equation
\[ \textsf{att}(Q, K, V) = \textsf{softmax}(\frac{QK^\intercal}{\sqrt{d_k}}) V \]
Parts
Analysis
Definiciones