Given the Boom of Deep Learning Approaches
INFOTEC

INFOTEC
Centro Público de Investigación del Gobierno Federal, que contribuye a la Transformación Digital de México, a través de la investigación, la innovación, la formación académica y el desarrollo de productos y servicios TIC. Sus alcances abarcan al sector público y privado, habilitando caminos que conduzcan hacia un México moderno y de inclusión digital.
GitHub: https://github.com/INGEOTEC
WebPage: https://ingeotec.github.io/
Artificial Intelligence (AI)
Theory and development of computer systems able to perform tasks that normally require human intelligence, such as visual perception, speech recognition, decision-making, and translation between languages.
Machine Learning
Machine learning (ML) is a subfield of artificial intelligence that focuses on the development and implementation of algorithms capable of learning from data without being explicitly programmed.
Natural Language Processing (NLP)
NLP is a branch of artificial intelligence (AI) that uses machine learning and other technologies to enable computers to understand, process, and manipulate human language.
How we look at it

How I see it

Problem

Classifier

Definition
The aim is the classification of documents into a fixed number of predefined categories.
Polarity
El día de mañana no podré ir con ustedes a la librería
Negative
https://ingeotec.github.io/Delitos
| texto | etiqueta | |
|---|---|---|
| 0 | Jorge Lanata sobre "la detención de Boudou", ... | N |
| 1 | #31Jul #Sucesos #NiUnaMenos \n52 mujeres fuero... | P |
| 2 | @anhetch @SilvieChavez En este caso el débil e... | N |
| 3 | Camión atropella a motochorros en gasolinera d... | P |
| 4 | Qué recuerdos cuando antes de irme al colegio ... | N |
| 5 | #VIOLENCIA \nDe la violencia que se registra e... | P |
| 6 | @osa409 @memohiervas Según vídeo que eh visto,... | N |
| 7 | Reportan una persona ejecutada con disparo de ... | P |
Bag of Words

Associate token \(t\)
\[ \mathbf{v_t} \in \mathbb R^d \]
Bag of words
\[ \mathbf x = \frac{\sum_t \mathbf{v_t}}{\lVert \sum_t \mathbf{v_t} \rVert} \]
Orthogonal
\[ \forall_{i \neq j} \mathbf{v_i} \cdot \mathbf{v_j} = 0 \]
Consequences
Document / TFIDF

Select a token

Supervised learning

Classification

CBOW
Dense representation
Classification

Linear Classifier
Dense representation
Classification

Outside NLP
Attention is All you Need
BERT
Equation
\[ \textsf{att}(Q, K, V) = \textsf{softmax}(\frac{QK^\intercal}{\sqrt{d_k}}) V \]
Parts
Analysis
Definiciones