Given the Boom of Deep Learning Approaches
INFOTEC
INFOTEC
Centro Público de Investigación del Gobierno Federal, que contribuye a la Transformación Digital de México, a través de la investigación, la innovación, la formación académica y el desarrollo de productos y servicios TIC. Sus alcances abarcan al sector público y privado, habilitando caminos que conduzcan hacia un México moderno y de inclusión digital.
GitHub: https://github.com/INGEOTEC
WebPage: https://ingeotec.github.io/
Artificial Intelligence (AI)
Theory and development of computer systems able to perform tasks that normally require human intelligence, such as visual perception, speech recognition, decision-making, and translation between languages.
Machine Learning
Machine learning (ML) is a subfield of artificial intelligence that focuses on the development and implementation of algorithms capable of learning from data without being explicitly programmed.
Natural Language Processing (NLP)
NLP is a branch of artificial intelligence (AI) that uses machine learning and other technologies to enable computers to understand, process, and manipulate human language.
How we look at it
How I see it
Problem
Classifier
Definition
The aim is the classification of documents into a fixed number of predefined categories.
Polarity
El día de mañana no podré ir con ustedes a la librería
Negative
https://ingeotec.github.io/Delitos
texto | etiqueta | |
---|---|---|
0 | Aut. Acapulco-Cuernavaca, km 100, incidente, c... | N |
1 | #CDMX Hallan cuatro cuerpos en la carretera Pi... | P |
2 | @PepaHoffmann Y su hijo violador ya está en la... | N |
3 | Caos en Dallas luego de que hombres armados ma... | P |
4 | @josemata1974 @amacasqui @diegobravorayo @Cami... | N |
5 | Al llegar, encuentran el cuerpo de la niña qui... | P |
6 | @torrents_d @KRLS @QuimTorraiPla putos botifle... | N |
7 | Chofer de autobús escolar ebria es detenida po... | P |
Bag of Words
Associate token \(t\)
\[ \mathbf{v_t} \in \mathbb R^d \]
Bag of words
\[ \mathbf x = \frac{\sum_t \mathbf{v_t}}{\lVert \sum_t \mathbf{v_t} \rVert} \]
Orthogonal
\[ \forall_{i \neq j} \mathbf{v_i} \cdot \mathbf{v_j} = 0 \]
Consequences
Document / TFIDF
Representación
[(1144, np.float64(0.17310726187637004)),
(7829, np.float64(0.18087521410848678)),
(2463, np.float64(0.6784430819531494)),
(1204, np.float64(0.4246144937481326)),
(663, np.float64(0.18792613863512314)),
(7854, np.float64(0.5112918104893687))]
Select a token
Supervised learning
Classification
CBOW
Dense representation
Classification
Linear Classifier
Dense representation
Classification
Outside NLP
Attention is All you Need
BERT
Equation
\[ \textsf{att}(Q, K, V) = \textsf{softmax}(\frac{QK^\intercal}{\sqrt{d_k}}) V \]
Parts
Analysis
Definiciones