Topic Modelling with Scikit-learn - Derek Greene
Topic Modelling with Scikit-learn
Derek Greene University College Dublin
PyData Dublin - 2017
Overview
? Scikit-learn
? Introduction to topic modelling
? Working with text data
? Topic modelling algorithms
? Non-negative Matrix Factorisation (NMF)
? Topic modelling with NMF in Scikit-learn
? Parameter selection for NMF
? Practical issues
Code, data, and slides:
2
Scikit-learn
pip install scikit-learn conda install scikit-learn
3
Introduction to Topic Modelling
Topic modelling aims to automatically discover the hidden thematic structure in a large corpus of text documents.
Topics
Documents
Topic 1
Basketball
LeBron NBA ...
Topic 2
NFL
Football American
...
Topic 3
Trump President Clinton
...
LeBron James says President Trump 'trying to divide through sport'
Basketball star LeBron James has praised the American football players who have protested against Donald Trump, and accused the US president of "using sports to try and divide us".
Trump said that NFL players who fail to stand during the national anthem should be sacked or suspended.
James praised the players' unity, and said: "The people run this country."
James, who plays for the Cleveland Cavaliers and has won three NBA championships, campaigned for Hillary Clinton, Trump's rival, during the 2016 presidential election campaign.
A document is composed of terms related to one or more topics.
4
Introduction to Topic Modelling
? Topic modelling is an unsupervised text mining approach.
? Input: A corpus of unstructured text documents (e.g. news
articles, tweets, speeches etc). No prior annotation or training set is typically required.
Input
Output
Data Preprocessing
Topic Modelling Algorithm
Topic 1 Topic 2
Topic k
? Output: A set of k topics, each of which is represented by:
1. A descriptor, based on the top-ranked terms for the topic.
2. Associations for documents relative to the topic.
5
................
................
In order to avoid copyright disputes, this page is only a partial summary.
To fulfill the demand for quickly locating and searching documents.
It is intelligent file search solution for home and business.
Related download
- python for data w o r k i n g o n m o d e l science m o d e l c h o o s
- scikit learn
- introduction to scikit learn
- institute for geographic information science san francisco state university
- python for r users brigham young university
- topic modelling with scikit learn derek greene
- quick start installing python and packages george mason university
- install package python spyder
- contents using spyder with arcgis pro 2 8 home use your own computer
- spyder rt seat 2020 up
Related searches
- import scikit learn jupyter
- youtube derek and the dominos layla
- derek and the dominos members
- layla song derek and dominos
- derek and the dominos bell bottom blues
- youtube derek and the dominos
- derek and the dominos live
- python install scikit learn
- scikit learn install
- confusion matrix scikit learn
- greene and greene houses
- greene and greene homes