Building datasets - Florida State University
[Pages:9]BUILDING DATASETS
An Introduction
BUILDING DATASETS
? Sometimes, it becomes necessary to make your own dataset, for training, testing or verification purposes.
? Alternatively, you might want to build a dataset with "real-world" data from various sources for a project.
? We need well-composed datasets to build and test our ML application before launching it. ? Datasets are usually multi-variate. Each data-point has multiple variables/dimensions,
which may or may not be labeled. ? The advantage of building a test dataset is the existence of the class label (target) for the
data point, which allows us to test the application on something besides the training data.
TYPES OF DATASETS
Data can be numerical (just numbers) or categorical (all data types) ? Numerical data could be
? Discrete - individual data point represented by 1 value oin each axis ? Continuous - Data could be described over a range ? Interval - Data points are represented as counts per interval, with the intervals ordered ? Ratio - Normalized across the dimensions, so mathematical trnaformations may be applied
without skew
? Categorical data could be
? Nominal ? unordered lists, like lists of colors, courses, etc. ? Ordinal ? ordered, usually exhaustive lists, like T-shirt sizes (XS ? XL), letter grades (A-F), etc.
CONSTRUCTING DATASETS
There are several options: ? Use a pre-existing dataset from a dataset repository (or buy pre-labeled datasets).
You could also get access to an API that would funnel some client data to you.
? We did this for the iris clustering example
? Make up data points using statistics and pre-existing modules to fit the parameters of the application.
? We did this for the torch example
? Generate data points using just RNG's (tailored to fit the application). Harder for categorical data.
? Scrape the internet for data
? Web scrapers exist, but a lot of websites are not scrapable.
GENERATING DATASETS WITH SKLEARN
? The sklearn library allows us for several supervised and unsupervised ML applications including numerical data points that are
? Linear (for regression based applications) ? Circular (for 2 ? class classification) ? Moons (for 2-class classification) ? Blobs (for clustering, and multi-class classification)
? We will examine quick ways to generate all 4 of these. ? There are more, but these shall suffice for an example.
SOME DATA POINTS GENERATED WITH SKLEARN
SOME DATA POINTS GENERATED WITH SKLEARN
GENERATING CATEGORICAL DATA
There are several libraries that can be used to genberate "random" categorical data. ? Faker is one such library ? To install
pip install faker ? Faker contains ways to randomly generate
? Names ? Addresses ? Dates and times ? Job titles ? Geopositional data ? Textual "notes" ? Currencies ? User ID's and hashes ? XML data, etc.
? Faker can be configured for a variety of languages, locations, etc.
................
................
In order to avoid copyright disputes, this page is only a partial summary.
To fulfill the demand for quickly locating and searching documents.
It is intelligent file search solution for home and business.
Related download
- sklearn crfsuite documentation
- scikit survival a library for time to event analysis built on top of
- leaps regression subset selection
- scikit learn
- use the following guidelines to install python and run code for
- building datasets florida state university
- glmnet lasso and elastic net regularized generalized linear models
- installing numpy scipy opencv theano for python in vs
- numpy user guide
- factor analyzer documentation read the docs
Related searches
- florida state university education department
- florida state university course catalog
- florida state university online certificates
- florida state university employee salaries
- florida state university pay scale
- florida state university map printable
- florida state university certificate programs
- florida state university football schedule
- florida state university pictures
- florida state university football roster
- florida state university application status
- florida state university college scholarships