Commonly used terms for machine learning and data mining

Author: The Little Dream, Created: 2017-03-20 09:58:22, Updated:

Commonly used terms for machine learning and data mining

  • Sampling (sample):

    • Simple Random Sampling (SRS) is a method of random sampling used in the field of computer vision.
    • Offline Sampling (possibly K-sampling offline, etc.)
    • Online Sampling (possibly K samples online and so on)
    • Ratio-based Sampling (RAS)
    • Acceptance-RejectionSampling (A/R) is the process of selecting samples from a set of samples.
    • Importance Sampling is the process of selecting samples of importance.
    • MCMC ((Markov Chain Monte Carlo Markov Monte Carlo sampling algorithm: Metropolis-Hasting and Gibbs)) is a mathematical model used by the mathematicians of the early 19th century.
  • Clustering (grouping):

    • K-Means,
    • K-Mediods,
    • Two points K-Means,
    • FK-Means,
    • Canopy,
    • Spectral-KMeans (spectral clustering) is a project of the company.
    • GMM-EM (hybrid Gaussian model - expected optimization algorithm solution)
    • K-Pototypes, CLARANS (based on the classification), and K-Pototypes, based on the classification of the species.
    • BIRCH (based on ranking)
    • CURE (based on level)
    • DBSCAN (based on density)
    • CLIQUE (based on density and based on grid) ⇒
  • Classification and Regression (Classification and Regression):

    • LR (Linear Regression)
    • LR (Logistic Regression Logical Regression)
    • SR (Softmax Regression probably class logical regression)
    • The GLM (Generalized Linear Model) is a linear model that is based on the GLM model.
    • RR ((Ridge Regression Swing Regression/L2 Regular Minimum Double Regression))
    • LASSO (Least Absolute Shrinkage and Selectionator Operator L1 is the least binary regression rule)
    • RF (Random Forest)
    • DT (DecisionTree) is a decision tree.
    • GBDT (Gradient Boosting Decision Tree) is a declining decision tree that is used to generate a number of decisions.
    • CART (Classification and Regression Tree) is a classification regression tree that is used to classify and classify a group of data.
    • KNN ((K-Nearest Neighbor K nearest neighbour))
    • SVM(Support VectorMachine),
    • KF (KernelFunction) is a polynomial kernel function, a polynomial kernel function, a polynomial kernel function, and a polynomial kernel function.
    • Guassian KernelFunction Gaussian kernel function/Radial BasisFunction RBF radius to the base function
    • String KernelFunction is a string kernel function.
    • NB (Naive Bayes, Naïve Bayes), BN (Bayesian Network/Bayesian Belief Network/Belief Network, Bayesian network/Bayesian belief network/belief network), also known as Bayesian network, Bayesian belief network, Bayesian belief network, Bayesian belief network, Bayesian belief network, Bayesian belief network, Bayesian belief network, Bayesian belief network, Bayesian belief network, Bayesian belief network, Bayesian belief network, Bayesian belief network, Bayesian belief network, Bayesian belief network)
    • LDA ((Linear Discriminant Analysis/FisherLinear Discriminant Analysis/Fisher Linear Discriminant Analysis) is a method used to analyze the results of a linear discriminant analysis.
    • EL (Ensemble Learning integrates learning with Boosting, Bagging, Stacking)
    • AdaBoost (Adaptive Boosting) is a software that is used to improve the performance of a website.
    • MEM (Maximum Entropy Model) is the largest helium model.
  • Effectiveness Evaluation:

    • The Confusion Matrix is a series of mathematical problems that are solved by a number of methods.
    • Precision (accuracy), recall (recall rate), and recall rate.
    • Accuracy (accuracy), F-score (F-score)
    • ROC curve (ROC curve), AUC (AUC area),
    • Lift Curve, KS Curve and KS Curve.
  • PGM (Probabilistic Graphical Models probability chart model):

    • BN ((Bayesian Network/Bayesian Belief Network/BeliefNetwork Bayesian network/Bayesian belief network/belief network), also known as the Bayesian belief network, is a non-profit organization that promotes the beliefs of Bayesian beliefs.
    • MC (Markov Chain) is a Markov chain that is used to describe the structure of the universe.
    • HMM (Hidden Markov Model), which is a model of the Markov effect.
    • MEMM (Maximum Entropy Markov Model) is a mathematical model used to estimate the entropy of a system.
    • CRF (conditional random field)
    • MRF (Markov Random Field at Markov Random Airport)
  • NN (neural network):

    • ANN (Artificial Neural Network) is an artificial neural network that is used by the human brain to detect and process information.
    • BP (Error BackPropagation) is the name of the program.
  • DeepLearning

    • Auto-encoder (Automatic encoder), which is used to encode data.
    • SAE (Stacked Auto-encoders) is a stack of auto-encoders that are used to encode data.
    • Sparse Auto-encoders is a rare automated encoder.
    • Denoising Auto-encoders to noise the auto-encoders
    • Contractive Auto-encoders are a type of automatic encoder used to encode data.
    • RBM (Restricted Boltzmann Machine) is a German computer programming language.
    • DBN (Deep Belief Network) is a profound belief network.
    • CNN (Convolutional Neural Network) is a news website that covers a wide range of topics.
    • Word2Vec (word vector learning model)
  • Dimensionality Reduction (Reduced Dimensionality):

    • LDA Linear Discriminant Analysis/Fisher Linear Discriminant Analysis/Fisher Linear Discriminant Analysis is a method of linear discriminant analysis that is used to analyze the results of a linear discriminant analysis.
    • PCA (Principal Component Analysis) is a method of analyzing the main components of a chemical compound.
    • ICA (Independent Component Analysis) is an independent component analysis tool used to analyze the composition of a chemical compound.
    • SVD (Singular Value Decomposition) is a term used to describe the singular value decomposition.
    • Factor analysis is a method of factor analysis.
  • Text mining (text mining):

    • VSM (Vector Space Model) is a vector space model used to describe a vector space.
    • Word2Vec (word vector learning model) is a word processor that is used to create a word processor.
    • TF (Term Frequency word frequency)
    • TF-IDF (Term Frequency-Inverse Document Frequency) is a term used to describe the frequency of a document.
    • MI (Mutual Information) is a mutual information service that provides information about the situation in Myanmar.
    • ECE (Expected Cross Entropy)
    • QEMI (secondary information portal)
    • IG (Information Gain) is a social networking site that provides information about the Internet.
    • IGR (Information Gain Ratio) is the rate of information gain.
    • Gini (Kini coefficient) is the ratio of the
    • x2 Statistic ((x2 statistics),
    • TEW (TextEvidence Weight) is a legal entity that provides legal advice to the public.
    • OR (Odds Ratio) The odds of winning.
    • N-Gram Model,
    • LSA (Latent Semantic Analysis) is the term used to describe the process of analyzing the semantics of a given language.
    • PLSA (Probabilistic Latent Semantic Analysis) is a semantic analysis based on probabilities.
    • LDA (Latent Dirichlet Allocation of the potential Dirichlet model)
  • Association Mining (linked to mining):

    • Apriori,
    • FP-growth (Frequency Pattern Tree Growth) is a tree-growth algorithm that is used to estimate the growth rate of a tree.
    • AprioriAll,
    • Spade。
  • Recommendation Engine (Engine for recommending):

    • DBR (Demographic-based Recommendation) is a non-governmental organization that promotes the development of demographic-based recommendations.
    • CBR (Context-based Recommendation) is a website that offers content-based recommendations.
    • CF (Collaborative Filtering collaborative filtering)
    • UCF (User-based Collaborative Filtering Recommendation) is a user-based collaborative filtering recommendation.
    • ICF (Item-based Collaborative Filtering Recommendation)
  • Similarity Measure & Distance Measure:

    • Euclidean Distance
    • ManhattanDistance is the name of the main street in Manhattan, the capital of the United States.
    • Chebyshev Distance is the distance between Chebyshev and Cheb Shev.
    • Minkowski Distance is the distance between the city of Minkowski and the city of Minkowski.
    • Standardized Euclidean Distance
    • MahalanobisDistance (Martian distance) is the distance between the Earth and Mars.
    • Cos (Cosine consonant)
    • HammingDistance/Edit Distance (Hamming distance/edit distance), which is the distance between the editors of a document and the editors of the document.
    • JaccardDistance is the distance between the two points.
    • Correlation coefficient distance
    • InformationEntropy is an online news site that provides information about the world's largest information society, InformationEntropy.
    • KL ((Kullback-Leibler Divergence KL scattering/Relative Entropy is relatively flat) ).
  • Feature Selection (algorithm for selecting features):

    • Mutual Information (Mutual information) is a free and open forum.
    • Document Frequency (DFC) is the frequency of a document.
    • Information Gain (Information Gain) is a term used to describe the process of gaining information.
    • The Chi-squared test is a test of the ability of the brain to make decisions.
    • Gini is the Gini coefficient.
  • Outlier Detection (algorithm for detecting outliers):

    • Statistic-based (based on statistics)
    • Distance-based
    • Density-based
    • Clustering-based.
  • Learning to Rank:

    • Pointwise:McRank;
    • Pairwise:RankingSVM,RankNet,Frank,RankBoost;
    • Listwise:AdaRank,SoftRank,LamdaMART。

More