proxy.golang.org : github.com/e-gun/nlp : v0.0.0-20230418221101-577c2209ffcc
Package nlp provides implementations of selected machine learning algorithms for natural language processing of text corpora. The primary focus is the statistical semantics of plain-text documents supporting semantic analysis and retrieval of semantically similar documents. The package makes use of the Gonum (http://http//www.gonum.org/) library for linear algebra and scientific computing with some inspiration taken from Python's scikit-learn (http://scikit-learn.org/stable/) and Gensim(https://radimrehurek.com/gensim/) The primary intended use case is to support document input as text strings encoded as a matrix of numerical feature vectors called a `term document matrix`. Each column in the matrix corresponds to a document in the corpus and each row corresponds to a unique term occurring in the corpus. The individual elements within the matrix contain the frequency with which each term occurs within each document (referred to as `term frequency`). Whilst textual data from document corpora are the primary intended use case, the algorithms can be used with other types of data from other sources once encoded (vectorised) into a suitable matrix e.g. image data, sound data, users/products, etc. These matrices can be processed and manipulated through the application of additional transformations for weighting features, identifying relationships or optimising the data for analysis, information retrieval and/or predictions. Typically the algorithms in this package implement one of three primary interfaces: One of the implementations of Vectoriser is Pipeline which can be used to wire together pipelines composed of a Vectoriser and one or more Transformers arranged in serial so that the output from each stage forms the input of the next. This can be used to construct a classic LSI (Latent Semantic Indexing) pipeline (vectoriser -> TF.IDF weighting -> Truncated SVD): Whilst they take different inputs, both Vectorisers and Transformers have 3 primary methods:
Registry -
Documentation -
Download -
JSON -
codemeta.json
purl: pkg:golang/github.com/e-gun/nlp@v0.0.0-20230418221101-577c2209ffcc
Published:
Indexed:
- github.com/armon/go-radix v1.0.0
- github.com/e-gun/sparse v0.0.0-20230418220937-07063da15582
- github.com/spaolacci/murmur3 v1.1.0
- golang.org/x/exp v0.0.0-20230418202329-0354be287a23
- gonum.org/v1/gonum v0.12.0