An open API service providing package, version and dependency metadata of many open source software ecosystems and registries.

Top 2.0% on proxy.golang.org
Top 1.6% dependent packages on proxy.golang.org
Top 1.5% dependent repos on proxy.golang.org
Top 3.7% forks on proxy.golang.org
Top 0.6% docker downloads on proxy.golang.org

proxy.golang.org : github.com/james-bowman/nlp

Package nlp provides implementations of selected machine learning algorithms for natural language processing of text corpora. The primary focus is the statistical semantics of plain-text documents supporting semantic analysis and retrieval of semantically similar documents. The package makes use of the Gonum (http://http//www.gonum.org/) library for linear algebra and scientific computing with some inspiration taken from Python's scikit-learn (http://scikit-learn.org/stable/) and Gensim(https://radimrehurek.com/gensim/) The primary intended use case is to support document input as text strings encoded as a matrix of numerical feature vectors called a `term document matrix`. Each column in the matrix corresponds to a document in the corpus and each row corresponds to a unique term occurring in the corpus. The individual elements within the matrix contain the frequency with which each term occurs within each document (referred to as `term frequency`). Whilst textual data from document corpora are the primary intended use case, the algorithms can be used with other types of data from other sources once encoded (vectorised) into a suitable matrix e.g. image data, sound data, users/products, etc. These matrices can be processed and manipulated through the application of additional transformations for weighting features, identifying relationships or optimising the data for analysis, information retrieval and/or predictions. Typically the algorithms in this package implement one of three primary interfaces: One of the implementations of Vectoriser is Pipeline which can be used to wire together pipelines composed of a Vectoriser and one or more Transformers arranged in serial so that the output from each stage forms the input of the next. This can be used to construct a classic LSI (Latent Semantic Indexing) pipeline (vectoriser -> TF.IDF weighting -> Truncated SVD): Whilst they take different inputs, both Vectorisers and Transformers have 3 primary methods:

Registry - Source - Documentation - JSON - codemeta.json
purl: pkg:golang/github.com/james-bowman/nlp
Keywords: feature-hash , go , golang , latent-dirichlet-allocation , latent-semantic-analysis , latent-semantic-indexing , lda , locality-sensitive-hashing , lsa , lsh , lsi , machine-learning , natural-language-processing , nlp , random-indexing , random-projections , simhash , singular-value-decomposition , svd , tf-idf
License: MIT
Latest release: over 4 years ago
First release: over 4 years ago
Namespace: github.com/james-bowman
Dependent packages: 11
Dependent repositories: 12
Stars: 390 on GitHub
Forks: 44 on GitHub
Docker dependents: 6
Docker downloads: 965,224
See more repository details: repos.ecosyste.ms
Last synced: 27 days ago

    Loading...
    Readme
    Loading...