R-universe - nalimilan (Milan Bouchet-Valat)

SnowballC - Snowball Stemmers Based on the C 'libstemmer' UTF-8 Library

An R interface to the C 'libstemmer' library that implements Porter's word stemming algorithm for collapsing words to a common root to aid comparison of vocabulary. Currently supported languages are Arabic, Basque, Catalan, Danish, Dutch, English, Finnish, French, German, Greek, Hindi, Hungarian, Indonesian, Irish, Italian, Lithuanian, Nepali, Norwegian, Portuguese, Romanian, Russian, Spanish, Swedish, Tamil and Turkish.

Last updated

text-mining

12.22 score 30 stars 179 dependents 5.1k scripts 41k downloads

R2HTML - HTML Exportation for R Objects

Includes HTML function and methods to write in an HTML file. Thus, making HTML reports is easy. Includes a function that allows redirection on the fly, which appears to be very useful for teaching purpose, as the student can keep a copy of the produced output to keep all that he did during the course. Package comes with a vignette describing how to write HTML reports for statistical analysis. Finally, a driver for 'Sweave' allows to parse HTML flat files containing R code and to automatically write the corresponding outputs (tables and graphs).

Last updated

7.70 score 3 stars 18 dependents 427 scripts 4.0k downloads

logmult - Log-Multiplicative Models, Including Association Models

Functions to fit log-multiplicative models using 'gnm', with support for convenient printing, plots, and jackknife/bootstrap standard errors. For complex survey data, models can be fitted from design objects from the 'survey' package. Currently supported models include UNIDIFF (Erikson & Goldthorpe, 1992), a.k.a. log-multiplicative layer effect model (Xie, 1992) <doi:10.2307/2096242>, and several association models: Goodman (1979) <doi:10.2307/2286971> row-column association models of the RC(M) and RC(M)-L families with one or several dimensions; two skew-symmetric association models proposed by Yamaguchi (1990) <doi:10.2307/271086> and by van der Heijden & Mooijaart (1995) <doi:10.1177/0049124195024001002> Functions allow computing the intrinsic association coefficient (see Bouchet-Valat (2022) <doi:10.1177/0049124119852389>) and the Altham (1970) index <doi:10.1111/j.2517-6161.1970.tb00816.x>, including via the Bayes shrinkage estimator proposed by Zhou (2015) <doi:10.1177/0081175015570097>; and the RAS/IPF/Deming-Stephan algorithm.

Last updated

log-linear-modelmodellingstatistics

5.74 score 4 stars 106 scripts 2.6k downloads

tm.plugin.factiva - Import Articles from 'Factiva' Using the 'tm' Text Mining Framework

Provides a 'tm' Source to create corpora from articles exported from the Dow Jones 'Factiva' content provider as XML or HTML files. It is able to read both text content and meta-data information (including source, date, title, author, subject, geographical coverage, company, industry, and various provider-specific fields).

Last updated

text-mining

4.83 score 30 stars 1 dependents 15 scripts 340 downloads

tm.plugin.lexisnexis - Import Articles from 'LexisNexis' Using the 'tm' Text Mining Framework

Provides a 'tm' Source to create corpora from articles exported from the 'LexisNexis' content provider as HTML files. It is able to read both text content and meta-data information (including source, date, title, author and pages). Note that the file format is highly unstable: there is no warranty that this package will work for your corpus, and you may have to adjust the code to adapt it to your particular format.

Last updated

text-mining

4.80 score 30 stars 1 dependents 14 scripts 416 downloads

tm.plugin.europresse - Import Articles from 'Europresse' Using the 'tm' Text Mining Framework

Provides a 'tm' Source to create corpora from articles exported from the 'Europresse' content provider as HTML files. It is able to read both text content and meta-data information (including source, date, title, author and pages).

Last updated

text-mining

4.65 score 30 stars 1 dependents 7 scripts 309 downloads

tm.plugin.alceste - Import Texts from Files in the 'Alceste' Format Using the 'tm' Text Mining Framework

Provides a 'tm' Source to create corpora from a corpus prepared in the format used by the 'Alceste' application (i.e. a single text file with inline meta-data). It is able to import both text contents and meta-data (starred) variables.

Last updated

text-mining

4.65 score 30 stars 1 dependents 7 scripts 334 downloads

R.temis - Integrated Text Mining Solution

An integrated solution to perform a series of text mining tasks such as importing and cleaning a corpus, and analyses like terms and documents counts, lexical summary, terms co-occurrences and documents similarity measures, graphs of terms, correspondence analysis and hierarchical clustering. Corpora can be imported from spreadsheet-like files, directories of raw text files, as well as from 'Dow Jones Factiva', 'LexisNexis', 'Europresse' and 'Alceste' files.

Last updated

text-mining

4.59 score 30 stars 26 scripts 206 downloads

RcmdrPlugin.temis - Graphical Integrated Text Mining Solution

An 'R Commander' plug-in providing an integrated solution to perform a series of text mining tasks such as importing and cleaning a corpus, and analyses like terms and documents counts, vocabulary tables, terms co-occurrences and documents similarity measures, time series analysis, correspondence analysis and hierarchical clustering. Corpora can be imported from spreadsheet-like files, directories of raw text files, as well as from 'Dow Jones Factiva', 'LexisNexis', 'Europresse' and 'Alceste' files.

Last updated

1.00 score 9 scripts 432 downloads