SnowballC - Snowball Stemmers Based on the C 'libstemmer' UTF-8 Library
An R interface to the C 'libstemmer' library that implements Porter's word stemming algorithm for collapsing words to a common root to aid comparison of vocabulary. Currently supported languages are Arabic, Basque, Catalan, Danish, Dutch, English, Finnish, French, German, Greek, Hindi, Hungarian, Indonesian, Irish, Italian, Lithuanian, Nepali, Norwegian, Portuguese, Romanian, Russian, Spanish, Swedish, Tamil and Turkish.
Last updated 5 days ago
text-mining
12.72 score 28 stars 170 dependents 4.4k scripts 56k downloadsR2HTML - HTML Exportation for R Objects
Includes HTML function and methods to write in an HTML file. Thus, making HTML reports is easy. Includes a function that allows redirection on the fly, which appears to be very useful for teaching purpose, as the student can keep a copy of the produced output to keep all that he did during the course. Package comes with a vignette describing how to write HTML reports for statistical analysis. Finally, a driver for 'Sweave' allows to parse HTML flat files containing R code and to automatically write the corresponding outputs (tables and graphs).
Last updated 10 months ago
7.47 score 3 stars 18 dependents 288 scripts 3.5k downloadstm.plugin.factiva - Import Articles from 'Factiva' Using the 'tm' Text Mining Framework
Provides a 'tm' Source to create corpora from articles exported from the Dow Jones 'Factiva' content provider as XML or HTML files. It is able to read both text content and meta-data information (including source, date, title, author, subject, geographical coverage, company, industry, and various provider-specific fields).
Last updated 5 days ago
text-mining
5.14 score 28 stars 1 dependents 11 scripts 320 downloadstm.plugin.alceste - Import Texts from Files in the 'Alceste' Format Using the 'tm' Text Mining Framework
Provides a 'tm' Source to create corpora from a corpus prepared in the format used by the 'Alceste' application (i.e. a single text file with inline meta-data). It is able to import both text contents and meta-data (starred) variables.
Last updated 5 days ago
text-mining
5.10 score 28 stars 1 dependents 5 scripts 419 downloadstm.plugin.lexisnexis - Import Articles from 'LexisNexis' Using the 'tm' Text Mining Framework
Provides a 'tm' Source to create corpora from articles exported from the 'LexisNexis' content provider as HTML files. It is able to read both text content and meta-data information (including source, date, title, author and pages). Note that the file format is highly unstable: there is no warranty that this package will work for your corpus, and you may have to adjust the code to adapt it to your particular format.
Last updated 5 days ago
text-mining
5.10 score 28 stars 1 dependents 9 scripts 267 downloadstm.plugin.europresse - Import Articles from 'Europresse' Using the 'tm' Text Mining Framework
Provides a 'tm' Source to create corpora from articles exported from the 'Europresse' content provider as HTML files. It is able to read both text content and meta-data information (including source, date, title, author and pages).
Last updated 5 days ago
text-mining
5.10 score 28 stars 1 dependents 256 downloadsR.temis - Integrated Text Mining Solution
An integrated solution to perform a series of text mining tasks such as importing and cleaning a corpus, and analyses like terms and documents counts, lexical summary, terms co-occurrences and documents similarity measures, graphs of terms, correspondence analysis and hierarchical clustering. Corpora can be imported from spreadsheet-like files, directories of raw text files, as well as from 'Dow Jones Factiva', 'LexisNexis', 'Europresse' and 'Alceste' files.
Last updated 5 days ago
text-mining
5.00 score 28 stars 24 scripts 213 downloads