SnowballC - Snowball Stemmers Based on the C 'libstemmer' UTF-8 Library
An R interface to the C 'libstemmer' library that implements Porter's word stemming algorithm for collapsing words to a common root to aid comparison of vocabulary. Currently supported languages are Arabic, Basque, Catalan, Danish, Dutch, English, Finnish, French, German, Greek, Hindi, Hungarian, Indonesian, Irish, Italian, Lithuanian, Nepali, Norwegian, Portuguese, Romanian, Russian, Spanish, Swedish, Tamil and Turkish.
Last updated 9 months ago
text-mining
12.11 score 26 stars 164 packages 4.3k scripts 47k downloadsR2HTML - HTML Exportation for R Objects
Includes HTML function and methods to write in an HTML file. Thus, making HTML reports is easy. Includes a function that allows redirection on the fly, which appears to be very useful for teaching purpose, as the student can keep a copy of the produced output to keep all that he did during the course. Package comes with a vignette describing how to write HTML reports for statistical analysis. Finally, a driver for 'Sweave' allows to parse HTML flat files containing R code and to automatically write the corresponding outputs (tables and graphs).
Last updated 5 months ago
7.59 score 3 stars 18 packages 330 scripts 4.1k downloadstm.plugin.factiva - Import Articles from 'Factiva' Using the 'tm' Text Mining Framework
Provides a 'tm' Source to create corpora from articles exported from the Dow Jones 'Factiva' content provider as XML or HTML files. It is able to read both text content and meta-data information (including source, date, title, author, subject, geographical coverage, company, industry, and various provider-specific fields).
Last updated 9 months ago
text-mining
4.67 score 26 stars 1 packages 12 scripts 270 downloadstm.plugin.alceste - Import Texts from Files in the 'Alceste' Format Using the 'tm' Text Mining Framework
Provides a 'tm' Source to create corpora from a corpus prepared in the format used by the 'Alceste' application (i.e. a single text file with inline meta-data). It is able to import both text contents and meta-data (starred) variables.
Last updated 9 months ago
text-mining
4.59 score 26 stars 1 packages 5 scripts 214 downloadstm.plugin.lexisnexis - Import Articles from 'LexisNexis' Using the 'tm' Text Mining Framework
Provides a 'tm' Source to create corpora from articles exported from the 'LexisNexis' content provider as HTML files. It is able to read both text content and meta-data information (including source, date, title, author and pages). Note that the file format is highly unstable: there is no warranty that this package will work for your corpus, and you may have to adjust the code to adapt it to your particular format.
Last updated 9 months ago
text-mining
4.59 score 26 stars 1 packages 9 scripts 254 downloadsR.temis - Integrated Text Mining Solution
An integrated solution to perform a series of text mining tasks such as importing and cleaning a corpus, and analyses like terms and documents counts, lexical summary, terms co-occurrences and documents similarity measures, graphs of terms, correspondence analysis and hierarchical clustering. Corpora can be imported from spreadsheet-like files, directories of raw text files, as well as from 'Dow Jones Factiva', 'LexisNexis', 'Europresse' and 'Alceste' files.
Last updated 9 months ago
text-mining
4.49 score 26 stars 24 scripts 234 downloadstm.plugin.europresse - Import Articles from 'Europresse' Using the 'tm' Text Mining Framework
Provides a 'tm' Source to create corpora from articles exported from the 'Europresse' content provider as HTML files. It is able to read both text content and meta-data information (including source, date, title, author and pages).
Last updated 8 years ago
2.48 score 1 packages 5 scripts 288 downloads