SnowballC - Snowball Stemmers Based on the C 'libstemmer' UTF-8 Library
An R interface to the C 'libstemmer' library that implements Porter's word stemming algorithm for collapsing words to a common root to aid comparison of vocabulary. Currently supported languages are Arabic, Basque, Catalan, Danish, Dutch, English, Finnish, French, German, Greek, Hindi, Hungarian, Indonesian, Irish, Italian, Lithuanian, Nepali, Norwegian, Portuguese, Romanian, Russian, Spanish, Swedish, Tamil and Turkish.
Last updated 12 months ago
text-mining
11.96 score 27 stars 167 dependents 4.5k scripts 30k downloadsR2HTML - HTML Exportation for R Objects
Includes HTML function and methods to write in an HTML file. Thus, making HTML reports is easy. Includes a function that allows redirection on the fly, which appears to be very useful for teaching purpose, as the student can keep a copy of the produced output to keep all that he did during the course. Package comes with a vignette describing how to write HTML reports for statistical analysis. Finally, a driver for 'Sweave' allows to parse HTML flat files containing R code and to automatically write the corresponding outputs (tables and graphs).
Last updated 8 months ago
7.38 score 3 stars 18 dependents 288 scripts 2.9k downloadstm.plugin.factiva - Import Articles from 'Factiva' Using the 'tm' Text Mining Framework
Provides a 'tm' Source to create corpora from articles exported from the Dow Jones 'Factiva' content provider as XML or HTML files. It is able to read both text content and meta-data information (including source, date, title, author, subject, geographical coverage, company, industry, and various provider-specific fields).
Last updated 12 months ago
text-mining
4.65 score 27 stars 1 dependents 11 scripts 306 downloadstm.plugin.lexisnexis - Import Articles from 'LexisNexis' Using the 'tm' Text Mining Framework
Provides a 'tm' Source to create corpora from articles exported from the 'LexisNexis' content provider as HTML files. It is able to read both text content and meta-data information (including source, date, title, author and pages). Note that the file format is highly unstable: there is no warranty that this package will work for your corpus, and you may have to adjust the code to adapt it to your particular format.
Last updated 12 months ago
text-mining
4.61 score 27 stars 1 dependents 9 scripts 279 downloadstm.plugin.alceste - Import Texts from Files in the 'Alceste' Format Using the 'tm' Text Mining Framework
Provides a 'tm' Source to create corpora from a corpus prepared in the format used by the 'Alceste' application (i.e. a single text file with inline meta-data). It is able to import both text contents and meta-data (starred) variables.
Last updated 12 months ago
text-mining
4.61 score 27 stars 1 dependents 5 scripts 259 downloadsR.temis - Integrated Text Mining Solution
An integrated solution to perform a series of text mining tasks such as importing and cleaning a corpus, and analyses like terms and documents counts, lexical summary, terms co-occurrences and documents similarity measures, graphs of terms, correspondence analysis and hierarchical clustering. Corpora can be imported from spreadsheet-like files, directories of raw text files, as well as from 'Dow Jones Factiva', 'LexisNexis', 'Europresse' and 'Alceste' files.
Last updated 12 months ago
text-mining
4.51 score 27 stars 24 scripts 197 downloadstm.plugin.europresse - Import Articles from 'Europresse' Using the 'tm' Text Mining Framework
Provides a 'tm' Source to create corpora from articles exported from the 'Europresse' content provider as HTML files. It is able to read both text content and meta-data information (including source, date, title, author and pages).
Last updated 8 years ago
2.48 score 1 dependents 5 scripts 285 downloads