ReadWriteHack

Pattern: A Bundle of Data Mining Modules for Python

Pattern is a collection of open source (BSD license) web mining modules for Python from the Computational Linguistics and Psycholinguistics Research Center. It contains tools for data retrieval, text analysis and data visualization and comes with over 30 sample scripts.

Pattern consists of six main modules:

  • pattern.web: A toolkit that includes APIs for various Web services, including Google, Gmail, Bing, Twitter Wikipedia and Flickr. It has its own HTML parser and Web spider.
  • pattern.table: A module for working with tabular data, used for storing data from the pattern.web module.
  • pattern.en: A natural language processing toolkit for English.pattern.search: A module containing a search algorithm.
  • pattern.vector: A module containing various tools for analyzing the text of a document.
  • pattern.graph: A module for data visualization using Canvas.

ReadWriteWeb encourages comments, but please remember: Keep it nice, keep it clean, and avoid promotional comments. We do pre-moderate some comments with links. For more information, please read our full comment policy.
blog comments powered by Disqus
Recommended Story
RWW SPONSORS



RWW PARTNERS