Term Extraction in PHP

The new version of the term extraction tool on fivefilters.org is now in PHP.

Read the blog post explaining what’s new.

For anyone looking for a simple way to carry out term extraction on English text using PHP, here’s a snippet using the PHP port of Topia’s Term Extractor:

require 'TermExtractor/TermExtractor.php';

$text = 'Politics is the shadow cast on society by big business';

$extractor = new TermExtractor();
$terms = $extractor->extract($text);

// We're outputting results in plain text...
header('Content-Type: text/plain; charset=UTF-8');

// Loop through extracted terms and print each term on a new line
foreach ($terms as $term_info) {
  // index 0: term
  // index 1: number of occurrences in text
  // index 2: word count
  list($term, $occurrence, $word_count) = $term_info;
  echo "$term\n";
}
This entry was posted in Code. Bookmark the permalink. Trackbacks are closed, but you can post a comment.

Post a Comment

Your email is never published nor shared. Required fields are marked *

*
*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

Subscribe without commenting