<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Keyvan Minoukadeh</title>
	<atom:link href="http://www.keyvan.net/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.keyvan.net</link>
	<description></description>
	<lastBuildDate>Tue, 26 Mar 2013 15:16:05 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	
		<item>
		<title></title>
		<link>http://www.keyvan.net/2013/03/first-aid-kit/</link>
		<comments>http://www.keyvan.net/2013/03/first-aid-kit/#comments</comments>
		<pubDate>Tue, 26 Mar 2013 15:14:05 +0000</pubDate>
		<dc:creator>Keyvan</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://www.keyvan.net/?p=1907</guid>
		<description><![CDATA[]]></description>
				<content:encoded><![CDATA[<p><iframe width="290" height="315" src="http://www.youtube.com/embed/At4HhgHtBHk" frameborder="0" allowfullscreen></iframe></p>
]]></content:encoded>
			<wfw:commentRss>http://www.keyvan.net/2013/03/first-aid-kit/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Term Extraction in PHP</title>
		<link>http://www.keyvan.net/2013/01/term-extraction-in-php/</link>
		<comments>http://www.keyvan.net/2013/01/term-extraction-in-php/#comments</comments>
		<pubDate>Sun, 20 Jan 2013 15:01:36 +0000</pubDate>
		<dc:creator>Keyvan</dc:creator>
				<category><![CDATA[Code]]></category>

		<guid isPermaLink="false">http://www.keyvan.net/?p=1902</guid>
		<description><![CDATA[The new version of the term extraction tool on fivefilters.org is now in PHP. Read the blog post explaining what&#8217;s new. For anyone looking for a simple way to carry out term extraction on English text using PHP, here&#8217;s a snippet using the PHP port of Topia&#8217;s Term Extractor: require 'TermExtractor/TermExtractor.php'; $text = 'Politics is [...]]]></description>
				<content:encoded><![CDATA[<p>The new version of the <a href="http://fivefilters.org/term-extraction/">term extraction tool</a> on fivefilters.org is now in PHP.</p>
<p>Read the <a href="http://blog.fivefilters.org/post/40840393725/term-extraction">blog post</a> explaining what&#8217;s new.</p>
<p>For anyone looking for a simple way to carry out term extraction on English text using PHP, here&#8217;s a snippet using the <a href="http://code.fivefilters.org/term-extraction">PHP port</a> of <a href="http://pypi.python.org/pypi/topia.termextract/">Topia&#8217;s Term Extractor</a>:</p>
<pre class="brush: php">
require 'TermExtractor/TermExtractor.php';

$text = 'Politics is the shadow cast on society by big business';

$extractor = new TermExtractor();
$terms = $extractor->extract($text);

// We're outputting results in plain text...
header('Content-Type: text/plain; charset=UTF-8');

// Loop through extracted terms and print each term on a new line
foreach ($terms as $term_info) {
  // index 0: term
  // index 1: number of occurrences in text
  // index 2: word count
  list($term, $occurrence, $word_count) = $term_info;
  echo "$term\n";
}
</pre>
]]></content:encoded>
			<wfw:commentRss>http://www.keyvan.net/2013/01/term-extraction-in-php/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Chris Hedges: Assault on Gaza is Not a War, it is Murder</title>
		<link>http://www.keyvan.net/2012/11/chris-hedges-assault-on-gaza-is-not-a-war-it-is-murder/</link>
		<comments>http://www.keyvan.net/2012/11/chris-hedges-assault-on-gaza-is-not-a-war-it-is-murder/#comments</comments>
		<pubDate>Sun, 18 Nov 2012 14:31:38 +0000</pubDate>
		<dc:creator>Keyvan</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://www.keyvan.net/?p=1890</guid>
		<description><![CDATA[via Jonathan Cook]]></description>
				<content:encoded><![CDATA[<p><iframe width="560" height="315" src="http://www.youtube.com/embed/z7kBN9Me4Cs" frameborder="0" allowfullscreen></iframe></p>
<p>via <a href="http://www.jonathan-cook.net/">Jonathan Cook</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.keyvan.net/2012/11/chris-hedges-assault-on-gaza-is-not-a-war-it-is-murder/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>PHP DOMDocument replace DOMElement contents with HTML string</title>
		<link>http://www.keyvan.net/2012/11/php-domdocument-replace-domelement-child-with-html-string/</link>
		<comments>http://www.keyvan.net/2012/11/php-domdocument-replace-domelement-child-with-html-string/#comments</comments>
		<pubDate>Wed, 14 Nov 2012 17:44:23 +0000</pubDate>
		<dc:creator>Keyvan</dc:creator>
				<category><![CDATA[Code]]></category>

		<guid isPermaLink="false">http://www.keyvan.net/?p=1880</guid>
		<description><![CDATA[This is another StackOverflow answer I&#8217;m moving over to my blog. AWinter asked: Using PHP I&#8217;m attempting to take an HTML string passed from a WYSIWYG editor and replace the children of an element inside of a preloaded HTML document with the new HTML. So far I&#8217;m loading the document identifying the element I want [...]]]></description>
				<content:encoded><![CDATA[<p><em>This is another StackOverflow answer I&#8217;m moving over to my blog.</em></p>
<p>AWinter asked:</p>
<blockquote><p>
Using PHP I&#8217;m attempting to take an HTML string passed from a WYSIWYG editor and replace the children of an element inside of a preloaded HTML document with the new HTML.</p>
<p>So far I&#8217;m loading the document identifying the element I want to change by ID but the process to convert an HTML to something that can be placed inside a DOMElement is eluding me.</p>
<pre>
$doc = new DOMDocument();
$doc->loadHTML($html);

$element = $doc->getElementById($item_id);
if(isset($element)){
    //Remove the old children from the element
    while($element->childNodes->length){
        $element->removeChild($element->firstChild);
    }

    //Need to build the new children from $html_string and append to $element
}
</pre>
</blockquote>
<p>My answer:</p>
<p>If the HTML string can be parsed as XML, you can do this (after clearing the element of all child nodes):</p>
<pre class="brush: php">
$fragment = $doc->createDocumentFragment();
$fragment->appendXML($html_string);
$element->appendChild($fragment);</pre>
<p>If <code>$html_string</code> cannot be parsed as XML, it will fail. If it does, you&#8217;ll have to use <code>loadHTML()</code>, which is less strict — but it will add elements around the fragment which you will have to strip.</p>
<p>Unlike PHP, Javascript has the <code>innerHTML</code> property which allows you to do this very easily. I needed something like it for a project so I extended PHP&#8217;s <code>DOMElement</code> to include <a href="http://www.keyvan.net/2010/07/javascript-like-innerhtml-access-in-php/">Javascript-like <code>innerHTML</code> access</a>.</p>
<p>With it you can access the <code>innerHTML</code> property and change it just as you would in Javascript:</p>
<pre class="brush: php">echo $element->innerHTML;
$elem->innerHTML = '<a href="http://example.org">example</a>';</pre>
]]></content:encoded>
			<wfw:commentRss>http://www.keyvan.net/2012/11/php-domdocument-replace-domelement-child-with-html-string/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Clean up HTML on paste in CKEditor</title>
		<link>http://www.keyvan.net/2012/11/clean-up-html-on-paste-in-ckeditor/</link>
		<comments>http://www.keyvan.net/2012/11/clean-up-html-on-paste-in-ckeditor/#comments</comments>
		<pubDate>Tue, 13 Nov 2012 13:52:56 +0000</pubDate>
		<dc:creator>Keyvan</dc:creator>
				<category><![CDATA[Code]]></category>

		<guid isPermaLink="false">http://www.keyvan.net/?p=1873</guid>
		<description><![CDATA[We use CKEditor at FiveFilters.org for our PastePad service. The idea is to allow users to paste content that&#8217;s not currently publically available on the web for processing with one of our web tools. This can be content that&#8217;s in a Word document, an email, or behind a paywall. CKEditor can automatically clean up HTML [...]]]></description>
				<content:encoded><![CDATA[<p>We use <a href="http://ckeditor.com">CKEditor</a> at <a href="http://fivefilters.org">FiveFilters.org</a> for our <a href="http://pastepad.fivefilters.org">PastePad</a> service. The idea is to allow users to paste content that&#8217;s not currently publically available on the web for processing with one of our web tools. This can be content that&#8217;s in a Word document, an email, or behind a paywall.</p>
<p>CKEditor can automatically clean up HTML it identifies as coming from MS Word, but there&#8217;s no way to force cleanup on all pasted content. By default, HTML cleanup occurs in the following two cases:</p>
<ol>
<li>User clicks the &#8216;paste from word&#8217; toolbar icon</li>
<li>User pastes content copied from MS Word itself</li>
</ol>
<p>In the second case, CKEditor looks for signs of MS Word formatting. It does this by testing whatever you paste against the following regular expression:</p>
<p><code>/(class=\"?Mso|style=\"[^\"]*\bmso\-|w:WordDocument)/</code></p>
<p>If there&#8217;s a match, it will be cleaned up. Otherwise it will paste as normal.</p>
<p>I want to avoid editing core files, so my solution is simply to ensure that this regular expression always matches pasted content. Here&#8217;s what I&#8217;ve come up with:</p>
<pre class="brush: js; gutter: false">
CKEDITOR.on('instanceReady', function(ev) {
    ev.editor.on('paste', function(evt) {    
        evt.data['html'] = '<!--class="Mso"-->'+evt.data['html'];
    }, null, null, 9);
});
</pre>
<p>I haven&#8217;t tested extensively, but this appears to work as expected (CKEditor 3.6.2). You can <a href="http://pastepad.fivefilters.org">try it out</a>.</p>
<p>What the code does is it registers a new listener for the paste event, just like the Paste from Word plugin. When it receives the pasted HTML, it simply prepends an HTML comment containing one of the strings the Paste from Word plugin looks for. The listener has a priority of 9 to ensure it runs before the plugin which will trigger the actual cleaning (default priority of 10).</p>
<p>Note: I posted this solution on StackOverflow as an alternative to another solution, titled &#8220;CKEditor &#8211; use pastefromword filtering on all pasted content.&#8221; StackOverflow recently deleted some of my answers (and hid them from me) so I&#8217;m moving the rest of my meagre contributions over to my own blog.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.keyvan.net/2012/11/clean-up-html-on-paste-in-ckeditor/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Push to Kindle e-mail service</title>
		<link>http://www.keyvan.net/2012/10/push-to-kindle-email-service/</link>
		<comments>http://www.keyvan.net/2012/10/push-to-kindle-email-service/#comments</comments>
		<pubDate>Mon, 29 Oct 2012 00:14:16 +0000</pubDate>
		<dc:creator>Keyvan</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://www.keyvan.net/?p=1837</guid>
		<description><![CDATA[Push to Kindle, FiveFilters.org&#8217;s web service for sending web articles to your Kindle, can now also be used by e-mail. The email service is aimed at iPad and iPhone users. Here&#8217;s a video showing you how to use it on your iPad or iPhone: Step by step On your device, load an article you&#8217;d like [...]]]></description>
				<content:encoded><![CDATA[<p><a href="http://fivefilters.org/kindle-it/">Push to Kindle</a>, FiveFilters.org&#8217;s web service for sending web articles to your Kindle, can now also be used by e-mail. The email service is aimed at iPad and iPhone users. </p>
<p>Here&#8217;s a video showing you how to use it on your iPad or iPhone:</p>
<p><iframe width="500" height="305" src="http://www.youtube.com/embed/u1asTphZ4Ls?rel=0" frameborder="0" allowfullscreen></iframe></p>
<h2>Step by step</h2>
<ol>
<li>On your device, load an article you&#8217;d like to send to your Kindle</li>
<li>Choose share page</li>
<li>In the list of options presented, select Mail</li>
<li>Enter your Kindle email address but instead of <tt>@kindle.com</tt>, enter <tt>@pushtokindle.com</tt></li>
<li>Send!</li>
</ol>
<p>Changing the ending to <tt>@pushtokindle.com</tt> in step 4 ensures our service processes the article first and then sends it to your Kindle account.</p>
<p>The first time you do this, you&#8217;ll receive an email from FiveFilters.org asking you to confirm the address you&#8217;re sending from. After confirming, you&#8217;ll have the opportunity to save your Push to Kindle email address in your contacts list to make future sending easier. (Simply typing &#8216;kin&#8217; in to the To: field should show your Push to Kindle address as an option.)</p>
<p><img src="http://fivefilters.org/kindle-it/images/email-ipad-example.png" /><br />
If you own a 3G Kindle device and you want to make sure you will not be charged by Amazon, please send to @free.pushtokindle.com. (For the time being we are only sending to @free.kindle.com, but this might change in future.)</p>
<h2>Why an e-mail service?</h2>
<p>We already have a Push to Kindle <a href="https://play.google.com/store/apps/details?id=org.fivefilters.kindleit">Android app</a>. It adds &#8216;Push to Kindle&#8217; as an entry in your device&#8217;s share menu, so whenever you want to send a web article to your Kindle, you bring up the share menu and choose Push to Kindle.</p>
<p>We considered doing the same for iOS and other mobile devices, but decided to focus on email for two reasons:</p>
<ol>
<li>Unlike Android, iOS and Windows Phone operating systems do not yet allow apps to add entries to the share menu.</li>
<li>The share menu on most mobile devices does, however, include e-mail as an option</li>
</ol>
<h2>Pricing</h2>
<p>The first 25 articles processed by our e-mail service are free, after that you&#8217;ll be asked to purchase credits — this allows us to maintain the service.</p>
<p>100 credits cost 1.5€ (around £1.20 or $2)</p>
<p>Each article sent uses 1 credit. You will receive an email notice when your credits are low.</p>
<p>Note: credits are linked to the email address you send from, not your Kindle address.</p>
<h2>Compared to Amazon&#8217;s email service</h2>
<p><a href="http://www.amazon.com/gp/sendtokindle/email">Amazon&#8217;s Send to Kindle email service</a> currently works by accepting documents as attachments to an email message.</p>
<p>Web articles you read online are usually not in a format that can be sent to your Kindle account directly. They need to be cleaned up and converted to a suitable format first. That&#8217;s what our Push to Kindle service does. We take care of extracting the content and converting the article to a suitable format for your Kindle. We then send the result as an attachment to your Kindle account.</p>
<h2>Bear in mind</h2>
<p>We&#8217;re working to integrate this service with our <a href="https://member.fivefilters.org/plans.php">sustainer membership</a>. Once that&#8217;s done this service will be free for new and existing sustainers.</p>
<p>All articles are currently considered equal: 1 credit = 1 article. In the future this may change. For example, in line with our goal to encourage use of non-corporate sources, we&#8217;ll be white listing many non-corporate sources so no credits will be used if you process articles from these sources. Conversely, we may deduct more credits for articles originating from corporate sources.</p>
<p>Please consider this an experimental service. Let us know if you experience any issues and we&#8217;ll be happy to help. Email help@fivefilters.org.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.keyvan.net/2012/10/push-to-kindle-email-service/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Push to Kindle supports sending to Duokan</title>
		<link>http://www.keyvan.net/2012/10/push-to-kindle-duokan/</link>
		<comments>http://www.keyvan.net/2012/10/push-to-kindle-duokan/#comments</comments>
		<pubDate>Thu, 04 Oct 2012 23:33:45 +0000</pubDate>
		<dc:creator>Keyvan</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://www.keyvan.net/?p=1826</guid>
		<description><![CDATA[Our Push to Kindle service has been updated to enable delivery to iduokan.com addresses &#8212; a system similar to Amazon&#8217;s personal documents service but designed to work with the Duokan software. Note: Sending to @iduokan.com addresses has been enabled on our web app. The Android app has not yet been updated. Thanks to Daniel Żołopa [...]]]></description>
				<content:encoded><![CDATA[<p>Our <a href="http://fivefilters.org/kindle-it/">Push to Kindle</a> service has been updated to enable delivery to iduokan.com addresses &mdash; a system similar to Amazon&#8217;s personal documents service but designed to work with the <a href="http://wiki.mobileread.com/wiki/Duokan_Kindle">Duokan</a> software.</p>
<p>Note: Sending to @iduokan.com addresses has been enabled on our web app. The Android app has not yet been updated.</p>
<p>Thanks to Daniel Żołopa for testing.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.keyvan.net/2012/10/push-to-kindle-duokan/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Full-Text RSS 3.0</title>
		<link>http://www.keyvan.net/2012/09/full-text-rss-3/</link>
		<comments>http://www.keyvan.net/2012/09/full-text-rss-3/#comments</comments>
		<pubDate>Wed, 05 Sep 2012 14:03:37 +0000</pubDate>
		<dc:creator>Keyvan</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://www.keyvan.net/?p=1769</guid>
		<description><![CDATA[Full-Text RSS 3.0 is now available. What is it? Full-Text RSS is a free software PHP application to help you extract content from web pages. It can extract content from a standard HTML page and return a 1-item feed or it can transform an existing feed into a full-text feed. It&#8217;s used primarily by news [...]]]></description>
				<content:encoded><![CDATA[<p><a href="http://fivefilters.org/content-only/">Full-Text RSS 3.0 is now available</a>.</p>
<h2>What is it?</h2>
<p>Full-Text RSS is a <a href="http://www.gnu.org/philosophy/free-sw.html">free software</a> PHP application to help you extract content from web pages. It can extract content from a standard HTML page and return a 1-item feed or it can transform an existing feed into a full-text feed.</p>
<p>It&#8217;s used primarily by news enthusiasts and developers. </p>
<p>It&#8217;s used by news enthusiasts who dislike partial web feeds &#8211; feeds which require them to read the full story on a different site, rather than their preferred application. Full-Text RSS can convert these feeds to full-text versions, allowing the reader to stay in his/her preferred environment to read the full story.</p>
<p>It&#8217;s used by developers building applications which need an article extraction component. It allows developers to retrieve and process only the content they&#8217;re interested in.</p>
<h2>Demo</h2>
<p><a href="http://fivefilters.org/content-only/" title="Full-Text RSS 3.0">Try it out</a> &#8211; enter a URL in the form and hit &#8216;Create Feed&#8217;.</p>
<h2>What&#8217;s new in 3.0</h2>
<h3>Extraction</h3>
<dl>
<dt>Multi-page support</dt>
<dd>Many web sites now split their articles into a number of pages. In earlier version of Full-Text RSS we&#8217;d added support for retrieving the single-page view and extracting content from that page. For sites which do not offer such a single-page view, we can now follow the &#8216;next page&#8217; links and build up the full article page by page.</p>
<p>Multi-page support currently works by specifying a next_page_link in the site config file associated with the website you are extracting from.</p>
<p>Examples:</p>
<pre><code>next_page_link: //a[@id='next-page']
next_page_link: //a[contains(text(), 'Next page')]</code></pre>
</dd>
<dt>HTML5 parser: html5lib</dt>
<dd>
By default we still rely on PHP&#8217;s fast libxml parser. For sites where this proves problematic, you can now specify <a href="http://code.google.com/p/html5lib/">html5lib</a> &#8211; a PHP implementation of a HTML parser based on the HTML5 spec.</p>
<p>Example:</p>
<p><code>parser: html5lib</code>
</dd>
<dt>Better AJAX handling</dt>
<dd>
Full-Text RSS does not interpret any Javascript it comes across when fetching pages. To get at the content, we expect it to be marked up in HTML. Some sites have started relying on the user&#8217;s browser and its Javascript support to load page content. For pages which load content in this way, Google suggests that the publisher also offers the content in plain HTML so Google&#8217;s search engine crawlers can access it. <a href="https://developers.google.com/webmasters/ajax-crawling/docs/specification">Google&#8217;s spec</a> contains two possible triggers which will guide Google&#8217;s crawlers to the HTML version.</p>
<p>The first trigger appears in the URL, these URLs are often called &#8216;hashbang&#8217; URLs. Example: https://twitter.com/#!/search-home</p>
<p>The second trigger can appear in the HTML header: Example: <code>&lt;meta name="fragment" content="!"&gt;</code></p>
<p>When encountered, these triggers will result in a new URL being generated, what Google terms an &#8216;Ugly URL&#8217;. The new URL will contain additional query string parameters to to indicate to the server that the plain HTML version is being requested.</p>
<p>Earlier versions of Full-Text RSS looked for the first trigger (&#8216;hashbang&#8217; in the URL) but not the second trigger. Full-Text RSS 3.0 now handles both.
</dd>
<dt>Site config extraction patterns updated</dt>
<dd>
Site config files are used to fine-tune extraction where autodetection doesn&#8217;t always work. There are now over 700 site config files. Many old ones have been updated and new ones added.</p>
<p>We also now look for OpenGraph title and date elements.
</dd>
</dl>
<h3>Developers</h3>
<dl>
<dt>Cross-origin resource sharing (CORS) support</dt>
<dd>If Full-Text RSS is hosted on an a different domain to your application. Enabling CORS will allow your application to request JSON results from Full-Text RSS directly from the user&#8217;s browser. Avoiding the browser&#8217;s <a href="http://en.wikipedia.org/wiki/Same_origin_policy">same origin policy</a>.</p>
<p>To enable CORS, look at <code>$options-&gt;cors</code> in the config file.</dd>
<dt>JSONP support</dt>
<dd>The old way of circumventing the browser&#8217;s same origin policy was to use JSONP. You can do this by requesting JSON (<code>&amp;format=json</code>) with an additional callback function (<code>&amp;format=json&amp;callback=functionName</code>).</dd>
<dt>Global site config</dt>
<dd>
The global site config accepts everything a regular site config file does, but it&#8217;s applied to all sites, whether or not a specific site config matches.</p>
<p>The global site config file should be named <code>global.txt</code> and placed inside the relevant <code>site_config/</code> subfolder.
</dd>
<dt>Site config merging</dt>
<dd>
Site config files are used to fine-tune extraction where autodetection doesn&#8217;t always work. </p>
<p>Previous version of Full-Text RSS looked for site config files in the following order:</p>
<ol>
<li>URL hostname match or wildcard match in the <code>site_config/custom/</code></li>
<li>URL hostname match or wildcard match in the <code>site_config/standard/</code></li>
<li>fingerprint match (HTML fragment mapping to hostname) in <code>site_config/custom/</code></li>
<li>fingerprint match (HTML fragment mapping to hostname) in <code>site_config/standard/</code></li>
</ol>
<p>As soon as an entry was matched, we&#8217;d process it, return it, and stop looking.</p>
<p>In Full-Text RSS 3.0, we follow the same order, but continue looking even if there&#8217;s a match. We build up the site config by appending any new entries we find. In addition, we also look for and combine global site config files:</p>
<ol start="5">
<li>global rules in <code>site_config/custom/global.txt</code></li>
<li>global rules in <code>site_config/standard/global.txt</code></li>
</ol>
<p>To prevent this behaviour, you can enter <code>autodetect_on_failure: no</code> in the site config file. This will end the chain. The config files before and including this one will be loaded and merged, but no others.
</dd>
<dt>XSS filtering</dt>
<dd>
We have not enabled XSS filtering by default because we assume the majority of our users do not display the HTML retrieved by Full-Text RSS in a web page without further processing. If you subscribe to our generated feeds in your news reader application, it should, if it&#8217;s good software, already filter the resulting HTML for XSS attacks, making it redundant for Full-Text RSS do the same. Similarly with frameworks/CMS which display feed content &#8211; the content should be treated like any other user-submitted content.</p>
<p>If you are writing an application yourself which is processing feeds generated by Full-Text RSS, you can either filter the HTML yourself to remove potential XSS attacks or enable this option. This might be useful if you are processing our generated feeds with JavaScript on the client side &#8211; although there&#8217;s client side xss filtering available too, e.g. <a href="https://code.google.com/p/google-caja/wiki/JsHtmlSanitizer">JsHtmlSanitizer</a></p>
<p>If enabled, we&#8217;ll pass retrieved HTML content through htmLawed with safe flag on and style attributes denied, see <a href="http://www.bioinformatics.org/phplabware/internal_utilities/htmLawed/htmLawed_README.htm#s3.6">htmLawed&#8217;s readme</a>.</p>
<p>Note: if enabled this will also remove certain elements you may want to preserve, such as iframes.
</dd>
<dt>Site config editor</dt>
<dd>
Full-Text RSS 3.0 now comes with a site config editor available in the admin area (accessible via the admin/ folder). This lets you find, edit, and test existing site config files, or add new ones.</p>
<p>Note: We suggest you make changes to the site config files using a local installation of Full-Text RSS and upload the results to your server when ready. Site config files are simple text files stored on disk. Cloud hosting environments do not always offer persistent file storage, so changes made to a hosted copy on such environments may be lost.
</dd>
<dt>Debug mode</dt>
<dd>
Debug mode allows you to see what happens behind the scenes when Full-Text RSS is running. This is useful if you want to see things such as:</p>
<ul>
<li>URL redirects</li>
<li>Which site config files are loaded</li>
<li>Whether the single_page_link and next_page_link expressions match</li>
<li>Which XPath expression end up matching title, body, date, author</li>
</ul>
</dd>
</dl>
<h3>Performance</h3>
<dl>
<dt>Site config caching in APC</dt>
<dd>
If you run Full-Text RSS in a hosting environment which has APC enabled, it can take advantage of APC&#8217;s user cache &#8211; a memory cache. If enabled we will store site config files (when requested for the first time) in APC&#8217;s user cache &#8211; avoiding disk access on subsequent requests. See <code>$options-&gt;apc</code> in the config file to enable. Keys in APC are prefixed with &#8216;sc.&#8217;</p>
<p>Note: <code>$options-&gt;apc</code> has no effect if APC is unavailable on your server.
</dd>
<dt>Smart cache (experimental)</dt>
<dd>
If you enable caching and APC, you can also try out the experimental smart cache. The intention here  is, again, to reduce disk access. With this enabled we will not write Full-Text RSS&#8217;s results to disk straight away, instead we&#8217;ll store the generated cache key in APC&#8217;s user cache for 10 minutes. If a subsequent request comes in matching the cache key, we&#8217;ll write the result to disk. Requests after that matching the cache key will be loaded from disk. See <code>$options-&gt;smart_cache</code> in the config file to enable. Keys in APC are prefixed with &#8216;cache.&#8217;</p>
<p>Note: this has no effect if APC is disabled or unavailable on your server, or if you have caching disabled.
</dd>
</dl>
<h3><a href="http://www.youtube.com/watch?v=9ntPxdWAWq8">Cloud ready</a></h3>
<dl>
<dt>Host for free on AppFog</dt>
<dd>
<a href="http://appfog.com/">AppFog</a> offer users free hosting with 2GB RAM. That&#8217;s more than enough to run Full-Text RSS for most users.</p>
<p>To get started: </p>
<ol>
<li>Create a free account</li>
<li>Install the AppFog command-line client (<tt>af</tt>)</li>
<li>Change into the Full-Text RSS folder</li>
<li>Type <tt>af push</tt></li>
<li>Follow the prompts and you&#8217;re done.</li>
</ol>
<p>Note: if you get a 701 error saying the URL has been taken, edit <code>manifest.yml</code> and comment out the line starting with <code>name:</code> and <code>url:</code> by inserting a hash sign (<code>#</code>) at the beginning of the line. Save and try again. This time <tt>af</tt> will prompt you for an application name and URL.</p>
</dd>
<dt>Override config options with environment variables</dt>
<dd>
Most of the config options in the config file can now be overridden with environment variables. When creating environment variables, use the option name prefixed with &#8216;<code>ftr_</code>&#8216;. For example, to override <code>$options-&gt;max_entries</code> and limit the maximum to 2, create an environment variable with key <code>ftr_max_entries</code> and value <code>2</code>.
</dd>
</dl>
<h3>What didn&#8217;t make it</h3>
<dl>
<dt>No monitored feeds</dt>
<dd>
One feature which didn&#8217;t make this release is the ability to create monitored feeds with PubSubHubbub support. This was specifically to improve the speed with which generated feeds updated within Google Reader&#8217;s system. Unfortunately this feature is not yet ready &#8211; we&#8217;ve not had great results in our tests, so won&#8217;t be releasing until we&#8217;re happy.
</dd>
<dt>Config options removed</dt>
<dd>
The following config options were removed:</p>
<ul>
<li><code>$options-&gt;restrict</code></li>
<li><code>$options-&gt;message_to_prepend_with_key</code></li>
<li><code>$options-&gt;message_to_append_with_key</code></li>
<li><code>$options-&gt;error_message_with_key</code></li>
<li><code>$options-&gt;alternative_url</code></li>
</ul>
</dd>
<dt>No extraction with CSS selector</dt>
<dd>
You can no longer specify what should get extracted with a CSS selector passed in the querystring.
</dd>
]]></content:encoded>
			<wfw:commentRss>http://www.keyvan.net/2012/09/full-text-rss-3/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Push to Kindle: some stats</title>
		<link>http://www.keyvan.net/2012/07/push-to-kindle-some-stats/</link>
		<comments>http://www.keyvan.net/2012/07/push-to-kindle-some-stats/#comments</comments>
		<pubDate>Thu, 26 Jul 2012 01:09:26 +0000</pubDate>
		<dc:creator>Keyvan</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://www.keyvan.net/?p=1752</guid>
		<description><![CDATA[Our Push to Kindle service has become quite popular since we launched. Over 25,000 people currently use our Chrome extension, 7,000 use the Firefox extension and over 2,000 have installed our Android app. I recently decided to check how much of the content processed by our Push to Kindle service comes from corporate news sources. [...]]]></description>
				<content:encoded><![CDATA[<p>Our <a href="http://fivefilters.org/kindle-it/">Push to Kindle</a> service has become quite popular since we launched. Over 25,000 people currently use our <a href="https://chrome.google.com/webstore/detail/pnaiinchjaonopoejhknmgjingcnaloc">Chrome extension</a>, 7,000 use the <a href="https://addons.mozilla.org/en-US/firefox/addon/kindle-it/">Firefox extension</a> and over 2,000 have installed our <a href="https://market.android.com/details?id=org.fivefilters.kindleit">Android app</a>.</p>
<p>I recently decided to check how much of the content processed by our Push to Kindle service comes from corporate news sources. Here&#8217;s what I found:</p>
<table>
<thead>
<tr>
<th>rank</th>
<th>domain</th>
<th>percentage</th>
</thead>
<tbody>
<tr>
<td>#1</td>
<td>nytimes.com</td>
<td>2.62%</td>
</tr>
<tr>
<td>#4</td>
<td>guardian.co.uk</td>
<td>1.32%</td>
</tr>
<tr>
<td>#15</td>
<td>bbc.co.uk</td>
<td>0.51%</td>
</tr>
<tr>
<td>#48</td>
<td>telegraph.co.uk</td>
<td>0.22%</td>
</tr>
<tr>
<td>#97</td>
<td>independent.co.uk</td>
<td>0.11%</td>
</tr>
</tbody>
</table>
<p>This is based on data collected over a period of 3 weeks.</p>
<p>I&#8217;m glad to see our users do not rely too much on corporate news sources. However, as the main goal of the FiveFilters.org project is to promote independent, non-corporate media, I&#8217;ll be thinking about ways to direct people to non-corporate sources of news and analysis in future updates. </p>
<p>For the time being, if a New York Times article is loaded, I&#8217;ve added a tab with links to <a href="http://www.nytexaminer.com/">The NYTimes eXaminer</a> (&#8216;An antidote to the &#8220;paper of record&#8221;&#8216;). Similarly, if an article from The Guardian, BBC or Independent is loaded, users will see a tab with links to <a href="http://medialens.org/">Medialens</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.keyvan.net/2012/07/push-to-kindle-some-stats/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Send web articles to multiple Kindle devices</title>
		<link>http://www.keyvan.net/2012/03/send-articles-to-multiple-kindle-devices/</link>
		<comments>http://www.keyvan.net/2012/03/send-articles-to-multiple-kindle-devices/#comments</comments>
		<pubDate>Tue, 27 Mar 2012 13:11:01 +0000</pubDate>
		<dc:creator>Keyvan</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://www.keyvan.net/?p=1724</guid>
		<description><![CDATA[We&#8217;ve just updated our Kindle It service to allow you to send web articles to up to 5 Kindle devices in one go. Last December Amazon enabled its Kindle Personal Documents Service for iPhone/iPad users, assigning each device a new email address, and this month the same feature has been enabled for Android users. Our [...]]]></description>
				<content:encoded><![CDATA[<p>We&#8217;ve just updated our <a href="http://fivefilters.org/kindle-it/">Kindle It</a> service to allow you to send web articles to up to 5 Kindle devices in one go. </p>
<p>Last December Amazon enabled its <a href="http://www.amazon.co.uk/gp/help/customer/display.html?nodeId=200767360">Kindle Personal Documents Service</a> for iPhone/iPad users, assigning each device a new email address, and this month the same feature has been enabled for Android users. Our Kindle It service has up to now been able to send to only one Kindle email address at a time, but as of today you can enter up to 5 addresses (separated by commas):</p>
<p><img src="http://www.keyvan.net/wp-content/uploads/2012/03/multi-address-kindle-it.jpg" alt="" title="Kindle It sends to multiple addresses" width="560" height="235" class="alignnone size-full wp-image-1728" style="border: 1px solid #999" /><br />
This will also work with our <a href="https://play.google.com/store/apps/details?id=org.fivefilters.kindleit">Push to Kindle</a> Android app (no update necessary).</p>
<p><a href="http://help.fivefilters.org/">Let us know</a> if you have any trouble.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.keyvan.net/2012/03/send-articles-to-multiple-kindle-devices/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>


<!-- W3 Total Cache: Minify debug info:
Engine:             disk: basic
Theme:              d5be0
Template:           index
-->
<!-- Performance optimized by W3 Total Cache. Learn more: http://www.w3-edge.com/wordpress-plugins/

Page Caching using disk: enhanced

Served from: www.keyvan.net @ 2013-06-18 23:51:23 -->