<?xml version="1.0" encoding="UTF-8"?><!-- generator="WordPress/2.9.2" -->
<rss version="0.92">
<channel>
	<title>Chunhao's Blog</title>
	<link>http://chunhao.net/blog</link>
	<description>Life happens, love helps…</description>
	<lastBuildDate>Thu, 23 Jul 2009 06:45:49 +0000</lastBuildDate>
	<docs>http://backend.userland.com/rss092</docs>
	<language>en</language>
	
	<item>
		<title>Partial Eclipse</title>
		<description><![CDATA[Thanks Newton for his discovery of the law of the universe, and so can we predict the occurrence of eclipses so accurately.
On July 22, the solar eclipse occurred in the Yangtze River Basin in China. However, in my hometown, there was only partial eclipse. Following are some photos taken by me, with the assistant of [...]]]></description>
		<link>http://chunhao.net/blog/partial-eclipse</link>
			</item>
	<item>
		<title>Are Your Fingers Long Enough to Use Vim?</title>
		<description><![CDATA[Note: I wrote this post just for fun. Please don’t take it seriously.  Thank you!  
Which editor are you using, Vim or Emacs? And why?
Next time, when you are arguing with other people about which editor is better and trying to persuade others to use your favorite editor, just forget it. If you [...]]]></description>
		<link>http://chunhao.net/blog/are-your-fingers-long-enough-to-use-vim</link>
			</item>
	<item>
		<title>Which email client are you using?</title>
		<description><![CDATA[What can you do for killing time? Maybe you could figure out what email clients your contacts are using. It&#8217;s funny.
Most people prefer webmail, especially Gmail. Webmail is very easy to use and does not require much configuration. Gmail is the most wonderful webmail. It groups messages by threads and it has a powerful searching [...]]]></description>
		<link>http://chunhao.net/blog/which-email-client-are-you-using</link>
			</item>
	<item>
		<title>Tools for Reading Sources</title>
		<description><![CDATA[Hacking is a good method of learning. And the pre-stage of hacking is reading the source code. You might have this experience: facing a large amount of source code (generally dozens of files), you don&#8217;t know how to start, or you even don&#8217;t know how to read them. A good tool is very helpful for [...]]]></description>
		<link>http://chunhao.net/blog/tools-for-reading-sources</link>
			</item>
	<item>
		<title>Burning Video DVD on Ubuntu</title>
		<description><![CDATA[Burning video DVD is harder than you think. It involves transcoding, subtitles, burning and other complicated things. Each step may not work for you, let alone on Ubuntu!
I just created my first video DVD. My experience may help you get less trouble on burning your video DVD on Ubuntu/Linux.
Most DVD videos are encoded in MPEG-2 [...]]]></description>
		<link>http://chunhao.net/blog/burning-video-dvd-on-ubuntu</link>
			</item>
	<item>
		<title>My Chinese Blog</title>
		<description><![CDATA[A new blog has been set up. I will post my Chinese articles there.
Here, only English contents will appear from now on. All the old articles in both languages here will be reserved. I will use this blog to post some technique articles related to Linux, programming in English. It should be a good way [...]]]></description>
		<link>http://chunhao.net/blog/my-chinese-blog</link>
			</item>
	<item>
		<title>How to write an auto-downloading script</title>
		<description><![CDATA[如何写一个自动下载的脚本？
用Linux有一个很好的地方就是你可以很方便地让计算机为你做体力活。当然，前提是你要了解Linux的思维方式。下面就是一个活生生的例子，大家可以看看用Linux是怎么做事情的。
如果你喜欢听评书，这里是个不错的地方。这个网站有很多评书，还有百家讲坛，甚至有声的金庸小说。上面的东西不光可以在线听，而且都可以免费下载，不用注册。但是，只能一个链接一个链接地下载。例如，如果想下载《大唐惊雷》第1回，就要进入一个页面，然后点下载；要下载第2回，就要进入另一个页面，然后再点下载。这样，就无法批量下载一整部评书了。
如果你用Windows，那么你要么人肉点击这些链接，一个一个地下载（可能有100+回）；要么就用C或者Java写一个程序自动下载，前提是你要懂网络编程的细节。
下面，我就说一说在Linux下如何用Shell写一个自动下载脚本：
首先要做的就是分析链接，这个网站上，包含有《大唐惊雷》从第1回到第100回的链接的页面都是很有规律的：
http://www1.5ips.net/down_19_001.htm
http://www1.5ips.net/down_19_002.htm
&#8230;
http://www1.5ips.net/down_19_100.htm
我们需要一个东西能从001递增到100，递增可以用一个for循环解决，但是要输出成001，002这样的就不是很直接。在C语言中，我们可以用printf(&#8220;%03d&#8221;, n)来解决，在Shell中也一样，可以用printf命令：

printf &#34;%03d&#34; $i

此时，包含1-100回链接的页面地址已经生成了，下一步就是要从这些页面中提取出下载链接。如果分析这些页面的源代码，我们可以发现下面代码：

&#60;li&#62;&#60;a href=&#34;http://dx23a.52ps.cn/pingshu/单田芳_大唐惊雷/单田芳_大唐惊雷_001.mp3?0000060.191.99.1203tflag=1235403802opin=5d44be8c1fba6271d12369537d33a135&#38;amp;ip=60.191.99.1.mp3&#34;&#62;&#60;font color=&#34;blue&#34;&#62;点此下载《大唐惊雷》第001回&#60;/font&#62;&#60;/a&#62;&#60;br /&#62;&#60;/li&#62;

我们可以用grep来获得这行。
那么，如何获得里面的下载链接？用sed或者awk也能达到目的，可以我都不会。我用了Python，string有一个split函数，这里按照引号(&#8220;)把这行分开，形成一个list，然后提取适当的元素就可以获得下载链接。用同样的方法还可以获得文件名，例如：&#8221;单田芳_大唐惊雷_001.mp3&#8243;。
大体的方法就是想法就是这样，下面就是具体的脚本：

#!/bin/sh
&#160;
siteurl=&#34;http://www1.5ips.net/&#34;
prefix=&#34;down_192_&#34;
startnum=1
endnum=100
&#160;
for i in `seq $startnum $endnum`
do
    preurl=$siteurl$prefix`printf &#34;%03d&#34; $i`&#34;.htm&#34;
    wget -q -O prehtml $preurl
    iconv -f GBK -t utf-8 prehtml &#124; grep 点此下载 &#38;gt; down_url_line
    down_url=`./geturl.py`
    filename=`./getname.py`
    echo &#34;Starting downloading $i...&#34;
 [...]]]></description>
		<link>http://chunhao.net/blog/how-to-write-an-auto-downloading-script</link>
			</item>
	<item>
		<title>Unify the themes</title>
		<description><![CDATA[My site is based on various applications. The blog is based on Wordpress, the wiki is based on Dokuwiki, the photo gallery the Simpleviewer. Other pages are written by hand.
Originally, each part had its own theme. It&#8217;s hard to acces other parts directly from one part. For example, there are no links from my blog [...]]]></description>
		<link>http://chunhao.net/blog/unify-the-themes</link>
			</item>
	<item>
		<title>How to synchronize with SSH</title>
		<description><![CDATA[I will share my experience in synchronizing files with SSH here. Following are demonstrated by synchronizing Dokuwiki. Of course, you can synchronize everything as you like.
The wiki I used in my homepage is Dokuwiki. The most great feature is that it does not require database connection. It&#8217;s very convenience for personal usage.
I have two Dokuwiki [...]]]></description>
		<link>http://chunhao.net/blog/how-to-synchronize-with-ssh</link>
			</item>
	<item>
		<title>Site moved here</title>
		<description><![CDATA[I have successfully backordered this new domain name “chunhao.net“, which is better to stand for me than my old domain name “chunhao86.cn”.
I am using 000webhost for the old site, which is really a good free host provider. However, due to my misoperation,  the new domain cannot be located in the free web host any more. [...]]]></description>
		<link>http://chunhao.net/blog/site-moved-here</link>
			</item>
</channel>
</rss>
