TeknologikL Blog

TeknologikL

"Plug it, play it, burn it, rip it"
TeknologikL is a place for conversation and discussion about new technologies emerging in consumer electronics with a focus on high-definition video and audio. The blog will cover topics including home theater equipment, digital distribution, media streaming, electronic product reviews and more.

The blog's owners are constantly searching for the next device to satisfy their ever growing hunger for technology. Media junkies standing on the edge of reality, ready to take the jump.

Previous in Blog: 5 Reasons to Not Root Your Android Device   Next in Blog: Can You Quit Your Cell Phone?
Close
Close
Close
6 comments
Rate Comments: Nested

Apache, PHP, Python, Web Crawling, and Data Extraction with Raspberry Pi

Posted July 03, 2012 12:00 AM by jweis

Having spent some more time working with my Raspberry Pi, I begin to get a better idea for what the little device is capable of. The download section on the Raspberry Pi Foundation's website lists three customized distributions of Linux for use on the Raspberry Pi. While I am very interested in trying out the version of Arch Linux, some technical issues with the image have kept me from it and so I have continued working with the Debian release. In addition to what is listed on the Raspberry Pi organization's website, there are also beta and release candidate builds for spin-offs of the media centered XBMC variation of Linux, but that's a different topic.

The Raspberry Pi may not contain the most advanced (or fastest) hardware, but closer examination indicates that it is capable of providing basic server resources effectively and provides a satisfying command line user interface. Apache and PHP5 installed with a small amount of setup time and with the only unique complication of a mysterious failure to automatically create the "www-data" group as a part of the installation. Ideally, the group should have been created as a part of the Apache configuration and installation, but it is something that can be handled manually afterwards as well. Performance wise, the Raspberry Pi appears to take its biggest hit during compilation/ configuration/ installation, but this performance decrease is not as painful as what occurs when attempting to load the web browser in the GUI. It feels a little slow, but it certainly gets the job done and the system does not appear to have crashed.

With Apache installed I was able to test the responsiveness of the Raspberry Pi as a web server. The server was responsive and had absolutely no problems loading 4kb HTML documents on the fly (not exactly a complex task to do in the first place.) Once I had PHP5 installed, I then transferred some photos and a PHP image gallery script to the Raspberry Pi server to test. Initially the gallery script failed to work properly, but this was a matter of user error as I realized that the script was referencing a locally stored copy of JQuery, which was not transferred with the rest of the files. This error was easily remedied by retrieving the file from another local machine using the built in command line SFTP application. At this point the image gallery was now up and running.

The image gallery script itself is fairly basic: PHP is used to read the file listing of the current directory and create a hidden HTML list of image filenames. The gallery script ran responsively on the Raspberry Pi with the only time delay occurring when it came to loading the actual individual image files - a point in which the data transfer must be touching on any number of potential bottlenecks (SD card read speed, processor speed, RAM quantity, network connection speed, etc.) But even while the transfer speed was less than ideal, it did come across as being acceptable.

The next experiment was to install Python on the Raspberry Pi and to see what I could break by running web crawlers which were created for desktop machines. This was fun, in that I actually did manage to break things, but it was also a successful test, in that the Raspberry Pi handled Python well. Given Python's platform independent nature, transferring a basic Python script or application over to a Raspberry Pi is not an overwhelmingly complex process and has a few basic steps:

    1. Install Python.
    2. Install additional dependency modules as needed.
    3. Run code.

While I am confident that not every script transfer will be as quick and simple, it is good to see that it can be.

I attempted to run two different web crawlers three times as a part of testing out Python on Raspberry Pi. The first attempt may have been victimized by the crawled site being non-responsive (and otherwise error prone - later analysis of the website indicated that there were a few server side issues occurring.) The second attempt successfully crawled 30 - 40 pages and then triggered a kernel panic in the operating system. I later decided that I wanted to trigger the kernel panic a second time, and so I ran the second crawler script again. Naturally, given that I wanted it to fail, the script ran for 30 hours, crawled 4,206 web pages, and extracted data for 32,000+ points of interest. I would have liked to have been able to spend some time testing out the Raspberry Pi's video capabilities at this point, but the little device just had to be difficult and run the web crawler flawlessly instead.

All in all, the $35 computer made a fairly decent web server and Python machine. It could be fun to set up a cluster of Raspberry Pi's for a little distributed computing, but now I want to take a good look at its video handling capabilities.

Reply

Interested in this topic? By joining CR4 you can "subscribe" to
this discussion and receive notification when new comments are added.
Power-User

Join Date: Jun 2008
Location: Kentucky Lake
Posts: 390
Good Answers: 26
#1

Re: Apache, PHP, Python, Web Crawling, and Data Extraction with Raspberry Pi

07/04/2012 1:57 AM

This will be the next computer I build! Truly amazing! Didn't know they existed, thanks!

Reply
Guru
Panama - Member - New Member Hobbies - CNC - New Member Engineering Fields - Marine Engineering - New Member Engineering Fields - Retired Engineers / Mentors - New Member

Join Date: Dec 2006
Location: Panama
Posts: 4273
Good Answers: 213
#2

Re: Apache, PHP, Python, Web Crawling, and Data Extraction with Raspberry Pi

07/04/2012 6:10 PM

What is your usb data transfer speed? I am looking for a 115,200 serial data rate for controlling my 3D printer, so I don't have to tie up my regular computer when printing for several hours. It turns out that I can install an SD card reader for about the same cost as the Raspberry Pi, but that doesn't give me a monitoring feature, or an ability to interact on the fly (i.e., pausing a print to change colors, for example).

Of course, I could always buy myself a tablet, but that would be over-kill (and the wife would more likely approve a $40 over a $300 purchase...

Reply
Member

Join Date: Jun 2012
Posts: 9
Good Answers: 1
#4
In reply to #2

Re: Apache, PHP, Python, Web Crawling, and Data Extraction with Raspberry Pi

07/06/2012 11:07 AM

It's hard to determine what the USB transfer speed is itself as opposed to what may be a part of the SD card transfer speed. However, I have been able to fluidly stream video encoded at 2.5MB/sec directly from a USB flash drive (and have the expectation that the potential transfer speed should be significantly higher.)

Reply
Anonymous Poster #1
#3

Re: Apache, PHP, Python, Web Crawling, and Data Extraction with Raspberry Pi

07/05/2012 11:51 AM

Another great (free and in Python) web framework is web2py and it should work nicely on this.

Reply
Participant

Join Date: Dec 2014
Posts: 1
#5

Re: Apache, PHP, Python, Web Crawling, and Data Extraction with Raspberry Pi

12/28/2014 10:20 AM

You attempted to run two different web crawlers, I would like to try something like this. Which webcrawlers did you test??

Reply
Associate
Australia - Member -

Join Date: Sep 2010
Location: Western Austraila
Posts: 34
#6
In reply to #5

Re: Apache, PHP, Python, Web Crawling, and Data Extraction with Raspberry Pi

09/23/2017 11:22 PM

You attempted to run two different web crawlers, I would like to try something like this. Which webcrawlers did you test??

YaCy would work and I think wget can be made to spider the server.

__________________
smokingwheels
Reply
Reply to Blog Entry 6 comments
Copy to Clipboard

Users who posted comments:

AdriaanB (1); Barchetta (1); cwarner7_11 (1); jweis (1); smokingwheels (1)

Previous in Blog: 5 Reasons to Not Root Your Android Device   Next in Blog: Can You Quit Your Cell Phone?

Advertisement