[Discuss] Having trouble browsing a backed-up MediaWiki (PHP)
site.
Alan W. Irwin
irwin at beluga.phys.uvic.ca
Thu Sep 28 09:55:32 PDT 2006
On 2006-09-28 05:12-0000 Michael Foltinek wrote:
> On 9/28/06, Alan W. Irwin <irwin at beluga.phys.uvic.ca> wrote:
> SNIP
>> (unless you substitute %3f for the question mark). If you figure out how
>> to
>> access the URL without making the substitution, please let me know.
>
> I'm guessing that you're butting up against something in the HTML
> specification where the question mark is a special character that
> implies that you're asking for a CGI script and feeding it data.
Exactly. According to the w3c specifications, question marks in URL's receive
special interpretation.
>
> If you're just looking to access the content of the saved pages, why
> not approach the problem from the other end and just rename the files
> by replacing the question mark?
Good idea which avoids screwing around with mod_rewrite or other apache
complications. Further search of the wget info pages showed they had an
option for escaping question marks in filenames. Apparently, we have windows
to thank for this option since question marks aren't allowed for that case,
but is also solves a major browsing problem on the Unix side of things.
So for what it is worth, here is the final wget script that stores a static
version of this particular MediaWiki site in a form that can be browsed with
no internal broken links either as a set of files or using apache with the
ForceType text/html directive for the directory:
********
#!/bin/sh
# convenient form of wget command to backup
# http://www.miscdebris.net/plplot_wiki
# Recurse up to 5 levels deep, don't get any html above
# plplot_wiki, get the files necessary to display all pages,
# convert links to the locally downloaded version, windows==> escape a long
# list of characters (including question mark) from file names,
# start local hierarchy at $DIR, and keep a log of the transaction in
# $DIR/plplot_wiki_backup.log
DIR=plplot_wiki
# Need to create the directory first if logging to a file in it.
mkdir -p $DIR
wget --recursive --level=5 --no-parent \
--page-requisites --convert-links \
--restrict-file-names=windows \
--no-host-directories --cut-dirs=1 \
--directory-prefix=$DIR \
-o $DIR/plplot_wiki_backup.log \
http://www.miscdebris.net/plplot_wiki
********
File results are
plplot_wiki/index.php at title=Main_Page
plplot_wiki/index.php at title=Special%3ARecentchangeslinked&target=Main_Page
plplot_wiki/index.php at title=Special%3AUserlogin&returnto=Main_Page
plplot_wiki/index.php at title=Special%3AWhatlinkshere&target=Main_Page
etc., with no question marks in the file names to mess up browsing.
It's good to finally have this problem solved and thanks to all who
participated in the discussion.
Alan
__________________________
Alan W. Irwin
Astronomical research affiliation with Department of Physics and Astronomy,
University of Victoria (astrowww.phys.uvic.ca).
Programming affiliations with the FreeEOS equation-of-state implementation
for stellar interiors (freeeos.sf.net); PLplot scientific plotting software
package (plplot.org); the Yorick front-end to PLplot (yplot.sf.net); the
Loads of Linux Links project (loll.sf.net); and the Linux Brochure Project
(lbproject.sf.net).
__________________________
Linux-powered Science
__________________________
More information about the Discuss
mailing list