| Home | Forums | Register | FAQ | Search | Today's Posts | Mark Forums Read |
|
Welcome to the misticriver forums. You are currently viewing our boards as a guest which gives you limited access to view most discussions and access our other features. By joining our free community you will have access to post topics, communicate privately with other members (PM), respond to polls, upload content and access many other special features. Registration is fast, simple and absolutely free so please, join our community today! If you have any problems with the registration process or your account login, please contact contact us. |
![]() |
|
|
LinkBack | Thread Tools | Display Modes |
|
|||
|
I just had a conversation with someone that fruited a great idea:
Quote:
|
|
|||
|
I take it if the articles themselves are actually going to be in *.txt format, the hyperlinks wont be there? The blue links that pop up every other word that make it really hard to leave the site because there's always another interesting article.
|
|
|||
|
An idea for that splash screen yuo wanted. What do you think?
__________________
Want some great music for your iRiver!!! Get some great music free from David Bass!!! |
|
|||
|
Cool. Well feel free to use it. Glad I could do somethnig since I can't code.
Dave
__________________
Want some great music for your iRiver!!! Get some great music free from David Bass!!! |
|
|||
|
In order to stop the links not functioning, wouldnt it be easier to use a simple cut down web browser to do it. While i dont actually know anything about coding one could be converted to run on the h3xx architecture.
P.S. sorry if this was a stupid suggestion
__________________
H340 - international - Korean Firmware v 1.29 Sennheiser PXC - 250 |
|
|||
|
Not at all, but it would blow up the size of what's needed. However, I think that an XML parser written in Lex could interpret the information from Wikipedia (including formatting) and show it just how it is.
|
|
|||
|
Quote:
Looking at a web browser might be a good idea as many of the problems will be similar. I would think most existing browsers will be so complex that stripping them down would be impractical. HTML is a lot more complicated than the simple markup on most wikipedia articles. I'm not really sure what the advantages of using xml over the existing wiki markup are. It looks like it would increase the size of the files and make them harder to parse. |
|
|||
|
Hello,
Anyone looked between the <page></page> tags of our 4G expanded download yet? Regarding embedded hyperlinks in text, they are not html tags, but are pointers to other document names. There be 5394066 lines of data. I'm sure you're not going to be able to get inside this with a generic open(FILE_HANDLE) and will probably have to read directly from the disk in nice blocks. regards A PS Nice logo |
|
|||
|
Great idea for distributing it TheCowBox, I feel that during the parsing process we'll run into problems with it's size and any xml parser we choose. If or when you get the file you may need this to look at it
http://www.movsd.com/download/pfe101i.zip ( here is the calling page http://www.movsd.com/tools.htm ) My usual stable of viewers and editors freaked out! Also, and possibly more importantly to xml parsers, the file does not appear to be a "well formed" document. In fact the file looks truncated. It stops in mid-word. DrippyDonut did you notice this? Just ctrl End and you'll see what I mean Regards A |
|
|||
|
So, how's everything coming along? I appreciate all the work that you all are putting into this.
__________________
H340/1.29K/RockBox Experimental I proudly announce that I discovered the mysterious fruity smell of the H3xx plugs. Smell it. I know you want to. |
|
|||
|
Hmmm...
I dunno what to do. If it's that large truncated, I'd hate to see how big it is normally. Should we start looking at compression schemes? Also, would it be possible to write a script to divide the files into the format I specified? I don't use *NIX anymore, but I have Cygwin, so I'll look into it. |
|
|||
|
here is my idea (funny: I thought about this about one week before this thread was opened):
Use PC to generate a striped-down version of WikipediaXML file, if you want to. Only interesting this is the <page></page> and the <title></title> before that. An all in one File is possible (because fopen only returns a pointer to fileposition; i think we can open a 2GB file on rockbox?) next you need a table-file: Every record (in alphabetical position) and the desired fseek-Position (byte offset from beginning of datafile), so we can seek to that. And voila, we have our entry The designthing of the wikipedia articles: tables and headlines are the most important thing. only a small interpreter is needed, but i'm not really able to code it. |
|
|||
|
The big problem with a one-file system isn't the ability to open it (though it could be a factor), it's speed. And besides, it's alot bigger than two gigs, as we've seen. I think that the iPod people are using a compression scheme. Or maybe if you strip it down enough, you might get it to their alleged size. Either way, start small, work your way up.
|
|
|||
|
Hi,
I'm new in this forum and joined due to this thread. I thought about having wikipedia on my iriver last year, and downloaded the xml wikipedia output file pages-articles.xml.bz2. I then re-imported this into a MySQL database using MediaWiki 1.5. Then using the ipod linux script, with a few modifications, I started getting the HTML output from the version of wikipedia now on my computer. This script uses the *nix utility html2text to strip out all html tags (including links). I had to stop this as my version of MySQL was too old (and causing problems). What I was going to do after that was write a shell script to sort the files into folders depending on the title (and with a maximum depth of say 4-8 directories: For Apples: A\P\P\L\Apples.txt For Bus: B\U\S\Bus.txt This would ensure that it would be quite quick to navigate to an article in the fileview and it would be simple to write a plugin that observes the trend and provides a very simple search facility. An index could be created which allows you to more comprehensively search for titles. I think I'll try and update MySQL and try this again. I'll post if I get any further. Thanks, Connell (inspired by Doug. Adams to do this aswell |
|
|||
|
|||
|
Well, since it is on sourceforge and written for iPodLinux, it should be quite useful, right?
__________________
“It is a fool's prerogative to utter truths that no one else will speak.” Proud owner of a H340 + LCD Remote http://www.nexxyz.tk |