Hi all, I tried to implement a little search engine. Python is not the language I know thoroughly and I¹ve never written a search engine in any language. That¹s why I would like to ask (after having read as many Python documents I could) if the following sketch is somewhat reasonable or if there are basic mistakes. (The program runs but it should be as fast as possible! Hm. Not just a good job for an outsider.) Problem: 738 files with Latin text. Every file represents a singular source. Find every match of a term. (I need e.g. structura, structuram, structurae etc.) To do: Enter a question in form of a regex and look for all occurrences. If there is a match: (a) if it¹s the first match then give the contents of the file (in the canonical form of those who prepared the files this is always line 10-15) and then (aa) give every match with (user defined) x words before the match, the matched word, and y words after the match; (b) if there was already a match in this source then do only (aa) The program: 1. Enter the question and x and y. The question will then be the compiled object p. 2. Read every file in (something like fp.readlines()). 3. Transform the file to a string (string.join(list_of_file)). 4. m = p.findall(string). If m is not None then: a. Give the file contents (lines 10-15) b. look for the occurrence of every word in m and note the position (with string.find) c. Give x words before the matched word, the matched word and then y words after ... The main problem for me is: do I understand correctly the function of p.findall(string) in combination with string.find? Many thanks in advance Max
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4