印度超人克里斯 演员:请大家帮忙翻译一段计算机专业英语

来源:百度文库 编辑:高校问答 时间:2024/04/29 11:41:20
8 . 5 . 2 Text Retrieval
A basic reason to convert paper documents into electronic files is to improve access to the information. After a set of documents has been scanned and the text recognized and converted into ASCII, what's next? Indexing and the creation of a fulltext retrieval database is one possibility.
What is a fulltext database, and how is one used? As the name implies, a fulltext database allows you to search the entire text of a document. Every word is indexed for rapid retrieval. Often, the index takes up as much space as the text. Storage media like CD-ROMs with over 600Mb of available space are perfect for these types of databases. However this involves the classic trade-off between speed and space.
To build a fulltext database, you go through the following steps:
1. Assemble all text into a common area, such as a single directory.
2. Identify and possibly mark up, in any required format, the headings, sections, and subsections that provide a hierarchical structure to the document. (Typically, this is used for the user interface of the text retrieval engine.)
3. Identify the "kill list" words you do not wish to search, such as "the," "and," and so on.
4. Run the database builder software to create the indexes and generate the user interface for the particular set of data.
In a typical case, the textual information that originated with a set of documents is processed by the database builder software to produce a searchable database. A user interface or run-time systemas opposed to the builder systemis used to search through the text in a variety of ways. The searching flexibility is an important characteristic of fulltext retrieval systems. (Please see section Text Retrieval in the appendix Resources for some references to these products.)
Textual searching can take many forms. The complexity of searching can range from a simple word search to boolean queries with proximity distance and regular expressions. OK, I'll explain that obnoxious jargon.
A proximity search is a way of specifying words that you want to locate that are not necessarily next to each other. They would have to be within some specified distance of each other. Distance is described as a stated number of words.
Regular expressions are a formal way of using a pattern to represent many letters. You are probably already familiar with the concept of wild carding for file names. For example, in DOS, when you ask to list all files names that start with the letter F, you type the command:
看到“紫羽漫天”的翻译时我已经自己翻译了(当然了,我翻译的比较粗糙),这是我们的作业,我也已经交了。不过还是要感谢你,为了表示我的感谢我把我所有剩余的分都给你得了:)

将纸张文件转换为电子文档的一个基本的理由就是为了改善获取信息。一套文件被扫描后,文字会被识别,并被转换成为ASCII,接下来呢?索引和全文索引的数据库便成为了一种可能。
什么是全文索引数据库,它是怎样工作的呢?通过字面的理解,一个全文索引数据库允许你查询一个文件里的所有文字。为了能够快速的检索,每一个单词将会被编上索引。通常这些索引会占据同文本相当的空间.对这种类型的数据来说,像CD-ROM这样的存储媒介,只有超过600MB才算的上完美。然而,这将会陷入到一个传统的问题上来,那就是时间和空间的交易问题。
为了建立一个全文数据库,你要完成下面几步:
1.将所有的文字集中到同一空间,并把他们作为一个单一的目录。
2.按照需求的格式,标识文件标题,文件体和文件子体,这样可以提供一个文件的层次结构(特别的讲,这将会用于文本检索引擎的客户接口)。
3.标识出你不想查询的单词,比如:"the","and"等等。
4.执行建立数据库的软体,创建索引,并为这些独特的数据产生用户接口。
在一种典型的情况下,一个由文本建立的文档信息,通过数据库的建立软体处理过后,可以转换成一个可搜索的数据库。用户接口或是分时系统不接受能够通过各种途径查询文件的建立系统。因此搜索的灵活性就成为了全文索引系统的一个重要特征(请见附录资源中文件索引部分对这些产品的介绍文献)。
文件搜索能产生各种形式。搜索的复杂性在于它能够从一个简单的字母搜索到一串二进制序列的近距离搜索或是有规则的表达式的搜索。好了,我将会解释这些讨厌的术语。
亲近搜索是一种搜索方式。它是指,你想找到的单词组没必要和规定的单词组一个接一个的完全一致。它们每个之间可能会存在一些的距离。距离是用来描述单词间的一个规定的数字。
规则表达式,是一种正规的查询方式。它用一个模板去代表许多个字母。对文件来说,你可能已经很熟悉百搭(一种牌的名称)的观念。举个例子,在DOS中,当你需要所有以字母F开始的文件名称,你必须输入这样的命令:
老兄,你的20分太难挣了!!^-^