|
For word wrap handling on Wikipedia, see Wikipedia:Line break handling.
Word wrap or line wrap is the feature, supported by most text editors, word processors, and web browsers, of automatically replacing some of the blank spaces between words by line breaks, such that each line fits in the viewable window, allowing text to be read from top to bottom without any horizontal scrolling. It is usually done on the fly when viewing or printing a document, so no line break code is manually entered, or stored. If the user changes the margins, the editor will either automatically reposition the line breaks to ensure that all the text will "flow" within the margins and remain visible, or provide the typist some convenient way to reposition the line breaks. Compare soft return and hard return.
Word boundaries, hyphenation, and hard spacesThe soft returns are usually placed after the end of complete words, or after the punctuation that follows complete words. However, word wrap may also occur following a hyphen. Word wrap following hyphens is sometimes not desired, and can be avoided by using a so-called non-breaking hyphen instead of a regular hyphen. On the other hand, when using word processors, invisible hyphens, called soft hyphens, can also be inserted inside words so that word wrap can occur following the soft hyphens. Sometimes, word wrap is not desirable between words. In such cases, word wrap can usually be avoided by using a hard space or non-breaking space between the words, instead of regular spaces. Word wrapping in text containing Chinese, Japanese, and KoreanIn Chinese, Japanese, and Korean, each Han character is normally considered a word, and therefore word wrapping can usually occur before and after any Han character. Under certain circumstances, however, word wrapping is not desired. For instance,
Most existing word processors and typesetting software cannot handle either of the above scenarios. CJK punctuation may or may not follow rules similar to the above-mentioned special circumstances; such rules are usually referred to by the Japanese term kinsoku shori (禁則処理, literally “prohibition rule handling”). A special case of kinsoku shori, however, always applies: line wrap must never occur inside the CJK dash and ellipsis. Even though each of these punctuation marks must be represented by two characters due to a limitation of all existing character encodings, each of these are intrinsically a single punctuation mark that is two ems wide, not two one-em-wide punctuation marks. AlgorithmGreedy algorithmThe naive way to solve the problem is to use a greedy algorithm that puts as many words on a line as possible, then moving on to the next line to do the same until there are no more words left to place. This method is used by many modern word processors, such as Microsoft Word and Open Office. The following pseudocode implements this algorithm:
SpaceLeft := LineWidth
for each Word in Text
if Width(Word) > SpaceLeft
insert line break before Word in Text
SpaceLeft := LineWidth - Width(Word)
else
SpaceLeft := SpaceLeft - (Width(Word) + SpaceWidth)
Where Optimal solutionTeX uses a different "breaking algorithm" that considers the entire paragraph as a whole, breaking it into lines in a way that is often considered "more aesthetically pleasing" than the greedy algorithm. (TeX also uses a hyphenation algorithm to break words across lines). While the greedy algorithm is often adequate, it doesn't give the optimal solution if you want the remaining space on the end of each line to be as small as possible. Consider the following text: aaa bb cc ddddd If the cost function of a line is defined by the remaining space squared, the greedy algorithm would yield a sub-optimal solution for the problem (for simplicity, consider a fixed-width font): ------ Line width: 6 aaa bb Remaining space: 0 (cost = 0 squared = 0) cc Remaining space: 4 (cost = 4 squared = 16) ddddd Remaining space: 1 (cost = 1 squared = 1) Summing to a total cost of 17, while the optimal solution would look like this: ------ Line width: 6 aaa Remaining space: 3 (cost = 3 squared = 9) bb cc Remaining space: 1 (cost = 1 squared = 1) ddddd Remaining space: 1 (cost = 1 squared = 1) The difference here is that the first line is broken before To solve the problem we need to define a cost function c(i,j) that computes the cost of a line consisting of the words Word[i] to Word[j] from the text: Where P typically is 2 or 3. There are some special cases to consider: If the result is negative (that is, the sequence of words cannot fit on a line), the cost needs to reflect the cost of tracking or condensing the text it to fit; if that is not possible, it needs to return The cost of the optimal solution can be defined as a recurrence: Computation can be greatly optimized using dynamic programming. In terms of implementation, it seems that the computation of c(i,j) is unnecessary when c(i,k) < 0 (where k < j); it will be infinite anyway. External linksKnuth's algorithm:
Other word-wrap links:
|
This article is from Wikipedia. All text is available under the terms of the GNU Free Documentation License.
Mercedes Car
This site monitored by SitePinger.net