10380010@unknown@formal@none@1@S@
Information
@@@@1@1@@danf@17-8-2009 10380020@unknown@formal@none@1@S@'''Information''' as a [[Conveyed concept|concept]] has a diversity of meanings, from everyday usage to technical settings.@@@@1@16@@danf@17-8-2009 10380030@unknown@formal@none@1@S@Generally speaking, the concept of information is closely related to notions of [[constraint]], [[communication]], [[control system|control]], [[data]], [[form]], [[instruction]], [[knowledge]], [[Meaning (linguistics)|meaning]], [[stimulation|mental stimulus]], [[pattern]], [[perception]], and [[knowledge representation|representation]].@@@@1@29@@danf@17-8-2009 10380040@unknown@formal@none@1@S@Many people speak about the [[Information Age]] as the advent of the Knowledge Age or [[knowledge society]], the [[information society]], the [[Information revolution]], and [[Information technology|information technologies]], and even though [[informatics]], [[information science]] and [[computer science]] are often in the spotlight, the word "information" is often used without careful consideration of the various meanings it has acquired.@@@@1@57@@danf@17-8-2009 10380050@unknown@formal@none@1@S@== Etymology ==@@@@1@3@@danf@17-8-2009 10380060@unknown@formal@none@1@S@According to the [[Oxford English Dictionary]], the earliest historical meaning of the word ''information'' in [[English language|English]] was the act of ''informing'', or giving form or shape to the mind, as in education, instruction, or training.@@@@1@36@@danf@17-8-2009 10380070@unknown@formal@none@1@S@A quote from 1387: "Five books come down from heaven for information of mankind."@@@@1@14@@danf@17-8-2009 10380080@unknown@formal@none@1@S@It was also used for an ''item'' of training, ''e.g.'' a particular instruction.@@@@1@13@@danf@17-8-2009 10380090@unknown@formal@none@1@S@"Melibee had heard the great skills and reasons of Dame Prudence, and her wise information and techniques."@@@@1@17@@danf@17-8-2009 10380100@unknown@formal@none@1@S@(1386)@@@@1@1@@danf@17-8-2009 10380110@unknown@formal@none@1@S@The English word was apparently derived by adding the common "noun of action" ending "''-ation''" (descended through Francais from Latin "''-tio''") to the earlier verb ''to inform'', in the sense of to give form to the mind, to discipline, instruct, teach: "Men so wise should go and inform their kings."@@@@1@50@@danf@17-8-2009 10380120@unknown@formal@none@1@S@(1330) ''Inform'' itself comes (via French) from the Latin verb ''informare'', to give form to, to form an idea of.@@@@1@20@@danf@17-8-2009 10380125@unknown@formal@none@1@S@Furthermore, Latin itself already even contained the word ''informatio'' meaning concept or idea, but the extent to which this may have influenced the development of the word ''information'' in English is unclear.@@@@1@32@@danf@17-8-2009 10380130@unknown@formal@none@1@S@As a final note, the ancient Greek word for ''form'' was [eidos], and this word was famously used in a technical philosophical sense by [Plato] (and later Aristotle) to denote the ideal identity or essence of something (see [Theory of forms]).@@@@1@41@@danf@17-8-2009 10380140@unknown@formal@none@1@S@"Eidos" can also be associated with [thought], [proposition] or even [concept].@@@@1@11@@danf@17-8-2009 10380150@unknown@formal@none@1@S@== Information as a message ==@@@@1@6@@danf@17-8-2009 10380160@unknown@formal@none@1@S@'''Information''' is the state of a system of interest.@@@@1@9@@danf@17-8-2009 10380170@unknown@formal@none@1@S@Message is the information materialized.@@@@1@5@@danf@17-8-2009 10380180@unknown@formal@none@1@S@Information is a quality of a [[message]] from a [[sender]] to one or more receivers.@@@@1@15@@danf@17-8-2009 10380190@unknown@formal@none@1@S@Information is always ''about'' something (size of a parameter, occurrence of an event, etc).@@@@1@14@@danf@17-8-2009 10380200@unknown@formal@none@1@S@Viewed in this manner, information does not have to be accurate.@@@@1@11@@danf@17-8-2009 10380210@unknown@formal@none@1@S@It may be a truth or a lie, or just the sound of a falling tree.@@@@1@16@@danf@17-8-2009 10380220@unknown@formal@none@1@S@Even a disruptive noise used to inhibit the flow of communication and create misunderstanding would in this view be a form of information.@@@@1@23@@danf@17-8-2009 10380230@unknown@formal@none@1@S@However, generally speaking, if the ''amount'' of information in the received message increases, the message is more accurate.@@@@1@18@@danf@17-8-2009 10380240@unknown@formal@none@1@S@This model assumes there is a definite [[sender]] and at least one receiver.@@@@1@13@@danf@17-8-2009 10380250@unknown@formal@none@1@S@Many refinements of the model assume the existence of a common language understood by the sender and at least one of the receivers.@@@@1@23@@danf@17-8-2009 10380260@unknown@formal@none@1@S@An important variation identifies information as that which would be communicated by a message if it were sent from a sender to a receiver capable of understanding the message.@@@@1@29@@danf@17-8-2009 10380270@unknown@formal@none@1@S@Notably, it is not required that the sender be capable of understanding the message, or even cognizant that there is a message.@@@@1@22@@danf@17-8-2009 10380280@unknown@formal@none@1@S@Thus, information is something that can be extracted from an environment, e.g., through observation, reading or measurement.@@@@1@17@@danf@17-8-2009 10380290@unknown@formal@none@1@S@Information is a term with many meanings depending on context, but is as a rule closely related to such concepts as meaning, knowledge, instruction, communication, representation, and mental stimulus.@@@@1@29@@danf@17-8-2009 10380300@unknown@formal@none@1@S@Simply stated, information is a message received and understood.@@@@1@9@@danf@17-8-2009 10380310@unknown@formal@none@1@S@In terms of data, it can be defined as a collection of facts from which conclusions may be drawn.@@@@1@19@@danf@17-8-2009 10380320@unknown@formal@none@1@S@There are many other aspects of information since it is the knowledge acquired through study or experience or instruction.@@@@1@19@@danf@17-8-2009 10380330@unknown@formal@none@1@S@But overall, information is the result of processing, manipulating and organizing data in a way that adds to the knowledge of the person receiving it.@@@@1@25@@danf@17-8-2009 10380340@unknown@formal@none@1@S@[[Communication theory]] provides a numerical measure of the uncertainty of an outcome.@@@@1@12@@danf@17-8-2009 10380350@unknown@formal@none@1@S@For example, we can say that "the signal contained thousands of bits of information".@@@@1@14@@danf@17-8-2009 10380360@unknown@formal@none@1@S@Communication theory tends to use the concept of [[information entropy]], generally attributed to [[C.E. Shannon]] (see below).@@@@1@17@@danf@17-8-2009 10380370@unknown@formal@none@1@S@Another form of information is [[Fisher information]], a concept of [[R.A. Fisher]].@@@@1@12@@danf@17-8-2009 10380380@unknown@formal@none@1@S@This is used in application of statistics to [[estimation theory]] and to science in general.@@@@1@15@@danf@17-8-2009 10380390@unknown@formal@none@1@S@Fisher information is thought of as the amount of information that a message carries about an unobservable parameter.@@@@1@18@@danf@17-8-2009 10380400@unknown@formal@none@1@S@It can be computed from knowledge of the [[likelihood function]] defining the system.@@@@1@13@@danf@17-8-2009 10380410@unknown@formal@none@1@S@For example, with a normal likelihood function, the Fisher information is the reciprocal of the variance of the law.@@@@1@19@@danf@17-8-2009 10380420@unknown@formal@none@1@S@In the absence of knowledge of the likelihood law, the Fisher information may be computed from normally distributed score data as the reciprocal of their second moment.@@@@1@27@@danf@17-8-2009 10380430@unknown@formal@none@1@S@Even though information and data are often used interchangeably, they are actually very different.@@@@1@14@@danf@17-8-2009 10380440@unknown@formal@none@1@S@Data is a set of unrelated information, and as such is of no use until it is properly evaluated.@@@@1@19@@danf@17-8-2009 10380450@unknown@formal@none@1@S@Upon evaluation, once there is some significant relation between data, and they show some relevance, then they are converted into information.@@@@1@21@@danf@17-8-2009 10380460@unknown@formal@none@1@S@Now this same data can be used for different purposes.@@@@1@10@@danf@17-8-2009 10380470@unknown@formal@none@1@S@Thus, till the data convey some information, they are not useful.@@@@1@11@@danf@17-8-2009 10380480@unknown@formal@none@1@S@=== Measuring information entropy ===@@@@1@5@@danf@17-8-2009 10380490@unknown@formal@none@1@S@The view of information as a message came into prominence with the publication in 1948 of an influential paper by [[Claude Shannon]], "[[A Mathematical Theory of Communication]]."@@@@1@27@@danf@17-8-2009 10380500@unknown@formal@none@1@S@This paper provides the foundations of [[information theory]] and endows the word ''information'' not only with a technical meaning but also a measure.@@@@1@23@@danf@17-8-2009 10380510@unknown@formal@none@1@S@If the sending device is equally likely to send any one of a set of N messages, then the preferred measure of "the information produced when one message is chosen from the set" is the base two [[logarithm]] of N (This measure is called ''[[self-information]]'').@@@@1@45@@danf@17-8-2009 10380520@unknown@formal@none@1@S@In this paper, Shannon continues:@@@@1@5@@danf@17-8-2009 10380530@unknown@formal@none@1@S@A complementary way of measuring information is provided by [[algorithmic information theory]].@@@@1@12@@danf@17-8-2009 10380540@unknown@formal@none@1@S@In brief, this measures the information content of a list of symbols based on how predictable they are, or more specifically how easy it is to compute the list through a [[computer program|program]]: the information content of a sequence is the number of bits of the shortest program that computes it.@@@@1@51@@danf@17-8-2009 10380550@unknown@formal@none@1@S@The sequence below would have a very low algorithmic information measurement since it is a very predictable pattern, and as the pattern continues the measurement would not change.@@@@1@28@@danf@17-8-2009 10380560@unknown@formal@none@1@S@Shannon information would give the same information measurement for each symbol, since they are [[statistical randomness|statistically random]], and each new symbol would increase the measurement.@@@@1@25@@danf@17-8-2009 10380570@unknown@formal@none@1@S@:123456789101112131415161718192021@@@@1@1@@danf@17-8-2009 10380580@unknown@formal@none@1@S@It is important to recognize the limitations of traditional information theory and algorithmic information theory from the perspective of human meaning.@@@@1@21@@danf@17-8-2009 10380590@unknown@formal@none@1@S@For example, when referring to the meaning content of a message Shannon noted “Frequently the messages have ''meaning…'' these semantic aspects of communication are irrelevant to the engineering problem.@@@@1@29@@danf@17-8-2009 10380600@unknown@formal@none@1@S@The significant aspect is that the actual message is one selected ''from a set of possible messages''” (emphasis in original).@@@@1@20@@danf@17-8-2009 10380610@unknown@formal@none@1@S@In information theory signals are part of a process, not a substance; they do something, they do not contain any specific meaning.@@@@1@22@@danf@17-8-2009 10380620@unknown@formal@none@1@S@Combining algorithmic information theory and information theory we can conclude that the most random signal contains the most information as it can be interpreted in any way and cannot be compressed.@@@@1@31@@danf@17-8-2009 10380630@unknown@formal@none@1@S@Michael Reddy noted that "'signals' of the [[mathematical theory]] are 'patterns that can be exchanged'.@@@@1@15@@danf@17-8-2009 10380640@unknown@formal@none@1@S@There is no message contained in the signal, the signals convey the ability to select from a set of possible messages."@@@@1@21@@danf@17-8-2009 10380650@unknown@formal@none@1@S@In information theory "the system must be designed to operate for each possible selection, not just the one which will actually be chosen since this is unknown at the time of design".@@@@1@32@@danf@17-8-2009 10380660@unknown@formal@none@1@S@== Information as a pattern ==@@@@1@6@@danf@17-8-2009 10380670@unknown@formal@none@1@S@Information is any represented [[pattern]].@@@@1@5@@danf@17-8-2009 10380680@unknown@formal@none@1@S@This view assumes neither accuracy nor directly communicating parties, but instead assumes a separation between an object and its representation.@@@@1@20@@danf@17-8-2009 10380690@unknown@formal@none@1@S@Consider the following example: [[economic statistics]] represent an [[Economics|economy]], however inaccurately.@@@@1@11@@danf@17-8-2009 10380700@unknown@formal@none@1@S@What are commonly referred to as data in [[computing]], [[statistics]], and other fields, are forms of information in this sense.@@@@1@20@@danf@17-8-2009 10380710@unknown@formal@none@1@S@The [[electromagnetism|electro-magnetic]] patterns in a [[computer network]] and connected [[peripheral device|device]]s are related to something other than the pattern itself, such as [[Character (computing)|text characters]] to be displayed and [[Computer keyboard|keyboard]] input.@@@@1@32@@danf@17-8-2009 10380720@unknown@formal@none@1@S@[[Signal (information theory)|Signal]]s, [[Sign (linguistics)|sign]]s, and [[symbol]]s are also in this category.@@@@1@12@@danf@17-8-2009 10380730@unknown@formal@none@1@S@On the other hand, according to [[semiotics]], data is symbols with certain syntax and information is data with a certain semantic.@@@@1@21@@danf@17-8-2009 10380740@unknown@formal@none@1@S@[[Painting]] and [[drawing]] contain information to the extent that they represent something such as an assortment of objects on a table, a [[profile]], or a [[landscape]].@@@@1@26@@danf@17-8-2009 10380750@unknown@formal@none@1@S@In other words, when a pattern of something is transposed to a pattern of something else, the latter is information.@@@@1@20@@danf@17-8-2009 10380760@unknown@formal@none@1@S@This would be the case whether or not there was anyone to perceive it.@@@@1@14@@danf@17-8-2009 10380770@unknown@formal@none@1@S@But if information can be defined merely as a pattern, does that mean that neither [[utility]] nor meaning are necessary components of information?@@@@1@23@@danf@17-8-2009 10380780@unknown@formal@none@1@S@Arguably a distinction must be made between raw unprocessed data and information which possesses utility, [[value (economics)|value]] or some quantum of meaning.@@@@1@22@@danf@17-8-2009 10380790@unknown@formal@none@1@S@On this view, information may indeed be characterized as a pattern; but this is a [[necessary]] condition, not a [[sufficient]] one.@@@@1@21@@danf@17-8-2009 10380800@unknown@formal@none@1@S@An individual entry in a telephone book, which follows a specific pattern formed by name, address and telephone number, does not become "informative" in some sense unless and until it possesses some degree of utility, value or meaning.@@@@1@38@@danf@17-8-2009 10380810@unknown@formal@none@1@S@For example, someone might look up a girlfriend's number, might order a take away etc.@@@@1@15@@danf@17-8-2009 10380820@unknown@formal@none@1@S@The vast majority of numbers will never be construed as "information" in any meaningful sense.@@@@1@15@@danf@17-8-2009 10380830@unknown@formal@none@1@S@The gap between data and information is only closed by a behavioral bridge whereby some value, utility or meaning is added to transform mere data or pattern into information.@@@@1@29@@danf@17-8-2009 10380840@unknown@formal@none@1@S@When one constructs a representation of an object, one can selectively extract from the object ([[sampling (case studies)|sampling]]) or use a [[system]] of signs to replace ([[encode|encoding]]), or both.@@@@1@29@@danf@17-8-2009 10380850@unknown@formal@none@1@S@The sampling and encoding result in representation.@@@@1@7@@danf@17-8-2009 10380860@unknown@formal@none@1@S@An example of the former is a "sample" of a product; an example of the latter is "verbal description" of a product.@@@@1@22@@danf@17-8-2009 10380870@unknown@formal@none@1@S@Both contain information of the product, however inaccurate.@@@@1@8@@danf@17-8-2009 10380880@unknown@formal@none@1@S@When one interprets representation, one can predict a broader pattern from a limited number of observations (inference) or understand the relation between patterns of two different things ([[decode|decoding]]).@@@@1@28@@danf@17-8-2009 10380890@unknown@formal@none@1@S@One example of the former is to sip a [[soup]] to know if it is spoiled; an example of the latter is examining footprints to determine the animal and its condition.@@@@1@31@@danf@17-8-2009 10380900@unknown@formal@none@1@S@In both cases, information sources are not constructed or presented by some "sender" of information.@@@@1@15@@danf@17-8-2009 10380910@unknown@formal@none@1@S@Regardless, information is dependent upon, but usually unrelated to and separate from, the medium or media used to express it.@@@@1@20@@danf@17-8-2009 10380920@unknown@formal@none@1@S@In other words, the position of a theoretical series of bits, or even the output once interpreted by a [[computer]] or similar device, is unimportant, except when someone or something is present to interpret the information.@@@@1@36@@danf@17-8-2009 10380930@unknown@formal@none@1@S@Therefore, a quantity of information is totally distinct from its medium.@@@@1@11@@danf@17-8-2009 10380940@unknown@formal@none@1@S@== Information as sensory input ==@@@@1@6@@danf@17-8-2009 10380950@unknown@formal@none@1@S@Often information is viewed as a type of [[input]] to an [[organism]] or designed device.@@@@1@15@@danf@17-8-2009 10380960@unknown@formal@none@1@S@Inputs are of two kinds.@@@@1@5@@danf@17-8-2009 10380970@unknown@formal@none@1@S@Some inputs are important to the function of the organism (for example, food) or device ([[energy]]) by themselves.@@@@1@18@@danf@17-8-2009 10380980@unknown@formal@none@1@S@In his book ''Sensory Ecology,'' Dusenbery called these causal inputs.@@@@1@10@@danf@17-8-2009 10380990@unknown@formal@none@1@S@Other inputs (information) are important only because they are associated with causal inputs and can be used to predict the occurrence of a causal input at a later time (and perhaps another place).@@@@1@33@@danf@17-8-2009 10381000@unknown@formal@none@1@S@Some information is important because of association with other information but eventually there must be a connection to a causal input.@@@@1@21@@danf@17-8-2009 10381010@unknown@formal@none@1@S@In practice, information is usually carried by weak stimuli that must be detected by specialized sensory systems and amplified by energy inputs before they can be functional to the organism or device.@@@@1@32@@danf@17-8-2009 10381020@unknown@formal@none@1@S@For example, light is often a causal input to plants but provides information to animals.@@@@1@15@@danf@17-8-2009 10381030@unknown@formal@none@1@S@The colored light reflected from a flower is too weak to do much photosynthetic work but the visual system of the bee detects it and the bee's nervous system uses the information to guide the bee to the flower, where the bee often finds nectar or pollen, which are causal inputs, serving a nutritional function.@@@@1@55@@danf@17-8-2009 10381040@unknown@formal@none@1@S@Information is any type of sensory input.@@@@1@7@@danf@17-8-2009 10381050@unknown@formal@none@1@S@When an organism with a [[nervous system]] receives an input, it transforms the input into an electrical signal.@@@@1@18@@danf@17-8-2009 10381060@unknown@formal@none@1@S@This is regarded information by some.@@@@1@6@@danf@17-8-2009 10381070@unknown@formal@none@1@S@The idea of representation is still relevant, but in a slightly different manner.@@@@1@13@@danf@17-8-2009 10381080@unknown@formal@none@1@S@That is, while [[abstract painting]] does not represent anything concretely, when the viewer sees the painting, it is nevertheless transformed into electrical signals that create a representation of the painting.@@@@1@30@@danf@17-8-2009 10381090@unknown@formal@none@1@S@Defined this way, information does not have to be related to truth, communication, or representation of an object.@@@@1@18@@danf@17-8-2009 10381100@unknown@formal@none@1@S@[[Entertainment]] in general is not intended to be informative.@@@@1@9@@danf@17-8-2009 10381110@unknown@formal@none@1@S@[[Music]], the [[performing arts]], [[amusement park]]s, works of [[fiction]] and so on are thus forms of information in this sense, but they are not necessarily forms of information according to some definitions given above.@@@@1@34@@danf@17-8-2009 10381120@unknown@formal@none@1@S@Consider another example: food supplies both nutrition and taste for those who eat it.@@@@1@14@@danf@17-8-2009 10381130@unknown@formal@none@1@S@If information is equated to sensory input, then nutrition is not information but taste is.@@@@1@15@@danf@17-8-2009 10381140@unknown@formal@none@1@S@== Information as an influence which leads to a transformation ==@@@@1@11@@danf@17-8-2009 10381150@unknown@formal@none@1@S@Information is any type of pattern that influences the formation or transformation of other patterns.@@@@1@15@@danf@17-8-2009 10381160@unknown@formal@none@1@S@In this sense, there is no need for a conscious mind to perceive, much less appreciate, the pattern.@@@@1@18@@danf@17-8-2009 10381170@unknown@formal@none@1@S@Consider, for example, [[DNA]].@@@@1@4@@danf@17-8-2009 10381180@unknown@formal@none@1@S@The sequence of [[nucleotide]]s is a pattern that influences the formation and development of an organism without any need for a conscious mind.@@@@1@23@@danf@17-8-2009 10381190@unknown@formal@none@1@S@[[Systems theory]] at times seems to refer to information in this sense, assuming information does not necessarily involve any conscious mind, and patterns circulating (due to [[feedback]]) in the system can be called information.@@@@1@34@@danf@17-8-2009 10381200@unknown@formal@none@1@S@In other words, it can be said that information in this sense is something potentially perceived as representation, though not created or presented for that purpose.@@@@1@26@@danf@17-8-2009 10381210@unknown@formal@none@1@S@When [[Marshall McLuhan]] speaks of [[media (communication)|media]] and their effects on human cultures, he refers to the structure of [[cultural artifact|artifacts]] that in turn shape our behaviors and mindsets.@@@@1@29@@danf@17-8-2009 10381220@unknown@formal@none@1@S@Also, [[pheromone]]s are often said to be "information" in this sense.@@@@1@11@@danf@17-8-2009 10381230@unknown@formal@none@1@S@(See also [[Gregory Bateson]].)@@@@1@4@@danf@17-8-2009 10381240@unknown@formal@none@1@S@== Information as a property in physics ==@@@@1@8@@danf@17-8-2009 10381250@unknown@formal@none@1@S@In 2003, J. D. Bekenstein claimed there is a growing trend in [[physics]] to define the physical world as being made of information itself (and thus information is defined in this way).@@@@1@32@@danf@17-8-2009 10381260@unknown@formal@none@1@S@Information has a well defined meaning in physics.@@@@1@8@@danf@17-8-2009 10381270@unknown@formal@none@1@S@Examples of this include the phenomenon of [[quantum entanglement]] where particles can interact without reference to their separation or the speed of light.@@@@1@23@@danf@17-8-2009 10381280@unknown@formal@none@1@S@Information itself cannot travel faster than light even if the information is transmitted indirectly.@@@@1@14@@danf@17-8-2009 10381290@unknown@formal@none@1@S@This could lead to the fact that all attempts at physically observing a particle with an "entangled" relationship to another are slowed down, even though the particles are not connected in any other way other than by the information they carry.@@@@1@41@@danf@17-8-2009 10381300@unknown@formal@none@1@S@Another link is demonstrated by the [[Maxwell's demon]] thought experiment.@@@@1@10@@danf@17-8-2009 10381310@unknown@formal@none@1@S@In this experiment, a direct relationship between information and another physical property, [[entropy]], is demonstrated.@@@@1@15@@danf@17-8-2009 10381320@unknown@formal@none@1@S@A consequence is that it is impossible to destroy information without increasing the entropy of a system; in practical terms this often means generating heat.@@@@1@25@@danf@17-8-2009 10381330@unknown@formal@none@1@S@Another, more philosophical, outcome is that information could be thought of as interchangeable with [[Energy#Transformations_of_energy|energy]].@@@@1@15@@danf@17-8-2009 10381340@unknown@formal@none@1@S@Thus, in the study of [[logic gates]], the theoretical lower bound of thermal energy released by an ''AND gate'' is higher than for the ''NOT gate'' (because information is destroyed in an ''AND gate'' and simply converted in a ''NOT gate'').@@@@1@41@@danf@17-8-2009 10381350@unknown@formal@none@1@S@Physical information is of particular importance in the theory of [[quantum computers]].@@@@1@12@@danf@17-8-2009 10381360@unknown@formal@none@1@S@== Information as records ==@@@@1@5@@danf@17-8-2009 10381370@unknown@formal@none@1@S@Records are a specialized form of information.@@@@1@7@@danf@17-8-2009 10381380@unknown@formal@none@1@S@Essentially, records are information produced consciously or as by-products of business activities or transactions and retained because of their value.@@@@1@20@@danf@17-8-2009 10381390@unknown@formal@none@1@S@Primarily their value is as evidence of the activities of the organization but they may also be retained for their informational value.@@@@1@22@@danf@17-8-2009 10381400@unknown@formal@none@1@S@Sound [[records management]] ensures that the integrity of records is preserved for as long as they are required.@@@@1@18@@danf@17-8-2009 10381410@unknown@formal@none@1@S@The international standard on records management, ISO 15489, defines records as "information created, received, and maintained as evidence and information by an organization or person, in pursuance of legal obligations or in the transaction of business".@@@@1@36@@danf@17-8-2009 10381420@unknown@formal@none@1@S@The International Committee on Archives (ICA) Committee on electronic records defined a record as, "a specific piece of recorded information generated, collected or received in the initiation, conduct or completion of an activity and that comprises sufficient content, context and structure to provide proof or evidence of that activity".@@@@1@49@@danf@17-8-2009 10381430@unknown@formal@none@1@S@Records may be retained because of their business value, as part of the [[corporate memory]] of the organization or to meet legal, fiscal or accountability requirements imposed on the organization.@@@@1@30@@danf@17-8-2009 10381440@unknown@formal@none@1@S@Willis (2005) expressed the view that sound management of business records and information delivered "…six key requirements for good [[corporate governance]]…transparency; accountability; due process; compliance; meeting statutory and common law requirements; and security of personal and corporate information."@@@@1@38@@danf@17-8-2009 10381450@unknown@formal@none@1@S@== Information and semiotics ==@@@@1@5@@danf@17-8-2009 10381460@unknown@formal@none@1@S@Beynon-Davies explains the multi-faceted concept of information in terms of that of signs and sign-systems.@@@@1@15@@danf@17-8-2009 10381470@unknown@formal@none@1@S@Signs themselves can be considered in terms of four inter-dependent levels, layers or branches of [[semiotics]]: pragmatics, semantics, syntactics and empirics.@@@@1@21@@danf@17-8-2009 10381480@unknown@formal@none@1@S@These four layers serve to connect the social world on the one hand with the physical or technical world on the other.@@@@1@22@@danf@17-8-2009 10381490@unknown@formal@none@1@S@[[Pragmatics]] is concerned with the purpose of communication.@@@@1@8@@danf@17-8-2009 10381500@unknown@formal@none@1@S@Pragmatics links the issue of signs with that of intention.@@@@1@10@@danf@17-8-2009 10381510@unknown@formal@none@1@S@The focus of pragmatics is on the intentions of human agents underlying communicative behaviour.@@@@1@14@@danf@17-8-2009 10381520@unknown@formal@none@1@S@In other words, intentions link language to action.@@@@1@8@@danf@17-8-2009 10381530@unknown@formal@none@1@S@[[Semantics]] is concerned with the meaning of a message conveyed in a communicative act.@@@@1@14@@danf@17-8-2009 10381535@unknown@formal@none@1@S@Semantics considers the content of communication.@@@@1@6@@danf@17-8-2009 10381540@unknown@formal@none@1@S@Semantics is the study of the meaning of signs - the association between signs and behaviour.@@@@1@16@@danf@17-8-2009 10381550@unknown@formal@none@1@S@Semantics can be considered as the study of the link between symbols and their referents or concepts; particularly the way in which signs relate to human behaviour.@@@@1@27@@danf@17-8-2009 10381560@unknown@formal@none@1@S@Syntactics is concerned with the formalism used to represent a message.@@@@1@11@@danf@17-8-2009 10381570@unknown@formal@none@1@S@Syntactics as an area studies the form of communication in terms of the logic and grammar of sign systems.@@@@1@19@@danf@17-8-2009 10381580@unknown@formal@none@1@S@Syntactics is devoted to the study of the form rather than the content of signs and sign-systems.@@@@1@17@@danf@17-8-2009 10381590@unknown@formal@none@1@S@Empirics is the study of the signals used to carry a message; the physical characteristics of the medium of communication.@@@@1@20@@danf@17-8-2009 10381600@unknown@formal@none@1@S@Empirics is devoted to the study of communication channels and their characteristics, e.g., sound, light, electronic transmission etc.@@@@1@18@@danf@17-8-2009 10381610@unknown@formal@none@1@S@Communication normally exists within the context of some social situation.@@@@1@10@@danf@17-8-2009 10381620@unknown@formal@none@1@S@The social situation sets the context for the intentions conveyed (pragmatics) and the form in which communication takes place.@@@@1@19@@danf@17-8-2009 10381630@unknown@formal@none@1@S@In a communicative situation intentions are expressed through messages which comprise collections of inter-related signs taken from a language which is mutually understood by the agents involved in the communication.@@@@1@30@@danf@17-8-2009 10381640@unknown@formal@none@1@S@Mutual understanding implies that agents involved understand the chosen language in terms of its agreed syntax (syntactics) and semantics.@@@@1@19@@danf@17-8-2009 10381650@unknown@formal@none@1@S@The sender codes the message in the language and sends the message as signals along some communication channel (empirics).@@@@1@19@@danf@17-8-2009 10381660@unknown@formal@none@1@S@The chosen communication channel will have inherent properties which determine outcomes such as the speed with which communication can take place and over what distance.@@@@1@25@@danf@17-8-2009 10390010@unknown@formal@none@1@S@
Information extraction
@@@@1@2@@danf@17-8-2009 10390020@unknown@formal@none@1@S@In [[natural language processing]], '''information extraction''' (IE) is a type of [[information retrieval]] whose goal is to automatically extract structured information, i.e. categorized and contextually and semantically well-defined data from a certain domain, from unstructured [[machine-readable]] documents.@@@@1@37@@danf@17-8-2009 10390030@unknown@formal@none@1@S@An example of information extraction is the extraction of instances of corporate mergers, more formally MergerBetween(company_1, company_2, date), from an online news sentence such as: "Yesterday, New-York based Foo Inc. announced their acquisition of Bar Corp."@@@@1@36@@danf@17-8-2009 10390040@unknown@formal@none@1@S@A broad goal of IE is to allow computation to be done on the previously unstructured data.@@@@1@17@@danf@17-8-2009 10390050@unknown@formal@none@1@S@A more specific goal is to allow logical reasoning to draw inferences based on the logical content of the input data.@@@@1@21@@danf@17-8-2009 10390060@unknown@formal@none@1@S@The significance of IE is determined by the growing amount of information available in unstructured (i.e. without [[metadata]]) form, for instance on the Internet.@@@@1@24@@danf@17-8-2009 10390070@unknown@formal@none@1@S@This knowledge can be made more accessible by means of transformation into [[relational database|relational form]], or by marking-up with [[XML]] tags.@@@@1@21@@danf@17-8-2009 10390080@unknown@formal@none@1@S@An intelligent agent monitoring a news data feed requires IE to transform unstructured data into something that can be reasoned with.@@@@1@21@@danf@17-8-2009 10390090@unknown@formal@none@1@S@A typical application of IE is to scan a set of documents written in a [[natural language]] and populate a database with the information extracted.@@@@1@25@@danf@17-8-2009 10390100@unknown@formal@none@1@S@Current approaches to IE use [[natural language processing]] techniques that focus on very restricted domains.@@@@1@15@@danf@17-8-2009 10390110@unknown@formal@none@1@S@For example, the ''[[Message Understanding Conference]]'' (MUC) is a competition-based conference that focused on the following domains in the past:@@@@1@20@@danf@17-8-2009 10390120@unknown@formal@none@1@S@*MUC-1 (1987), MUC-2 (1989): Naval operations messages.@@@@1@7@@danf@17-8-2009 10390130@unknown@formal@none@1@S@*MUC-3 (1991), MUC-4 (1992): Terrorism in Latin American countries.@@@@1@9@@danf@17-8-2009 10390140@unknown@formal@none@1@S@*MUC-5 (1993): Joint ventures and microelectronics domain.@@@@1@7@@danf@17-8-2009 10390150@unknown@formal@none@1@S@*MUC-6 (1995): News articles on management changes.@@@@1@7@@danf@17-8-2009 10390160@unknown@formal@none@1@S@*MUC-7 (1998): Satellite launch reports.@@@@1@5@@danf@17-8-2009 10390170@unknown@formal@none@1@S@Natural Language texts may need to use some form of a [[Text simplification]] to create a more easily machine readable text to extract the sentences.@@@@1@25@@danf@17-8-2009 10390180@unknown@formal@none@1@S@Typical subtasks of IE are:@@@@1@5@@danf@17-8-2009 10390190@unknown@formal@none@1@S@* [[Named Entity Recognition]]: recognition of entity names (for people and organizations), place names, temporal expressions, and certain types of numerical expressions.@@@@1@22@@danf@17-8-2009 10390200@unknown@formal@none@1@S@* [[Coreference]]: identification chains of [[noun phrase]]s that refer to the same object.@@@@1@13@@danf@17-8-2009 10390210@unknown@formal@none@1@S@For example, [[Anaphora (linguistics)|anaphora]] is a type of coreference.@@@@1@9@@danf@17-8-2009 10390220@unknown@formal@none@1@S@* [[Terminology extraction]]: finding the relevant terms for a given [[text corpus|corpus]]@@@@1@12@@danf@17-8-2009 10390230@unknown@formal@none@1@S@* Relation Extraction: identification of relations between entities, such as:@@@@1@10@@danf@17-8-2009 10390240@unknown@formal@none@1@S@**PERSON works for ORGANIZATION (extracted from the sentence "Bill works for IBM.")@@@@1@12@@danf@17-8-2009 10390250@unknown@formal@none@1@S@**PERSON located in LOCATION (extracted from the sentence "Bill is in France.")@@@@1@12@@danf@17-8-2009 10400010@unknown@formal@none@1@S@
Information retrieval
@@@@1@2@@danf@17-8-2009 10400020@unknown@formal@none@1@S@'''Information retrieval''' ('''IR''') is the science of searching for documents, for [[information]] within documents and for [[Metadata (computing)|metadata]] about documents, as well as that of searching [[relational database]]s and the [[World Wide Web]].@@@@1@33@@danf@17-8-2009 10400030@unknown@formal@none@1@S@There is overlap in the usage of the terms data retrieval, [[document retrieval]], information retrieval, and [[text retrieval]], but each also has its own body of literature, theory, [[Praxis (process)|praxis]] and technologies.@@@@1@32@@danf@17-8-2009 10400040@unknown@formal@none@1@S@IR is [[interdisciplinary]], based on [[computer science]], [[mathematics]], [[library science]], [[information science]], [[information architecture]], [[cognitive psychology]], [[linguistics]], [[statistics]] and [[physics]].@@@@1@20@@danf@17-8-2009 10400050@unknown@formal@none@1@S@Automated information retrieval systems are used to reduce what has been called "[[information overload]]".@@@@1@14@@danf@17-8-2009 10400060@unknown@formal@none@1@S@Many universities and [[public library|public libraries]] use IR systems to provide access to books, journals and other documents.@@@@1@18@@danf@17-8-2009 10400070@unknown@formal@none@1@S@Web [[Web search engine|search engine]]s are the most visible [[Information retrieval applications|IR applications]].@@@@1@13@@danf@17-8-2009 10400080@unknown@formal@none@1@S@== History ==@@@@1@3@@danf@17-8-2009 10400090@unknown@formal@none@1@S@The idea of using computers to search for relevant pieces of information was popularized in an article ''[[As We May Think]]'' by [[Vannevar Bush]] in 1945.@@@@1@26@@danf@17-8-2009 10400100@unknown@formal@none@1@S@First implementations of information retrieval systems were introduced in the 1950s and 1960s.@@@@1@13@@danf@17-8-2009 10400110@unknown@formal@none@1@S@By 1990 several different techniques had been shown to perform well on small text corpora (several thousand documents).@@@@1@18@@danf@17-8-2009 10400120@unknown@formal@none@1@S@In 1992 the US Department of Defense, along with the [[National Institute of Standards and Technology]] (NIST), cosponsored the [[Text Retrieval Conference]] (TREC) as part of the TIPSTER text program.@@@@1@30@@danf@17-8-2009 10400130@unknown@formal@none@1@S@The aim of this was to look into the information retrieval community by supplying the infrastructure that was needed for evaluation of text retrieval methodologies on a very large text collection.@@@@1@31@@danf@17-8-2009 10400140@unknown@formal@none@1@S@This catalyzed research on methods that [[scalability|scale]] to huge corpora.@@@@1@10@@danf@17-8-2009 10400150@unknown@formal@none@1@S@The introduction of web [[Web search engine|search engine]]s has boosted the need for very large scale retrieval systems even further.@@@@1@20@@danf@17-8-2009 10400160@unknown@formal@none@1@S@The use of digital methods for storing and retrieving information has led to the phenomenon of [[digital obsolescence]], where a digital resource ceases to be readable because the physical media, the reader required to read the media, the hardware, or the software that runs on it, is no longer available.@@@@1@50@@danf@17-8-2009 10400170@unknown@formal@none@1@S@The information is initially easier to retrieve than if it were on paper, but is then effectively lost.@@@@1@18@@danf@17-8-2009 10400180@unknown@formal@none@1@S@=== Timeline ===@@@@1@3@@danf@17-8-2009 10400190@unknown@formal@none@1@S@* 1890: Hollerith tabulating machines were used to analyze the US census.@@@@1@12@@danf@17-8-2009 10400200@unknown@formal@none@1@S@([[Herman Hollerith]]).@@@@1@2@@danf@17-8-2009 10400210@unknown@formal@none@1@S@* 1945: [[Vannevar Bush]]'s ''[[As We May Think]]'' appeared in ''[[Atlantic Monthly]]''@@@@1@12@@danf@17-8-2009 10400220@unknown@formal@none@1@S@* Late 1940s: The US military confronted problems of indexing and retrieval of wartime scientific research documents captured from Germans.@@@@1@20@@danf@17-8-2009 10400230@unknown@formal@none@1@S@* 1947: [[Hans Peter Luhn]] (research engineer at IBM since 1941) began work on a mechanized, punch card based system for searching chemical compounds.@@@@1@24@@danf@17-8-2009 10400240@unknown@formal@none@1@S@* 1950: The term "information retrieval" may have been coined by [[Calvin Mooers]].@@@@1@13@@danf@17-8-2009 10400250@unknown@formal@none@1@S@* 1950s: Growing concern in the US for a "science gap" with the USSR motivated, encouraged funding, and provided a backdrop for mechanized literature searching systems ([[Allen Kent]] et al) and the invention of citation indexing ([[Eugene Garfield]]).@@@@1@38@@danf@17-8-2009 10400260@unknown@formal@none@1@S@* 1955: Allen Kent joined [[Case Western Reserve University]], and eventually becomes associate director of the Center for Documentation and Communications Research.@@@@1@22@@danf@17-8-2009 10400270@unknown@formal@none@1@S@That same year, Kent and colleagues publish a paper in American Documentation describing the precision and recall measures, as well as detailing a proposed "framework" for evaluating an IR system, which includes statistical sampling methods for determining the number of relevant documents not retrieved.@@@@1@44@@danf@17-8-2009 10400280@unknown@formal@none@1@S@* 1958: International Conference on Scientific Information Washington DC included consideration of IR systems as a solution to problems identified.@@@@1@20@@danf@17-8-2009 10400290@unknown@formal@none@1@S@See: Proceedings of the International Conference on Scientific Information, 1958 (National Academy of Sciences, Washington, DC, 1959)@@@@1@17@@danf@17-8-2009 10400300@unknown@formal@none@1@S@* 1959: Hans Peter Luhn published "Auto-encoding of documents for information retrieval."@@@@1@12@@danf@17-8-2009 10400310@unknown@formal@none@1@S@* 1960: Melvin Earl (Bill) Maron and J. L. Kuhns published "On relevance, probabilistic indexing, and information retrieval" in Journal of the ACM 7(3):216-244, July 1960.@@@@1@26@@danf@17-8-2009 10400320@unknown@formal@none@1@S@* Early 1960s: [[Gerard Salton]] began work on IR at Harvard, later moved to Cornell.@@@@1@15@@danf@17-8-2009 10400330@unknown@formal@none@1@S@* 1962: [[Cyril W. Cleverdon]] published early findings of the Cranfield studies, developing a model for IR system evaluation.@@@@1@19@@danf@17-8-2009 10400340@unknown@formal@none@1@S@See: Cyril W. Cleverdon, "Report on the Testing and Analysis of an Investigation into the Comparative Efficiency of Indexing Systems".@@@@1@20@@danf@17-8-2009 10400350@unknown@formal@none@1@S@Cranfield Coll. of Aeronautics, Cranfield, England, 1962.@@@@1@7@@danf@17-8-2009 10400360@unknown@formal@none@1@S@* 1962: Kent published Information Analysis and Retrieval@@@@1@8@@danf@17-8-2009 10400370@unknown@formal@none@1@S@* 1963: Weinberg report "Science, Government and Information" gave a full articulation of the idea of a "crisis of scientific information."@@@@1@21@@danf@17-8-2009 10400380@unknown@formal@none@1@S@The report was named after Dr. [[Alvin Weinberg]].@@@@1@8@@danf@17-8-2009 10400390@unknown@formal@none@1@S@* 1963: [[Joseph Becker]] and [[Robert M. Hayes]] published text on information retrieval.@@@@1@13@@danf@17-8-2009 10400400@unknown@formal@none@1@S@Becker, Joseph; Hayes, Robert Mayo.@@@@1@5@@danf@17-8-2009 10400410@unknown@formal@none@1@S@Information storage and retrieval: tools, elements, theories.@@@@1@7@@danf@17-8-2009 10400420@unknown@formal@none@1@S@New York, Wiley (1963).@@@@1@4@@danf@17-8-2009 10400430@unknown@formal@none@1@S@* 1964: [[Karen Spärck Jones]] finished her thesis at Cambridge, ''Synonymy and Semantic Classification'', and continued work on [[computational linguistics]] as it applies to IR@@@@1@25@@danf@17-8-2009 10400440@unknown@formal@none@1@S@* 1964: The [[National Bureau of Standards]] sponsored a symposium titled "Statistical Association Methods for Mechanized Documentation."@@@@1@17@@danf@17-8-2009 10400450@unknown@formal@none@1@S@Several highly significant papers, including G. Salton's first published reference (we believe) to the SMART system.@@@@1@16@@danf@17-8-2009 10400460@unknown@formal@none@1@S@* Mid-1960s: National Library of Medicine developed [[MEDLARS]] Medical Literature Analysis and Retrieval System, the first major machine-readable database and batch retrieval system@@@@1@23@@danf@17-8-2009 10400470@unknown@formal@none@1@S@* Mid-1960s: Project Intrex at MIT@@@@1@6@@danf@17-8-2009 10400480@unknown@formal@none@1@S@* 1965: [[J. C. R. Licklider]] published ''Libraries of the Future''@@@@1@11@@danf@17-8-2009 10400490@unknown@formal@none@1@S@* 1966: [[Don Swanson]] was involved in studies at University of Chicago on Requirements for Future Catalogs@@@@1@17@@danf@17-8-2009 10400500@unknown@formal@none@1@S@* 1968: Gerard Salton published ''Automatic Information Organization and Retrieval''.@@@@1@10@@danf@17-8-2009 10400510@unknown@formal@none@1@S@* 1968: [[J. W. Sammon]]'s RADC Tech report "Some Mathematics of Information Storage and Retrieval..." outlined the vector model.@@@@1@19@@danf@17-8-2009 10400520@unknown@formal@none@1@S@* 1969: Sammon's "A nonlinear mapping for data structure analysis" (IEEE Transactions on Computers) was the first proposal for visualization interface to an IR system.@@@@1@25@@danf@17-8-2009 10400530@unknown@formal@none@1@S@* Late 1960s: [[F. W. Lancaster]] completed evaluation studies of the MEDLARS system and published the first edition of his text on information retrieval@@@@1@24@@danf@17-8-2009 10400540@unknown@formal@none@1@S@* Early 1970s: first online systems--NLM's AIM-TWX, MEDLINE; Lockheed's Dialog; SDC's ORBIT@@@@1@12@@danf@17-8-2009 10400550@unknown@formal@none@1@S@* Early 1970s: [[Theodor Nelson]] promoting concept of [[hypertext]], published Computer Lib/Dream Machines@@@@1@13@@danf@17-8-2009 10400560@unknown@formal@none@1@S@* 1971: [[N. Jardine]] and [[C. J. Van Rijsbergen]] published "The use of hierarchic clustering in information retrieval", which articulated the "cluster hypothesis."@@@@1@23@@danf@17-8-2009 10400570@unknown@formal@none@1@S@(Information Storage and Retrieval, 7(5), pp. 217-240, Dec 1971)@@@@1@9@@danf@17-8-2009 10400580@unknown@formal@none@1@S@*1975: Three highly influential publications by Salton fully articulated his vector processing framework and term discrimination model:@@@@1@17@@danf@17-8-2009 10400590@unknown@formal@none@1@S@** A Theory of Indexing (Society for Industrial and Applied Mathematics)@@@@1@11@@danf@17-8-2009 10400600@unknown@formal@none@1@S@** "A theory of term importance in automatic text analysis", (JASIS v. 26)@@@@1@13@@danf@17-8-2009 10400610@unknown@formal@none@1@S@** "A vector space model for automatic indexing", (CACM 18:11)@@@@1@10@@danf@17-8-2009 10400620@unknown@formal@none@1@S@* 1978: The First [[Association for Computing Machinery|ACM]] [[SIGIR]] conference.@@@@1@10@@danf@17-8-2009 10400630@unknown@formal@none@1@S@* 1979: C. J. Van Rijsbergen published ''Information Retrieval'' (Butterworths).@@@@1@10@@danf@17-8-2009 10400640@unknown@formal@none@1@S@Heavy emphasis on probabilistic models.@@@@1@5@@danf@17-8-2009 10400650@unknown@formal@none@1@S@* 1980: First international ACM SIGIR conference, joint with British Computer Society IR group in Cambridge@@@@1@16@@danf@17-8-2009 10400660@unknown@formal@none@1@S@* 1982: [[Nicholas J. Belkin|Belkin]], Oddy, and Brooks proposed the ASK (Anomalous State of Knowledge) viewpoint for information retrieval.@@@@1@19@@danf@17-8-2009 10400670@unknown@formal@none@1@S@This was an important concept, though their automated analysis tool proved ultimately disappointing.@@@@1@13@@danf@17-8-2009 10400680@unknown@formal@none@1@S@* 1983: Salton (and M. McGill) published Introduction to Modern Information Retrieval (McGraw-Hill), with heavy emphasis on vector models.@@@@1@19@@danf@17-8-2009 10400690@unknown@formal@none@1@S@* Mid-1980s: Efforts to develop end user versions of commercial IR systems.@@@@1@12@@danf@17-8-2009 10400700@unknown@formal@none@1@S@* 1985-1993: Key papers on and experimental systems for visualization interfaces.@@@@1@11@@danf@17-8-2009 10400710@unknown@formal@none@1@S@* Work by [[D. B. Crouch]], [[Robert R. Korfhage]], [[M. Chalmers]], [[A. Spoerri]] and others.@@@@1@15@@danf@17-8-2009 10400720@unknown@formal@none@1@S@* 1989: First [[World Wide Web]] proposals by [[Tim Berners-Lee]] at [[CERN]].@@@@1@12@@danf@17-8-2009 10400730@unknown@formal@none@1@S@* 1992: First TREC conference.@@@@1@5@@danf@17-8-2009 10400740@unknown@formal@none@1@S@* 1997: Publication of [[Robert R. Korfhage|Korfhage]]'s ''Information Storage and Retrieval'' with emphasis on visualization and multi-reference point systems.@@@@1@19@@danf@17-8-2009 10400750@unknown@formal@none@1@S@* Late 1990s: Web [[Web search engine|search engine]] implementation of many features formerly found only in experimental IR systems@@@@1@19@@danf@17-8-2009 10400760@unknown@formal@none@1@S@== Overview ==@@@@1@3@@danf@17-8-2009 10400770@unknown@formal@none@1@S@An information retrieval process begins when a user enters a query into the system.@@@@1@14@@danf@17-8-2009 10400780@unknown@formal@none@1@S@Queries are formal statements of [[information need]]s, for example search strings in web search engines.@@@@1@15@@danf@17-8-2009 10400790@unknown@formal@none@1@S@In information retrieval a query does not uniquely identify a single object in the collection.@@@@1@15@@danf@17-8-2009 10400800@unknown@formal@none@1@S@Instead, several objects may match the query, perhaps with different degrees of [[relevance|relevancy]].@@@@1@13@@danf@17-8-2009 10400810@unknown@formal@none@1@S@An object is an entity which keeps or stores information in a database.@@@@1@13@@danf@17-8-2009 10400820@unknown@formal@none@1@S@User queries are matched to objects stored in the database.@@@@1@10@@danf@17-8-2009 10400830@unknown@formal@none@1@S@Depending on the [[Information retrieval applications|application]] the data objects may be, for example, text documents, images or videos.@@@@1@18@@danf@17-8-2009 10400840@unknown@formal@none@1@S@Often the documents themselves are not kept or stored directly in the IR system, but are instead represented in the system by document surrogates.@@@@1@24@@danf@17-8-2009 10400850@unknown@formal@none@1@S@Most IR systems compute a numeric score on how well each object in the database match the query, and rank the objects according to this value.@@@@1@26@@danf@17-8-2009 10400860@unknown@formal@none@1@S@The top ranking objects are then shown to the user.@@@@1@10@@danf@17-8-2009 10400870@unknown@formal@none@1@S@The process may then be iterated if the user wishes to refine the query.@@@@1@14@@danf@17-8-2009 10400880@unknown@formal@none@1@S@== Performance measures ==@@@@1@4@@danf@17-8-2009 10400890@unknown@formal@none@1@S@Many different measures for evaluating the performance of information retrieval systems have been proposed.@@@@1@14@@danf@17-8-2009 10400900@unknown@formal@none@1@S@The measures require a collection of documents and a query.@@@@1@10@@danf@17-8-2009 10400910@unknown@formal@none@1@S@All common measures described here assume a ground truth notion of relevancy: every document is known to be either relevant or non-relevant to a particular query.@@@@1@26@@danf@17-8-2009 10400920@unknown@formal@none@1@S@In practice queries may be [[ill-posed]] and there may be different shades of relevancy.@@@@1@14@@danf@17-8-2009 10400930@unknown@formal@none@1@S@=== Precision ===@@@@1@3@@danf@17-8-2009 10400940@unknown@formal@none@1@S@Precision is the fraction of the documents retrieved that are [[Relevance (information retrieval)|relevant]] to the user's information need.@@@@1@18@@danf@17-8-2009 10400950@unknown@formal@none@1@S@: \\mbox{precision}=\\frac{|\\{\\mbox{relevant documents}\\}\\cap\\{\\mbox{retrieved documents}\\}|}{|\\{\\mbox{retrieved documents}\\}|} @@@@1@6@@danf@17-8-2009 10400960@unknown@formal@none@1@S@In [[binary classification]], precision is analogous to [[positive predictive value]].@@@@1@10@@danf@17-8-2009 10400970@unknown@formal@none@1@S@Precision takes all retrieved documents into account.@@@@1@7@@danf@17-8-2009 10400980@unknown@formal@none@1@S@It can also be evaluated at a given cut-off rank, considering only the topmost results returned by the system.@@@@1@19@@danf@17-8-2009 10400990@unknown@formal@none@1@S@This measure is called ''precision at n'' or ''P\sn''.@@@@1@9@@danf@17-8-2009 10401000@unknown@formal@none@1@S@Note that the meaning and usage of "precision" in the field of Information Retrieval differs from the definition of [[accuracy and precision]] within other branches of science and technology.@@@@1@29@@danf@17-8-2009 10401010@unknown@formal@none@1@S@=== Recall ===@@@@1@3@@danf@17-8-2009 10401020@unknown@formal@none@1@S@Recall is the fraction of the documents that are relevant to the query that are successfully retrieved.@@@@1@17@@danf@17-8-2009 10401030@unknown@formal@none@1@S@:\\mbox{recall}=\\frac{|\\{\\mbox{relevant documents}\\}\\cap\\{\\mbox{retrieved documents}\\}|}{|\\{\\mbox{relevant documents}\\}|} @@@@1@5@@danf@17-8-2009 10401040@unknown@formal@none@1@S@In binary classification, recall is called [[sensitivity (tests)|sensitivity]].@@@@1@8@@danf@17-8-2009 10401050@unknown@formal@none@1@S@So it can be looked at as ''the probability that a relevant document is retrieved by the query''.@@@@1@18@@danf@17-8-2009 10401060@unknown@formal@none@1@S@It is trivial to achieve recall of 100% by returning all documents in response to any query.@@@@1@17@@danf@17-8-2009 10401070@unknown@formal@none@1@S@Therefore recall alone is not enough but one needs to measure the number of non-relevant documents also, for example by computing the precision.@@@@1@23@@danf@17-8-2009 10401080@unknown@formal@none@1@S@=== Fall-Out ===@@@@1@3@@danf@17-8-2009 10401090@unknown@formal@none@1@S@The proportion of non-relevant documents that are retrieved, out of all non-relevant documents available:@@@@1@14@@danf@17-8-2009 10401100@unknown@formal@none@1@S@: \\mbox{fall-out}=\\frac{|\\{\\mbox{non-relevant documents}\\}\\cap\\{\\mbox{retrieved documents}\\}|}{|\\{\\mbox{non-relevant documents}\\}|} @@@@1@6@@danf@17-8-2009 10401110@unknown@formal@none@1@S@In binary classification, fall-out is closely related to [[specificity (tests)|specificity]].@@@@1@10@@danf@17-8-2009 10401120@unknown@formal@none@1@S@More precisely: \\mbox{fall-out}=1-\\mbox{specificity}.@@@@1@3@@danf@17-8-2009 10401130@unknown@formal@none@1@S@It can be looked at as ''the probability that a non-relevant document is retrieved by the query''.@@@@1@17@@danf@17-8-2009 10401140@unknown@formal@none@1@S@It is trivial to achieve fall-out of 0% by returning zero documents in response to any query.@@@@1@17@@danf@17-8-2009 10401150@unknown@formal@none@1@S@=== F-measure ===@@@@1@3@@danf@17-8-2009 10401160@unknown@formal@none@1@S@The weighted [[harmonic mean]] of precision and recall, the traditional F-measure or balanced F-score is:@@@@1@15@@danf@17-8-2009 10401170@unknown@formal@none@1@S@:F = 2 \\cdot (\\mathrm{precision} \\cdot \\mathrm{recall}) / (\\mathrm{precision} + \\mathrm{recall}).\\,@@@@1@11@@danf@17-8-2009 10401180@unknown@formal@none@1@S@This is also known as the F_1 measure, because recall and precision are evenly weighted.@@@@1@15@@danf@17-8-2009 10401190@unknown@formal@none@1@S@The general formula for non-negative real ß is:@@@@1@8@@danf@17-8-2009 10401200@unknown@formal@none@1@S@:F_\\beta = (1 + \\beta^2) \\cdot (\\mathrm{precision} \\cdot \\mathrm{recall}) / (\\beta^2 \\cdot \\mathrm{precision} + \\mathrm{recall}).\\,@@@@1@15@@danf@17-8-2009 10401210@unknown@formal@none@1@S@Two other commonly used F measures are the F_{2} measure, which weights recall twice as much as precision, and the F_{0.5} measure, which weights precision twice as much as recall.@@@@1@30@@danf@17-8-2009 10401220@unknown@formal@none@1@S@The F-measure was derived by van Rijsbergen (1979) so that F_\\beta "measures the effectiveness of retrieval with respect to a user who attaches ß times as much importance to recall as precision".@@@@1@32@@danf@17-8-2009 10401230@unknown@formal@none@1@S@It is based on van Rijsbergen's effectiveness measure E = 1-(1/(\\alpha/P + (1-\\alpha)/R)).@@@@1@13@@danf@17-8-2009 10401240@unknown@formal@none@1@S@Their relationship is F_\\beta = 1 - E where \\alpha=1/(\\beta^2+1).@@@@1@10@@danf@17-8-2009 10401250@unknown@formal@none@1@S@=== Average precision of precision and recall===@@@@1@7@@danf@17-8-2009 10401260@unknown@formal@none@1@S@The precision and recall are based on the whole list of documents returned by the system.@@@@1@16@@danf@17-8-2009 10401270@unknown@formal@none@1@S@Average precision emphasizes returning more relevant documents earlier.@@@@1@8@@danf@17-8-2009 10401280@unknown@formal@none@1@S@It is average of precisions computed after truncating the list after each of the relevant documents in turn:@@@@1@18@@danf@17-8-2009 10401290@unknown@formal@none@1@S@: \\operatorname{AveP} = \\frac{\\sum_{r=1}^N (P(r) \\times \\mathrm{rel}(r))}{\\mbox{number of relevant documents}} \\!@@@@1@11@@danf@17-8-2009 10401300@unknown@formal@none@1@S@where ''r'' is the rank, ''N'' the number retrieved, ''rel()'' a binary function on the relevance of a given rank, and ''P()'' precision at a given cut-off rank.@@@@1@28@@danf@17-8-2009 10401310@unknown@formal@none@1@S@== Model types ==@@@@1@4@@danf@17-8-2009 10401320@unknown@formal@none@1@S@[[Image:Information-Retrieval-Models.png|thumb|500px|categorization of IR-models (translated from [http://de.wikipedia.org/wiki/Informationsrückgewinnung#Klassifikation_von_Modellen_zur_Repr.C3.A4sentation_nat.C3.BCrlichsprachlicher_Dokumente German entry], original source [http://www.logos-verlag.de/cgi-bin/engbuchmid?isbn=0514&lng=eng&id= Dominik Kuropka])]]@@@@1@13@@danf@17-8-2009 10401325@unknown@formal@none@1@S@For the information retrieval to be efficient, the documents are typically transformed into a suitable representation.@@@@1@16@@danf@17-8-2009 10401330@unknown@formal@none@1@S@There are several representations.@@@@1@4@@danf@17-8-2009 10401340@unknown@formal@none@1@S@The picture on the right illustrates the relationship of some common models.@@@@1@12@@danf@17-8-2009 10401350@unknown@formal@none@1@S@In the picture, the models are categorized according to two dimensions: the mathematical basis and the properties of the model.@@@@1@20@@danf@17-8-2009 10401360@unknown@formal@none@1@S@=== First dimension: mathematical basis ===@@@@1@6@@danf@17-8-2009 10401370@unknown@formal@none@1@S@* ''Set-theoretic models'' represent documents as sets of words or phrases.@@@@1@11@@danf@17-8-2009 10401380@unknown@formal@none@1@S@Similarities are usually derived from set-theoretic operations on those sets.@@@@1@10@@danf@17-8-2009 10401390@unknown@formal@none@1@S@Common models are:@@@@1@3@@danf@17-8-2009 10401400@unknown@formal@none@1@S@** [[Standard Boolean model]]@@@@1@4@@danf@17-8-2009 10401410@unknown@formal@none@1@S@** [[Extended Boolean model]]@@@@1@4@@danf@17-8-2009 10401420@unknown@formal@none@1@S@** [[Fuzzy retrieval]]@@@@1@3@@danf@17-8-2009 10401430@unknown@formal@none@1@S@* ''Algebraic models'' represent documents and queries usually as vectors, matrices or tuples.@@@@1@13@@danf@17-8-2009 10401440@unknown@formal@none@1@S@The similarity of the query vector and document vector is represented as a scalar value.@@@@1@15@@danf@17-8-2009 10401450@unknown@formal@none@1@S@** [[Vector space model]]@@@@1@4@@danf@17-8-2009 10401460@unknown@formal@none@1@S@** [[Generalized vector space model]]@@@@1@5@@danf@17-8-2009 10401470@unknown@formal@none@1@S@** Topic-based vector space model (literature: [http://www.kuropka.net/files/TVSM.pdf], [http://www.logos-verlag.de/cgi-bin/engbuchmid?isbn=0514&lng=eng&id=])@@@@1@8@@danf@17-8-2009 10401480@unknown@formal@none@1@S@** [[Extended Boolean model]]@@@@1@4@@danf@17-8-2009 10401490@unknown@formal@none@1@S@** Enhanced topic-based vector space model (literature: [http://kuropka.net/files/HPI_Evaluation_of_eTVSM.pdf], [http://www.logos-verlag.de/cgi-bin/engbuchmid?isbn=0514&lng=eng&id=])@@@@1@9@@danf@17-8-2009 10401500@unknown@formal@none@1@S@** Latent semantic indexing aka [[latent semantic analysis]]@@@@1@8@@danf@17-8-2009 10401510@unknown@formal@none@1@S@* ''Probabilistic models'' treat the process of document retrieval as a probabilistic inference.@@@@1@13@@danf@17-8-2009 10401520@unknown@formal@none@1@S@Similarities are computed as probabilities that a document is relevant for a given query.@@@@1@14@@danf@17-8-2009 10401530@unknown@formal@none@1@S@Probabilistic theorems like the [[Bayes' theorem]] are often used in these models.@@@@1@12@@danf@17-8-2009 10401540@unknown@formal@none@1@S@** [[Binary independence retrieval]]@@@@1@4@@danf@17-8-2009 10401550@unknown@formal@none@1@S@** [[Probabilistic relevance model (BM25)]]@@@@1@5@@danf@17-8-2009 10401560@unknown@formal@none@1@S@** Uncertain inference@@@@1@3@@danf@17-8-2009 10401570@unknown@formal@none@1@S@** [[Language model]]s@@@@1@3@@danf@17-8-2009 10401580@unknown@formal@none@1@S@** [[Divergence-from-randomness model]]@@@@1@3@@danf@17-8-2009 10401590@unknown@formal@none@1@S@** [[Latent Dirichlet allocation]]@@@@1@4@@danf@17-8-2009 10401600@unknown@formal@none@1@S@=== Second dimension: properties of the model ===@@@@1@8@@danf@17-8-2009 10401610@unknown@formal@none@1@S@* ''Models without term-interdependencies'' treat different terms/words as independent.@@@@1@9@@danf@17-8-2009 10401620@unknown@formal@none@1@S@This fact is usually represented in vector space models by the [[orthogonality]] assumption of term vectors or in probabilistic models by an [[independency]] assumption for term variables.@@@@1@27@@danf@17-8-2009 10401630@unknown@formal@none@1@S@* ''Models with immanent term interdependencies'' allow a representation of interdependencies between terms.@@@@1@13@@danf@17-8-2009 10401640@unknown@formal@none@1@S@However the degree of the interdependency between two terms is defined by the model itself.@@@@1@15@@danf@17-8-2009 10401650@unknown@formal@none@1@S@It is usually directly or indirectly derived (e.g. by [[dimension reduction|dimensional reduction]]) from the [[co-occurrence]] of those terms in the whole set of documents.@@@@1@24@@danf@17-8-2009 10401660@unknown@formal@none@1@S@* ''Models with transcendent term interdependencies'' allow a representation of interdependencies between terms, but they do not allege how the interdependency between two terms is defined.@@@@1@26@@danf@17-8-2009 10401670@unknown@formal@none@1@S@They relay an external source for the degree of interdependency between two terms.@@@@1@13@@danf@17-8-2009 10401680@unknown@formal@none@1@S@(For example a human or sophisticated algorithms.)@@@@1@7@@danf@17-8-2009 10401690@unknown@formal@none@1@S@== Major figures ==@@@@1@4@@danf@17-8-2009 10401700@unknown@formal@none@1@S@* [[Gerard Salton]]@@@@1@3@@danf@17-8-2009 10401710@unknown@formal@none@1@S@* [[Hans Peter Luhn]]@@@@1@4@@danf@17-8-2009 10401720@unknown@formal@none@1@S@* [http://ciir.cs.umass.edu/personnel/croft.html W. Bruce Croft]@@@@1@5@@danf@17-8-2009 10401730@unknown@formal@none@1@S@* [[Karen Spärck Jones]]@@@@1@4@@danf@17-8-2009 10401740@unknown@formal@none@1@S@* [[C. J. van Rijsbergen]]@@@@1@5@@danf@17-8-2009 10401750@unknown@formal@none@1@S@* [http://www.soi.city.ac.uk/~ser/homepage.html Stephen E. Robertson]@@@@1@5@@danf@17-8-2009 10401760@unknown@formal@none@1@S@== Awards in the field ==@@@@1@6@@danf@17-8-2009 10401770@unknown@formal@none@1@S@* [[Tony Kent Strix award]]@@@@1@5@@danf@17-8-2009 10401780@unknown@formal@none@1@S@* [[Gerard Salton Award]]@@@@1@4@@danf@17-8-2009 10410010@unknown@formal@none@1@S@
Information theory
@@@@1@2@@danf@17-8-2009 10410020@unknown@formal@none@1@S@'''Information theory''' is a branch of [[applied mathematics]] and [[electrical engineering]] involving the quantification of [[information]].@@@@1@16@@danf@17-8-2009 10410030@unknown@formal@none@1@S@Historically, information theory was developed to find fundamental limits on compressing and reliably [[communication|communicating]] data.@@@@1@15@@danf@17-8-2009 10410040@unknown@formal@none@1@S@Since its inception it has broadened to find applications in many other areas, including [[statistical inference]], [[natural language processing]], [[cryptography]] generally, [[networks]] other than communication networks -- as in [[neurobiology]], the evolution and function of molecular codes, model selection in ecology, thermal physics, [[quantum computing]], plagiarism detection and other forms of [[data analysis]].@@@@1@53@@danf@17-8-2009 10410050@unknown@formal@none@1@S@A key measure of information in the theory is known as [[information entropy]], which is usually expressed by the average number of bits needed for storage or communication.@@@@1@28@@danf@17-8-2009 10410060@unknown@formal@none@1@S@Intuitively, entropy quantifies the uncertainty involved when encountering a [[random variable]].@@@@1@11@@danf@17-8-2009 10410070@unknown@formal@none@1@S@For example, a fair coin flip (2 equally likely outcomes) will have less entropy than a roll of a die (6 equally likely outcomes).@@@@1@24@@danf@17-8-2009 10410080@unknown@formal@none@1@S@Applications of fundamental topics of information theory include [[lossless data compression]] (e.g. [[ZIP (file format)|ZIP files]]), [[lossy data compression]] (e.g. [[MP3]]s), and [[channel capacity|channel coding]] (e.g. for [[DSL]] lines).@@@@1@29@@danf@17-8-2009 10410110@unknown@formal@none@1@S@The field is at the intersection of [[mathematics]], [[statistics]], [[computer science]], [[physics]], [[neurobiology]], and [[electrical engineering]].@@@@1@16@@danf@17-8-2009 10410120@unknown@formal@none@1@S@Its impact has been crucial to the success of the [[Voyager program|Voyager]] missions to deep space, the invention of the CD, the feasibility of mobile phones, the development of the [[Internet]], the study of [[linguistics]] and of human perception, the understanding of [[black hole]]s, and numerous other fields.@@@@1@48@@danf@17-8-2009 10410130@unknown@formal@none@1@S@Important sub-fields of information theory are source coding, channel coding, algorithmic complexity theory, algorithmic information theory, and measures of information.@@@@1@20@@danf@17-8-2009 10410140@unknown@formal@none@1@S@==Overview==@@@@1@1@@danf@17-8-2009 10410150@unknown@formal@none@1@S@The main concepts of information theory can be grasped by considering the most widespread means of human communication: language.@@@@1@19@@danf@17-8-2009 10410160@unknown@formal@none@1@S@Two important aspects of a good language are as follows: First, the most common words (e.g., "a", "the", "I") should be shorter than less common words (e.g., "benefit", "generation", "mediocre"), so that sentences will not be too long.@@@@1@38@@danf@17-8-2009 10410170@unknown@formal@none@1@S@Such a tradeoff in word length is analogous to [[data compression]] and is the essential aspect of [[source coding]].@@@@1@19@@danf@17-8-2009 10410180@unknown@formal@none@1@S@Second, if part of a sentence is unheard or misheard due to noise -— e.g., a passing car -— the listener should still be able to glean the meaning of the underlying message.@@@@1@33@@danf@17-8-2009 10410190@unknown@formal@none@1@S@Such robustness is as essential for an electronic communication system as it is for a language; properly building such robustness into communications is done by [[Channel capacity|channel coding]].@@@@1@28@@danf@17-8-2009 10410200@unknown@formal@none@1@S@Source coding and channel coding are the fundamental concerns of information theory.@@@@1@12@@danf@17-8-2009 10410210@unknown@formal@none@1@S@Note that these concerns have nothing to do with the ''importance'' of messages.@@@@1@13@@danf@17-8-2009 10410220@unknown@formal@none@1@S@For example, a platitude such as "Thank you; come again" takes about as long to say or write as the urgent plea, "Call an ambulance!" while clearly the latter is more important and more meaningful.@@@@1@35@@danf@17-8-2009 10410230@unknown@formal@none@1@S@Information theory, however, does not consider message importance or meaning, as these are matters of the quality of data rather than the quantity and readability of data, the latter of which is determined solely by probabilities.@@@@1@36@@danf@17-8-2009 10410240@unknown@formal@none@1@S@Information theory is generally considered to have been founded in 1948 by [[Claude Elwood Shannon|Claude Shannon]] in his seminal work, "[[A Mathematical Theory of Communication]]."@@@@1@25@@danf@17-8-2009 10410250@unknown@formal@none@1@S@The central paradigm of classical information theory is the engineering problem of the transmission of information over a noisy channel.@@@@1@20@@danf@17-8-2009 10410260@unknown@formal@none@1@S@The most fundamental results of this theory are Shannon's [[source coding theorem]], which establishes that, on average, the number of ''bits'' needed to represent the result of an uncertain event is given by its [[information entropy|entropy]]; and Shannon's [[noisy-channel coding theorem]], which states that ''reliable'' communication is possible over ''noisy'' channels provided that the rate of communication is below a certain threshold called the channel capacity.@@@@1@66@@danf@17-8-2009 10410270@unknown@formal@none@1@S@The channel capacity can be approached in practice by using appropriate encoding and decoding systems.@@@@1@15@@danf@17-8-2009 10410280@unknown@formal@none@1@S@Information theory is closely associated with a collection of pure and applied disciplines that have been investigated and reduced to engineering practice under a variety of rubrics throughout the world over the past half century or more: [[adaptive system]]s, [[anticipatory system]]s, [[artificial intelligence]], [[complex system]]s, [[complexity science]], [[cybernetics]], [[informatics]], [[machine learning]], along with [[systems science]]s of many descriptions.@@@@1@58@@danf@17-8-2009 10410290@unknown@formal@none@1@S@Information theory is a broad and deep mathematical theory, with equally broad and deep applications, amongst which is the vital field of [[coding theory]].@@@@1@24@@danf@17-8-2009 10410300@unknown@formal@none@1@S@Coding theory is concerned with finding explicit methods, called ''codes'', of increasing the efficiency and reducing the net error rate of data communication over a noisy channel to near the limit that Shannon proved is the maximum possible for that channel.@@@@1@41@@danf@17-8-2009 10410310@unknown@formal@none@1@S@These codes can be roughly subdivided into [[data compression]] (source coding) and [[error-correction]] (channel coding) techniques.@@@@1@16@@danf@17-8-2009 10410320@unknown@formal@none@1@S@In the latter case, it took many years to find the methods Shannon's work proved were possible.@@@@1@17@@danf@17-8-2009 10410330@unknown@formal@none@1@S@A third class of information theory codes are cryptographic algorithms (both [[code (cryptography)|code]]s and [[cipher]]s).@@@@1@15@@danf@17-8-2009 10410340@unknown@formal@none@1@S@Concepts, methods and results from coding theory and information theory are widely used in [[cryptography]] and [[cryptanalysis]].@@@@1@17@@danf@17-8-2009 10410350@unknown@formal@none@1@S@''See the article [[ban (information)]] for a historical application.''@@@@1@9@@danf@17-8-2009 10410360@unknown@formal@none@1@S@Information theory is also used in [[information retrieval]], [[intelligence (information gathering)|intelligence gathering]], [[gambling]], [[statistics]], and even in [[musical composition]].@@@@1@19@@danf@17-8-2009 10410370@unknown@formal@none@1@S@==Historical background==@@@@1@2@@danf@17-8-2009 10410380@unknown@formal@none@1@S@The landmark event that established the discipline of information theory, and brought it to immediate worldwide attention, was the publication of [[Claude E. Shannon]]'s classic paper "[[A Mathematical Theory of Communication]]" in the ''[[Bell System Technical Journal]]'' in July and October of 1948.@@@@1@43@@danf@17-8-2009 10410390@unknown@formal@none@1@S@Prior to this paper, limited information theoretic ideas had been developed at Bell Labs, all implicitly assuming events of equal probability.@@@@1@21@@danf@17-8-2009 10410400@unknown@formal@none@1@S@[[Harry Nyquist]]'s 1924 paper, ''Certain Factors Affecting Telegraph Speed,'' contains a theoretical section quantifying "intelligence" and the "line speed" at which it can be transmitted by a communication system, giving the relation W = K \\log m, where ''W'' is the speed of transmission of intelligence, ''m'' is the number of different voltage levels to choose from at each time step, and ''K'' is a constant.@@@@1@66@@danf@17-8-2009 10410410@unknown@formal@none@1@S@[[Ralph Hartley]]'s 1928 paper, ''Transmission of Information,'' uses the word ''information'' as a measurable quantity, reflecting the receiver's ability to distinguish that one sequence of symbols from any other, thus quantifying information as H = \\log S^n = n \\log S, where ''S'' was the number of possible symbols, and ''n'' the number of symbols in a transmission.@@@@1@58@@danf@17-8-2009 10410420@unknown@formal@none@1@S@The natural unit of information was therefore the decimal digit, much later renamed the [[ban (information)|hartley]] in his honour as a unit or scale or measure of information.@@@@1@28@@danf@17-8-2009 10410430@unknown@formal@none@1@S@[[Alan Turing]] in 1940 used similar ideas as part of the statistical analysis of the breaking of the German second world war [[Cryptanalysis of the Enigma|Enigma]] ciphers.@@@@1@27@@danf@17-8-2009 10410440@unknown@formal@none@1@S@Much of the mathematics behind information theory with events of different probabilities was developed for the field of [[thermodynamics]] by [[Ludwig Boltzmann]] and [[J. Willard Gibbs]].@@@@1@26@@danf@17-8-2009 10410450@unknown@formal@none@1@S@Connections between information-theoretic entropy and thermodynamic entropy, including the important contributions by [[Rolf Landauer]] in the 1960s, are explored in ''[[Entropy in thermodynamics and information theory]]''.@@@@1@26@@danf@17-8-2009 10410460@unknown@formal@none@1@S@In Shannon's revolutionary and groundbreaking paper, the work for which had been substantially completed at Bell Labs by the end of 1944, Shannon for the first time introduced the qualitative and quantitative model of communication as a statistical process underlying information theory, opening with the assertion that@@@@1@47@@danf@17-8-2009 10410470@unknown@formal@none@1@S@:"The fundamental problem of communication is that of reproducing at one point, either exactly or approximately, a message selected at another point."@@@@1@22@@danf@17-8-2009 10410480@unknown@formal@none@1@S@With it came the ideas of@@@@1@6@@danf@17-8-2009 10410490@unknown@formal@none@1@S@* the [[information entropy]] and [[redundancy (information theory)|redundancy]] of a source, and its relevance through the [[source coding theorem]];@@@@1@19@@danf@17-8-2009 10410500@unknown@formal@none@1@S@* the [[mutual information]], and the [[channel capacity]] of a noisy channel, including the promise of perfect loss-free communication given by the [[noisy-channel coding theorem]];@@@@1@25@@danf@17-8-2009 10410510@unknown@formal@none@1@S@* the practical result of the [[Shannon–Hartley law]] for the channel capacity of a Gaussian channel; and of course@@@@1@19@@danf@17-8-2009 10410520@unknown@formal@none@1@S@* the [[bit]]—a new way of seeing the most fundamental unit of information@@@@1@13@@danf@17-8-2009 10410530@unknown@formal@none@1@S@==Ways of measuring information==@@@@1@4@@danf@17-8-2009 10410540@unknown@formal@none@1@S@Information theory is based on [[probability theory]] and [[statistics]].@@@@1@9@@danf@17-8-2009 10410550@unknown@formal@none@1@S@The most important quantities of information are [[Information entropy|entropy]], the information in a [[random variable]], and [[mutual information]], the amount of information in common between two random variables.@@@@1@28@@danf@17-8-2009 10410560@unknown@formal@none@1@S@The former quantity indicates how easily message data can be [[data compression|compressed]] while the latter can be used to find the communication rate across a [[Channel (communications)|channel]].@@@@1@27@@danf@17-8-2009 10410570@unknown@formal@none@1@S@The choice of logarithmic base in the following formulae determines the [[units of measurement|unit]] of [[information entropy]] that is used.@@@@1@20@@danf@17-8-2009 10410580@unknown@formal@none@1@S@The most common unit of information is the [[bit]], based on the [[binary logarithm]].@@@@1@14@@danf@17-8-2009 10410590@unknown@formal@none@1@S@Other units include the [[nat (information)|nat]], which is based on the [[natural logarithm]], and the [[deciban|hartley]], which is based on the [[common logarithm]].@@@@1@23@@danf@17-8-2009 10410600@unknown@formal@none@1@S@In what follows, an expression of the form p \\log p \\, is considered by convention to be equal to zero whenever p=0.@@@@1@23@@danf@17-8-2009 10410605@unknown@formal@none@1@S@This is justified because \\lim_{p \\rightarrow 0+} p \\log p = 0 for any logarithmic base.@@@@1@16@@danf@17-8-2009 10410610@unknown@formal@none@1@S@===Entropy===@@@@1@1@@danf@17-8-2009 10410620@unknown@formal@none@1@S@The '''[[information entropy|entropy]]''', H, of a discrete random variable X is a measure of the amount of ''uncertainty'' associated with the value of X.@@@@1@24@@danf@17-8-2009 10410630@unknown@formal@none@1@S@Suppose one transmits 1000 bits (0s and 1s).@@@@1@8@@danf@17-8-2009 10410640@unknown@formal@none@1@S@If these bits are known ahead of transmission (to be a certain value with absolute probability), logic dictates that no information has been transmitted.@@@@1@24@@danf@17-8-2009 10410650@unknown@formal@none@1@S@If, however, each is equally and independently likely to be 0 or 1, 1000 bits (in the information theoretic sense) have been transmitted.@@@@1@23@@danf@17-8-2009 10410660@unknown@formal@none@1@S@Between these two extremes, information can be quantified as follows.@@@@1@10@@danf@17-8-2009 10410670@unknown@formal@none@1@S@If \\mathbb{X}\\, is the set of all messages x that X could be, and p(x) is the probability of X given x, then the entropy of X is defined:@@@@1@29@@danf@17-8-2009 10410680@unknown@formal@none@1@S@: H(X) = \\mathbb{E}_{X} [I(x)] = -\\sum_{x \\in \\mathbb{X}} p(x) \\log p(x).@@@@1@12@@danf@17-8-2009 10410690@unknown@formal@none@1@S@(Here, I(x) is the [[self-information]], which is the entropy contribution of an individual message.)@@@@1@14@@danf@17-8-2009 10410700@unknown@formal@none@1@S@An important property of entropy is that it is maximized when all the messages in the message space are equiprobable—i.e., most unpredictable—in which case H(X) = \\log |\\mathbb{X}|.@@@@1@28@@danf@17-8-2009 10410710@unknown@formal@none@1@S@The special case of information entropy for a random variable with two outcomes is the '''[[binary entropy function]]''':@@@@1@18@@danf@17-8-2009 10410720@unknown@formal@none@1@S@:H_\\mbox{b}(p) = - p \\log p - (1-p)\\log (1-p).\\,@@@@1@9@@danf@17-8-2009 10410730@unknown@formal@none@1@S@===Joint entropy===@@@@1@2@@danf@17-8-2009 10410740@unknown@formal@none@1@S@The '''[[joint entropy]]''' of two discrete random variables X and Y is merely the entropy of their pairing: (X, Y).@@@@1@20@@danf@17-8-2009 10410750@unknown@formal@none@1@S@This implies that if X and Y are [[statistical independence|independent]], then their joint entropy is the sum of their individual entropies.@@@@1@21@@danf@17-8-2009 10410760@unknown@formal@none@1@S@For example, if (X,Y) represents the position of a [[chess]] piece — X the row and Y the column, then the joint entropy of the row of the piece and the column of the piece will be the entropy of the position of the piece.@@@@1@45@@danf@17-8-2009 10410770@unknown@formal@none@1@S@:H(X, Y) = \\mathbb{E}_{X,Y} [-\\log p(x,y)] = - \\sum_{x, y} p(x, y) \\log p(x, y) \\,@@@@1@16@@danf@17-8-2009 10410780@unknown@formal@none@1@S@Despite similar notation, joint entropy should not be confused with '''[[cross entropy]]'''.@@@@1@12@@danf@17-8-2009 10410790@unknown@formal@none@1@S@===Conditional entropy (equivocation)===@@@@1@3@@danf@17-8-2009 10410800@unknown@formal@none@1@S@The '''[[conditional entropy]]''' or '''conditional uncertainty''' of X given random variable Y (also called the '''equivocation''' of X about Y) is the average conditional entropy over Y:@@@@1@27@@danf@17-8-2009 10410810@unknown@formal@none@1@S@: H(X|Y) = \\mathbb E_Y [H(X|y)] = -\\sum_{y \\in Y} p(y) \\sum_{x \\in X} p(x|y) \\log p(x|y) = -\\sum_{x,y} p(x,y) \\log \\frac{p(x,y)}{p(y)}.@@@@1@22@@danf@17-8-2009 10410820@unknown@formal@none@1@S@Because entropy can be conditioned on a random variable or on that random variable being a certain value, care should be taken not to confuse these two definitions of conditional entropy, the former of which is in more common use.@@@@1@40@@danf@17-8-2009 10410830@unknown@formal@none@1@S@A basic property of this form of conditional entropy is that:@@@@1@11@@danf@17-8-2009 10410840@unknown@formal@none@1@S@: H(X|Y) = H(X,Y) - H(Y) .\\,@@@@1@8@@danf@17-8-2009 10410850@unknown@formal@none@1@S@===Mutual information (transinformation)===@@@@1@3@@danf@17-8-2009 10410860@unknown@formal@none@1@S@'''[[Mutual information]]''' measures the amount of information that can be obtained about one random variable by observing another.@@@@1@18@@danf@17-8-2009 10410870@unknown@formal@none@1@S@It is important in communication where it can be used to maximize the amount of information shared between sent and received signals.@@@@1@22@@danf@17-8-2009 10410880@unknown@formal@none@1@S@The mutual information of X relative to Y is given by:@@@@1@11@@danf@17-8-2009 10410890@unknown@formal@none@1@S@:I(X;Y) = \\mathbb{E}_{X,Y} [SI(x,y)] = \\sum_{x,y} p(x,y) \\log \\frac{p(x,y)}{p(x)\\, p(y)}@@@@1@10@@danf@17-8-2009 10410900@unknown@formal@none@1@S@where SI (''S''pecific mutual ''I''nformation) is the [[pointwise mutual information]].@@@@1@10@@danf@17-8-2009 10410910@unknown@formal@none@1@S@A basic property of the mutual information is that@@@@1@9@@danf@17-8-2009 10410920@unknown@formal@none@1@S@: I(X;Y) = H(X) - H(X|Y).\\,@@@@1@6@@danf@17-8-2009 10410930@unknown@formal@none@1@S@That is, knowing ''Y'', we can save an average of I(X; Y) bits in encoding ''X'' compared to not knowing ''Y''.@@@@1@21@@danf@17-8-2009 10410940@unknown@formal@none@1@S@Mutual information is [[symmetric function|symmetric]]:@@@@1@5@@danf@17-8-2009 10410950@unknown@formal@none@1@S@: I(X;Y) = I(Y;X) = H(X) + H(Y) - H(X,Y).\\,@@@@1@10@@danf@17-8-2009 10410960@unknown@formal@none@1@S@Mutual information can be expressed as the average [[Kullback–Leibler divergence]] (information gain) of the [[posterior probability|posterior probability distribution]] of ''X'' given the value of ''Y'' to the [[prior probability|prior distribution]] on ''X'':@@@@1@32@@danf@17-8-2009 10410970@unknown@formal@none@1@S@: I(X;Y) = \\mathbb E_{p(y)} [D_{\\mathrm{KL}}( p(X|Y=y) \\| p(X) )].@@@@1@10@@danf@17-8-2009 10410980@unknown@formal@none@1@S@In other words, this is a measure of how much, on the average, the probability distribution on ''X'' will change if we are given the value of ''Y''.@@@@1@28@@danf@17-8-2009 10410990@unknown@formal@none@1@S@This is often recalculated as the divergence from the product of the marginal distributions to the actual joint distribution:@@@@1@19@@danf@17-8-2009 10411000@unknown@formal@none@1@S@: I(X; Y) = D_{\\mathrm{KL}}(p(X,Y) \\| p(X)p(Y)).@@@@1@7@@danf@17-8-2009 10411010@unknown@formal@none@1@S@Mutual information is closely related to the [[likelihood-ratio test|log-likelihood ratio test]] in the context of contingency tables and the [[multinomial distribution]] and to [[Pearson's chi-square test|Pearson's χ2 test]]: mutual information can be considered a statistic for assessing independence between a pair of variables, and has a well-specified asymptotic distribution.@@@@1@49@@danf@17-8-2009 10411020@unknown@formal@none@1@S@===Kullback–Leibler divergence (information gain)===@@@@1@4@@danf@17-8-2009 10411030@unknown@formal@none@1@S@The '''[[Kullback–Leibler divergence]]''' (or '''information divergence''', '''information gain''', or '''relative entropy''') is a way of comparing two distributions: a "true" [[probability distribution]] ''p(X)'', and an arbitrary probability distribution ''q(X)''.@@@@1@29@@danf@17-8-2009 10411040@unknown@formal@none@1@S@If we compress data in a manner that assumes ''q(X)'' is the distribution underlying some data, when, in reality, ''p(X)'' is the correct distribution, the Kullback–Leibler divergence is the number of average additional bits per datum necessary for compression.@@@@1@39@@danf@17-8-2009 10411050@unknown@formal@none@1@S@It is thus defined@@@@1@4@@danf@17-8-2009 10411060@unknown@formal@none@1@S@:D_{\\mathrm{KL}}(p(X) \\| q(X)) = \\sum_{x \\in X} -p(x) \\log {q(x)} \\, - \\, \\left( -p(x) \\log {p(x)}\\right) = \\sum_{x \\in X} p(x) \\log \\frac{p(x)}{q(x)}.@@@@1@24@@danf@17-8-2009 10411070@unknown@formal@none@1@S@Although it is sometimes used as a 'distance metric', it is not a true [[Metric (mathematics)|metric]] since it is not symmetric and does not satisfy the [[triangle inequality]] (making it a semi-quasimetric).@@@@1@32@@danf@17-8-2009 10411080@unknown@formal@none@1@S@===Other quantities===@@@@1@2@@danf@17-8-2009 10411090@unknown@formal@none@1@S@Other important information theoretic quantities include [[Rényi entropy]] (a generalization of entropy) and [[differential entropy]] (a generalization of quantities of information to continuous distributions.)@@@@1@24@@danf@17-8-2009 10411100@unknown@formal@none@1@S@==Coding theory==@@@@1@2@@danf@17-8-2009 10411110@unknown@formal@none@1@S@[[Coding theory]] is one of the most important and direct applications of information theory.@@@@1@14@@danf@17-8-2009 10411120@unknown@formal@none@1@S@It can be subdivided into [[data compression|source coding]] theory and [[error correction|channel coding]] theory.@@@@1@14@@danf@17-8-2009 10411130@unknown@formal@none@1@S@Using a statistical description for data, information theory quantifies the number of bits needed to describe the data, which is the information entropy of the source.@@@@1@26@@danf@17-8-2009 10411140@unknown@formal@none@1@S@* Data compression (source coding): There are two formulations for the compression problem:@@@@1@13@@danf@17-8-2009 10411150@unknown@formal@none@1@S@#[[lossless data compression]]: the data must be reconstructed exactly;@@@@1@9@@danf@17-8-2009 10411160@unknown@formal@none@1@S@#[[lossy data compression]]: allocates bits needed to reconstruct the data, within a specified fidelity level measured by a distortion function.@@@@1@20@@danf@17-8-2009 10411170@unknown@formal@none@1@S@This subset of Information theory is called [[rate–distortion theory]].@@@@1@9@@danf@17-8-2009 10411180@unknown@formal@none@1@S@* Error-correcting codes (channel coding): While data compression removes as much [[redundancy (information theory)|redundancy]] as possible, an error correcting code adds just the right kind of redundancy (i.e. [[error correction]]) needed to transmit the data efficiently and faithfully across a noisy channel.@@@@1@42@@danf@17-8-2009 10411190@unknown@formal@none@1@S@This division of coding theory into compression and transmission is justified by the information transmission theorems, or source–channel separation theorems that justify the use of bits as the universal currency for information in many contexts.@@@@1@35@@danf@17-8-2009 10411200@unknown@formal@none@1@S@However, these theorems only hold in the situation where one transmitting user wishes to communicate to one receiving user.@@@@1@19@@danf@17-8-2009 10411210@unknown@formal@none@1@S@In scenarios with more than one transmitter (the multiple-access channel), more than one receiver (the [[broadcast channel]]) or intermediary "helpers" (the [[relay channel]]), or more general [[computer network|networks]], compression followed by transmission may no longer be optimal.@@@@1@37@@danf@17-8-2009 10411220@unknown@formal@none@1@S@[[Network information theory]] refers to these multi-agent communication models.@@@@1@9@@danf@17-8-2009 10411230@unknown@formal@none@1@S@===Source theory===@@@@1@2@@danf@17-8-2009 10411240@unknown@formal@none@1@S@Any process that generates successive messages can be considered a '''[[Communication source|source]]''' of information.@@@@1@14@@danf@17-8-2009 10411250@unknown@formal@none@1@S@A memoryless source is one in which each message is an [[Independent identically-distributed random variables|independent identically-distributed random variable]], whereas the properties of [[ergodic theory|ergodicity]] and [[stationary process|stationarity]] impose more general constraints.@@@@1@31@@danf@17-8-2009 10411260@unknown@formal@none@1@S@All such sources are [[stochastic process|stochastic]].@@@@1@6@@danf@17-8-2009 10411270@unknown@formal@none@1@S@These terms are well studied in their own right outside information theory.@@@@1@12@@danf@17-8-2009 10411280@unknown@formal@none@1@S@====Rate====@@@@1@1@@danf@17-8-2009 10411290@unknown@formal@none@1@S@Information [[Entropy rate|'''rate''']] is the average entropy per symbol.@@@@1@9@@danf@17-8-2009 10411300@unknown@formal@none@1@S@For memoryless sources, this is merely the entropy of each symbol, while, in the case of a stationary stochastic process, it is@@@@1@22@@danf@17-8-2009 10411310@unknown@formal@none@1@S@:r = \\lim_{n \\to \\infty} H(X_n|X_{n-1},X_{n-2},X_{n-3}, \\ldots);@@@@1@7@@danf@17-8-2009 10411320@unknown@formal@none@1@S@that is, the conditional entropy of a symbol given all the previous symbols generated.@@@@1@14@@danf@17-8-2009 10411330@unknown@formal@none@1@S@For the more general case of a process that is not necessarily stationary, the ''average rate'' is@@@@1@17@@danf@17-8-2009 10411340@unknown@formal@none@1@S@:r = \\lim_{n \\to \\infty} \\frac{1}{n} H(X_1, X_2, \\dots X_n);@@@@1@10@@danf@17-8-2009 10411350@unknown@formal@none@1@S@that is, the limit of the joint entropy per symbol.@@@@1@10@@danf@17-8-2009 10411360@unknown@formal@none@1@S@For stationary sources, these two expressions give the same result.@@@@1@10@@danf@17-8-2009 10411370@unknown@formal@none@1@S@It is common in information theory to speak of the "rate" or "entropy" of a language.@@@@1@16@@danf@17-8-2009 10411380@unknown@formal@none@1@S@This is appropriate, for example, when the source of information is English prose.@@@@1@13@@danf@17-8-2009 10411390@unknown@formal@none@1@S@The rate of a source of information is related to its [[redundancy (information theory)|redundancy]] and how well it can be [[data compression|compressed]], the subject of '''source coding'''.@@@@1@27@@danf@17-8-2009 10411400@unknown@formal@none@1@S@===Channel capacity===@@@@1@2@@danf@17-8-2009 10411410@unknown@formal@none@1@S@Communications over a channel—such as an [[ethernet]] wire—is the primary motivation of information theory.@@@@1@14@@danf@17-8-2009 10411420@unknown@formal@none@1@S@As anyone who's ever used a telephone (mobile or landline) knows, however, such channels often fail to produce exact reconstruction of a signal; noise, periods of silence, and other forms of signal corruption often degrade quality.@@@@1@36@@danf@17-8-2009 10411430@unknown@formal@none@1@S@How much information can one hope to communicate over a noisy (or otherwise imperfect) channel?@@@@1@15@@danf@17-8-2009 10411440@unknown@formal@none@1@S@Consider the communications process over a discrete channel.@@@@1@8@@danf@17-8-2009 10411450@unknown@formal@none@1@S@A simple model of the process is shown below:@@@@1@9@@danf@17-8-2009 10411460@unknown@formal@none@1@S@Here ''X'' represents the space of messages transmitted, and ''Y'' the space of messages received during a unit time over our channel.@@@@1@22@@danf@17-8-2009 10411470@unknown@formal@none@1@S@Let p(y|x) be the [[conditional probability]] distribution function of ''Y'' given ''X''.@@@@1@12@@danf@17-8-2009 10411480@unknown@formal@none@1@S@We will consider p(y|x) to be an inherent fixed property of our communications channel (representing the nature of the '''[[Signal noise|noise]]''' of our channel).@@@@1@24@@danf@17-8-2009 10411490@unknown@formal@none@1@S@Then the joint distribution of ''X'' and ''Y'' is completely determined by our channel and by our choice of f(x), the marginal distribution of messages we choose to send over the channel.@@@@1@32@@danf@17-8-2009 10411500@unknown@formal@none@1@S@Under these constraints, we would like to maximize the rate of information, or the '''[[Signal (electrical engineering)|signal]]''', we can communicate over the channel.@@@@1@23@@danf@17-8-2009 10411510@unknown@formal@none@1@S@The appropriate measure for this is the [[mutual information]], and this maximum mutual information is called the '''[[channel capacity]]''' and is given by:@@@@1@23@@danf@17-8-2009 10411520@unknown@formal@none@1@S@: C = \\max_{f} I(X;Y).\\! @@@@1@6@@danf@17-8-2009 10411530@unknown@formal@none@1@S@This capacity has the following property related to communicating at information rate ''R'' (where ''R'' is usually bits per symbol).@@@@1@20@@danf@17-8-2009 10411540@unknown@formal@none@1@S@For any information rate ''R < C'' and coding error ε > 0, for large enough ''N'', there exists a code of length ''N'' and rate ≥ R and a decoding algorithm, such that the maximal probability of block error is ≤ ε; that is, it is always possible to transmit with arbitrarily small block error.@@@@1@56@@danf@17-8-2009 10411550@unknown@formal@none@1@S@In addition, for any rate ''R > C'', it is impossible to transmit with arbitrarily small block error.@@@@1@18@@danf@17-8-2009 10411560@unknown@formal@none@1@S@'''[[Channel code|Channel coding]]''' is concerned with finding such nearly optimal [[error detection and correction|codes]] that can be used to transmit data over a noisy channel with a small coding error at a rate near the channel capacity.@@@@1@37@@danf@17-8-2009 10411570@unknown@formal@none@1@S@====Channel capacity of particular model channels====@@@@1@6@@danf@17-8-2009 10411580@unknown@formal@none@1@S@* A continuous-time analog communications channel subject to Gaussian noise — see [[Shannon–Hartley theorem]].@@@@1@14@@danf@17-8-2009 10411590@unknown@formal@none@1@S@* A [[binary symmetric channel]] (BSC) with crossover probability ''p'' is a binary input, binary output channel that flips the input bit with probability '' p''.@@@@1@26@@danf@17-8-2009 10411600@unknown@formal@none@1@S@The BSC has a capacity of 1 - H_\\mbox{b}(p) bits per channel use, where H_\\mbox{b} is the [[binary entropy function]]:@@@@1@20@@danf@17-8-2009 10411610@unknown@formal@none@1@S@::@@@@1@1@@danf@17-8-2009 10411620@unknown@formal@none@1@S@* A binary erasure channel (BEC) with erasure probability '' p '' is a binary input, ternary output channel.@@@@1@19@@danf@17-8-2009 10411630@unknown@formal@none@1@S@The possible channel outputs are ''0'', ''1'', and a third symbol 'e' called an erasure.@@@@1@15@@danf@17-8-2009 10411640@unknown@formal@none@1@S@The erasure represents complete loss of information about an input bit.@@@@1@11@@danf@17-8-2009 10411650@unknown@formal@none@1@S@The capacity of the BEC is ''1 - p'' bits per channel use.@@@@1@13@@danf@17-8-2009 10411660@unknown@formal@none@1@S@::@@@@1@1@@danf@17-8-2009 10411670@unknown@formal@none@1@S@==Applications to other fields==@@@@1@4@@danf@17-8-2009 10411680@unknown@formal@none@1@S@===Intelligence uses and secrecy applications===@@@@1@5@@danf@17-8-2009 10411690@unknown@formal@none@1@S@Information theoretic concepts apply to [[cryptography]] and [[cryptanalysis]].@@@@1@8@@danf@17-8-2009 10411700@unknown@formal@none@1@S@[[Turing]]'s information unit, the [[Ban (information)|ban]], was used in the [[Ultra]] project, breaking the German [[Enigma machine]] code and hastening the [[Victory in Europe Day|end of WWII in Europe]].@@@@1@29@@danf@17-8-2009 10411710@unknown@formal@none@1@S@Shannon himself defined an important concept now called the [[unicity distance]].@@@@1@11@@danf@17-8-2009 10411720@unknown@formal@none@1@S@Based on the [[redundancy (information theory)|redundancy]] of the [[plaintext]], it attempts to give a minimum amount of [[ciphertext]] necessary to ensure unique decipherability.@@@@1@23@@danf@17-8-2009 10411730@unknown@formal@none@1@S@Information theory leads us to believe it is much more difficult to keep secrets than it might first appear.@@@@1@19@@danf@17-8-2009 10411740@unknown@formal@none@1@S@A [[brute force attack]] can break systems based on [[public-key cryptography|asymmetric key algorithms]] or on most commonly used methods of [[symmetric-key algorithm|symmetric key algorithms]] (sometimes called secret key algorithms), such as [[block cipher]]s.@@@@1@33@@danf@17-8-2009 10411750@unknown@formal@none@1@S@The security of all such methods currently comes from the assumption that no known attack can break them in a practical amount of time.@@@@1@24@@danf@17-8-2009 10411760@unknown@formal@none@1@S@[[Information theoretic security]] refers to methods such as the [[one-time pad]] that are not vulnerable to such brute force attacks.@@@@1@20@@danf@17-8-2009 10411770@unknown@formal@none@1@S@In such cases, the positive conditional [[mutual information]] between the [[plaintext]] and [[ciphertext]] (conditioned on the [[key (cryptography)| key]]) can ensure proper transmission, while the unconditional mutual information between the plaintext and ciphertext remains zero, resulting in absolutely secure communications.@@@@1@40@@danf@17-8-2009 10411780@unknown@formal@none@1@S@In other words, an eavesdropper would not be able to improve his or her guess of the plaintext by gaining knowledge of the ciphertext but not of the key.@@@@1@29@@danf@17-8-2009 10411790@unknown@formal@none@1@S@However, as in any other cryptographic system, care must be used to correctly apply even information-theoretically secure methods; the [[Venona project]] was able to crack the one-time pads of the [[Soviet Union]] due to their improper reuse of key material.@@@@1@40@@danf@17-8-2009 10411800@unknown@formal@none@1@S@===Pseudorandom number generation===@@@@1@3@@danf@17-8-2009 10411810@unknown@formal@none@1@S@[[Pseudorandom number generator]]s are widely available in computer language libraries and application programs.@@@@1@13@@danf@17-8-2009 10411820@unknown@formal@none@1@S@They are, almost universally, unsuited to cryptographic use as they do not evade the deterministic nature of modern computer equipment and software.@@@@1@22@@danf@17-8-2009 10411830@unknown@formal@none@1@S@A class of improved random number generators is termed [[Cryptographically secure pseudorandom number generator]]s, but even they require external to the software [[random seed]]s to work as intended.@@@@1@28@@danf@17-8-2009 10411840@unknown@formal@none@1@S@These can be obtained via [[extractor]]s, if done carefully.@@@@1@9@@danf@17-8-2009 10411850@unknown@formal@none@1@S@The measure of sufficient randomness in extractors is [[min-entropy]], a value related to Shannon entropy through [[Rényi entropy]]; Rényi entropy is also used in evaluating randomness in cryptographic systems.@@@@1@29@@danf@17-8-2009 10411860@unknown@formal@none@1@S@Although related, the distinctions among these measures mean that a [[random variable]] with high Shannon entropy is not necessarily satisfactory for use in an extractor and so for cryptography uses.@@@@1@30@@danf@17-8-2009 10411870@unknown@formal@none@1@S@===Miscellaneous applications===@@@@1@2@@danf@17-8-2009 10411880@unknown@formal@none@1@S@Information theory also has applications in [[Gambling and information theory|gambling and investing]], [[black hole information paradox|black holes]], [[bioinformatics]], and [[music]].@@@@1@20@@danf@17-8-2009 10420010@unknown@formal@none@1@S@
Italian language
@@@@1@2@@danf@17-8-2009 10420020@unknown@formal@none@1@S@'''Italian''' (, or ''lingua italiana'') is a [[Romance languages|Romance language]] spoken as a [[first language]] by about 63 million people, primarily in [[Italy]].@@@@1@23@@danf@17-8-2009 10420030@unknown@formal@none@1@S@In [[Switzerland]], Italian is one of four [[Linguistic geography of Switzerland|official language]]s.@@@@1@12@@danf@17-8-2009 10420040@unknown@formal@none@1@S@It is also the official language of [[San Marino]].@@@@1@9@@danf@17-8-2009 10420050@unknown@formal@none@1@S@It is the primary language of the [[Vatican City]].@@@@1@9@@danf@17-8-2009 10420060@unknown@formal@none@1@S@Standard Italian, adopted by the state after the [[unification of Italy]], is based on [[Tuscan dialect|Tuscan]] and is somewhat intermediate between [[Italo-Western|Italo-Dalmatian languages]] of the [[Mezzogiorno|South]] and [[Northern Italian dialects]] of the [[Northern Italy|North]].@@@@1@34@@danf@17-8-2009 10420070@unknown@formal@none@1@S@Unlike most other Romance languages, Italian has retained the contrast between short and [[consonant length|long consonants]] which existed in Latin.@@@@1@20@@danf@17-8-2009 10420080@unknown@formal@none@1@S@As in most [[Romance languages]], [[stress (linguistics)|stress]] is distinctive.@@@@1@9@@danf@17-8-2009 10420090@unknown@formal@none@1@S@Of the Romance languages, Italian is considered to be one of the closest resembling [[Latin]] in terms of [[vocabulary]].@@@@1@19@@danf@17-8-2009 10420100@unknown@formal@none@1@S@According to Ethnologue, lexical similarity is 89% with [[French language|French]], 87% with [[Catalan language|Catalan]], 85% with [[Sardinian language|Sardinian]], 82% with [[Spanish language|Spanish]], 78% with Rheto-Romance, and 77% with Romanian.@@@@1@29@@danf@17-8-2009 10420110@unknown@formal@none@1@S@It is affectionately called ''il parlar gentile'' (the gentle language) by its speakers.@@@@1@13@@danf@17-8-2009 10420120@unknown@formal@none@1@S@==Writing system==@@@@1@2@@danf@17-8-2009 10420130@unknown@formal@none@1@S@Italian is written using the [[Latin alphabet]].@@@@1@7@@danf@17-8-2009 10420140@unknown@formal@none@1@S@The letters ''J'', ''K'', ''W'', ''X'' and ''Y'' are not considered part of the standard [[Italian alphabet]], but appear in loanwords (such as ''jeans'', ''whisky'', ''taxi'').@@@@1@26@@danf@17-8-2009 10420150@unknown@formal@none@1@S@''X'' has become a commonly used letter in genuine Italian words with the prefix ''extra-''.@@@@1@15@@danf@17-8-2009 10420160@unknown@formal@none@1@S@''J'' in Italian is an old-fashioned orthographic variant of ''I'', appearing in the first name "Jacopo" as well as in some Italian place names, e.g., the towns of [[Bajardo]], [[Bojano]], [[Joppolo]], [[Jesolo]], [[Jesi]], among numerous others, and in the alternate spelling ''Mar Jonio'' (also spelled ''Mar Ionio'') for the [[Ionian Sea]].@@@@1@51@@danf@17-8-2009 10420170@unknown@formal@none@1@S@''J'' may also appear in many words from different dialects, but its use is discouraged in contemporary Italian, and it is not part of the standard 21-letter contemporary Italian alphabet.@@@@1@30@@danf@17-8-2009 10420180@unknown@formal@none@1@S@Each of these foreign letters had an Italian equivalent spelling: ''gi'' for ''j'', ''c'' or ''ch'' for ''k'', ''u'' or ''v'' for ''w'' (depending on what sound it makes), ''s'', ''ss'', or ''cs'' for ''x'', and ''i'' for ''y''.@@@@1@39@@danf@17-8-2009 10420190@unknown@formal@none@1@S@* Italian uses the [[acute accent]] over the letter ''E'' (as in ''perché'', why/because) to indicate a front mid-close vowel, and the [[grave accent]] (as in ''tè'', tea) to indicate a front mid-open vowel.@@@@1@34@@danf@17-8-2009 10420200@unknown@formal@none@1@S@The [[grave accent]] is also used on letters ''A'', ''I'', ''O'', and ''U'' to mark [[stress (linguistics)|stress]] when it falls on the final vowel of a word (for instance ''gioventù'', youth).@@@@1@31@@danf@17-8-2009 10420210@unknown@formal@none@1@S@Typically, the penultimate syllable is stressed.@@@@1@6@@danf@17-8-2009 10420220@unknown@formal@none@1@S@If syllables other than the last one are stressed, the accent is not mandatory, unlike in [[Spanish language|Spanish]], and, in virtually all cases, it is omitted.@@@@1@26@@danf@17-8-2009 10420230@unknown@formal@none@1@S@In some cases, when the word is ambiguous (as ''principi''), the accent mark is sometimes used in order to disambiguate its meaning (in this case, ''prìncipi'', princes, or ''princìpi'', principles).@@@@1@30@@danf@17-8-2009 10420240@unknown@formal@none@1@S@This is, however, not compulsory.@@@@1@5@@danf@17-8-2009 10420250@unknown@formal@none@1@S@Rare words with three or more syllables can confuse Italians themselves, and the pronunciation of [[Istanbul]] is a common example of a word in which placement of stress is not clearly established.@@@@1@32@@danf@17-8-2009 10420260@unknown@formal@none@1@S@Turkish, like French, tends to put the accent on ultimate syllable, but Italian doesn't.@@@@1@14@@danf@17-8-2009 10420270@unknown@formal@none@1@S@So we can hear "Istànbul" or "Ìstanbul".@@@@1@7@@danf@17-8-2009 10420280@unknown@formal@none@1@S@Another instance is the American State of [[Florida]]: the correct way to pronounce it in Italian is like in Spanish, "Florìda", but since there is an Italian word meaning the same ("flourishing"), "flòrida", and because of the influence of English, most Italians pronounce it that way.@@@@1@46@@danf@17-8-2009 10420290@unknown@formal@none@1@S@Dictionaries give the latter as an alternative pronunciation.@@@@1@8@@danf@17-8-2009 10420300@unknown@formal@none@1@S@* The letter ''H'' at the beginning of a word is used to distinguish ''ho'', ''hai'', ''ha'', ''hanno'' (present indicative of ''avere'', 'to have') from ''o'' ('or'), ''ai'' ('to the'), ''a'' ('to'), ''anno'' ('year').@@@@1@34@@danf@17-8-2009 10420310@unknown@formal@none@1@S@In the spoken language this letter is always silent for the cases given above.@@@@1@14@@danf@17-8-2009 10420320@unknown@formal@none@1@S@''H'' is also used in combinations with other letters (see below), but no [[phoneme]] {{IPA|[h]}} exists in Italian.@@@@1@18@@danf@17-8-2009 10420330@unknown@formal@none@1@S@In foreign words entered in common use, like "hotel" or "hovercraft", the H is commonly silent, so they are pronounced as {{IPA|/oˈtɛl/}} and {{IPA|/ˈɔverkraft/}}@@@@1@24@@danf@17-8-2009 10420340@unknown@formal@none@1@S@* The letter ''Z'' represents {{IPA|/ʣ/}}, for example: ''Zanzara'' {{IPA|/dzan'dzaɾa/}} (mosquito), or {{IPA|/ʦ/}}, for example: ''Nazione'' {{IPA|/naˈttsjone/}} (nation), depending on context, though there are few [[minimal pair]]s.@@@@1@27@@danf@17-8-2009 10420350@unknown@formal@none@1@S@The same goes for ''S'', which can represent {{IPA|/s/}} or {{IPA|/z/}}.@@@@1@11@@danf@17-8-2009 10420360@unknown@formal@none@1@S@However, these two phonemes are in [[complementary distribution]] everywhere except between two vowels in the same word, and even in such environment there are extremely few minimal pairs, so that this distinction is being lost in many varieties.@@@@1@38@@danf@17-8-2009 10420370@unknown@formal@none@1@S@* The letters ''C'' and ''G'' represent [[affricate]]s: [[Voiceless postalveolar affricate|{{IPA|/ʧ/}}]] as in "chair" and [[Voiced postalveolar affricate|{{IPA|/ʤ/}}]] as in "gem", respectively, before the [[front vowel]]s ''I'' and ''E''.@@@@1@29@@danf@17-8-2009 10420380@unknown@formal@none@1@S@They are pronounced as [[plosive]]s {{IPA|/k/}}, {{IPA|/g/}} (as in "call" and "gall") otherwise.@@@@1@13@@danf@17-8-2009 10420390@unknown@formal@none@1@S@Front/back vowel rules for ''C'' and ''G'' are similar in [[French language|French]], [[Romanian language|Romanian]], [[Spanish language|Spanish]], and to some extent [[English language|English]] (including [[Old English]]).@@@@1@25@@danf@17-8-2009 10420400@unknown@formal@none@1@S@[[swedish language|Swedish]] and [[Norwegian language|Norwegian]] have similar rules for ''K'' and ''G''.@@@@1@12@@danf@17-8-2009 10420410@unknown@formal@none@1@S@(See also [[palatalization]].)@@@@1@3@@danf@17-8-2009 10420420@unknown@formal@none@1@S@* However, an ''H'' can be added between ''C'' or ''G'' and ''E'' or ''I'' to represent a plosive, and an ''I'' can be added between ''C'' or ''G'' and ''A'', ''O'' or ''U'' to signal that the consonant is an affricate.@@@@1@42@@danf@17-8-2009 10420430@unknown@formal@none@1@S@For example:@@@@1@2@@danf@17-8-2009 10420440@unknown@formal@none@1@S@:Note that the ''H'' is [[silent letter|silent]] in the digraphs ''[[ch (digraph)|CH]]'' and ''[[gh (digraph)|GH]]'', as also the ''I'' in ''cia'', ''cio'', ''ciu'' and even ''cie'' is not pronounced as a separate vowel, unless it carries the primary stress.@@@@1@39@@danf@17-8-2009 10420450@unknown@formal@none@1@S@For example, it is silent in ''[[ciao]]'' {{IPA|/ˈʧa.o/}} and cielo {{IPA|/ˈʧɛ.lo/}}, but it is pronounced in ''farmacia'' {{IPA|/ˌfaɾ.ma.ˈʧi.a/}} and ''farmacie'' {{IPA|/ˌfaɾ.ma.ˈʧi.e/}}.@@@@1@21@@danf@17-8-2009 10420460@unknown@formal@none@1@S@* There are three other special [[digraph (orthography)|digraphs]] in Italian: ''[[gn (digraph)|GN]]'', ''GL'' and ''SC''.@@@@1@15@@danf@17-8-2009 10420470@unknown@formal@none@1@S@''GN'' represents [[Palatal nasal|{{IPA|/ɲ/}}]].@@@@1@4@@danf@17-8-2009 10420480@unknown@formal@none@1@S@''GL'' represents [[Palatal lateral approximant|{{IPA|/ʎ/}}]] only before ''i'', and never at the beginning of a word, except in the [[personal pronoun]] and [[definite article]] ''gli''.@@@@1@25@@danf@17-8-2009 10420490@unknown@formal@none@1@S@(Compare with [[Spanish language|Spanish]] ''ñ'' and ''ll'', [[Portuguese language|Portuguese]] ''nh'' and ''lh''.)@@@@1@12@@danf@17-8-2009 10420500@unknown@formal@none@1@S@''SC'' represents fricative [[Voiceless postalveolar fricative|{{IPA|/ʃ/}}]] before ''i'' or ''e''.@@@@1@10@@danf@17-8-2009 10420510@unknown@formal@none@1@S@Except in the speech of some Northern Italians, all of these are normally [[geminate]] between vowels.@@@@1@16@@danf@17-8-2009 10420520@unknown@formal@none@1@S@* In general, all letters or digraphs represent phonemes rather clearly, and, in standard varieties of Italian, there is little allophonic variation.@@@@1@22@@danf@17-8-2009 10420530@unknown@formal@none@1@S@The most notable exceptions are assimilation of /n/ in point of articulation before consonants, assimilatory voicing of /s/ to following voiced consonants, and vowel length (vowels are long in stressed open syllables, and short elsewhere) — compare with the enormous number of [[allophone]]s of the English phoneme /t/.@@@@1@48@@danf@17-8-2009 10420540@unknown@formal@none@1@S@Spelling is clearly phonemic and difficult to mistake given a clear pronunciation.@@@@1@12@@danf@17-8-2009 10420550@unknown@formal@none@1@S@Exceptions are generally only found in foreign borrowings.@@@@1@8@@danf@17-8-2009 10420560@unknown@formal@none@1@S@There are fewer cases of [[dyslexia]] than among speakers of languages such as English , and the concept of a spelling bee is strange to Italians.@@@@1@26@@danf@17-8-2009 10420570@unknown@formal@none@1@S@==History==@@@@1@1@@danf@17-8-2009 10420580@unknown@formal@none@1@S@The history of the Italian language is long, but the modern standard of the language was largely shaped by relatively recent events.@@@@1@22@@danf@17-8-2009 10420590@unknown@formal@none@1@S@The earliest surviving texts which can definitely be called Italian (or more accurately, vernacular, as opposed to its predecessor [[Vulgar Latin]]) are legal formulae from the region of [[province of Benevento|Benevento]] dating from 960-963.@@@@1@34@@danf@17-8-2009 10420600@unknown@formal@none@1@S@What would come to be thought of as Italian was first formalized in the first years of the 14th century through the works of [[Dante Alighieri]], who mixed southern Italian languages, especially [[Sicilian language|Sicilian]], with his native Tuscan in his epic poems known collectively as the ''[[Divine Comedy|Commedia]],'' to which [[Giovanni Boccaccio]] later affixed the title ''Divina''.@@@@1@57@@danf@17-8-2009 10420610@unknown@formal@none@1@S@Dante's much-loved works were read throughout Italy and his written dialect became the "canonical standard" that all educated Italians could understand.@@@@1@21@@danf@17-8-2009 10420620@unknown@formal@none@1@S@Dante is still credited with standardizing the Italian language and, thus, the dialect of [[Tuscany]] became the basis for what would become the official language of Italy.@@@@1@27@@danf@17-8-2009 10420630@unknown@formal@none@1@S@Italy has always had a distinctive dialect for each city since the cities were until recently thought of as [[city-state]]s.@@@@1@20@@danf@17-8-2009 10420640@unknown@formal@none@1@S@The latter now has considerable [[variety (linguistics)|variety]], however.@@@@1@8@@danf@17-8-2009 10420650@unknown@formal@none@1@S@As Tuscan-derived Italian came to be used throughout the nation, features of local speech were naturally adopted, producing various versions of Regional Italian.@@@@1@23@@danf@17-8-2009 10420660@unknown@formal@none@1@S@The most characteristic differences, for instance, between [[Romanesco|Roman Italian]] and [[Milanese|Milanese Italian]] are the [[consonant length|gemination]] of initial consonants and the pronunciation of stressed "e", and of "s" in some cases (e.g. ''va bene'' "all right": is pronounced {{IPA|[va ˈbːɛne]}} by a Roman, {{IPA|[va ˈbene]}} by a Milanese; ''a casa'' "at home": Roman {{IPA|[a ˈkːasa]}}, Milanese {{IPA|[a ˈkaza]}}).@@@@1@58@@danf@17-8-2009 10420670@unknown@formal@none@1@S@In contrast to the [[Northern Italian language|dialects of northern Italy]], [[southern Italian]] dialects were largely untouched by the Franco-[[Occitan language|Occitan]] influences introduced to Italy, mainly by [[bard]]s from [[France]], during the [[Middle Ages]].@@@@1@33@@danf@17-8-2009 10420680@unknown@formal@none@1@S@Even in the case of Northern Italian dialects, however, scholars are careful not to overstate the effects of outsiders on the natural indigenous developments of the languages.@@@@1@27@@danf@17-8-2009 10420690@unknown@formal@none@1@S@(See [[La Spezia-Rimini Line]].)@@@@1@4@@danf@17-8-2009 10420700@unknown@formal@none@1@S@The economic might and relative advanced development of [[Tuscany]] at the time ([[Late Middle Ages]]), gave its dialect weight, though Venetian remained widespread in medieval Italian commercial life.@@@@1@28@@danf@17-8-2009 10420710@unknown@formal@none@1@S@Also, the increasing cultural relevance of [[Florence, Italy|Florence]] during the periods of '[[Humanism|Umanesimo (Humanism)]]' and the [[Renaissance|Rinascimento (Renaissance)]] made its ''volgare'' (dialect), or rather a refined version of it, a standard in the arts.@@@@1@34@@danf@17-8-2009 10420720@unknown@formal@none@1@S@The re-discovery of Dante's ''[[De vulgari eloquentia]]'' and a renewed interest in linguistics in the 16th century sparked a debate which raged throughout Italy concerning which criteria should be chosen to establish a modern Italian standard to be used as much as a literary as a spoken language.@@@@1@48@@danf@17-8-2009 10420730@unknown@formal@none@1@S@Scholars were divided into three factions: the [[purism|purists]], headed by [[Pietro Bembo]] who in his ''[[Gli Asolani]]'' claimed that the language might only be based on the great literary classics (notably, [[Petrarch]], and Boccaccio but not Dante as Bembo believed that the Divine Comedy was not dignified enough as it used elements from other dialects), [[Niccolò Machiavelli]] and other [[Florence|Florentine]]s who preferred the version spoken by ordinary people in their own times, and the [[Courtesan]]s like [[Baldassarre Castiglione]] and [[Gian Giorgio Trissino]] who insisted that each local vernacular must contribute to the new standard.@@@@1@94@@danf@17-8-2009 10420740@unknown@formal@none@1@S@Eventually Bembo's ideas prevailed, the result being the publication of the first Italian dictionary in 1612 and the foundation of the [[Accademia della Crusca]] in Florence (1582-3), the official legislative body of the Italian language.@@@@1@35@@danf@17-8-2009 10420750@unknown@formal@none@1@S@Italian literature's first modern novel, [[The Betrothed|''I Promessi Sposi'']] (The Betrothed), by [[Alessandro Manzoni]] further defined the standard by "rinsing" his Milanese 'in the waters of the [[Arno River|Arno]]" ([[Florence]]'s river), as he states in the Preface to his 1840 edition.@@@@1@41@@danf@17-8-2009 10420760@unknown@formal@none@1@S@After unification a huge number of civil servants and soldiers recruited from all over the country introduced many more words and idioms from their home dialects ("[[ciao]]" is [[Venetian language|Venetian]], "[[panettone]]" is [[Milanese]] etc.).@@@@1@34@@danf@17-8-2009 10420770@unknown@formal@none@1@S@==Classification==@@@@1@1@@danf@17-8-2009 10420780@unknown@formal@none@1@S@Italian is most closely related to the other two Italo-Dalmatian languages, [[Sicilian language|Sicilian]] and the extinct [[Dalmatian language|Dalmatian]].@@@@1@18@@danf@17-8-2009 10420790@unknown@formal@none@1@S@The three are part of the [[Italo-Western languages|Italo-Western]] grouping of the [[Romance languages]], which are a subgroup of the [[Italic languages|Italic]] branch of [[Indo-European language family|Indo-European]].@@@@1@26@@danf@17-8-2009 10420800@unknown@formal@none@1@S@==Geographic distribution==@@@@1@2@@danf@17-8-2009 10420810@unknown@formal@none@1@S@The total speakers of Italian as maternal language are between 60 and 70 million.@@@@1@14@@danf@17-8-2009 10420820@unknown@formal@none@1@S@The speakers who use Italian as second or cultural language are estimated around 110-120 million .@@@@1@16@@danf@17-8-2009 10420830@unknown@formal@none@1@S@Italian is the official language of [[Italy]] and [[San Marino]], and one of the official languages of [[Switzerland]], spoken mainly in [[Canton Ticino|Ticino]] and [[Graubünden|Grigioni]] cantons, a region referred to as [[Italian Switzerland]].@@@@1@33@@danf@17-8-2009 10420840@unknown@formal@none@1@S@It is also the second official language in some areas of [[Istria]], in [[Slovenia]] and [[Croatia]], where an Italian minority exists.@@@@1@21@@danf@17-8-2009 10420850@unknown@formal@none@1@S@It is the primary language of the [[Vatican City]] and is widely used and taught in [[Monaco]] and [[Malta]].@@@@1@19@@danf@17-8-2009 10420860@unknown@formal@none@1@S@It is also widely understood in France with over one million speakers (especially in [[Corsica]] and the [[County of Nice]], areas that historically spoke [[Italian dialects]] before annexation to [[France]]), and in [[Albania]].@@@@1@33@@danf@17-8-2009 10420870@unknown@formal@none@1@S@Italian is also spoken by some in former Italian colonies in [[Africa]] ([[Libya]], [[Somalia]] and [[Eritrea]]).@@@@1@16@@danf@17-8-2009 10420880@unknown@formal@none@1@S@However, its use has sharply dropped off since the colonial period.@@@@1@11@@danf@17-8-2009 10420890@unknown@formal@none@1@S@In [[Eritrea]] [[Italian Language|Italian]] is widely understood .@@@@1@8@@danf@17-8-2009 10420900@unknown@formal@none@1@S@In fact, for fifty years, during the colonial period, Italian was the language of instruction, but [[as of 1997]], there is only one Italian language school remaining, with 470 pupils.@@@@1@30@@danf@17-8-2009 10420910@unknown@formal@none@1@S@In [[Somalia]] Italian used to be a major language but due to the civil war and lack of education only the older generation still uses it.@@@@1@26@@danf@17-8-2009 10420920@unknown@formal@none@1@S@Italian and [[Italian dialects]] are widely used by Italian immigrants and many of their descendants (see ''[[Italians]]'') living throughout [[Western Europe]] (especially [[France]], [[Germany]], [[Belgium]], [[Switzerland]], the [[Britalian|United Kingdom]] and [[Luxembourg]]), the [[Italian Americans|United States]], [[Italian Canadians|Canada]], [[Italian Australians|Australia]], and [[Latin America]] (especially [[Uruguay]], [[Italian Brazilians|Brazil]], [[Argentina]], and [[Venezuela]]).@@@@1@49@@danf@17-8-2009 10420930@unknown@formal@none@1@S@In the United States, Italian speakers are most commonly found in four cities: [[Boston]] (7,000), [[Chicago]] (12,000), [[New York City]] (140,000), and [[Philadelphia]] (15,000).@@@@1@24@@danf@17-8-2009 10420940@unknown@formal@none@1@S@In Canada there are large Italian-speaking communities in [[Montreal]] (120,000) and [[Toronto]] (195,000).@@@@1@13@@danf@17-8-2009 10420950@unknown@formal@none@1@S@Italian is the second most commonly-spoken language in Australia, where 353,605 [[Italian Australian]]s, or 1.9% of the population, reported speaking Italian at home in the 2001 [[Census in Australia|Census]].@@@@1@29@@danf@17-8-2009 10420960@unknown@formal@none@1@S@In 2001 there were 130,000 Italian speakers in [[Melbourne]], and 90,000 in [[Sydney]].@@@@1@13@@danf@17-8-2009 10420970@unknown@formal@none@1@S@===Italian language education===@@@@1@3@@danf@17-8-2009 10420980@unknown@formal@none@1@S@Italian is widely taught in many schools around the world, but rarely as the first non-native language of pupils; in fact, Italian generally is the fourth or fifth most taught second-language in the world.@@@@1@34@@danf@17-8-2009 10420990@unknown@formal@none@1@S@In [[anglophone]] parts of [[Canada]], Italian is, after [[French language|French]], the third most taught language.@@@@1@15@@danf@17-8-2009 10421000@unknown@formal@none@1@S@In [[francophone]] Canada it is third after [[English language|English]].@@@@1@9@@danf@17-8-2009 10421010@unknown@formal@none@1@S@In the [[United States]] and the [[United Kingdom]], Italian ranks fourth (after [[Spanish language|Spanish]]-French-[[German language|German]] and French-German-Spanish respectively).@@@@1@18@@danf@17-8-2009 10421020@unknown@formal@none@1@S@Throughout the world, Italian is the fifth most taught non-native language, after [[English language|English]], French, Spanish, and German.@@@@1@18@@danf@17-8-2009 10421030@unknown@formal@none@1@S@In the [[European Union]], Italian is spoken as a mother tongue by 13% of the population (64 million, mainly in Italy itself) and as a second language by 3% (14 million); among EU member states, it is most likely to be desired (and therefore learned) as a second language in [[Malta]] (61%), [[Croatia]] (14%), [[Slovenia]] (12%), [[Austria]] (11%), [[Romania]] (8%), [[France]] (6%), and [[Greece]] (6%).@@@@1@65@@danf@17-8-2009 10421040@unknown@formal@none@1@S@It is also an important second language in [[Albania]] and [[Switzerland]], which are not EU members or candidates.@@@@1@18@@danf@17-8-2009 10421050@unknown@formal@none@1@S@===Influence and derived languages===@@@@1@4@@danf@17-8-2009 10421060@unknown@formal@none@1@S@From the late 19th to the mid 20th century, thousands of Italians settled in Argentina, Uruguay and southern Brazil, where they formed a very strong physical and cultural presence (see the [[Italian diaspora]]).@@@@1@33@@danf@17-8-2009 10421070@unknown@formal@none@1@S@In some cases, colonies were established where variants of [[Italian dialects]] were used, and some continue to use a derived dialect.@@@@1@21@@danf@17-8-2009 10421080@unknown@formal@none@1@S@An example is [[Rio Grande do Sul]], [[Brazil]], where [[Talian]] is used and in the town of [[Chipilo]] near Puebla, [[Mexico]] each continuing to use a derived form of [[Venetian language|Venetian]] dating back to the 19th century.@@@@1@37@@danf@17-8-2009 10421090@unknown@formal@none@1@S@Another example is [[Cocoliche]], an Italian-Spanish [[pidgin]] once spoken in [[Argentina]] and especially in [[Buenos Aires]], and [[Lunfardo]].@@@@1@18@@danf@17-8-2009 10421100@unknown@formal@none@1@S@[[Rioplatense Spanish]], and particularly the speech of the city of Buenos Aires, has intonation patterns that resemble those of Italian dialects, due to the fact that Argentina had a constant, large influx of Italian settlers since the second half of the nineteenth century; initially primarily from Northern Italy then, since the beginning of the twentieth century, mostly from Southern Italy.@@@@1@60@@danf@17-8-2009 10421110@unknown@formal@none@1@S@===Lingua Franca===@@@@1@2@@danf@17-8-2009 10421120@unknown@formal@none@1@S@Starting in late [[medieval]] times, Italian language variants replaced Latin to become the primary commercial language for much of Europe and Mediterranean Sea (especially the Tuscan and Venetian variants).@@@@1@29@@danf@17-8-2009 10421130@unknown@formal@none@1@S@This became solidified during the [[Renaissance]] with the strength of Italian banking and the rise of [[Renaissance humanism|humanism]] in the arts.@@@@1@21@@danf@17-8-2009 10421140@unknown@formal@none@1@S@During the period of the Renaissance, Italy held artistic sway over the rest of Europe.@@@@1@15@@danf@17-8-2009 10421150@unknown@formal@none@1@S@All educated European gentlemen were expected to make the [[Grand Tour]], visiting Italy to see its great historical monuments and works of art.@@@@1@23@@danf@17-8-2009 10421160@unknown@formal@none@1@S@It thus became expected that educated Europeans would learn at least some Italian; the English poet [[John Milton]], for instance, wrote some of his early poetry in Italian.@@@@1@28@@danf@17-8-2009 10421170@unknown@formal@none@1@S@In England, Italian became the second most common modern language to be learned, after [[French language|French]] (though the classical languages, [[Latin]] and [[Greek language|Greek]], came first).@@@@1@26@@danf@17-8-2009 10421180@unknown@formal@none@1@S@However, by the late eighteenth century, Italian tended to be replaced by [[German language|German]] as the second modern language on the curriculum.@@@@1@22@@danf@17-8-2009 10421190@unknown@formal@none@1@S@Yet Italian [[loanword]]s continue to be used in most other [[European languages]] in matters of art and music.@@@@1@18@@danf@17-8-2009 10421200@unknown@formal@none@1@S@Today, the Italian language continues to be used as a [[lingua franca]] in some environments.@@@@1@15@@danf@17-8-2009 10421210@unknown@formal@none@1@S@Within the [[Catholic church]] Italian is known by a large part of the ecclesiastic hierarchy, and is used in substitution of [[Latin]] in some official documents.@@@@1@26@@danf@17-8-2009 10421220@unknown@formal@none@1@S@The presence of Italian as the primary language in the [[Vatican City]] indicates not only use within the [[Holy See]], but also throughout the world where an episcopal seat is present.@@@@1@31@@danf@17-8-2009 10421230@unknown@formal@none@1@S@It continues to be used in [[music]] and [[opera]].@@@@1@9@@danf@17-8-2009 10421240@unknown@formal@none@1@S@Other examples where Italian is sometimes used as a means communication is in some sports (sometimes in [[Football (association)|football]] and [[motorsports]]) and in the [[design]] and [[fashion]] industries.@@@@1@28@@danf@17-8-2009 10421250@unknown@formal@none@1@S@==Dialects==@@@@1@1@@danf@17-8-2009 10421260@unknown@formal@none@1@S@In Italy, all [[Romance languages]] spoken as the vernacular, other than standard Italian and other unrelated, non-Italian languages, are termed "Italian dialects".@@@@1@22@@danf@17-8-2009 10421270@unknown@formal@none@1@S@Many Italian dialects are, in fact, historical languages in their own right.@@@@1@12@@danf@17-8-2009 10421280@unknown@formal@none@1@S@These include recognized language groups such as [[Friulian language|Friulian]], [[Neapolitan language|Neapolitan]], [[Sardinian language|Sardinian]], [[Sicilian language|Sicilian]], [[Venetian language|Venetian]], and others, and regional variants of these languages such as [[Calabrian languages|Calabrian]].@@@@1@29@@danf@17-8-2009 10421290@unknown@formal@none@1@S@The division between dialect and language has been used by scholars (such as by [[Francesco Bruni]]) to distinguish between the languages that made up the Italian [[koine]], and those which had very little or no part in it, such as [[Albanian language|Albanian]], [[Greek language|Greek]], [[German language|German]], [[Ladin language|Ladin]], and [[Occitan language|Occitan]], which are still spoken by minorities.@@@@1@57@@danf@17-8-2009 10421300@unknown@formal@none@1@S@Dialects are generally not used for general mass communication and are usually limited to native speakers in informal contexts.@@@@1@19@@danf@17-8-2009 10421310@unknown@formal@none@1@S@In the past, speaking in dialect was often deprecated as a sign of poor education.@@@@1@15@@danf@17-8-2009 10421320@unknown@formal@none@1@S@Younger generations, especially those under 35 (though it may vary in different areas), speak almost exclusively standard Italian in all situations, usually with local accents and idioms.@@@@1@27@@danf@17-8-2009 10421330@unknown@formal@none@1@S@Regional differences can be recognized by various factors: the openness of vowels, the length of the consonants, and influence of the local dialect (for example, ''annà'' replaces ''andare'' in the area of Rome for the infinitive "to go").@@@@1@38@@danf@17-8-2009 10421340@unknown@formal@none@1@S@==Sounds==@@@@1@1@@danf@17-8-2009 10421350@unknown@formal@none@1@S@{{IPA notice|lang=it}}@@@@1@2@@danf@17-8-2009 10421360@unknown@formal@none@1@S@===Vowels===@@@@1@1@@danf@17-8-2009 10421370@unknown@formal@none@1@S@Italian has seven [[vowel]] phonemes: {{IPA|/a/}}, {{IPA|/e/}}, {{IPA|/ɛ/}}, {{IPA|/i/}}, {{IPA|/o/}}, {{IPA|/ɔ/}}, {{IPA|/u/}}.@@@@1@12@@danf@17-8-2009 10421380@unknown@formal@none@1@S@The pairs {{IPA|/e/}}-{{IPA|/ɛ/}} and {{IPA|/o/}}-{{IPA|/ɔ/}} are seldom distinguished in writing and often confused, even though most varieties of Italian employ both phonemes consistently.@@@@1@23@@danf@17-8-2009 10421390@unknown@formal@none@1@S@Compare, for example: "perché" {{IPA|[perˈkɛ]}} (why, because) and "senti" {{IPA|[ˈsenti]}} (you listen, you are listening, listen!), employed by some northern speakers, with {{IPA|[perˈke]}} and {{IPA|[ˈsɛnti]}}, as pronounced by most central and southern speakers.@@@@1@33@@danf@17-8-2009 10421400@unknown@formal@none@1@S@As a result, the usage is strongly indicative of a person's origin.@@@@1@12@@danf@17-8-2009 10421410@unknown@formal@none@1@S@The standard (Tuscan) usage of these vowels is listed in vocabularies, and employed outside Tuscany mainly by specialists, especially actors and very few (television) journalists.@@@@1@25@@danf@17-8-2009 10421420@unknown@formal@none@1@S@These are truly different [[phonemes]], however: compare {{IPA|/ˈpeska/}} (fishing) and {{IPA|/ˈpɛska/}} (peach), both spelled ''pesca'' .@@@@1@16@@danf@17-8-2009 10421430@unknown@formal@none@1@S@Similarly {{IPA|/ˈbotte/}} ('barrel') and {{IPA|/ˈbɔtte/}} ('beatings'), both spelled ''botte'', discriminate {{IPA|/o/}} and {{IPA|/ɔ/}} .@@@@1@14@@danf@17-8-2009 10421440@unknown@formal@none@1@S@In general, vowel combinations usually pronounce each vowel separately.@@@@1@9@@danf@17-8-2009 10421450@unknown@formal@none@1@S@[[Diphthong]]s exist (e.g. ''uo'', ''iu'', ''ie'', ''ai''), but are limited to an unstressed ''u'' or ''i'' before or after a stressed vowel.@@@@1@22@@danf@17-8-2009 10421460@unknown@formal@none@1@S@The unstressed ''u'' in a diphthong approximates the English semivowel ''w'', the unstressed ''i'' approximates the semivowel ''y''.@@@@1@18@@danf@17-8-2009 10421470@unknown@formal@none@1@S@E.g.: ''buono'' {{IPA|[ˈbwɔno]}}, ''ieri'' {{IPA|[ˈjɛri]}}.@@@@1@5@@danf@17-8-2009 10421480@unknown@formal@none@1@S@[[Triphthong]]s exist in Italian as well, like "contin''uia''mo" ("we continue").@@@@1@10@@danf@17-8-2009 10421490@unknown@formal@none@1@S@Three vowel combinations exist only in the form semiconsonant ({{IPA|/j/}} or {{IPA|/w/}}), followed by a vowel, followed by a desinence vowel (usually {{IPA|/i/}}), as in ''miei'', ''suoi'', or two semiconsonants followed by a vowel, as the group ''-uia-'' exemplified above, or ''-iuo-'' in the word ''aiuola''.@@@@1@46@@danf@17-8-2009 10421500@unknown@formal@none@1@S@===Mobile diphthongs===@@@@1@2@@danf@17-8-2009 10421510@unknown@formal@none@1@S@Many Latin words with a short ''e'' or ''o'' have Italian counterparts with a mobile diphthong (''ie'' and ''uo'' respectively).@@@@1@20@@danf@17-8-2009 10421520@unknown@formal@none@1@S@When the vowel sound is stressed, it is pronounced and written as a diphthong; when not stressed, it is pronounced and written as a single vowel.@@@@1@26@@danf@17-8-2009 10421530@unknown@formal@none@1@S@So Latin ''focus'' gave rise to Italian ''fuoco'' (meaning both "fire" and "optical focus"): when unstressed, as in ''focale'' ("focal") the "o" remains alone.@@@@1@24@@danf@17-8-2009 10421540@unknown@formal@none@1@S@Latin ''pes'' (more precisely its accusative form ''pedem'') is the source of Italian ''piede'' (foot): but unstressed "e" was left unchanged in ''pedone'' (pedestrian) and ''pedale'' (pedal).@@@@1@27@@danf@17-8-2009 10421550@unknown@formal@none@1@S@From Latin ''iocus'' comes Italian ''giuoco'' ("play", "game"), though in this case ''gioco'' is more common: ''giocare'' means "to play (a game)".@@@@1@22@@danf@17-8-2009 10421560@unknown@formal@none@1@S@From Latin ''homo'' comes Italian ''uomo'' (man), but also ''umano'' (human) and ''ominide'' (hominid).@@@@1@14@@danf@17-8-2009 10421570@unknown@formal@none@1@S@From Latin ''ovum'' comes Italian ''uovo'' (egg) and ''ovaie'' (ovaries).@@@@1@10@@danf@17-8-2009 10421580@unknown@formal@none@1@S@(The same phenomenon occurs in [[Spanish language|Spanish]]: ''juego'' (play, game) and ''jugar'' (to play), ''nieve'' (snow) and ''nevar'' (to snow)).@@@@1@20@@danf@17-8-2009 10421590@unknown@formal@none@1@S@===Consonants===@@@@1@1@@danf@17-8-2009 10421600@unknown@formal@none@1@S@Two symbols in a table cell denote the voiceless and voiced consonant, respectively.@@@@1@13@@danf@17-8-2009 10421610@unknown@formal@none@1@S@Nasals undergo assimilation when followed by a consonant, e.g., when preceding a velar ({{IPA|/k/}} or {{IPA|/g/}}) only {{IPA|[ŋ]}} appears, etc.@@@@1@20@@danf@17-8-2009 10421620@unknown@formal@none@1@S@Italian has geminate, or double, consonants, which are distinguished by [[Consonant length|length]].@@@@1@12@@danf@17-8-2009 10421630@unknown@formal@none@1@S@Length is distinctive for all consonants except for {{IPA|/ʃ/}}, {{IPA|/ʦ/}}, {{IPA|/ʣ/}}, {{IPA|/ʎ/}} {{IPA|/ɲ/}}, which are always geminate, and {{IPA|/z/}} which is always single.@@@@1@23@@danf@17-8-2009 10421640@unknown@formal@none@1@S@Geminate plosives and affricates are realised as lengthened closures.@@@@1@9@@danf@17-8-2009 10421650@unknown@formal@none@1@S@Geminate fricatives, nasals, and {{IPA|/l/}} are realized as lengthened [[continuant]]s.@@@@1@10@@danf@17-8-2009 10421660@unknown@formal@none@1@S@The flap consonant {{IPA|/ɾː/}} is typically dialectal, and it is called ''erre moscia''.@@@@1@13@@danf@17-8-2009 10421670@unknown@formal@none@1@S@The correct standard pronunciation is {{IPA|[r]}}.@@@@1@6@@danf@17-8-2009 10421680@unknown@formal@none@1@S@Of special interest to the linguistic study of Italian is the ''[[Tuscan gorgia|Gorgia Toscana]]'', or "Tuscan Throat", the weakening or [[lenition]] of certain [[:wiktionary:intervocalic|intervocalic]] consonants in [[Tuscan dialect]]s.@@@@1@28@@danf@17-8-2009 10421690@unknown@formal@none@1@S@See also [[Syntactic doubling]].@@@@1@4@@danf@17-8-2009 10421700@unknown@formal@none@1@S@===Assimilation===@@@@1@1@@danf@17-8-2009 10421710@unknown@formal@none@1@S@Italian has few diphthongs, so most unfamiliar diphthongs that are heard in foreign words (in particular, those beginning with vowel "a", "e", or "o") will be assimilated as the corresponding [[diaeresis]] (i.e., the vowel sounds will be pronounced separately).@@@@1@39@@danf@17-8-2009 10421720@unknown@formal@none@1@S@Italian [[phonotactics]] do not usually permit polysyllabic nouns and verbs to end with consonants, excepting poetry and song, so foreign words may receive extra terminal vowel sounds.@@@@1@27@@danf@17-8-2009 10421730@unknown@formal@none@1@S@==Grammar==@@@@1@1@@danf@17-8-2009 10421740@unknown@formal@none@1@S@===Common variations in the writing systems===@@@@1@6@@danf@17-8-2009 10421750@unknown@formal@none@1@S@Some variations in the usage of the writing system may be present in practical use.@@@@1@15@@danf@17-8-2009 10421760@unknown@formal@none@1@S@These are scorned by educated people, but they are so common in certain contexts that knowledge of them may be useful.@@@@1@21@@danf@17-8-2009 10421770@unknown@formal@none@1@S@* Usage of ''x'' instead of ''per'': this is very common among teenagers and in [[Text messaging|SMS]] abbreviations.@@@@1@18@@danf@17-8-2009 10421780@unknown@formal@none@1@S@The multiplication operator is pronounced "per" in Italian, and so it is sometimes used to replace the word "per", which means "for"; thus, for example, "per te" ("for you") is shortened to "x te" (compare with English "4 U").@@@@1@39@@danf@17-8-2009 10421790@unknown@formal@none@1@S@Words containing ''per'' can also have it replaced with ''x'': for example, ''perché'' (both "why" and "because") is often shortened as ''xché'' or ''xké'' or ''x' ''(see below).@@@@1@28@@danf@17-8-2009 10421800@unknown@formal@none@1@S@This usage might be useful to jot down quick notes or to fit more text into the low character limit of an SMS, but it is considered unacceptable in formal writing.@@@@1@31@@danf@17-8-2009 10421810@unknown@formal@none@1@S@* Usage of foreign letters such as ''k'', ''j'' and ''y'', especially in nicknames and SMS language: ''ke'' instead of ''che'', ''Giusy'' instead of ''Giuseppina'' (or sometimes ''Giuseppe'').@@@@1@28@@danf@17-8-2009 10421820@unknown@formal@none@1@S@This is curiously mirrored in the usage of ''i'' in English names such as ''Staci'' instead of ''Stacey'', or in the usage of ''c'' in [[Northern Europe]] (''Jacob'' instead of ''Jakob'').@@@@1@31@@danf@17-8-2009 10421830@unknown@formal@none@1@S@The use of "k" instead of "ch" or "c" to represent a plosive sound is documented in some historical texts from before the standardization of the Italian language; however, that usage is no longer standard in Italian.@@@@1@37@@danf@17-8-2009 10421840@unknown@formal@none@1@S@Possibly because it is associated with the [[German language]], the letter "k" has sometimes also been used in satire to suggest that a political figure is an authoritarian or even a "pseudo-nazi": [[Francesco Cossiga]] was famously nicknamed ''Kossiga'' by rioting students during his tenure as minister of internal affairs.@@@@1@49@@danf@17-8-2009 10421850@unknown@formal@none@1@S@[Cf. the [[alternative political spelling#"K" replacing "C"|politicized spelling ''Amerika'']] in the USA.]@@@@1@12@@danf@17-8-2009 10421860@unknown@formal@none@1@S@* Usage of the following abbreviations is limited to the electronic communications media and is deprecated in all other cases: '''nn''' instead of ''non'' (not), '''cmq''' instead of ''comunque'' (anyway, however), '''cm''' instead of ''come'' (how, like, as), '''d''' instead of ''di'' (of), '''(io/loro) sn''' instead of ''(io/loro) sono'' (I am/they are), '''(io) dv''' instead of ''(io) devo'' (I must/I have to) or instead of ''dove'' (where), '''(tu) 6''' instead of ''(tu) sei'' (you are).@@@@1@75@@danf@17-8-2009 10421870@unknown@formal@none@1@S@* Inexperienced typists often replace accents with apostrophes, such as in ''perche''' instead of ''perché''.@@@@1@15@@danf@17-8-2009 10421880@unknown@formal@none@1@S@Uppercase ''[[È]]'' is particularly rare, as it is absent from the [[Keyboard layout#Italian|Italian keyboard layout]], and is very often written as ''E''' (even though there are [[:it:Aiuto:Manuale di stile#Scrivere .C3.88|several ways]] of producing the uppercase È on a computer).@@@@1@39@@danf@17-8-2009 10421890@unknown@formal@none@1@S@This never happens in books or other professionally typeset material.@@@@1@10@@danf@17-8-2009 10421900@unknown@formal@none@1@S@==Samples==@@@@1@1@@danf@17-8-2009 10421910@unknown@formal@none@1@S@==Examples==@@@@1@1@@danf@17-8-2009 10421920@unknown@formal@none@1@S@*Cheers: "Salute!"@@@@1@2@@danf@17-8-2009 10421930@unknown@formal@none@1@S@*English: ''inglese'' {{IPA|/iŋˈglese/}}@@@@1@3@@danf@17-8-2009 10421940@unknown@formal@none@1@S@*Good-bye: ''arrivederci'' {{IPA|/arriveˈdertʃi/}}@@@@1@3@@danf@17-8-2009 10421950@unknown@formal@none@1@S@*Hello: ''[[ciao]]'' {{IPA|/ˈtʃao/}}@@@@1@3@@danf@17-8-2009 10421960@unknown@formal@none@1@S@*Good day: ''buon giorno'' {{IPA|/bwɔnˈdʒorno/}}@@@@1@5@@danf@17-8-2009 10421970@unknown@formal@none@1@S@*Good evening: ''buona sera'' {{IPA|/bwɔnaˈsera/}}@@@@1@5@@danf@17-8-2009 10421980@unknown@formal@none@1@S@*Yes: ''sì'' {{IPA|/si/}}@@@@1@3@@danf@17-8-2009 10421990@unknown@formal@none@1@S@*No: ''no'' {{IPA|/nɔ/}}@@@@1@3@@danf@17-8-2009 10422000@unknown@formal@none@1@S@*How are you? : Come stai {{IPA|/ˈkome ˈstai/}} (informal); Come sta {{IPA|/ˈkome 'sta/}} (formal)@@@@1@14@@danf@17-8-2009 10422010@unknown@formal@none@1@S@*Sorry: ''mi dispiace'' {{IPA|/mi disˈpjatʃe/}}@@@@1@5@@danf@17-8-2009 10422020@unknown@formal@none@1@S@*Excuse me: scusa {{IPA|/ˈskuza/}} (informal); scusi {{IPA|/ˈskuzi/}} (formal)@@@@1@8@@danf@17-8-2009 10422030@unknown@formal@none@1@S@*Again: ''di nuovo'', /{{IPA|di ˈnwɔvo}}/; ''ancora'' /{{IPA|aŋˈkora}}/@@@@1@7@@danf@17-8-2009 10422040@unknown@formal@none@1@S@*Always: ''sempre'' /{{IPA|ˈsɛmpre}}/@@@@1@3@@danf@17-8-2009 10422050@unknown@formal@none@1@S@*When: ''quando'' {{IPA|/ˈkwando/}}@@@@1@3@@danf@17-8-2009 10422060@unknown@formal@none@1@S@*Where: ''dove'' {{IPA|/'dove/}}@@@@1@3@@danf@17-8-2009 10422070@unknown@formal@none@1@S@*Why/Because: ''perché'' {{IPA|/perˈke/}}@@@@1@3@@danf@17-8-2009 10422080@unknown@formal@none@1@S@*How: ''come'' {{IPA|/'kome/}}@@@@1@3@@danf@17-8-2009 10422090@unknown@formal@none@1@S@*How much is it?: ''quanto costa?''@@@@1@6@@danf@17-8-2009 10422100@unknown@formal@none@1@S@{{IPA|/ˈkwanto/}}@@@@1@1@@danf@17-8-2009 10422110@unknown@formal@none@1@S@*Thank you!: ''grazie!''@@@@1@3@@danf@17-8-2009 10422120@unknown@formal@none@1@S@{{IPA|/ˈgrattsie/}}@@@@1@1@@danf@17-8-2009 10422130@unknown@formal@none@1@S@*Bon appetit: ''buon appetito'' {{IPA|/ˌbwɔn appeˈtito/}}@@@@1@6@@danf@17-8-2009 10422140@unknown@formal@none@1@S@*You're welcome!: ''prego!''@@@@1@3@@danf@17-8-2009 10422150@unknown@formal@none@1@S@{{IPA|/ˈprɛgo/}}@@@@1@1@@danf@17-8-2009 10422160@unknown@formal@none@1@S@*I love you: ''Ti amo'' {{IPA|/ti ˈamo/}}, ''Ti voglio bene'' {{IPA|/ti ˈvɔʎʎo ˈbɛne/}}.@@@@1@13@@danf@17-8-2009 10422170@unknown@formal@none@1@S@The difference is that you use "Ti amo" when you are in a romantic relationship, "Ti voglio bene" in any other occasion (to parents, to relatives, to friends...)@@@@1@28@@danf@17-8-2009 10422180@unknown@formal@none@1@S@Counting to twenty:@@@@1@3@@danf@17-8-2009 10422190@unknown@formal@none@1@S@*One: ''uno'' {{IPA|/ˈuno/}}@@@@1@3@@danf@17-8-2009 10422200@unknown@formal@none@1@S@*Two: ''due'' {{IPA|/ˈdue/}}@@@@1@3@@danf@17-8-2009 10422210@unknown@formal@none@1@S@*Three: ''tre'' {{IPA|/tre/}}@@@@1@3@@danf@17-8-2009 10422220@unknown@formal@none@1@S@*Four: ''quattro'' {{IPA|/ˈkwattro/}}@@@@1@3@@danf@17-8-2009 10422230@unknown@formal@none@1@S@*Five: ''cinque'' {{IPA|/ˈʧiŋkwe/}}@@@@1@3@@danf@17-8-2009 10422240@unknown@formal@none@1@S@*Six: ''sei'' {{IPA|/ˈsɛi/}}@@@@1@3@@danf@17-8-2009 10422250@unknown@formal@none@1@S@*Seven: ''sette'' {{IPA|/ˈsɛtte/}}@@@@1@3@@danf@17-8-2009 10422260@unknown@formal@none@1@S@*Eight: ''otto'' {{IPA|/ˈɔtto/}}@@@@1@3@@danf@17-8-2009 10422270@unknown@formal@none@1@S@*Nine: ''nove'' {{IPA|/ˈnɔve/}}@@@@1@3@@danf@17-8-2009 10422280@unknown@formal@none@1@S@*Ten: ''dieci'' {{IPA|/ˈdjɛʧi/}}@@@@1@3@@danf@17-8-2009 10422290@unknown@formal@none@1@S@*Eleven: ''undici'' {{IPA|/ˈundiʧi/}}@@@@1@3@@danf@17-8-2009 10422300@unknown@formal@none@1@S@*Twelve: ''dodici'' {{IPA|/ˈdodiʧi/}}@@@@1@3@@danf@17-8-2009 10422310@unknown@formal@none@1@S@*Thirteen: ''tredici'' {{IPA|/ˈtrediʧi/}}@@@@1@3@@danf@17-8-2009 10422320@unknown@formal@none@1@S@*Fourteen: ''quattordici'' {{IPA|/kwat'tordiʧi/}}@@@@1@3@@danf@17-8-2009 10422330@unknown@formal@none@1@S@*Fifteen: ''quindici'' {{IPA|/ˈkwindiʧi/}}@@@@1@3@@danf@17-8-2009 10422340@unknown@formal@none@1@S@*Sixteen: ''sedici'' {{IPA|/ˈsediʧi/}}@@@@1@3@@danf@17-8-2009 10422350@unknown@formal@none@1@S@*Seventeen: ''diciassette'' {{IPA|/diʧas'sɛtte/}}@@@@1@3@@danf@17-8-2009 10422360@unknown@formal@none@1@S@*Eighteen: ''diciotto'' {{IPA|/di'ʧɔtto/}}@@@@1@3@@danf@17-8-2009 10422370@unknown@formal@none@1@S@*Nineteen: ''diciannove'' {{IPA|/diʧan'nɔve/}}@@@@1@3@@danf@17-8-2009 10422380@unknown@formal@none@1@S@*Twenty: ''venti'' {{IPA|/'venti/}}@@@@1@3@@danf@17-8-2009 10422390@unknown@formal@none@1@S@The days of the week:@@@@1@5@@danf@17-8-2009 10422400@unknown@formal@none@1@S@*Monday: ''lunedì'' {{IPA|/lune'di/}}@@@@1@3@@danf@17-8-2009 10422410@unknown@formal@none@1@S@*Tuesday: ''martedì'' {{IPA|/marte'di/}}@@@@1@3@@danf@17-8-2009 10422420@unknown@formal@none@1@S@*Wednesday: ''mercoledì'' {{IPA|/merkole'di/}}@@@@1@3@@danf@17-8-2009 10422430@unknown@formal@none@1@S@*Thursday: ''giovedì'' {{IPA|/dʒove'di/}}@@@@1@3@@danf@17-8-2009 10422440@unknown@formal@none@1@S@*Friday: ''venerdì'' {{IPA|/vener'di/}}@@@@1@3@@danf@17-8-2009 10422450@unknown@formal@none@1@S@*Saturday: ''sabato'' {{IPA|/ˈsabato/}}@@@@1@3@@danf@17-8-2009 10422460@unknown@formal@none@1@S@*Sunday: ''domenica'' {{IPA|/do'menika/}}@@@@1@3@@danf@17-8-2009 10422470@unknown@formal@none@1@S@==Sample texts==@@@@1@2@@danf@17-8-2009 10422480@unknown@formal@none@1@S@There is a recording of [[Dante]]'s [[Divine Comedy]] read by [[Lino Pertile]] available at http://etcweb.princeton.edu/dante/pdp/@@@@1@15@@danf@17-8-2009