Heaps' law

http://dbpedia.org/resource/Heaps'_law an entity of type: WikicatStatisticalLaws

Зако́н Хи́пса — эмпирическая закономерность в лингвистике, описывающая распределение числа разных слов в документе (или наборе документов) как функцию от его длины. Описывается формулой , где VR — число разных слов в тексте размера n. K и β — свободные параметры, определяются эмпирически. Для английского корпуса текстов K обычно лежит между 10 и 100, а β между 0,4 и 0,6. Закон часто приписывается Гарольду Стэнли Хипсу, но впервые был открыт Густавом Герданом. С некоторым приближением закон Гердана — Хипса асимптотически эквивалентен закону Ципфа о частоте отдельных слов в тексте. rdf:langString

In linguistics, Heaps' law (also called Herdan's law) is an empirical law which describes the number of distinct words in a document (or set of documents) as a function of the document length (so called type-token relation). It can be formulated as where VR is the number of distinct words in an instance text of size n. K and β are free parameters determined empirically. With English text corpora, typically K is between 10 and 100, and β is between 0.4 and 0.6. rdf:langString

En lingüística, la ley de Heaps (también llamada ley de Herdan) es una ley empírica que describe el número de palabras distintas en un documento (o conjunto de documentos) como una función de la longitud del documento. Pueda ser formulado como: Donde VR es el número de palabras distintas en un texto de tamaño n. K Y β son los parámetros libres que se determinan empíricamente. Con un texto en inglés, típicamente K es entre 10 y 100, y β es entre 0,4 y 0,6. La ley de Heaps significa que cuando más texto es generado, costará más tiempo encontrar palabras nuevas. rdf:langString

Закон Гіпса (англ. Heaps' law) — емпірична закономірність у лінгвістиці, що описує розподіл числа різних слів у документі (або наборі документів) як функцію від його довжини. Описується формулою , де VR — число різних слів у тексті розміру n. K і β — вільні параметри, визначаються емпірично. Для англійського корпусу текстів, K зазвичай лежить між 10 і 100, а β між 0.4 і 0.6. rdf:langString

rdfs:label

rdf:langString Ley de Heaps

rdf:langString Heaps' law

rdf:langString Закон Хипса

rdf:langString Закон Гіпса

dbpedia-owl:wikiPageID

xsd:integer 436287

dbpedia-owl:wikiPageRevisionID

xsd:integer 1062655297

dbpprop:id

xsd:integer 3431

dbpprop:title

rdf:langString Heaps' law

dbpedia-owl:abstract

rdf:langString In linguistics, Heaps' law (also called Herdan's law) is an empirical law which describes the number of distinct words in a document (or set of documents) as a function of the document length (so called type-token relation). It can be formulated as where VR is the number of distinct words in an instance text of size n. K and β are free parameters determined empirically. With English text corpora, typically K is between 10 and 100, and β is between 0.4 and 0.6. The law is frequently attributed to , but was originally discovered by Gustav Herdan. Under mild assumptions, the Herdan–Heaps law is asymptotically equivalent to Zipf's law concerning the frequencies of individual words within a text. This is a consequence of the fact that the type-token relation (in general) of a homogenous text can be derived from the distribution of its types. Heaps' law means that as more instance text is gathered, there will be diminishing returns in terms of discovery of the full vocabulary from which the distinct terms are drawn. Heaps' law also applies to situations in which the "vocabulary" is just some set of distinct types which are attributes of some collection of objects. For example, the objects could be people, and the types could be country of origin of the person. If persons are selected randomly (that is, we are not selecting based on country of origin), then Heaps' law says we will quickly have representatives from most countries (in proportion to their population) but it will become increasingly difficult to cover the entire set of countries by continuing this method of sampling.Heaps' law has been observed also in single-cell transcriptomes considering genes as the distinct objects in the "vocabulary".

rdf:langString En lingüística, la ley de Heaps (también llamada ley de Herdan) es una ley empírica que describe el número de palabras distintas en un documento (o conjunto de documentos) como una función de la longitud del documento. Pueda ser formulado como: Donde VR es el número de palabras distintas en un texto de tamaño n. K Y β son los parámetros libres que se determinan empíricamente. Con un texto en inglés, típicamente K es entre 10 y 100, y β es entre 0,4 y 0,6. La ley es frecuentemente atribuida a Harold Stanley Heaps, pero fue originalmente descubierta por Gustav Herdan (1960). Bajo suposiciones suaves, la ley de Herdan-Heaps es una la ley asintóticamente equivalente a la ley de Zipf, que concierne a las frecuencias de palabras individuales dentro de un texto. Esto es una consecuencia del hecho de que la relación typo-token (en general) de un texto homogéneo puede ser derivado de la distribución de sus typos. La ley de Heaps significa que cuando más texto es generado, costará más tiempo encontrar palabras nuevas. La ley de Heaps también aplica a las situaciones en que el «vocabulario» es algún conjunto de distintas clases de alguna colección de objetos. Por ejemplo, los objetos podrían ser personas, y las clases podrían ser países de origen de la persona. Si las personas están seleccionadas aleatoriamente (es decir, no están seleccionadas las personas en función del país de origen), entonces la ley de Heaps dice cuán rápido encontraremos representantes de los países (en proporción al número de personas seleccionadas al azar) y predice que será más difícil cada vez encontrar personas de un país no incluido en la muestra.

rdf:langString Зако́н Хи́пса — эмпирическая закономерность в лингвистике, описывающая распределение числа разных слов в документе (или наборе документов) как функцию от его длины. Описывается формулой , где VR — число разных слов в тексте размера n. K и β — свободные параметры, определяются эмпирически. Для английского корпуса текстов K обычно лежит между 10 и 100, а β между 0,4 и 0,6. Закон часто приписывается Гарольду Стэнли Хипсу, но впервые был открыт Густавом Герданом. С некоторым приближением закон Гердана — Хипса асимптотически эквивалентен закону Ципфа о частоте отдельных слов в тексте.

rdf:langString Закон Гіпса (англ. Heaps' law) — емпірична закономірність у лінгвістиці, що описує розподіл числа різних слів у документі (або наборі документів) як функцію від його довжини. Описується формулою , де VR — число різних слів у тексті розміру n. K і β — вільні параметри, визначаються емпірично. Для англійського корпусу текстів, K зазвичай лежить між 10 і 100, а β між 0.4 і 0.6. Закон часто приписують Гарольду Стенлі Гіпсу (Harold Stanley Heaps), але вперше його відкрив Густав Гердан (Gustav Herdan). З деяким наближенням закон Гердана — Гіпса асимптотично еквівалентний закону Ципфа про частоту окремих слів у тексті.

dbpedia-owl:wikiPageLength

xsd:nonNegativeInteger 5491

rdf:type

yago:WikicatStatisticalLaws

yago:Abstraction100002137

yago:CausalAgent100007347

yago:Collection107951464

yago:Group100031264

yago:Law108441203

yago:LivingThing100004258

yago:Object100002684

yago:Organism100004475