Swish function

http://dbpedia.org/resource/Swish_function

La función swish es una función matemática definida por la siguiente fórmula: Donde β puede ser constante o un parámetro entrenable según el modelo. En el caso en que β=1, la función es equivalente a la función con ponderación sigmoide que se usa en aprendizaje de refuerzo (Sigmoid-weighted Linear Unit, SiL), mientras que para β=0, swish se convierte en la función lineal f(x)=x/2. Con β→∞, el componente sigmoideo se acerca a una función escalón unitario, por lo que swish tiende a la función ReLU. Así, puede ser vista como una interpolación no lineal entre una función lineal y la ReLU. rdf:langString

Swish функція це математична функція, що описується виразом: де β є константою або параметром, який залежить від типу моделі. Похідна функції. rdf:langString

The swish function is a mathematical function defined as follows: where β is either constant or a trainable parameter depending on the model. For β = 1, the function becomes equivalent to the Sigmoid Linear Unit or SiLU, first proposed alongside the GELU in 2016. The SiLU was later rediscovered in 2017 as the Sigmoid-weighted Linear Unit (SiL) function used in reinforcement learning. The SiLU/SiL was then rediscovered as the swish over a year after its initial discovery, originally proposed without the learnable parameter β, so that β implicitly equalled 1. The swish paper was then updated to propose the activation with the learnable parameter β, though researchers usually let β = 1 and do not use the learnable parameter β. For β = 0, the function turns into the scaled linear function f(x) rdf:langString

rdfs:label

rdf:langString Función Swish

rdf:langString Swish function

rdf:langString Swish функція

dbpedia-owl:wikiPageID

xsd:integer 63822450

dbpedia-owl:wikiPageRevisionID

xsd:integer 1124205518

dbpprop:cs1Dates

rdf:langString y

dbpprop:date

rdf:langString June 2020

dbpedia-owl:abstract

rdf:langString La función swish es una función matemática definida por la siguiente fórmula: Donde β puede ser constante o un parámetro entrenable según el modelo. En el caso en que β=1, la función es equivalente a la función con ponderación sigmoide que se usa en aprendizaje de refuerzo (Sigmoid-weighted Linear Unit, SiL), mientras que para β=0, swish se convierte en la función lineal f(x)=x/2. Con β→∞, el componente sigmoideo se acerca a una función escalón unitario, por lo que swish tiende a la función ReLU. Así, puede ser vista como una interpolación no lineal entre una función lineal y la ReLU.

rdf:langString The swish function is a mathematical function defined as follows: where β is either constant or a trainable parameter depending on the model. For β = 1, the function becomes equivalent to the Sigmoid Linear Unit or SiLU, first proposed alongside the GELU in 2016. The SiLU was later rediscovered in 2017 as the Sigmoid-weighted Linear Unit (SiL) function used in reinforcement learning. The SiLU/SiL was then rediscovered as the swish over a year after its initial discovery, originally proposed without the learnable parameter β, so that β implicitly equalled 1. The swish paper was then updated to propose the activation with the learnable parameter β, though researchers usually let β = 1 and do not use the learnable parameter β. For β = 0, the function turns into the scaled linear function f(x) = x/2. With β → ∞, the sigmoid component approaches a 0-1 function, so swish approaches the ReLU function. Thus, it can be viewed as a smoothing function which nonlinearly interpolates between a linear function and the ReLU function. This function uses non-monotonicity, and may have influenced the proposal of other activation functions with this property such as Mish. When considering positive values, Swish is a particular case of sigmoid shrinkage function defined in (see the doubly parameterized sigmoid shrinkage form given by Equation (3) of this reference).

rdf:langString Swish функція це математична функція, що описується виразом: де β є константою або параметром, який залежить від типу моделі. Похідна функції.

dbpedia-owl:wikiPageLength

xsd:nonNegativeInteger 4395

dcterms:subject

<http://dbpedia.org/resource/Category:Functions_and_mappings>

<http://dbpedia.org/resource/Category:Artificial_neural_networks>

dbpedia-owl:wikiPageWikiLink

<http://dbpedia.org/resource/Vanishing_gradient_problem>

<http://dbpedia.org/resource/Function_(mathematics)>

<http://dbpedia.org/resource/Google>

<http://dbpedia.org/resource/Trainable_parameter>

<http://dbpedia.org/resource/Backpropagation>

<http://dbpedia.org/resource/Activation_function>