T I P

Text & Information Processing

  • Increase font size
  • Default font size
  • Decrease font size
English
Home Syllabification - Syllabifier TIP
Syllabification - TIP Syllabifier

Syllabification - Silabeador TIP (for Spanish)

Article Index
Syllabification - Silabeador TIP (for Spanish)
Syllabification alternatives
Separator of syllables
Application
Lemmatization
Remarks
All Pages

Link to Syllabifier TIP (new version)
Android version

TIP Syllabler is an on-line application which performs the syllabification of any Spanish word. TIP Syllabler uses a separator of syllables based on orthographic criteria which is complemented with a morphological analysis tool and a database of lexical-semantics relationships in order to perform “intelligent” division by components, looking at first for the existence of the word and then for the presence of affixes that can influence the syllabification. The morphological tool recognizes more than 4 millions of words. The database of lexical-semantics keeps over eighty thousand relationships. The C++ source code of the separator of syllables is available for free download under GNU General Public License.

TIP Syllabler is aware of the general hyphenation rules available at the sources listed below:

  • (1) NGLE-RAE-2009. Nueva gramática de la lengua española. Espasa Calpe (2009)
  • (2) DPD-RAE-2005. Diccionario panhispánico de dudas de la Real Academia Española - 1ª edición (2005)
  • (3) Ortografía de la Lengua Española, edición revisada por las Academias de la Lengua Española y publicada por la Real Academia Española (1999).


In case of mentioning this source, please make use of the author reference:

Hernández-Figueroa, Z; Rodríguez-Rodríguez, G; Carreras-Riudavets, F (2012). Separador de sílabas del español – Silabeador TIP. Available at http://tip.dis.ulpgc.es.

The version of 2009 is already not available:

Hernández-Figueroa, Z; Rodríguez-Rodríguez, G; Carreras-Riudavets, F (2009). Separador de sílabas del español – Silabeador TIP. Available at http://tip.dis.ulpgc.es.



Syllabification alternatives

When there are more of only one feasible division, TIP Syllabler selects one of them in the basis detailed below:

  • Some vowels sequences can be combined either as diphthongs or as hiatuses, depending on various factors. However, according to (1), in deciding the graphic accent, those sequences are always considered diphthongs. TIP Syllabifier makes this decision when it performs the syllabification, given the difficulty to contrast the concurrent factors. Examples of this are, according to (1), fluir (syllabification fluir or flu-ir), incluido (syllabification in-clui-do or in-clu-i-do), cruel (syllabification cruel or cru-el) y desviado (syllabification des-via-do or des-vi-a-do).
  • According to (1) and (2) , in Hispano-America, the Canary Islands and some Spanish areas the sequence "tl" is inseparable (as in "a-tlas"). In the rest of Spain and Puerto Rico "tl" is usually divided in two syllables ("at-las"). TIP Syllabler selects the first, majoritary option.
  • According to (3), prefixes tends to do not merge with another syllables, although that depends on the “transparency or opacity” of the prefixes. The separator of syllables does not do anything about prefixes, but TIP Syllabler interprets the concept of “transparency or opacity” selecting division by components for the prefixes basing on the lexical-semantics relationships of the word. If the word without prefixes exists in the same lexical-semantics family than the prefixed word, the prefixes are interpreted as visible and taken into account, otherwise they are interpreted as opaque and did not taken into account. By example, the prefix “in-“ in the word “inerme” is considered opaque because “erme” (an inflected form of the verb “ermar”) does not belongs to the lexical-semantics family of “inerme”(an irregular derivative of the noun “arma”), but the prefix “des-” in “desosar” is considered visible because, although it derives from “hueso” in a very irregular way, the word “osar” exists and belongs to the same lexical-semantics family.


Separator of syllables

The separator of syllables is built as a C++ class, named SeparatorOfSyllables, with the following methods:

  • SeparatorOfSyllables ().- constructor
  • int NumberOfSyllables (const char *).- returns the number of syllables of a word.
  • int * SyllablePositions (const char *).- returns an array of integers with the start positions of every syllables in a word.
  • int StressedSyllable (const char *).- returns the position of the stressed syllable in a Word (first syllable is in position 1).

Adverbs built from an adjective by means of the suffix –mente have two phonic accents: the one of the adjective and the one of the suffix. In this case, StressedSyllable returns the position of the syllable with graphic accent, if any. If the word has not graphic accent, StressedSyllable returns the position of the phonic accent of the suffix.


Application

Syllabification results are shown in the main pages of the application on line, in a four-column table. First column shows the word. Second column shows the syllables of the word, separated by commas and with the stressed syllable emphasized if that option is selected. Third column shows the lemmatization of the word; that is, its canonical form and grammatical category. If the word is not recognized, possible alternatives are offered.


Lemmatization

As explained above, the third column shows the lemmatization information for the word. This lemmatization is done by an automatic lemmatizer that uses a database of 4,980,387 inflected forms from 196,597 canonical forms. The lemmatization information includes the canonical form and grammatical category. For example, figure 8 shows that the word "soluciones" (solutions) could be an inflected form of the verb "solucionar" (to solve) or are inflected form of the noun "solución" (solution). The verb “solucionar” (to solve) is underlined because contains a hyperlink to that application Conjugador TIP, a conjugate tool.

When the word is recognized as a “prefixal neologism” the prefix, or prefixes, and the canonical form are shown separately.

A word formed by the addition of a prefix that have been consolidated in the language is not a neologism, but, in despite, the prefix could continue being prominent for the user. When these consolidated words could be syllabified in different ways depending on the prominence of the prefix a colour code is used to indicate that prominence. The figure shows that the word "desarme" (disarmament) have two possible syllabifications:

  • When it is an inflected form of the verb "desarmar" (to disarm), formed by the addition of the prefix "des-" to the verb "armar" (to arm), the prefix is marked using apple green because it is prominent.
  • When it is the noun "desarme" (disarmament), formed by suffixal derivation from the verb "desarmar" (to disarm), the prefix is marked as not prominent using “light green” because the noun "arme" does not exists.

Consolidated words formed by the addition of a prefix cause the apparition of an icon ( ) which spam information about the word formation. This is especially useful for words which do not difference syllabification depending on prefixes, for example words having irregularities, like "contralmirante" (rear admiral).


Remarks

The four column of the results table, figure 6, shows some information about the syllabification process. This information is shown as mnemonic Icons, displaying a text when the mouse pointer passes over them. There are four different icons which are explained at table.

Icono Descripción
-mente

Any adverb ending with "-mente" has two stressed syllables, so when an adverb or an unknown word ending with "-mente" is syllabified, the following text is displayed:

"Adverbs ending in -mente are pronounced with two stressed syllable: the adjective corresponding to that derived for the item and the compositional -mente. (Ortografía de la Lengua Española [RAE 1999], Diccionario Panhispánico de dudas [RAE 2005])."

Pref-

When a Word could be syllabified on different ways depending on prefixes, the following text is displayed:

"Prefixes may vary the syllabification of a word depending on their prominence. Prefixes tends to be separated from another syllables. (Nueva Gramática de la Lengua Española [RAE 2009])"

tl / t-l

When a word contains the sequence "tl", it can be syllabified on different ways, depending on geographical area; so the following text is displayed:

"The sequence of consonants tl tends to pronuounce on different syllables in most of mianland Spain and Puerto Rico, in the rest of Latin America, the Canary Islands and in some areas of Spanish mainland, both consonants are pronounced in the same syllable. (Ortografía de la Lengua Española [RAE 1999], Diccionario Panhispánico de Dudas [RAE 2005])"

áéíóú

When a word have a bad-positioned graphic accent, one of the three following texts is displayed:

"A word can only have one accent mark"

"A paroxytone word must have accent mark only when no ending by n, s, or vowel"

"A oxytone word must have accent mark only when ending by n, s, or vowel"