To adapt or not to adapt in web localization

By Jeremy Armstrong,2014-02-04 20:18
15 views 0
To adapt or not to adapt in web localization

    To adapt or not to adapt in web localization: a contrastive genre-based study of original and localised legal sections in corporate websites

    Miguel A. Jiménez-Crespo, Rutgers University, The State University of New Jersey, USA.


    Since the early 90?s, the localisation industry has striven to produce non-culture-specific texts that can be easily localised into most languages. Nevertheless, international websites include sections, such as legal disclaimers or privacy policies, that preferably need to be adapted in order to be fully effective and increase the credibility of the website (Kenny and Jones 2007). This study explores these two seemingly contradictory perspectives through a comparable corpus analysis of original and localised legal sections in corporate websites. Following a genre-based approach (Swales 1990; Bhatia 1993; Gamero 2001), the main analysis concentrates on macrostructural differences and representative conventional linguistic forms associated with rhetorical moves. The analysis shows significant differences in the prototypical macrostructures of original and localised texts, as well as an impact on their terminology and phraseology. As far as the adaptation is concerned, only 32.60% of websites were somewhat adapted to the Spanish target legal system, while the rest were localised but not legally adapted. The results shed some light on the question of whether current industry strategies favor single internationalised vs. adapted localisations and on the inevitable effect of source text structures and phraseology on the final localised website. KEYWORDS

    Web localisation, localization, translation, adaptation, corpus-based translation studies, web genres, genre theory

    1. Introduction

    For most non-English speaking cultures, the ever-increasing digital world cannot be understood without the mediation of web localisation processes.

    1Millions of web users interact daily with localised web content and browsers.

    In fact, it could be claimed that localised webs might represent the most-used translated texts globally. From a translation perspective, the features of translated texts are, and have been, widely studied since the emergence of corpus-based translation studies (e. g. Baker 1993, 1995; Laviosa 1998), and nowadays, localised texts should be brought to the

    forefront of this discussion. So far, this translation-mediated communication process has not yet been granted the attention it deserves from both a theoretical and an empirical perspective (Pym 2003, 2005, 2009; Jiménez Crespo 2009b, 2010; Dunne 2006). Among the many issues that demand more detailed analysis, this paper focuses on the special adaptation

    component that the industry claims is the main difference between localisation and translation. The focus of the analysis is legal texts embedded in websites, a potential probe into whether web texts are localised maintaining macro and microstructures of source texts (Jiménez-Crespo 2009b), or fully adapted to the target sociocultural and legal context. The adaptation of legal sections in localisation to the target sociocultural contexts is of paramount importance as it can impact the end user‘s confidence in the entire site (Kenny and Jones, 2007). As such, the

    mere translation of the legal content without appropriate adaptation could be detrimental to the goal of the target text in the sociocultural context of reception. Ideally, this adaptation should be done in consultation with legal experts, as they are also responsible for the drafting of source legal sections in websites (Garrand 2001).

    The localisation industry, in its attempts at defining localisation as a process that goes beyond translation, normally claims the existence of this adaptation component to fulfill the expectations of local users (e.g. Esselink 2000; Önorm D 1200; Microsoft 2003; LISA 2003, 2007:14; Dunne 2006:4; Industrie Canada, 1999:48; Sighn and Pereira 2005). This cultural and technical adaptation is widely seen and presented as the most important differential factor between translation and localisation. However, since Nida‘s proposed dynamic and semantic equivalence models as well as the emergence of functionalist approaches (e.g. Reiss and Vermeer 1984; Nord 1997), translation adaptation to the receiver‘s context or expectations is regarded as inherent in all target-oriented translation processes. Methodologically, this issue is researched through a contrastive genre-based analysis of localised (LT) and original legal texts (OT) in corporate websites. These texts are understood as a conventionalised move

    (Swales 1991) or communicative section (Gamero 2001) in the corporate

    website digital genre. The study analyses contrastively the average frequencies of the many textual sections and subsections in these Spanish OT and LT into this same language, such as privacy policies, legal disclaimers, terms and conditions, etc. Additionally, and given that in legal texts there is a high level of conventionalisation in the phraseological units associated to the different steps or moves (Borja Albí 2000, 2005), a

    contrastive phraseological analysis is performed in a second stage. 2. Conceptualising web localisation

    Among all branches of localisation, web localisation has without any doubt the largest volume of translation (LISA 2007). It can be defined as a complex communicative, textual, cognitive and technological process by which interactive multimedia web texts are modified in order to be used by a target audience whose language and sociocultural context are different from those of original production (Jiménez-Crespo 2010). Web localisation developed by modeling and adapting certain processes and practices already established in software localisation (Dunne 2006; Yunker 2003: 30), but due to the explosion in the volume of information shared through the WWW, the economic impact of the former is currently far greater than of the latter (Schäler 2005). Its relative youth led several scholars to coin different terms during the last decade, such as e-localization (Cronin 2003), web

    globalization (Yunker 2003), content localization (Esselink 2006) or

    web-content localization (Mata Pastor 2005). Nevertheless, a review of

    recent literature clearly shows that the most conventional term used by both translators and practitioners alike is web localization, and given the need for

    a common and stable metalanguage of translation and localisation (Chesterman 2005; Mazur 2008), this will be the term used henceforth. Furthermore, and in order to clarify any conceptual ambiguities, it should be mentioned that web localisation concentrates exclusively on multimedia texts stored and distributed through the WWW, but it does not include texts from other Internet-meditated communicative exchanges, such as chats, SMS or forums (O‘Hagan and Ashworth 2003).

    Figure 1. Different areas of research in Localisation Studies. Most localisation processes, as shown in Figure 1, share several characteristics, such as the digital nature of the text, the presentation on screen, the interactive nature of texts or the necessary collaboration with localisation engineers and developers to produce the final target product. Nevertheless, there are stark differences in the way the actual textual segments are stored, the programming or markup languages used or the potential variation in textual types and genres (Jiménez-Crespo 2008b, 2009b). As an example, most software products entail a relatively

    standardised textual genre (Aüstermuhl 2007), videogame localisation also deals with a limited number of genres (Mangiron and O‘Hagan 2006), but on the contrary, most web digital genres are complex genres (Martin 1995; Hanks 1986), that is, genres that can potentially incorporate a wide range of secondary genres, such as online purchase contracts in e-commerce websites. This is what Bhatia (1986) or Martin (1995:25) referred to as genre embedding. Consequently, despite the fact that most widely used digital genres are nowadays highly conventionalised (Shepherd and Waters 1998; Shepherd et al. 2005; Santini 2007; Jiménez Crespo 2008b), a web localiser can potentially encounter a huge variety of secondary genres embedded in any website.

    It is also clear that this new process needs to be contextualised in its relation to the Internet and the WWW. The latter has not only led to the emergence of this new modality, but it has also revolutionised translation and business practices around the world (LISA 2007; Gouadec 2007). It should be mentioned that not all texts distributed on the WWW are the result of the new textual and communicative model that emerged through the hypertextual revolution (Storrer 2002; Crystal 2001). The WWW allows any text created or converted into digital format to be distributed through the WWW. As an example, an instruction booklet for any product can be uploaded in a website without modifications; a governmental website normally offers official scanned documents in html or pdf format. These types of texts are what Angelika Storrer (2002) refers to as e-texts:

    sequentially organised printed documents that are simply uploaded and made available on the WWW. These e-texts can also be conceptualised as digital secondary genres (Martin 1995), as they can be randomly embedded in any hypertext. As such, processing these documents cannot be per se the object of study of web localisation, but rather, the overall digital genre structure that allows for this genre embedding, that is, the corporate or social networking site as a whole. Additionally, Storrer considers hypertexts

    as the new textual and communicative model that appears exclusively on

    2the WWW. They can be defined as networks of textual nodes and links that serve a distinct textual function and address a comprehensive, global topic. These hypertexts are open, as the developer can add any other nodes or textual segments at any time. In hypertext theory, nodes are defined as

    subunits that form independent unitary communicative chunks, such as textual segments, navigation menus, graphics, pictures, ad banners, flash

    3files, etc. (Codina 2003). Thus, this paper proposes that hypertexts can be defined as the prototypical object of study of web localisation following what Toury (1995) and Holmes (1988) would consider a restricted theoretical

    area inside T&S.

    Moreover, due to storage, retrieval and screen presentation purposes, each webpage in a hypertext is in turn subdivided into interface text and content

    text (Prince and Price 2002). The former includes all textual segments whose function is to help users navigate the hypertextual structure. As such, these types of texts are repeated throughout the website and they help negotiate the global coherence in a complete website (Fritz 1998; Storrer 2002). These textual segments include navigation menus, search functions or web page descriptions and content tags in the headings . Interface texts tend to be more conventionalised as digital genres are gradually being highly conventionalised with a common structure (Santini 2007; Nielsen and Loranger 2007; Jiménez-Crespo 2008b). On the other hand, content text can be defined as the unique differentiated textual

    content that makes each web page a storage unit as summarised in the webpage title. As an example, in any conventional contact us page, the

    contact information for the party responsible for the website can be defined as the content text, while the rest of the text, such as navigation menus or banner ads is the interface text. As an example, digital newspapers constitute a new digital genre that evolved parallel to the expansion of the WWW (Shepherd and Watters 1998). Nevertheless, any piece of printed news simply posted in a digital paper could not be defined as a textual exemplar that is exclusively dependent on the medium; its translation process would be similar to the translation of any other printed piece of


    As far as the localisation process, and from a Translation Studies perspective, web localisation can be defined mostly as an instrumental (Nord 1997) or

    covert (House 1997:111) process in which the goal is for end users to interact with the translated text as if the text was directly produced in the target language. This is implicitly indicated in the goals for localisation laid out by the Localization Industry Standards Association (2004, 2003), as websites are to be received as ―locally made products‖ or look like they have been developed in-country. In this translation type, end users are unaware that they are in fact interacting with a translated text, and the adaptation to the cultural and linguistic expectations of the target user is of utmost importance. Nevertheless, the legal texts under study represent a completely different translation type, as legal translation requires a documentary (Nord 1997) or overt (House 1997) translation type. This

    means that the translation is presented as such and normally, the faithfulness to the source text becomes an essential aspect. Therefore, while translators have a wider range of possibilities while adapting the website to the expectations of the target audience, they face a completely different translation process in these legal sections. This documentary nature is sometimes implicitly formulated in legal texts, normally indicating that the English source version is the only valid one in the case of legal disputes. This poses an interesting challenge to localisers as they need to handle different translation types during the course of a web localisation project. The results

    from this study will help answer the question about whether or to what extent these sections are in fact translated differently or not. Now that web localisation has been defined and contextualised in the realm of Translation Studies, the next section reviews legal texts in websites from a textual genre perspective in order to clarify the methodological approach taken.

    3. Legal information in websites: a genre description

    Genre-based approaches to legal translation have been extremely productive during the last decade. This is due to the fact that legal genres are highly structured and conventionalised (Alcaraz and Hughes 2002; Borja Albí 2005; Cao 2007). Contrastive genre-based research of legal genres have been extremely beneficial to translation trainers, practitioners and researchers as it allows them to analyze and adjust not only the

    5macrostructure of the source text to the conventionalised macrostructure

    of the same legal genre in the target sociocultural context, but also the phraseological and terminological conventions associated to any of the many moves, steps or textual blocks. This high level of conventionalisation of legal structures in websites can be witnessed by the existence of standardised privacy or terms and conditions in published books (i.e. Gonzalez et. al 2004; American Bar Association 2007) or online interactive

    6generators that can be directly used in any website. Recent research

    following this contrastive genre-based approach has led to the development of corpora with the most translated legal genres, such as the GITRAD corpus (Borja Albí 2007), a first step towards the description and analysis of the prototypical macrostructures and microstructures of these genres in several languages and sociocultural contexts.

    Methodologically, these contrastive studies follow an analysis continuum starting from the superstructure and macrostructure (Göpferich 1995), usually describing and then contrasting the prototypical genre‘s macrostructure. In order to research these prototypical textual structures, any given genre is subdivided into recurring

    sections>moves>steps>substeps (Swales 1990), triad>keys (Paltridge

    1997) or communicative blocks>communicative sections>significant

    units>significant subunits (Gamero 2001). The frequencies of each textual

    section identified by the researcher are recorded in order to identify their level of conventionalisation. In a later stage, a microstructural analysis can

    7also be performed in which conventional linguistic forms that recurrently

    appear in each macrostructural section are identified and contrasted between both cultures.

    Following this approach, the adaptation claim by the localisation industry will be researched through a contrastive study of the prototypical macrostructures in Spanish original and localised web texts. This will be followed by a microstructural contrastive analysis which focuses on conventionalised phraseology that appears in a representative selection of textual sections.

    4. Empirical Study: Methodology

    The comparable corpus of Spanish original (OT) and localised legal texts (LT) was extracted from the Comparable Web Spanish corpus compiled by Jiménez-Crespo (2008a). This wider corpus included 95 localised websites for Spain from the largest US companies, as well as a representative collection of 175 original corporate websites from Spain. It was collected in November of 2006. The subcorpus under study includes all pages in this larger comparable corpus with legal content, such as legal disclaimers, privacy policies, terms and conditions and copyright-trademark pages. The Spanish original section of the corpus under study comprises 64 legal web pages, with 57,718 words and 4776 different tokens. The localised section has 65 legal web pages, with 112,319 words and 7495 different tokens. The number of web pages is higher than the number of total websites given that many websites have two or more pages for legal content, such as a page for a privacy policy and a page for terms and conditions.

     Original web legal Localised web legal

    corpus corpus

    Words 57,718 112,319

    Types 4776 7495

    Webpages 64 65

    Websites 54 47

    Table 1. Description of the comparable web legal corpus of corporate


    In order to contrastively analyze the macrostructure of these two textual subcorpora, all pages were analyzed as a single legal move in each website.

    This was necessary given that the distribution of content is normally uneven among these pages. That is, a privacy policy page might include information about the terms and conditions, and a legal disclaimer page might include all other legal information regarding privacy and terms, etc. The different moves, steps and substeps were carefully analyzed and the frequency of appearance of each of them was recorded. This means that the frequencies recorded indicate the appearance per site for each constituent textual section identified.

5. Results and discussion

    The first analysis, in Table 1, shows that the average number of words per site and per page, and the values of both are much higher in LT. The average number of words per web page with legal content in OT is 901.84, while LT shows an average of 1727.98 words, almost double the value in original ones. This finding points out that LT are on average much longer, and consequently, their macrostructure will inevitably show a higher volume of constituent moves and steps. In order to situate this result in the context of the global website, the average number of words per page in the overall Spanish Web Comparable Corpus is 258.87 in the original section and 416.07 in the localised one (Jiménez-Crespo 2008a: 273). Thus, for both sections, legal web pages normally contain almost four times more words than the rest of webpages in corporate sites. Furthermore, if the number of words in all legal texts are compounded per website, localised legal sections show an average of 2415.69 words, while original sites with legal content show on average 1074.94 words per site [+224.72% difference]. The longer formulation in localised websites would in principle lessen their usability and readability as style guidelines and empirical usability research recommend briefness and conciseness in web pages (e.g. Nielsen and Loranger 2006; Jeney 2007; Price and Price 2002). Moreover, according to usability research it is recommended to avoid page scrolling, because users normally avoid this process and move to other pages (Nielsen 1999; Price and Price 2002: 147).

    Legal Word average per Word average per web Average per

    comparable web page in legal page in Spanish Web website with legal

    subcorpus subcorpus Comparable Corpus moves-steps

    Original 901.84 258.87 1068.85

    Localised 1727.98 416.07 2389.76

    Table 2. Number of words per page with legal content and per corporate website.

    A potential explanation of this difference might be due to the different legal content in the source and target sociocultural contexts. In the USA, web privacy issues are self regulated by companies themselves under the guidance by the Federal Trade Commission (Liu and Arnett 2000), while in the Spanish Legal system, web privacy is regulated by the 1999 Spanish Data Protection Act. This means that US websites are required to explicitly formulate a full privacy policy block, while Spanish sites only have to indicate that their practices are in compliance with the applicable Spanish law. This again indicates that to some extent, the texts are not adapted as the macrostructure from the source text is maintained.

    This first analysis has shown that the length of localised texts tends to be on average twice that of original ones. With this in mind, the next section explores exactly which moves and steps might be contrastively under-or over-represented in both corpora.

    5.1. Contrastive macrostructural analysis

    For the next analysis, all OT and LT were manually examined and the potential constituent moves and steps were identified. After this descriptive analysis of all potential legal moves and steps, the macrostructure of each legal text was examined, each previously identified move or step was tagged and its frequency was recorded. Table 3 shows the contrastive analysis of the frequency of all moves, steps and substeps in these legal texts. Following previous studies in this area (Jiménez Crespo 2008a, 2008b; Nielsen and Tahir 2002), three main moves were identified: legal disclaimers (M1), privacy policy (M2) and terms and conditions (M3). In each move, all different steps and substeps were identified and recorded. As an example, in the legal disclaimer move (M1), ten different steps were

    identified, such as introduction (S1-1), acceptance of legal terms (S1-2),

    company registration (S1-3). Globally, ten steps were identified in M1, eleven steps in privacy policy (M2) and thirteen in terms and conditions (M3).

    For many of the steps, several substeps were also identified, and these were marked with a consecutive letter of the alphabet. As an illustration, the first step in the legal disclaimer move includes two substeps, a welcoming

    statement to the legal webpage (S1-1-a) and an appeal to read the legal text in its entirety (S1-1-b). Only eighteen substeps were recorded in Table 3, but nevertheless, it should be mentioned that the identification of these secondary textual blocks should be understood as an open framework in which more substeps could be potentially included.

    The frequencies recorded in Table 3 are indicative of the appearance in the web pages collected in the legal corpus. In the wider Spanish Web Comparable corpus, as previously reported by Jiménez-Crespo (2009b), the three basic legal sections show much higher frequencies in localised corporate websites than in original Spanish ones: privacy polices appear in

    70.52% of localised corporate websites and in 13.37% of original ones, terms and conditions in 38.94% of LT and in 4.65% of OT and legal

    disclaimers in 47.36% of LT and in 27.90% of OT. This finding is also consistent with a the results from another study (Robbin and Stylianou 2003) that concluded that the most consistent difference between US corporate sites and other international sites was that legal webpages were more frequent in the former. It should therefore be mentioned that the values included in Table 3 represent the frequency of moves and substeps in the subcorpus of legal web texts, and not in the frequency of appearance in all corporate websites as a whole.

    Frequency Frequency

    Move and step Substeps Original Localised M1. LEGAL DISCLAIMERS

    S1-1. Introduction a. Welcome 0.00* 17.39

     b. Please read text 13.21 65.22 S1-2. Acceptance legal terms a. Acceptance 58.49 93.48

     b. Leave website 7.55 15.22

    a. Company legal S1-3. Company registration registration 73.58 34.78

     a. Corporate Address 30.19 30.43

    b. Spanish Tax ID

     number (CIF) 52.83 30.43 S1-4. Applicable law and

    jurisdiction 18.87 65.22 S1-5. Copyright- Protected

    material 67.92 89.13

     a. Written authorisation 26.42 47.83

    b. Which material is

     protected 26.42 50.00 S1-6. Where is the information

    stored 0.00* 4.35 S1-7. Website owner... 56.60 71.74 S1-8. Who is the website

    addressed at? 5.66 23.91 S1-9. Using registered

    trademarks 3.77 32.61 S1-10. Effective date or

    revision date 0.00 23.91 M2. PRIVACY POLICY S2-1. Compliance to Spanish

    Privacy Laws

    a. Spanish Personal Data

     Protection Law 49.06 13.04

    b. Law 34/2002 of

    Information Society

    Services and

     E-Commerce 22.64 6.52 S2-2. Collection of data 47.17 45.65 S2-3. Right to access, rectify, 47.17 28.26

Report this document

For any questions or suggestions please email