
User Documentation¶
Introduction¶
Easyling is a cloud-based translation proxy solution designed to make websites available in several languages. Easily.
A translation proxy is a layer placed between the browser and the original website, through which the visitor sees the original website in a different language.
There are several proxy solutions available on the market, yet the translation proxy is unique in that:
- it is a solution targeted primarily at LSPs - which also means that this is the only LSP-independent solution on the market
- also available as a whitelabel solution
- standard-compliant XLIFF export option ensures CAT-independence
- automated XLIFF export/import option between CAT / Translation management systems and Easyling
- in-context live preview available in certain CAT-tools at translation-time
What does that mean?
If you are a business owner, it can help you reach a wider customer-base through providing information to your potential customers in their native language. What’s more, you can do more than just translating the text on your website, you can localize it: you can also adapt your message, the images displayed, or even your product range offered to the targeted culture. And all this without the need of heavy upfront investment in IT infrastructure and personnel, and the hassle with regular maintenance and upgrade. the translation proxy takes care of the IT part, so that you could concentrate on the content - and growing your business.
If you are a language service provider (LSP), you can offer cutting-edge website localization services to your customers - even under your own brand name! the translation proxy provides the technology, takes care of the IT infrastructure, leaving you to concentrate on your core business: cross-cultural communication. What’s more, your translators don’t need to learn using just another tool, they can keep using their own preferred CAT-tools.
Sounds good?
There are several challenges both business owners and language service providers face during website translation. The “ideal” workflow would be to create the content in the original language, get it translated into the desired languages, and then publish all language variants at the same time, from the website owner’s own content management system (CMS) - right from the very first page on the website. But reality is different. Apart from the fact that not many CMSs are capable of handling several languages, usually website localization comes into the picture at a later stage, when there is already a huge amount of data published on the website. And, in most cases, the website owner can’t extract the content for translation. If they can’t extract the original, there’s no easy way to load the translated content back either. Furthermore, if the website owner can’t extract the content into translatable format, it is impossible to get a proper estimate for the translation costs in time and money...
Easyling can, however, discover the website by following links and grabbing translatable and localizable content - and convert it into a translatable format. This gives a realistic view of the magnitude of the translation task, and, thanks to the translation proxy, even a partially translated site can give full user experience on the website visitor’s side.
Data can be extracted with a couple of clicks - and the publication of the translated site is similarly easy.
Where do we fit in the localization workflow
Easyling bridges the gap between the CMS and the translation workflow; it enables you to extract content from the website in a translation-ready format that can be used in any translation environment.
- If you are a content owner, it means that you have a technology solution that enables you to choose any LSP that suits your language requirements the best.
- If you are an LSP, it means that you can take care of the website translation requirements of your clients.
In either case, technology will no longer be the bottleneck.
Features¶
- process HTML, JavaScript/AJAX/JSON, XML (note: translation of Flash is not supported)
- Use the X-Proxy and other Preview mode domains to see everything in context. use Advanced Settings to translate text coming from JSON/XML sources.
- Automatically crawl static pages / HTML content only. Add extra AJAX URLs with the proper parameters.
- Fine-tune your settings to help the crawler decide what URLs to handle as same, and what to visit looking for new content.
- Translate forms, messages and dynamic content
- Translate images: replace them with their localized counterparts on a target-language basis.
- Link any external domain via multiple projects.
- Modify page behavior using customized CSS and JavaScript injection.
- Use regular expressions to filter for content.
and many more!
Whitelabel¶
Easyling offers a white label version that can be customized with your corporate logos and domains to create a branded version, allowing you to use & sell the translation proxy as your own product. In order to create the branded version, seven criteria must be met. See the “Whitelabel setup” section for details.
Pricing¶
Our pricing follows the ‘pay-as-you-go’ model, so you only get charged for what you use. The total cost is made up 2 types of fees: one-time fees and a monthly charge.
One-Time Fees¶
- Discovery: 1 EUR or 1.2 USD / 1000 pages for every Discovery
- Scan: 2 EUR or 2.4 USD / 1000 source words for each new word during a Scan (if no new words are found, a Scan counts as a Discovery)
- Translation Memory: storing your (human or machine) translation in the database costs 10 EUR or 12 USD / 1000 source words / target language
Content extraction and translation memory fees apply only the first time around, which means 102% repetitions are not counted. Once a segment is stored, subsequent scans will treat it as repetition and no additional charges will apply.
Monthly Fees¶
A 1 EUR or 1.2 USD / 1000 page views monthly fee applies to serving your tranlated content over the proxy.
In exchange, you get a guaranteed 99.99% website availability, and a capacity to handle practically unlimited traffic. You also have the option to serve the translated site from your server (in which case no proxy fee will apply), but in this case availability and traffic handling depends on your infrastructure.
Further Information¶
Visit us at https://easyling.com and sign up for a dedicated demo if you are new to translation over the proxy!
See https://easyling.com/pricing for pricing details. Try our Price Calculator, too!
Contact Support: support@easyling.com
Be sure to check out all the other tutorials we’ve compiled:
Tutorials
- https://www.youtube.com/watch?v=S47kArNiJ1o
- https://www.youtube.com/watch?v=8VsBy2bGo64
- https://gitlab.com/easyling/wikis/home
- https://drive.google.com/open?id=0Bw53oZELMrf8V1FIUnhmNEtubTA
- http://lesson101.tutorial.easyling.com/
- http://lesson102.tutorial.easyling.com/
- http://lesson103.tutorial.easyling.com/
- http://lesson105.tutorial.easyling.com/
Release notes
Getting started¶
Let us give you a quick overview of how the proxy is used. In this section, we give you a quick introduction to the Dashboard and we’ll also set up a simple project.
Registration & Login¶
To use Easyling, you need to register and set up an account for the service at https://app.easyling.com. You can start using the service right away after registration.
Logging in, you will be taken to the Dashboard, which is the project management center, every detail of which we’ll get to in later sections of this manual. There will be no existing projects at the outset, so let’s try setting one up.
Setting up Your First Project¶
To do this, click on Create new project dropdown at the top and choose Add project.
This opens the Add project dialog box, where you can enter the URL of the website you would like to translate, and also select the website language; this sets the source language of the translation project. Click on Advanced Settings to access extended functionality, such as custom SRX files.
Add a Target Lanuage¶
You will also need to add your target language(s), so use the option on the Dashboard to add then to the project. It’s not just that there is not much to do in terms of translation without a target: many crucial features (including the Preview proxy and the entire Workbench) are entirely unavailable as long as no target languages are set.
You can add any number of target languages. You can use the search function to lookup languages based on locale code or country name.
Running a Discovery¶
The next step is to figure out what (and how much of it) to translate and do it from a single target URL. For this, a Discovery has to be run on the site.
Discovering a website means running a crawler
on it and allowing the proxy to ‘explore’ it in its entirety in order to provide Statistics for you. As you can see in the ‘Add project’ dialog window, the Dashboard is set up to automatically start a Discovery on a webpage after creating a project - but don’t worry! After clicking on the Add project button, a new dialog will open where you get to set up additional details of the Discovery before really starting the process.
There are many details that potentially have to be taken into account when setting up a truly effective Discovery, but let’s set those aside for the moment. For now, simply Click on the ‘Add project’ button and after the Discovery dialog opens, click on Discover to start a crawl on the website. The default settings are safe.
Depending on the size of the site, a Discovery can take quite some time to finish. A spinner on the Discovery page will indicate that the system is currently working, but there is a default page limit of 100, which means that if the Discovery finds more than one hundred pages, it will automatically exit, allowing you to fine-tune your settings.
After the process is over, you’ll receive an e-mail about the results. You also have your first Statistics from the Discovery - a word count total from all Discovered pages.
Giving a Quote¶
You can use the results of the Discovery to give a quote (based on unique source words) to your clients about the estimated work-hours and expenses you forecast for a given project.
The results are an accurate indication of the translation costs associated with the text. However, with websites, it is prudent to consider other (techincal) details before taking the word count results of the Discovery process at face value.
Investigate the source site and consider the following:
- Is there a great deal of dynamic content?
- Any Site Search functionality? A webshop? A forum?
- Any other web apps that would have to be localized?
- Do the average word lengths of the source and target languages differ significantly?
- Is the direction of the target language different than that of the source language?
- Which pages are targeted for translation? Which pages need to be excluded? Ask the client if they have a specific page list.
- does the site have mixed-language content? If yes, ask the client to specify the source language(s) they need translated.
- is there an extant Translation Memory that could be used?
- is there any region-specific content? Does the site use geolocation?
- is there any content behind a secure login?
- are there any subdomains? example.com and blog.example.com require two separate projects that need to be linked.
- are there any other special requirements?
- is there any Javascript-generated content?
If you answered yes to any of those questions, that will require some deliberation, often beyond the primary focus of translators: UI fixes and a measure of fiddling with what’s under the hood - take those expenses into account when you make your quote.
NOTE!: If you are unsure as to how to go about translating a part of a website, feel free to contact our Support Centre and we’ll help you get an accurate picture of the required effort and costs.
It is also advised to negotiate the expected workflow with the client at the quoting phase. The translation of a website is, in most cases, a never-ending process, as new content is added to the original site at certain intervals.
The question is, how content added after the initial quote should be treated - both from a technological and fiscal viewpoint. It is a good idea to ask the client about their intentions for update cycles and new content.
Do they wish to publish at the same time in all languages? Or publish on the original site without delay, and then publish the translations later, as they become available? The different options will require different approaches when you get to the maintenance phase of the project.
As a translation proxy is practically a translation layer on top of the original site, serving translations from the datastore by replacing original content on-the-fly, new content will not be replaced, as translation is not stored for that. In practice it means that newly added content will appear in the original language on the translated site. This is called bleedthrough. There are 2 approaches to this phenomenon: let bleedthrough happen, to make new content available right away, even if it is in a different language, or block new content from appearing until translation is done. Both have their clear advantages and drawbacks, so you have to discuss with your client which option is more acceptable for them - and set up your project accordingly.
Sales tool for mass production¶
Easyling also offers a Sales Tool to help LSPs and freelancers in growing their business.
If you have a well-defined group of potential customers you’d like to offer your translation services to, like hotels or restaurants with only monolingual websites in your area, the translation proxy makes it easy for you to impress the business owners. Just collect the URL addresses, add them to the Sales Tool, and the translation proxy will automatically create a project for all webpages according to the settings you specify. Once the translation and post-editing of the translated main pages are ready, you can send a link to the business owners. If your potential clients are impressed with the translated page and the fact that no IT involvement is required on their end, you have a better chance to win the deal.
On the Workbench¶
You can export all source segments, translate them in your CAT tool of choice and then reimport your results. But going through that cycle for every small change would get rapidly tedious - wouldn’t it be great if you could edit & control your translations in the cloud, where it would all update in real time? You’re looking for the Workbench.
In Pages View, you can hover over any page - a menu will show up right next to it - choose ‘Translation in List View’ and you’ll be taken to the Workbench in a different tab.
If the Dashboard is the Project Management Center in Easyling, then the Workbench is the cloud CAT tool, where translation itself takes place. There are many features you can use in the Workbench to make working with websites easier - see the ‘Workbench’ section of this manual for the details.
The 3-Phase Workflow¶
Barring some detail (withheld for the sake of a convenient introduction) the above process is all that you need to get a website translation project going.
Our idea of a project’s lifetime can be summarized in the 3-phase Workflow.
1. Discover & Quote¶
Set up a project and Discover it. Have a Unique Word Count total and a general idea of any technical issues involved. Give your quote to the client (perhaps demo/impress them via the Live Preview). Win the bid.
2. Ingest & Translate¶
After you are entrusted with the project, collect all text content into a database (overcoming any technical issues that may arise in the process). When you have your data, export it to your CAT tool of choice or translate in the Workbench to a selection of target languages. Reimport and edit. Use our Proofreading and Workflow features to ensure quality.
3. Publish & Maintain¶
After the translation is greenlit by the proofreaders, you can verify the serving domain and publish the translated website. Add a language selector to the source site. Generally, it is with publishing a website that a deadline is met.
But don’t forget that a website is a living thing, with new content arriving every day - the final stage of website localization is always maintenance - making sure that new content gets translated according to schedule, all the while ensuring that visitors to the site will not be inconvenienced by bleedthrough of untranslated content.
Maintenance is the “long tail” of website translation - there are a variety of features in the proxy that make it a lot easier than it would be anyway else.
In the following pages, You will find everything there is to know about using the proxy. Keep reading!
The Dashboard¶
Introduction¶
The Dashboard is your command center. It contains a variety of features you can use to manage your projects. In this manual, we’ll take these options in the order that they appear in the menu on the left side of the screen.
When you open the Project Dashboard for the first time, the screen will display a few general settings described below.
Project Alias¶
This alternative name will be displayed in the Project dropdown below the URL, for easy identification of your projects. Project aliases are project-internal, they will not be displayed anywhere on the translated site.
Website Address¶
Exactly what it says on the tin, the website address is a property of your project that cannot be altered once declared during project creation.
There is one exception to this rule: by default, the proxy will follow redirects from the initial URL and will create the project for the address it is redirected to.
Alternative Domains¶
It happens sometimes that a website serves content both on the www
subdomain and the naked domain, such as example.com
. In these cases, it is useful to set things up over the proxy so that the different URLs are handled similarly.
After creating a project, this field is automatically filled with the complement of the Website Address. Add any further subdomains that contain identical content to this list. Separate them with commas.
Basic Authentication¶
Not to be confused with the project Access Control features of the proxy, the Basic Authentication username and password fields can be used for automatic authentication on the project website. Basic Authentication windows typically look like this:
If a username and password is provided on the Dashboard, the proxy will rely on this information from then on. Use this option to enable Discoveries and Scans to work properly on these sites. The various Preview proxy modes will also rely on this authentication info to get past the login screen automatically.
Project workflow¶
Change the number of project participants and project workflow type using this dropdown. See Collaboration.
Staging Domain¶
Although it is true that the project address cannot be changed after the project is created, the Staging Domain feature can still be used to change the origin server to which requests are sent.
For details, please see the Cookbook recipe on Staging domains
Language Selector¶
Select one or more from the available target languages to translate source language content.
Most useful translation facilities, such as the Workbench or the various instant preview features remain unavailable as long as there is no target language on a project.
You can remove or add target languages at any time. Note, however, that it is not recommended that you change the source language on a project that is published with translations.
Cookbook¶
Whitelabel Setup¶
Note: This article will use variable values, that will change for everyone. These variables are written here in the UNIX style of${VARIABLE_NAME}
. When providing us information, simply replace the entire construct (not just the name itself) with the data in mind. There is also “global” variable to keep in mind:${APP_DOMAIN}
refers to the domain chosen in point #3 to serve the translation proxy on.
- A €200 EUR topup / month (recurring): the translation proxy under your own brand name is special service, offered to customers who cater not only to one or two clients, but put their weight behind the punch and open up whole new markets with our proxy solution.
- A one-time setup fee of €200.
- A custom domain name: you will need a place to serve the translation proxy (as
well as any previews) on. Generally, our clients settle on
app.${yourdomain}.com
, but we can use practically anything that comes to your mind - the only limitation is that we are unable to serve the proxy on a naked domain (for instance,yourproxy.com
). Just keep in mind that once you settle on something, and we set up your branded Easyling, it becomes fixed, so your decision is final. - Two logos: one goes on the Dashboard, the other goes on the Workbench. They should be transparent PNGs, ideally, but we can use other file formats as well. However, their dimensions are fixed: the Dashboard logo is set at 200x62px, while the Workbench logo needs to be 109x44px.
- An SSL certificate: the translation proxy uses encrypted channels
to communicate on, and for that, we require an SSL certificate to
be made out for the domain name of your choice, and any subdomains
it may have - the translation proxy uses your “app domain” to serve
previews until they’re published, so your certificate must be a
so-called “wildcard certificate”. This is a type of SSL certificate
that is valid not only for
app.yourdomain.com
, but also*.app.yourdomain.com
, needed in order to ensure SSL coverage of all project-specific preview subdomains (the names of which are created by combining an arbitrary locale code and the randomly generated project code). Certificate issuers are likely to request a Certificate Signing Request (CSR) for the certificate, which we will have to provide.
In order to generate a CSR for you, you’ll need to provide a few pieces of data related to your company, which need to be incorporated into the certificate. Please provide the following by replacing the fields (this should look fairly familiar to your IT department) with the appropriate data.
1. `countryName_default = ${COUNTRY}`
2. `localityName_default = ${CITY}`
3. `streetAddress_default = ${ADDRESS}`
4. `postalCode_default = ${ZIP}`
5. `0.organizationName_default = ${COMPANY_NAME}`
6. `organizationalUnitName_default = ${ORG_UNIT}`
- The final step is to configure your DNS servers; and if you use
Google Apps for
yourdomain.com
, the setup process will require someone with Google Apps admin rights as well. You will need to add the following CNAME records to enable the translation proxy on your domain:${APPENGINE_KEY}.${APP_DOMAIN}
CNAME${APPENGINE_HASH}
- these values will be provided to you.${APP_DOMAIN}
CNAMEghs.domainverify.net
*.${APP_DOMAIN}
CNAMEghs.domainverify.net
- In order for the translation proxy to be able to send emails from under your
domain, you will need to provide authorization to the email
service. This is done by adding specialized DNS records.
mandrill._domainkey.${APP_DOMAIN}
TXTv=DKIM1; k=rsa; p=MIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQCrLHiExVd55zd/IQ/J/mRwSRMAocV/hMB3jXwaHH36d9NaVynQFYV8NaWi69c1veUtRzGt7yAioXqLj7Z4TeEUoOLgrKsn8YnckGs9i3B3tVFB+Ch/4mPhXWiNfNdynHWBcPcbJ8kjEQ2U8y78dHZj1YeRXXVvWob2OaKynO8/lQIDAQAB;
${SELECTOR}._domainkey.${APP_DOMAIN}
TXT${DKIM_KEY}
- these values will be provided to you as they are domain-dependent.v=spf1 include:spf.mandrillapp.com include:sparkpostmail.com ?all
; or if you’re already using an SPF record, addinclude:spf.mandrillapp.com include:sparkpostmail.com
just before the last operator.
- Finally, if you want, you can specify the following information to
customize the white label experience (this is completely voluntary
and can be changed at any time):
- name: Name of your branded product
- greeter: Person signing the greeting emails for new users
- team: Team name
- greeter address: Email address of the person sending the greeting emails
- greeter display name: Name of the person sending the greeting emails
- noreply address: Email address used for automated emails
- noreply name: Display name used for automated emails
- quote and wordcount signer: Person signing the quote and word count emails
Once all seven main points are settled, and we have the logos and the SSL certificate with us, we’ll set up the white label version for you, and you’ll be ready to start cracking your target market wide open.
JS/JSON/XML Translation¶
In this section, we describe the process of translating content in JavaScript files and dynamic (JSON or XML) responses.
General¶
Translation of HTML is mostly automatized over the proxy. But websites rely on many additional resources besides the document itself, such as JS libraries, CSS stylesheets, webfonts, dynamic requests and images. Not all of these resource types require translation, but JSON and XML responses frequently do. Such responses can also be very inhospitable to the translator and proxy specialist.
One of the problems is detection: proxy crawls/analyses do not operate in a browser-like environment. There is no headless browser or VM running in which a page load could be initiated or JavaScript evaluated for content detection purposes. Inherent complexity is another issue: the enormous diversity (to put it charitably) of web technologies in use nowadays prevents reliable automation of such a process.
But, though JS/dynamic content can slip under the radar at first, the proxy can easily translate it with some help.
Finding Content¶
An early investigation will reveal content that is unavailable to the crawler by default, and will save you trouble (of having to deal with untranslated content as late as in the review phase, for instance).
X-proxy¶
To check how a website is doing over the proxy, open it in the X-proxy mode, a specialized Preview mode available through the Dashboard page list. Click on Preview in the hover menu while holding Ctrl
to open it.
Note that you need to add at least one target language and select it in the left side menu to access the preview.
The X-proxy replaces text it recognizes with x-es. Though not too impressive visually, it is an excellent research tool. It lets you home in on undetected content. To utilize it most effectively, combine it with your browser’s DevTool.
Major browsers such as Firefox and Chrome allow you to do full-text search on (almost) all resources/requests used during and after page load. In Chrome, for example, you can press Ctrl
+ Shift
+ F
to start a full text search in the DevTool.
The following screenshot demonstrates how the X-proxy can make untranslated/undetected content obvious:
Having removed all known text from the equation, you are free to concentrate on the “untranslated” parts. Findings will naturally be site-specific, but there are some familiar, recurring scenarios:
- content is in
<script>
tags: this is the simple case, as there isn’t even a distinct resource to mark as translatable.text/html
pages are translated by default, and JS content in them can be annotated right away. - content is a string in a JS file: aside from the necessary annotation process, you’ll also need to ensure that the resource in question is marked as translatable.
- content is requested dynamically: dynamic content can be tough to dig up. Many DevTools don’t support full text search in XHR response bodies. If content is in plain sight on a webpage but the DevTool is not reporting any of it, then the content could have arrived via an XHR request. Check the “XHR” section in the Network tab after a reload. Aside from the inconvenience of locating them, dynamic request endpoints can be annotated and marked in the same way as JS files.
- content is on an external domain: this scenario requires some work. External domains require separate, but linked projects (add to this that you also have to ensure that URL references are mapped well, which can be difficult in a JS file), and the resources have to be marked and annotated on those projects to be translated.
- content is in an image: though not strictly connected with the topic of JS translation, an “honorable” mention goes to natural language content as image data, also frequently revealed by the x-proxy.
There are many ingenious ways in which webpages encode content, and the proxy has various levels of support for all these schemes (usually involving a combination of features). When in doubt, feel free to contact support for advice!
Marking Resources¶
You can mark a resource as translatable manually on the Resource screen or using a prefix in Advanced Settings.
Manual¶
All collected resources are listed in Discovery > Resources and Content > Resources. All “pages” and files that are not of the text/html
content type will go here.
What sets resources apart from pages is that by default, they have no associated source entries or translations. Marking a resource as translatable means declaring that it does have translatable text that can be stored as source entries (and accessed via the Workbench).
So, to mark a resource as translatable:
- switch from thumbnail view to list view
- click on the “Translate” button in the hover menu of a resource
The resource is moved to the page list and from that point onward, it can be opened on the Workbench. Note, however, that we have not yet told the proxy what and how to translate in it.
Prefix¶
You can do the marking via prefixes. Go to Advanced settings to find the “Mark Multiple Resources as Translatable” text field. Copy & paste the prefixes of your choice and click on Save to apply your changes.
Note that in the screenshot above, HTTP and HTTPS prefixes are handled separately, a recommended practice for sites that support both kinds of traffic. Prefixes are treated as simple strings by the proxy when matching them against a pathname. You are free to add as many of them as you like.
This feature is made available because cherry-picking resources for translation is not always feasible. For instance, versioned URLs are liable to create new resources on a project whenever a file is updated on the original site (the proxy keeps these URLs separate by default), but the new resources are not marked automatically.
You will recognize those cases where you want to apply the exact same translation rules and process to a set of URLs that differ in minimal ways. A resource prefix will let you do this without having to mark things one-by-one as they come.
Annotating JS/JSON¶
Picking up JSON/JS/XML content wholesale would be both costly and unwieldy. When you have successfully identified the source of content and earmarked it for translation, the last major task is to annotate those parts of it that really want to translate. JS/JSON paths and Xpaths can be used for this purpose.
JS/JSON paths¶
Go to Advanced settings > JavaScript translation options, and click on the “JSON Path tester tool” link right below text field to open the tester dialog. It looks like this:
We’ll use the following JavaScript snippet in the remainder of this section. It illustrates many use cases for JS translation:
(function () {
var exampleVar = "Hello World!";
var exampleUrl = "https://shadowcat.skawa.hu";
var exampleHtmlString = "<p>Hello World!</p>";
var exampleObject = {
"sentence01": "Hello World!",
"sentence02": "Hello Again!",
"nestedObject": {
"sentence03": "Hello World!",
"sentence04": "Hello Again!"
},
"exampleArray": [{ "value": "foo" },
{ "value": "bar" },
{ "value": "baz" }],
"exampleNestedJS": "var nestedVar = { nestedKey: \"Nested sentence\"}",
"exampleNestedHTMLinJS": "var nestedHTML = \"<p>Hello world!</p>\""
};
})();
You can copy & paste code into the upper field (or fetch the entire file via the field & button on top if you have the URL) and click on “Analyze script”. The file/text will be requested/sent for analysis in the cloud, when it’s finished, you should get a highlighted representation of the same code in the dialog.
Click on any of the blue icons to generate a JS path for the string in question. If you generate paths for all available strings in the example , the list of paths in the upper text field should look like this:
"%"."exampleVar"
"%"."exampleUrl"
"%"."exampleHtmlString"
"%"."exampleObject"."sentence01"
"%"."exampleObject"."sentence02"
"%"."exampleObject"."nestedObject"."sentence03"
"%"."exampleObject"."nestedObject"."sentence04"
"%"."exampleObject"."exampleArray".0."value"
"%"."exampleObject"."exampleArray".1."value"
"%"."exampleObject"."exampleArray".2."value"
"%"."exampleObject"."exampleNestedJS"
"%"."exampleObject"."exampleNestedHTMLinJS"
Some of these paths require adjustment before they’ll behave correctly.
Supported strings are highlighted in red, and those that are already covered by a listed JS path are be highlighted green. Your results should look
When you have all the JS paths you need, copy & paste them into the main JS translation text field in Advanced settings. Click on “Save” to apply your changes.
Translatable elements are specified by a dot-separated list of words, each optionally double quoted and constituting either a.) a valid JS variable/JSON key name or b.) a token specifying one or more hierarchical levels (anonymous function, array index or globbing mark).
var exampleVar = "Hello World!";
The simplest possible case would be "exampleVar"
to mark the value of the top-level element exampleVar
as translatable. Anonymous function calls are denoted with "%"
, and since the entire block of variables is wrapped by an anonymous function (function () { ... })()
, this leading ampersand shows up in each case. Paths for dynamic JSON responses should be prefixed with "json"
.
Use an asterisk (or Kleene-star) to collapse a single hierarchical level. E.g., the value of"exampleArray"
is an array of objects. To include every index in the array, you can roll three rules into one:
"%"."exampleObject"."exampleArray".*."value"
Double asterisks are even more inclusive: they recursively glob all child nodes. Exact specification can be restarted by following **
with a double-quoted form. That is, the rule
`%`.**."value"
marks any variable or property called value
it finds at any hierarchical level within an anonymous function call. If a JS path ends with the **
, then the entire subtree is marked as translatable. Incautious use of this construct is not recommended.
Nodes are processed as plain text by default, but you can enable specific processing modes with whitespace-separated postfixes. The available processing modes are url
, html
and javascript
.
Variables can contain either the project URL or some other important location (such as that of a linked project) that you would prefer to have remapped over the proxy. Don’t give in to the temptation to localize URLs in JS as plain text! Instead, use the url
postfix to map them:
"%"."exampleUrl" url
exampleHtmlString
demonstrates the fact that JS variables frequently hold markup (for better or worse). The html
postfix lets you process these strings as HTML.
"%"."exampleHtmlString" html[@process]
The screenshot above demonstrates the difference HTML-processing makes. Picking up HTML-markup explicitly as text is generally considered error-prone and disadvantageous from a localization viewpoint, and isn’t recommended.
[@process]
is optional. By adding it, you instruct the proxy to apply the translation-invariable regular expressions currently set on the project.
Although JS paths are mostly specified in a single line, the javascript
postfix bends this rule. It tells the proxy to apply the rule in the next line to the value of the postfixed JSON path. One level of nesting is supported. It is rarely needed, but invaluable when it is called for.
Plain text:
"%"."exampleObject"."exampleNestedJS" javascript
"%"."exampleObject"."exampleNestedJS"."nestedVar"."nestedKey"
HTML:
"%"."exampleObject"."exampleNestedHTMLinJS" javascript
"%"."exampleObject"."exampleNestedHTMLinJS"."nestedHTML" html
Note that the JSON Path tester tool is not equipped to display the nested use case.
Xpaths¶
Xpaths for XML AJAX responses in a similar way as JS/JSON paths do for their respective content types. For example, /ajax-response/component[1]/text() html
assumes that the first <component>
node contains translatable HTML markup.
Due to space constraints, we decline to reproduce a full Xpath tutorial in this documentation, and direct the reader’s attention to the many tutorial resources available online. The W3Schools summary of Xpath syntax serves as a good starting point.
Limitations¶
A simple content extraction crawl takes care of JS content in the source of an HTML document. But in many cases, such content is requested as part of a page load or via user action. The same limitation still applies, however: the page is not available to the crawler in a way that would allow for such “interactive” requests to start.
The solution is to extract content via Preview. Open the page in Preview mode and go through the required user actions to trigger all necessary events. The proxy will take care of extracting content from the affected dynamic responses (provided that your setup is correct).
Note the influence of TM Freeze on this approach: you need to disable it temporarily for Preview-based content extraction to work.
JS translation cannot be used with values such as var string = "You have " + getCartItems().length + " items in your basked"
). In these cases, you either have to forego translation or change the content so that no computed expression is present among the concatenated elements.
This implies that those instances of string concatenation where no token of the expression is computable are supported. This is indeed so, and provided that the appropriate tweak is enabled in Advanced settings, the proxy can perform string concatenation upfront and handle the resulting string as a whole.
Resource Translation¶
Resources are binary content found on the sites, suchg as images, PDFs, CSS and JS files, etc. Please note that the content of these resources is not extracted for translation, so you have to translate / edit them separately.
Replacing Images with Localized Versions¶
You have the option to replace images, downloadable files and other resources with their localized version.
- Please make sure that you see the appropriate target language selected on the left side menu.
- Navigate to the original image in the Resources view of the Content (or the Discovery) section.
- Hover over the thumbnail image. A green ‘+’ icon should appear.
- Click this icon. It marks the image as ‘localizable’, i.e. a candidate for replacement.
- Select the replacement image and upload it. It will immediately replace the original image, and the new image will show up on te translated site.
- Check the Preview to see if the image was replaced properly.
- Repeat these steps for all images and all target languages that need localized versions.
Please note that you can replace only 1 resource, for 1 target language at one time.
Extraction Issues with data-image
Attribute¶
Images might be inside a “data-image” attribute in the source code. In
such cases you have to set the “data-image” attribute up to be
translated as a link in the Advanced Settings screen. The <img src>
attributes are recognized by default as URLs, while data-* attributes
are not, as they can contain any sort of data, and must be configured
manually to tailor the behavior to the current project’s needs.
The image might also be drawn from a subdomain. In this case you have to create a second project for the subdomain, mutually link the two projects together, and add the resource in the subdomain’s project. In this case you have to publish the two projects together, in order to preserve the mapping.
HubSpot Forms¶
The proxy supports translation of HubSpot (or similar) forms via a combination of project linking and JS translation.
Method #1 marshals a combination of advanced proxy features. It is entirely hands-off from the site maintainer’s perspective, no change on the original server is necessary (which is a rather frequent constraint).
Method #2 relies on injected JS and HubSpot to provide separate, localized forms for each target language. Compared to #1, it is a clean and simple approach.
Method #1: Proxy¶
The proxy approach traces the structure of the main and form domains via linked projects. Affected JS resources/endpoints are overridden and the responses marked as translatable.
Project Creation & Setup¶
HubSpot uses several external domains to drive a form. You will see https://js.hsforms.net/forms/v2.js referenced in the page source. This file itself references https://forms.hubspot.com, which is where the translatable form contents are coming from.
Domains used by a HubSpot form are related to the main project and each other in the following manner:
Assuming that example.com
is already set up, at most two additional projects are required:
js.hsforms.com
: creation of this project is optional, though not complicated. At the time of writing, visiting the landing page results in a403 Forbidden
page (but this is not a problem).forms.hubspot.com
: this URL redirects you tohttps://developers.hubspot.com/
. To create a project for it, disable Redirect checking in the Add Project dialog. Click on Advanced settings to reveal the option and uncheck the checkbox:
Don’t forget to add every target language of the main project to each project you create.
Link Projects¶
Open each project in a separate tab and link each project according to the section on Project Linking. The result should be a chain of projects leading from example.com
to forms.hubspot.com
with js.hsforms.net
as an intermediary.
Alternative: Search & Replace¶
The js.hsforms.net
project is not, strictly speaking, necessary. Its true purpose is merely to expose a slightly modified version of the /forms/v2.js
script. If its URL is referred to in a way that makes it possible, you can sidestep the domain using a combination of Search & Replace and a page content override. The setup steps for this are as follows (done on the main project):
- create a path override for the exact URL where the form is present (the diagram above shows
/contact
as an illustration). - add a Search & Replace rule: replace
https?://js.hsforms.net
(a regex matching both HTTP and HTTPS versions of the same URL) with the empty string. This turns the reference to/forms/v2.js
into a relative URL, pointing it toward a page content override that is to be created in a moment).
Overriding v2.js
¶
This resource contains a crucial variable called urlRoot
, which has to be remappable over the proxy. However, it is set via a computed expression, which is unsupported by the proxy for reasons discussed in the section on JS translation, so an override and a small change is unavoidable (regardless of the presence/absence of the intermediate project). Follow the steps below to create the override:
- visit
https://js.hsforms.net/forms/v2.js
and copy & paste the contents of the JS file. - use the DevTool or an online pretty printer before pasting the code. Though optional, it is highly recommended that you do this (such minified code is cumbersome to work with as it is).
- create a PCO for the
/forms/v2.js
pathname in Page modifiers > Content Override. The response code default is 200, and theContent-Type
header isapplication/javascript; charset=utf-8
. We’ll return toCache-Control
andPragma
later, after setup is complete. - Add the following line to the top of the PCO:
var HUBSPOT_URL_ROOT = "https://forms.hubspot.com";
- Search for
this.urlRoot
. It is set in a line similar to the one below:
o ? this.urlRoot = "https://f.hsforms" + e + ".net/fallback" : null != a ? this.urlRoot = "" + a : this.urlRoot = "https://forms.hubspot" + e + ".com";
- Add the following line after it to make it use the “accessible” value:
this.urlRoot = HUBSPOT_URL_ROOT
- Use the Mark multiple resources as Translatable text field in Advanced settings. Simply add the pathname prefix of the PCO to the list:
/forms/v2.js
- URL Translation & HTTP/HTTPS Finally, add the following JS path to the list of translatable paths in Advanced settings:
"HUBSPOT_URL_ROOT" url
Open the PCO link over any one of the proxy preview domains to test it. If all projects are correctly linked and you followed the setup steps correctly, the HUBSPOT_URL_ROOT
variable will hold an appropriate proxy-mapped domain (and consequently this.urlRoot
will be set to the same value).
Form Contents¶
Set up the HubSpot content endpoint as translatable on the project for forms.hubspot.com
according to the JS translation section. In summary:
- locate the form request using the DevTool and add it to the “Mark multiple resources as translatable” list of prefixes. For any given HubSpot form, translatable content will usually be associated with a prefix similar to the one below (it will also have a
callback
query parameter).
/embed/v3/form/{numericId}/{formId}
- use the JSON path tester tool in Advanced settings to process the response. HubSpot forms come in a response format called JSONP or padded JSON (a function call such as
hs_request_0
with the form data passed to it as argument). It is not necessary to prefix JS paths with"json"
in this case. - use the x-proxy to test your JS paths, and use Preview for content extraction.
Publishing & Caching¶
All projects need to be published together in all target languages. Note that you don’t need to publish on a subdomain of the original server: you are free to proxy the German version of forms.hubspot.com
through hs-de.mydomain.com
, for example.
Once setup, translation and publishing is complete, you are free to set an appropriate Cache Header on your page content overrides (either on the PCO itself or on a prefix-basis) to reduce page request costs.
Method #2: HubSpot & Injected JS¶
You can rely on HubSpot to localize the form property names after cloning your form for each target language, and rely on a little JavaScript zo drive the forms on the client-side for each target language. This approach is cleaner than the one described above, as long as you don’t mind having a separate form for each target language.
The proxy sets the lang
attribute of the <html>
tag to the appropriate locale code on each target language domain, which you can use for branching. The code below demonstrates one example of how such code could look in practice:
var lang = document.querySelector("html").getAttribute("lang");
var HSFormId = {
"en-US": "English9-bb4c-45b4-8e32-21cdeaa3a7f0",
"fr-FR": "Frenche9-bb4c-45b4-8e32-21cdeaa3a7f0",
"de-DE": "Germane9-bb4c-45b4-8e32-21cdeaa3a7f0"
};
// refer to the HubSpot documentation for further customization at
// https://developers.hubspot.com/docs/methods/forms/advanced_form_options
hbspt.forms.create({
portalId: "portalId",
formId: HubSpotFormId[lang]
})
Page requests, CDNs and Caching¶
The number of page requests sent when a page loads predicts monthly costs on a project. The goal of this page is to provide an in-depth description (and an upfront summary) of the matter and offer suggestions for cost reduction.
Summary¶
Page requests (in the strict technical sense of an HTTP request) are a primary concern over the proxy as they form the basis of monthly costs. Since each HTTP request that hits the proxy is billed, it is useful to consider various ways of reducing the number of hits.
The three methods discussed here, URL remap prevention, use of CDNs and public caching have the potential to dramatically reduce monthly costs (and can be combined according to need).
Page Views vs. HTTP requests¶
The difference between a page view as generally seen and a page request as understood as part of pricing is crucial to keep in mind.
Views¶
Website owners/maintainers usually focus on tracking the number of page views (via Google Analytics, for example). This is a useful metric to gain insight about the visitors to a site and make various predictions/business decisions based on user traffic.
Such analytics count end user visits, where we generally expect one additional page view to show up in our Google Analytics View each time a user loads a page on the site.
Requests from bots of search engines and request where Javascript is not run (e.g. a JS-disabled browser, HTTtrack
or command-line tools such as cURL
and wget
) do not result in a “visit”.
The proxy pricing system does not refer to this understanding of a page request.
HTTP Requests¶
Refreshing the page with a devtool’s Network tab openend, or using the Content Breakdown section of www.webpagetest.org for a page shows you that a modern webpage heavily relies on multiple resources (HTML, CSS, JS files, images, fonts etc.) over several domains to construct the unified whole shown to the user. By page requests, then, we mean the number of distinct HTTP requests for such resources.
It is this sense that the pricing system uses the term. The proxy is a technical solution to process and translate HTTP requests between the visitor and the original site, and any HTTP request that has to be relayed between the user and the server is counted as 1 page request.
This is regardless of the type of content: whether HTML, an XHR/AJAX request or a static resource, it will be counted as a page request if the proxy has to process it.
It is easy to see now why understanding the number of requests going into a page load is important – they act as a multiplier on the number of page visits and become an important predictor of monthly project costs.
The ideal case, of course, is 1 page request per 1 user visit (meaning that only the HTML document has to be translated and served). Although this ideal case might not be attainable in all cases, it very often is.
See the next section for a suggested manual approach of evaluating a page load.
In this section, we’ll go over the various ways you can reduce the number of requests and consequently, project costs.
Possible approaches¶
Optimization means preventing HTTP requests from reaching the proxy. In practice, there are three general approaches:
- prevent URL remapping for non-localized resources
- ensure that “auxiliary” content such as CSS and JS files and images is served from a CDN
- caching intra-domain resources
Remapping¶
The proxy billing system is only concerned with those requests that are forced to go through it, so the simplest way of preventing an HTTP request from going through the proxy is to prevent it from being re-mapped from the original site to the translated domain entirely.
If www.example.com is being translated into German and published on de.example.com, any URLs in the source that point to the original server (that is, are intra-domain) will be remapped to refer to the translated domain over the proxy.
But useful exceptions should be made. For example, images that are not localized do not get mapped by default, which means that the image will be downloaded from the original www.example.com site instead of the TL domain, naturally preventing an HTTP request from going through the proxy. Altough a tweak exists in Advanced settings to force images through the proxy, you should consider the cost implications of this tweak before turning it on (and look into the subsection on caching to offset increased costs).
The __ptNoRemap
HTML class is handled specially by the proxy. If this class name is detected, the href
or src
attribute of the given element will not be mapped to the proxy (avoiding the request cost).
For example, on a project for www.example.com
, an achor tag such as
<script src="https://www.example.com/client.min.js"></script>
would be remapped by default, and it would look like this if published on de.example.com
:
<script src="https://de.example.com/client.min.js"></script>
The __ptNoRemap
class disables this default action, so the script src
is not remapped even if the page is opened over the proxy:
<script src="https://www.example.com/client.min.js" class="__ptNoRemap"></script>
Used in a systematic manner, this change has to be applied on the origin server, which is a potential downside if you don’t have source/admin access. It bears mentioning that the class is reported to have solved one-off problems by being search & replaced into targeted spots in the page source.
CDNs¶
The simplest way to prevent an HTTP request to the project domain is to offload it from the original server, too. Content Delivery Networks are servers that are capable of providing source (non-localized) content in a reliable way across the globe.
Public Caching¶
(Note: not to be confused with Source and Target caches!)
Many static resources need no processing whatsoever by the translation proxy, and become cost overhead if funneled through it. But it is also often the case that it impossible to avoid having these request go through. What to do in these cases? Enter public caching.
HTTP supports what is called a Cache-Control
header that can be added by the server to a response. The content of this header instructs public caching nodes on the network on how to store a static copy of the resource for a time.
A cache (usually affecting a given geographical area or specific network pathway) will serve the resource to visitors until the “Time-To-Live” of the cached resource expires. This TTL is defined by the value of max-age
in the Cache-Control
header, and its current value is tracked in the Age
header.
Until such time as this time is up, requests coming in from a place served by the cache will not, in fact, reach the proxy. After max-age
expires, the cache will re-request the original for another max-age
term - during that time, however, a cache can serve a multitude of requests without having to burden the original server (& proxy for it).
Declaring a max-age
of 86,400 on the image /about-us/logo.jpg
, for example, broadcasts on the network that for the duration of one day, any public caching node should feel free to cache the resource – it’s up to them, but if possible, they should not re-request it until then.
This way, the caching/serving/trafficking burden evens out on the network, and many of the repeating requests can be avoided (i.e. an intently browsing user might load the same resource over and over again, but most of those requests will be served from a cache).
Keep the following important points in mind:
- Caching naturally introduces update delays. A user getting cached content will have to wait for it expire before seeing a new version.
- Frequently updated HTML pages (such as those with a newsstream) should not be targeted for caching, and URLs serving dynamic resources should never be cached!
- Public caching works best with static resources (JS, CSS and images) and versioned URLs.
- It is impossible to reach a cached resource coming in from the server side. Neither the original site, nor the proxy can tell a node to throw a cached version away. Only expiry will do that. This is architectural, the HTTP protocol does not provide a method to send cache invalidation notifications to arbitrary nodes. The upshot is that while it might seem sensible (from a cost-reduction perspective) to add as huge
max-age
values as possible, this is highly recommended against. Unless the URL of the given resource is versioned to allow for updates anytime, users may end up “walled out” by a cache storing an image for weeks, for example.
A measure of carefulness is advised!
If you find that very sensible defaults are coming in from the original server, that is very good news, but it is also a fact that such is not always the case.
To adjust Cache-Control
on the proxy-side, go to Dashboard > Path settings to override headers on a URL or prefix-basis. See the Path settings documentation to learn more about overriding/fine-tuning Cache-Control headers.
References¶
See HTTP caching in Google’s Web Fundamentals on public caching.
See RFC 7234 for the full technical detail on HTTP Cache-Control
.
Page Request Evaluation¶
In this recipe, we go over one possible method of evaluating a page for the number of HTTP requests that it sends to the project domain when it loads.
Opening the DevTool¶
Investigation of the number of requests can be done on the original site. With the site open and selected in a tab, press F12 to open the DevTool for that specific URL. By default, the DevTool will be docked within the browser screen, but you can also unlock it into a separate window.
1. Click on the Network tab.
2. Select "All" to track all requests.
3. Enable "Use large request rows" (to the right of "View" in the toolbar).
4. You can leave the "Hide data URLs" option on
At this point, since you have opened the DevTool after the site has already loaded, the Network tab will be completely empty.
5. Refresh the site to start logging requests.
You will see a flurry of activity as the site reloads. After it has finished, you can start analyzing the various requests in the list.
Filter Requests on the same Domain¶
6. Enable "Regex" for filtering
You are free to ignore all requests that go to external domains that will not be part of any linked projects. If you have enabled the “large request rows” option, the DevTool will helpfully list the paths below the resource names.
The DevTool provides detailed information, but not all of it is needed in this case. As soon as you have the full list in view, you can remove from view those requests that go to external domains. The easiest way to do this is to add the “^/” regex to the filtering options on top.
At this point, the first relevant statistic becomes available at the bottom of the request list - the number of currently displayed requests / total number of requests will be a good first indication of the number of requests that potentially have to go through the proxy when the page is served.
Considerations¶
What Cache Headers are present?¶
If you click on a request in the list, a new sidebar will appear with information about that particular request. For evaluation purposes, the Header
tab, and the cache-control
directive is especially important.
cache-control: private
or cache-control: no-cache
indicate a request that the original server expressly states to be non-cacheable - usually, it is not a good idea to change this haphazardly. It is better to count these requests as necessary for the construction of the page.
Is the resource static?¶
PNG/JPG images, JS and CSS files are those resources that tend not to change rapidly. You can override the Cache Headers of such Resources in Dashboard > Path settings - with the effective result being that the burden of serving that content is offloaded to independent caching nodes on the global network.
Re-caching happens for each cached entity after the duration of the max-age
directive passes. Using max-age
, you declare a time-frame during which you will enlist the help of the network to serve the content in unchanged form. It can be used to fine-tune the time that you’ll allow a specific cached instance of a resource to persist.
NOTE While Cache Header overrides work in the overwhelming majority of cases, there is no “law” to force caching nodes to respect them: consequently, the pace at which various Resources are cached/re-cached on the global network is, to a degree, arbitrary.
While technically possible, making Document resources cacheable requires careful consideration, but as little as 10 to 60 minutes can be very useful. Consider that in most cases, the landing page receives the most page requests. Consequently, allowing it to be cached with a controlled max-age
value means considerable savings on the proxy (with the caveat that any changes will take at most the time declared for max-age
to propagate across the network).
Does the requested resource change often/constantly according to context?¶
XHRs/AJAX calls/dynamic content cannot be cached without rapidly running into many problems on the published site. It is better to say that they simply can not be cached. This also applies to those requests that are sent throughout the user session on the page after loading it (and in lieu of hard data, it is very difficult to forecast the number of dynamic requests a given user will start).
Salient examples are search field handler scripts, web-shop endpoints, PHP scripts, backend endpoints and other similar sources that give wildly varying responses based on the parameters sent in the requests.
If a site is undergoing development, for example, it is usually not a good idea to add Cache Header overrides (certainly not overly long ones). This would in effect delay propagation of any changes by the value of max-age
, resulting in syncing problems between the original site and its translated counterparts.
Overriding Cache Headers¶
Evaluation of the number of requests is most useful when estimating the monthly cost associated with serving a site. For overriding cache headers on the Dashboard, see the Path Settings section of this documentation. Enable the public caching tweak in Dashboard > Advanced settings to facilitate further speedups on the Easyling end.
Example Scenario & Conclusions¶
If the following information is available:
* The original site has 50,000 monthly user visits
* A single target language sub-domain is expected to receive about half of that, 25,000
* Each page uses between 50-70 requests to build
Using the consideration points above, it is determined that most of those requests can be counted out and the rest cached for 24 hours. The site will require 3 non-cacheable requests from a user coming in from a location where there is no caching node on the way.
From this you will be able to conclude the following:
* 2 requests are necessary, so 25,000 * 3 = 75.000 expected page requests
* BUT: consider also the number and visit frequency of revisitors - each time a cached entity's max-age (24 hours in this hypothetical scenario) expires, that resource has to be re-cached, which will increase the number of requests going through the proxy with a certain amount (although this amount is usually negligible).
With the appropriate Cache Headers, Google’s geographically specific Edge Cache, the public internet, the various ISP caching nodes, and at the other end of the process, the user’s browser cache will participate in offloading the page request from the proxy’s translation pipeline.
Conclusion¶
Armed with this knowledge of an overarching view of requests that will have go through the proxy, you will be able to provide accurate estimates for the monthly costs of the proxy
Staging Domains¶
Use Staging Domains to change the origin server to use in Preview or crawls.
Website maintainers like to test any changes they make on a development or staging server before unleashing them on the Live site. The same staging server that is in place on the site can be used over the proxy as a testing ground for any translatable updates. Add a Staging Domain option to extract and use data from that domain staging server in the various proxy modes.
If the project URL is example.com
, and a staging server exists at dev.example.com
, you can enter that URL into the Staging Domain field and click on “Add Staging Domain”. The domain is added to the menu below the text field automatically and enabled as default.
All requests going through the proxy (regardless of initiator, such as a user session in Preview or a content extraction Scan) will be mapped to that domain, regardless of the original project URL - the project domain might be example.com
, but the Preview will be displaying content from dev.example.com
.
Very important: Translations are propagated after extraction as usual, but this comes with an important warning: previous translations will only show up appropriately if the path structures of the original site and the staging server match 100%. If this is not the case, then a new project page is created in the project.
Default and Live Default Staging domains¶
Hover over the staging domain name to reveal three options. Live default, Default and Delete.
Click on “Make Live default” to enable the staging domain for the live published site. A tick icon will be displayed next to the staging domain when this option is enabled.
The same goes for Default, which applies to all other proxy modes (such as the Highlight View in the Workbench, the Preview domain or the X-proxy). Click on Delete to remove a Staging Domain.
Naturally, only one staging domain can be designated as Live or Preview default at one time, but it is possible to enable one domain to be both.
When to Use Staging Domains?¶
Content Decoupling¶
The staging domain feature of the proxy is beneficial when updates to the original site are regularly tested on the staging server first.
Staging domains let you make that same content available through the proxy for translation work without disturbing the translation quality over the published domain in the meantime.
The idea is to extract content from the staging domain and begin translation work early – by the time the changes on the original site are moved from the staging domain to the main one, all target language entries will have their translations ready.
Domain Name Changes¶
An alternative use of the staging domain feature is when the original site changes domains. As you know, project addresses cannot be changed once a project is created. However, migrating an entire project due to a simple name change might not be a very optimal solution.
In these cases, you can use the Staging Domain option to set up the new address as a staging server on the project. By following up on the name change in the Publishing settings, you can transform both the origin and the published domains to the new address (but note that the project address will remain the same, however).
Left-Right conversion¶
It depends on the site...
- Best case scenario:
html {
direction:rtl;
}
<html dir="rtl">
If it looks mostly OK, then all is left just some minor CSS fixes:
- flipping images (in carousel / slider)
- list elements’ bullet is defined using a background image
- If text’s align is defined explicitly (e.g. using WP’s text editor)
inline, then each and every element must be overridden using
!important
- Bootstrap and framework alike:
- some framework might provide a dedicated RTL css file bootstrap-rtl.css
- If this is not case, each and every element must be positioned individually
Mixed content within text¶
- When it comes to actually render the text (numbers), the direction is determined by a couple of rules. Please read this for the details.
- As a rule of thumb: during the translation (from LTR language to RTL), don’t use your CAT tool to change the order of the numbers and text where the translation is with latin characters. Phone numbers like 1-800-123-1234 should be left in this order.
- To make sure numbers are rendered properly at the end, a Left-to-Right (LRM) must be inserted before every number. The dash between numbers split them, so LRM must be inserted after them again. Click to read more on how to insert these LRM characters.
- The same holds for parenthesis, etc. Make sure you understand the rules; sometime LRM must be inserted after the closing parenthesis
<p style="direction: rtl;">
<span>(TTY/TDD) 711 </span>
</p>
<p style="direction: rtl;">
<span>‎(TTY/TDD) 711</span>
</p>
<p style="direction: rtl;">
<span>‪(TTY/TDD) 711‬</span>
</p>
- As an alternative, LRE character can be inserted before the sequence, terminated by a PDF mark.
- To edit the XML (XLIFF) directly, use
Sublime, available on Windows, OS X
and Linux as well. It works very well with Regular Expressions. It’s
a robust way to insert these marks using their unicode symbol, such
as
‎
Further readings:
Site Search¶
Transparent multilingual search is a frequently required (though often belatedly acknowledged) part of website translation. We recommend a client-side approach to search (the approach to which the proxy translation model lends itself best), a comparatively involved use case of code injection. Given its pervasiveness and importance, it warrants detailed treatment in our documentation.
Part I of this recipe provides a general description of the site search issue and an introduction to the recommended solution. Part II contains a simple example implementation.
This latter section also serves as an in-depth page modifier tutorial as well, and as such, it assumes basic familiarity with core web technologies such as HTML, Javascript and CSS.
Part I - General¶
Issue¶
Visitors expect to be able to search for things in the language of the website – to input any term into the search fields and receive appropriate results immediately.
Let’s say that a visitor uses the search field of www.example.com
to search for the word “product”. Upon pressing enter, they are navigated to https://www.example.com/search?q=product
. The server will detect and use the value of the q
parameter and runs whatever server-side search mechanism it uses to search for this term, assemble a list of results and construct a result page to send to the browser.
On a proxied site, however, we run up against a problem: the original does not know about the translations. Requests are automatically relayed to the original server. If a visitor were to type “produkt” on the German domain (resulting in an URL navigation to https://de.example.com/search?q=produkt
), that is the same as relaying a search query with German language content to the original site.
Not possessing a German language index, the response is certain to contain 0 results. The fact that the proxy is CMS-agnostic and that it generally doesn’t require that translated content be shared with the original server also means that this same content will not be available to the indexing/search software that is running on the original.
Recommended Solution¶
The way out of this conundrum is that proxied pages themselves are publicly available for indexing by search engine bots. A search engine that supports site-specific queries can provide localized search via client-side AJAX requests.
There are two aspects to this kind of solution that you should consider:
On the one hand, such a site search has to be coded in JavaScript in the form of an override (an example with a detailed explanation follows below), and depending on how ornate/feature-laden the original’s search functionality is, complexity of the override implementation can vary from the relatively simple to the astonishingly complex.
On the other hand, familiarity with the various indexing-related conditions of your chosen vendor is also important (we provide pointers to documentation for Bing, since it is the vendor used in the example). In our experience, it is not at all unusual for search engines to be leisurely in their pace. Remember that a site can only be indexed after it is published over the proxy.
On the same note, the publishing method you use can also have a bearing on the way search & indexing will work, so a case-by-case analysis is a must. Note that the Client-Side Translation publishing method is not compatible with a site search integration of the type described here.
Third-party Integrations¶
Bing’s Web Search API can be used for site search integration purposes over the proxy. In order to access Bing’s API, you will need to purchase an API key. See the details on Microsoft Azure on procuring one, and see the pricing page for a detailed description of the various service tiers.
Consult the Webmaster Tools Documentation concerning general usage-related matters. You can submit a domain for indexing in Bing here. Submission of specific URLs is possible here, but this feature is limited to root domains at the time of writing: until such time as this limitation is lifted, you might only be able to use this targeted indexing feature with subdirectory publishing.
If you want to track the indexing of your subdomains in Webmaster Tools and do SEO tracking, you’ll need to verify ownership of the target language subdomains. The available methods are described here.
The XML-based approach is the simplest to implement over the proxy (it can be done without having to apply any changes on the origin): create a temporary Page Content Override with the contents of BingSiteAuth.xml
(content type should be text/xml
). This exposes the authorization XML over the proxy domains. From then on, you are free to add the target language subdomains in Webmaster Tools and verify them one-by-one.
Part II: Example¶
In the rest of this documentation page, let us detail how a simple site search integration could be implemented as injected JavaScript over the proxy using v7 of the Microsoft Web Search API.
Page Modifiers¶
Customized JavaScript can be injected into the <head>
tag of each page over the proxy. This capability is the entryway for a search override. To add injected JavaScript you’ll find an editor in Page Modifiers > Javascript editor, where you can type or copy & paste Javascript code. After saving the modifier, it will show up in the page source over the proxy after a refresh (but note that cache settings might get in the way of an instantaneous update on the live domain!).
Site search functionality is frequently displayed to the visitor in the form of an input field and a button. We’ll go with this familiar scenario and use the following minimal webpage for this tutorial:
<html>
<head>
<script src="https://code.jquery.com/jquery-2.2.4.min.js"></script>
</head>
<body>
<script>
function showResults(){
document.querySelector("#results")
.setAttribute("style", "display:block;");
}
</script>
<div id="search">
<form id="form" action="javascript:showResults();">
<label for="#input">Search:</label>
<input id="input" type="text"/>
<button id="button" action="submit">Submit</button>
</form>
<div id="results" style="display:none;">
<div class="result-item">
<h2><a href="#">Result title</a></h2>
<p>Result summary</p>
</div>
</div>
</div>
</body>
</html>
First steps¶
showResults()
simulates the original search functionality of a site. The goal of our search integration is to prevent the original from running and replace it with a client-side dynamic request. A site search integration is no different from any other client-side code, and the skeleton that we described previously on the Page Modifiers documentation page is a safe starting point:
(function (){
"use strict";
$(document).ready(overrideSearch);
function overrideSearch () {
...
}
})();
We use jQuery’s $(document).ready()
to wait until the DOM is ready to be manipulated. The IIFE wrapper isolates our modifier from the global namespace, and we never leave the house without use strict
.
We could define a callback on document.onreadystatechange
, but this approach suffers from potential problems: we’d be setting our modifier on a publicly accessible property of document
, exposing us to the possibility that some other script might inadvertently redefine it (or we could be the ones doing the redefining).
We would also end up having to pool different page modifiers in one function definition, which goes against the principle of separation of concerns. For these and similar reasons, vanilla JavaScript’s document.addEventListener
s or jQuery are a better fit.
With basic scaffolding in place, we move on to the nitty-gritty of search. The integration needs to do three things:
- override search elements on the page
- send search request based on user input and handle response
- build the results page based on the response
We take each of these responsibilities in turn to see how they can be implemented.
Overriding Elements¶
Having prepared this basic scaffolding, we consider the contents of overrideSearch()
. We would like to attach our event handlers to all search-related elements:
function overrideSearch () {
$("#form").on("keydown", overrideInput);
$("#button").on("click", overrideClick);
}
Next, we implement the callbacks. For the input field, we ensure that the user is not disturbed by the modifier needlessly, which only steps in if the Return key is pressed:
function overrideInput () {
if (event.keyCode === 13) {
event.preventDefault();
sendRequest($("#input").val(), renderResult);
}
}
The button is straightforward to override. The use of this.previousSibling
is featured here as a suggested alternative to selectors.
function overrideClick(){
event.preventDefault();
sendRequest(this.previousSibling.value, renderResult);
}
We are done with the override part. The error message Uncaught ReferenceError: sendRequest is not defined
should appear in the console if we try to search at this point. The default event might have been a navigation or a dynamic request of the original site’s own, but we prevent it from executing and call the function sendRequest
instead, which we’ll implement next.
Sending the Request¶
We assemble a GET
request using the search term passed to sendRequest
and send it to the Bing endpoint. For example:
function sendRequest (term, callback) {
$.ajax(createRequest(term)).success(function (resp) {
callback(resp);
})
}
createRequest
is an important part of this process, where various API-related parameters to construct and return the request are sent. Using Bing, it could look something like this:
function createRequest (term, offset) {
return {
beforeSend: function (xhr) {
xhr.setRequestHeader("Ocp-Apim-Subscription-Key", config.API_KEY);
},
error: function (xhr, error, thrown) {
console.log("Error during request!");
}
url: config.API_URL,
type: "get",
data: {
q: "site:" + config.DOMAIN + "/ " + term,
count: 10,
offset: offset || 1
}
};
};
beforeSend
sets the Ocp-Apim-Subscription-Key header, which contains the API key to authenticate the request. We introduce Bing’s site-specific search feature into the request by prefixing the value of the q
property with “site:”(which works the same way as on bing.com).
It is also at this point that the code should handle the publishing method you use. If you publish your project in a subdirectory of the original domain, then in addition to the “site:” prefix, you would also have to add the target-language directory prefix (e.g “/ja/” or “/de/”). If you are publishing in multiple languages, you can use the lang
attribute of the html
tag (the value of which is always the current locale) to make this part of your code target-language specific.
A variable called config
was also introduced to hold search-related config parameters in one place. This variable can go to the top of the modifier:
var settings = {
API_KEY: "nmtcxylkj56lkjmnnj3mg782nmvf23gz", // example only!
API_URL: "https://api.cognitive.microsoft.com/bing/v7.0/search",
DOMAIN: location.host
}
If the API key is valid and version-compatible with the endpoint, Bing responds with a JSON to each request, details of which can be found here. The webPages.value
property of this response is particularly important, a JSON array containing the first batch of search results.
Note that settings.DOMAIN
is set to location.host
. Encoded in this fact is the assumption that location.host
(the site in which the code is running) is to be used for searches and it is already indexed.
During development and testing, however, no proper search result might be available yet. For dev purposes, it can be useful to change DOMAIN
to e.g en.wikipedia.org
temporarily to receive proper search results. The use of a mock responses is also an option.
Search engine APIs have many features that you can provide in your implementation beyond this example, but we will not discuss all the possibilities in this tutorial.
We pass the response to the final component of our override, the DOM handler, renderResult
. We already mentioned it in passing to the sendRequest
calls.
Building the Results¶
Astute readers will notice that there is hardly any site-specific information in the code above (practically none, except the element selectors). We’d be inclined to think that plugging a form
and button
selector into the functions and changing the API_KEY will get us on our way on any similar site – and to the extent of element overrides and request handling, that might very well be the case. renderResult
, however, does not enjoy the benefit of generality.
The DOM handler is responsible for displaying search results on the proxied site in a way that seamlessly integrates with the original appearance, making this part of a site search integration heavily dependent on the specific context of the website that we are dealing with.
In order to work out the imlementation for our example webpage, we turn our attention from #search
to #results
and research the structure of a search result item.
In the same breath, we can add the basic structure of renderResult
right away. The bare minimum functionality is to clear the result list and add the one that was received and passed as argument.
function renderResult(response) {
$("#results").empty();
if (typeof response.webPages.value !== "undefined") {
for (var hit in data.webPages.value)
$("#results").append(createResultItem(data.webPages.value[r]))
}
}
createResultItem
constructs an element for a result item using information provided by Bing, but copying the element structure from the original, which has a huge benefit: existing CSS styles will apply to the search integrated results automatically. Inspecting the search results produced by the original server, we learn that a search result item looks like this:
<div class="result-item">
<h2><a href="#">Result title</a></h2>
<p>Result summary</p>
</div>
Knowing this, we can implement createResultItem
in the following simple fashion:
function createResultItem (result) {
return $("<div>").attr("class", "result-item")
.append($("<h2>")
.append($("<a>").attr("href", result.url).text(result.name)))
.append($("<p>").text(result.snippet))
};
Among the available options for creating DOM elements (document.createElement
, string contatenation, etc.), the jQuery-based approach is clear and concise. Many bells and whistles can be added to a DOM handler, such as a pager, for example, which would require that we handle offsets (only hinted at by createRequest
in this tutorial) and add Previous/Next buttons, etc.
Our tutorial, however, does not extend to those details and we conclude it here.
Notes¶
Some frequent, but non-essential complexity of client-side search is omitted from our example:
- offsets (a Bing-specific term) are only alluded to. Any search API will support requests for the next batch of search results for the same search query. This is usually exposed to the end user in the form of a pager that, when clicked, sends the same request but increments the
offset
by 10. Such a feature causes bothrenderResult
to manage the concept of a pager andsendRequest
/createRequest
to have to keep tabs on tne currentoffset
. - a search field is also generally available on all pages on a site, not just the search results page. This means that an integration needs to ensure that the user is redirected to the search results page and that query parameters are appropriately handled. Besides using the values of the input field (a user-driven query after page load), a search integration usually has to be able to extract
produkt
fromhttps://de.example.com/search?q=produkt
to start a default search on the search results page when it loads for the first time. This is usually not difficult, but as we’ve said previously, much depends on specific circumstance. - a search integration also has to be prepared to display a “No results.” search result page if the search engine returns no hits. Such natural language has to exposed in a way that can be translated using the proxy.
Code¶
We repeat the code from the discussion above in its entirety. In summary, when injected into the example webpage, it will override both the search field and the button to request search results from Bing via an AJAX call and then displays those search results in-place.
(function (){
"use strict";
var config = {
API_KEY: "nmtcxylkj56lkjmnnj3mg782nmvf23gz", // example only!
API_URL: "https://api.cognitive.microsoft.com/bing/v7.0/search",
DOMAIN: location.host
};
$(document).ready(overrideSearch);
// OVERRIDE
function overrideSearch () {
$("#form").on("keydown", overrideInput);
$("#button").on("click", overrideInput);
};
function overrideInput () {
if (event.keyCode === 13) {
event.preventDefault();
sendRequest($("#input").val(), renderResult);
}
};
function overrideClick(){
event.preventDefault();
sendRequest(this.previousSibling.value, renderResult);
};
// REQUEST
function sendRequest (term, callback) {
$.ajax(createRequest(term)).success(function (resp) {
callback(resp);
})
};
function createRequest (term, offset) {
return {
beforeSend: function (xhr) {
xhr.setRequestHeader("Ocp-Apim-Subscription-Key", config.API_KEY);
},
error: function (xhr, error, thrown) {
console.log("Error during request!");
},
url: config.API_URL,
type: "get",
data: {
q: "site:" + config.DOMAIN + "/ " + term,
count: 10,
offset: offset || 1
}
};
};
// DOM
function renderResult(response) {
$("#results").show().empty();
if (typeof response.webPages.value !== "undefined") {
for (var hit in response.webPages.value)
$("#results").append(createResultItem(response.webPages.value[hit]))
}
};
function createResultItem (result) {
return $("<div>").attr("class", "result-item")
.append($("<h2>")
.append($("<a>").attr("href", result.url).text(result.name)))
.append($("<p>").text(result.snippet))
};
})();
SSL Certificates¶
Easyling has the ability to proxy HTTPS pages, but to do so, it must be provided a certificate and private key matching the URL. Otherwise, the proxy will be unable to identify itself as a valid server, and the browser will abort the connection for security reasons.
Easyling support can assist in deploying an HTTPS site by providing a CSR (Certificate Signing Request) to generate the appropriate certificate if the required information is provided, or the client can prepare the certificate themselves.
Additionally, a certificate is required to provide a branded Easyling instance. For more information, you can check the Whitelabel article of the FAQ.
The protocol¶
HTTPS (also called HTTP over TLS, HTTP over SSL, and HTTP Secure) is a protocol for secure communication over a computer network which is widely used on the Internet. HTTPS consists of communication over Hypertext Transfer Protocol (HTTP) within a connection encrypted by Transport Layer Security or its predecessor, Secure Sockets Layer. The main motivation for HTTPS is authentication of the visited website and protection of the privacy and integrity of the exchanged data.
In its popular deployment on the internet, HTTPS provides authentication of the website and associated web server with which one is communicating, which protects against man-in-the-middle attacks. Additionally, it provides bidirectional encryption of communications between a client and server, which protects against eavesdropping and tampering with and/or forging the contents of the communication. In practice, this provides a reasonable guarantee that one is communicating with precisely the website that one intended to communicate with (as opposed to an impostor), as well as ensuring that the contents of communications between the user and site cannot be read or forged by any third party.
Historically, HTTPS connections were primarily used for payment transactions on the World Wide Web, e-mail and for sensitive transactions in corporate information systems. In the late 2000s and early 2010s, HTTPS began to see widespread use for protecting page authenticity on all types of websites, securing accounts and keeping user communications, identity and web browsing private. (Courtesy of Wikipedia)
Issuing an SSL certificate¶
Issuing universally accepted certificates is restricted to the so-called “Root Certificate Authorities”. However, root authorities will generally delegate their powers to “Intermediate Authorities”, who will sign certificates as requested by the end user (provided ownership of the domain can be verified). When requesting a certificate for your whitelabel installation or HTTPS site, you will most likely interact with an intermediate authority, by providing them an encrypted configuration file (the Certificate Signing Request), which the provider consumes to produce a cryptographically signed certificate.
Using a CSR has several distinct advantages over generating your own certificate at the issuer: since the the translation proxy support crew is in control of the final product, we can tailor the request to generate the certificate we need from you; and since the private key remains safe with us, you do not need to take special precautions when sending it. However, we require some information to be provided to us beforehand:
countryName_default = ${COUNTRY}
localityName_default = ${CITY}
streetAddress_default = ${ADDRESS}
postalCode_default = ${ZIP}
0.organizationName_default = ${COMPANY_NAME}
organizationalUnitName_default = ${ORG_UNIT}
Issuing a certificate consists of several predetermined steps. First, a cryptographic key pair is generated, one half public, the other private. Then, a file indicating the domain the certificate is to be issued for, as well as various data about the entity holding the domain (generally, you or your client) is created. This file is then combined with the public half of the key, and signed with the private half. The resulting encrypted file is then handed to the issuer, who verifies the information contained within, and if successful, encodes the information into a public certificate, signing it with its own private key. This is the file that needs to be provided to Easyling support.
Providing us with a certificate¶
Following the previous phase, you are now in possession of a cryptographic certificate, and possibly its private key (if you elected to create your own certificate instead of requesting a CSR from us).
If we provided you with a CSR file, you need to send only the certificate. On its own, the certificate is not viable - it requires the private key to be useful, thus, it can be sent via email safely. The private key remains safe with us, and we will use it to upload the certificate to Google AppEngine, from where it will be available to authenticate the proxied site to the browser, and enable HTTPS for the translations.
If you elected to forgo the CSR and generate your own certificate, we will need the associated private key as well (and any passwords used to lock the private key). In this case, however, care must be taken to prevent the certificate falling into the wrong hands: the certificate and its keyfile, or the keyfile and its password must never travel together - if the email carrying both is intercepted, the malicious third party can use it to impersonate your site! Either use two separate emails, or even two different channels (email and Skype, for instance) to provide us the key and password and the certificate. Once we receive the files, we will again upload them to AppEngine after decrypting the keyfile, at which point it will be available for use with the proxied site.
N.B.: When selecting an SSL provider, bear in mind that Google only accepts certificates that are either self-signed or signed by a publicly trusted root Certificate Authority. One important point that needs to be highlighted is that the root CA for CloudFlare’s Origin certificate system is not publicly trusted. Thus, we are unable to make use of Origin certificates generated via CloudFlare’s system - in this case, please contact us for a CSR
file for use with a provider of your choice.
SSL Manipulation Commands¶
Converting private keys to RSA¶
When uploading the keys into AppEngine, the file must be in RSA format. To verify, the beginning of the file should be
-----BEGIN RSA PRIVATE KEY-----
.
If you read
-----BEGIN PRIVATE KEY-----
,
you need to convert it, using the following command:
openssl rsa -in key -out key.rsa.key
Extracting Certificate and Private Key Files from a .pfx / PKCS#12 File (includes both the certificate and the private key)¶
- export the private key:
openssl pkcs12 -in certname.pfx -nocerts -out key.pem -nodes
- export the certificate:
openssl pkcs12 -in certname.pfx -nokeys -out cert.pem
- create RSA key / remove passphrase from the key:
openssl rsa -in key.pem -out server.key
Check if a given key matches a certificate (or CSR)¶
By running these commands on the keyfile and certificate, you can verify that the key used to generate the certificate matches the one you have on hand. If the two outputs match, so do the keyfiles. But if not, your client may have used their own private keys to create the certificate, which you will have to obtain before forwarding it to us.
openssl rsa -noout -modulus -in privateKey.key | openssl md5
openssl x509 -noout -modulus -in certificate.crt | openssl md5
Alternatively, if you have the CSR as well, you can use the following command to obtain the checksum of the CSR’s key and verify that the CSR you have on hand was used to generate the certificate.
openssl req -noout -modulus -in CSR.csr | openssl md5
Proxy modes - X, P, live¶
X-proxy (testing, JS bugs, JS fix domains , CORS)¶
The X-proxy is great for testing. You can spot content that does not get picked up by default, and make your configurations to your project, and check for success.
There are a couple of situations, when the X-proxy comes in handy:
- Testing regular expressions, for example on e-commerce sites.
- Testing JSON (JavaScript) and XML translation.
- Just browsing through a site, for evaluation purposes.
An example X-proxy URL: https://de-de-{project_code}-x.app.easyling.com
The X-proxy can be accessed from the pages list under Content (or Discovery) by clicking on the Preview button in the hover toolbar, while holding down the Ctrl/Cmd button, or you can just replace the -p for a -x in the normal preview’s URL for the same effect.
P - Preview¶
The standard proxy mode to view the translated website before publishing. However, the preview can be used for a couple of other things:
- Cookie header extraction to get behind logins
- Visiting pages manually, to ingest content
An example Preview-proxy URL: https://de-de-{project_code}-p.app.easyling.com
C - Channel¶
Live serving mode¶
After publishing the website, the proxy serves content on the chosen domain.
HTTP Headers¶
What are HTTP Headers?¶
HTTP header fields are components of the header section of request and response messages in the Hypertext Transfer Protocol (HTTP). They define the operating parameters of an HTTP transaction. The header fields are transmitted after the request or response line, which is the first line of a message. Header fields are colon-separated name-value pairs in clear-text string format, terminated by a carriage return (CR) and line feed (LF) character sequence. The end of the header section is indicated by an empty field, resulting in the transmission of two consecutive CR-LF pairs. (Wikipedia)
Headers and the proxy¶
The Proxy strives to be as transparent when it comes to headers as possible. Therefore, we forward the majority of headers added to any incoming request, with a few exceptions where the presence of said header could cause undesirable operation in the original server.
Additionally, The Proxy also adds a few specialized headers both to requests to the remote server and responses to the client. The presence of these headers SHOULD NOT cause erroneous behavior in the server.Request headers contain additional information on the client viewing the site and the language being served. This can be used to provide customized content. Some Proxy-specific request headers are:
X-TranslationProxy-Translating-To: ja-JP
: This gives the language of the translated version the client is browsing.X-TranslationProxy-Translating-Host: jp.eveonline.com
: This header contains the domain under which the Proxy is serving the translated site.X-TranslationProxy-Originator-IP: 192.168.168.192
: If enabled, this header contains the IP address of the requester, which may be hidden by CDNs and other proxies.- Future headers of the
X-TranslationProxy
containing other metadata User-Agent
: This header is somewhat special, in the sense that it’s not specific to The Proxy, rather, it is sent by almost all browsers to identify themselves. The reason it finds a place in this article is that it can be used to identify Proxy-served requests. Google AppEngine modifies this header when sending requests, in a way that ensure no further tampering before the header reaches the original server, by addingAppEngine-Google; (+http://code.google.com/appengine; appid: u~skawa-easyling)
- this can be used to whitelist the Proxy.
HTTP Headers and Security¶
Due to its nature as a proxy, and the fact that Google URLFetch uses a diverse range of addresses allocated randomly, it is easy to see that proxied requests may be caught by Web Application Firewalls, Anti-DDOS software, or even security provider companies. When a project is launched, it is often a good idea to contact the client and have them notify any contracted security providers, or make the necessary changes to firewalls and block-lists.
The easiest way to identify proxied requests is to read the User-Agent
header, and locate the above-mentioned pattern - the application ID is added by Google at the last possible moment, thus, it can be a trusted indicator (along with the other headers, if needed) that the request was initiated by the Proxy. Security providers can be advised that the appearance of these headers is normal and should not be construed as an attack/phishing attempt.
The Dashboard 2.0¶
Complexity Matrix¶
The Complexity Matrix (CM) is two things: first, it is a note-taking application where you can get your knowledge of a project domain and various potential issues in order. Second, it is the first feature to showcase the look & feel of the Dashboard 2. Feedback is welcome!
You’ll recall from the introduction that there are a number of questions to ask when you embark on a new project, so as to create better forecasts for your project expenses and the required amount of work to get the project going. The CM is meant to make this easier via a screen that you can use to track issues from within the project itself.
Accessing & Using the CM¶
While incremental development of the Dashboard 2.0 is ongoing, the old interface remains available. From there, go into the Dashboard menu and click on Complexity Matrix to access the new Dashboard and the CM screen.
QA Tests¶
QA tests should be considered as part of the project setup phase. In the remainder of this documentation page, You will find a non-exhaustive list of possible issues that warrant careful consideration.
JavaScript¶
Content in JavaScript¶
The proxy will find all translatable content in the HTML source, but such is not the case with JS: the appropriate JS Translation paths need to be worked out and added to the Advanced settings. Use the x-proxy after Discovering/Scanning a site to reveal any content that was not found by the crawl.
If you notice that specific parts, such as the navigation bar or a user account page are not x-ed out in the x-proxy, it is likely that that content is in JS in <script>
tags, or discrete JavaScript Resources that are requested as the page is loading.
Investigating the original site and the proxy preview for content in JavaScript increases project setup time.
Things to Note
- Watch the domain of origin for JS resources. Translatable content on external domains takes longer to set up.
- string concatenation (especially that of HTML markup) is difficult to overcome. The example below is a sign of trouble!
var exampleVar = "world"
var htmlContent = "<div><p>Hello " + exampleVar + "</p></div>"
URLs in JavaScript¶
JS code often contains URLs or pathnames that are referred to elsewhere when making requests (both for intra and extra-domain Resources). Since the proxy will remap the project domain according to its own preview and publishing settings, it is possible that the JS URLs will point to the wrong place.
Things to Note
- Fully qualified, non-concatenated URLs can be remapped via a tweak available in Advanced settings.
Functionality¶
HTTP and HTTPS¶
If you notice that a site uses HTTPS, pay attention. Since the proxy preview domains are certified themselves, you can go a long way without realizing that the Live target language sub-domains will also require certification. Procuring an appropriate SSL certificate takes time, and it can become blocking in the Publishing phase.
Things to Note
- An SSL certificate for the published sub-domains is necessary.
Site Search¶
Most large-scale websites nowadays have a site search functionality that requires additional work on the proxy end of things to intergrate appropriately.
Things to Note
- Evaluation of the Site Search functionality increases project setup time.
- The proxy supports Google’s Custom Search Engine. The integration has to happen via JavaScript page modifiers to maintain the original look and feel of the site - this, however, means coding work.
Language Selector¶
Many sites already have some sort of localization solution in place. Investigate the current language selection mechanism for potential issues.
Things to Note
- If a language selector is already present, it might require some tweaking on the original server to better handle the link mapping that happens over the proxy. The
__pTNoremap
class needs to be added tohref
attributes that should not be remapped. - If there is no selector, it will have to be implemented,or one of the default proxy selectors used. This requires action from someone who has access to the source on the original server - these changes can increase the time it takes for a project to go live.
Plugins¶
External i18n Modules¶
Closely related to the Language Selector, the presence of i18n modules can life easier or more difficult, depending on what you can do with them. Consider whether these modules can be reused in the context of the proxy or if new translations will have to be provided.
Things to Note
- Is the i18n module part of minified code? Is it dynamic content? These are usually extremely difficult to reliably reuse.
User Sessions & Logins¶
Sites often have sections that require login credentials. The proxy can handle these sections if it is passed the appropriate session cookies.
Things to Note
- Extracting session cookies is an additional step that needs to be executed for each Scan/Content Ingestion Cycle.
Query Parameters¶
Sometimes, the project page list will contain many instances of the same path with different query parameters after a Discovery. The thing about query parameters is that they do not necessarily imply unique content for the specific settings they engender. For example,
https://example.com/products?sort=asc
https://example.com/products?sort=desc
could plausibly serve exactly the same content, only in different order. Conversely, it could also be requesting an entirely different set of data to use in the browser. This requires careful combing of the site to make sure that all necessary content is extracted, but no superfluous pages are kept.
Things to Note
- The proxy can ignore or group specified query parameters, but figuring out which ones to keep and which ones to throw away will add to project setup time.
- Consider also that independently made changes to the original site might instantly deprecate your current project settings - this also has implications for the maintenance phase of a project.
Dynamic Content¶
XHR, AJAX, XML and JSON¶
Things to Note
- Dynamic content has to be extracted manually via the Preview.
Webshops¶
Single-Page Applications (Angular, etc.)¶
Layout & CSS¶
Word lengths¶
English and German, for example, differ considerably in their average word lenght. While not normally an issue in the case of paragraphs, this can become an issue if some part of the site layout is too tight (for example, a navigation header with widths declared in pixels). If too many assumptions are ingrained in it about the amount of content that has to fit in a given element, that might need some tweaks.
Other times, enclosing elements (such as border boxes) will change their size based on the amount of text they contain. This potentially requires some target-language specific tweaking of the layout via CSS/JS Page modifiers.
Things to Note
- Some things can be made to fit via CSS tricks and Page Modifiers. In other cases, only rewording will help if extensive reworking of the site layout is not an option.
- The other case may also happen: Chinese terminology, for example, can be considerably shorter than its German counterpart. Too much room is less often a problem than too little, but it is useful to keep in mind nevertheless.
Text Direction¶
Some target languages, such as Hebrew or Arabic require that the site be redesigned from Left-To-Right to Right-To-Left via CSS rules/page modifiers/framework-specific RTL libraries. This usually means an extensive reworking of the site structure using development tools.
Not only does the CSS/JS parts of RTL redesign (which has to be done before the project is published) take a long time, there are some parts of a site that require even more detail work. Images with specific directionality (a left-pointing arrow, for)
Things to Note
- *Keep track of UI elements that have explicit directions (such as images).
- Interactive elements and vendor plugins usually mean many hours of work to get everything just so. Be extra careful!
Responsive Layout¶
Responsive site design is a compounding factor for all extant layout issues. Websites are expected to work with a wide variety of screen sizes and browser settings, you have to take it into account. Use the Chrome DevTool’s device mode to gain understanding of site behavior over the proxy (and in general).
The Workbench¶
Introduction¶
The Workbench is a CAT tool in the cloud.
If the Dashboard is our take on the project management side of webiste localization, then the Workbench is the analogous CAT tool you can use without ever leaving the browser window.
In this chapter, we take a close look at all the various features available on the Workbench. The explanations pertaining to the various elements and functionality of the Workbench screen are grouped into thematically linked subsections. Use the sidebar to navigate to the subsection of your choice.
Opening the Workbench¶
You can access the Workbench anytime after adding at least one target language.
There are a few ways to enter it:
- In the Dashboard menu, see the Languages part. If you hover over any of the target languages, a menu bar will be displayed to the right. Click on “Translate in List View”..
- Click on “Manage segments” in Content menu...
- Enter the Page list and hover over any entry to display a menu to the right. Click on “Translate in List View”...
.. and the Workbench will open in a new tab.
Moving Around¶
The Workbench has a single viewport, so every feature you’d need to navigate between pages and segments is always just a couple of clicks away.
You may switch between pages, search for text, filter for segments based on a variety of metadata, such as Approval state, containing block element, translation source and so forth. This section deals with the various ways to do that.
Workbench Page List¶
A segment is tied to the specific page it was found on. You can use the Page list dropdown right next to the logo to get a list of all pages currently within the scope of translation.
You can click on any page entry in the list to visit that page and get an overview of all segments associated with that page. Websites can get very large with a huge list of pages - use the search field embedded in this dropdown to locate a specific page (you may use regular expressions with the usual format of including them between two slashes like this: /[regular expression]/
).
There are three options at the top of the dropdown that bear special mention:
Show All Entries¶
Most of the time, you will want segments displayed for a specific page, but you may also use this option to get an overview of all segments across all pages.
WARNING! Only List View is available in this view, all other View buttons will be unavailable!
The All Entries list doesn’t flood your browser with every last segment all at once: segments will be loaded in batches of 500. The Workbench will automatically fetch a new batch of entries as you scroll down.
Show Pending Entries¶
By default, Scans will pick up new entries in “Approved” state, which in this case means “Approved for Translation”, immediately available for translation. You can change this default behavior, by going into the Dashboard and changing it in Advanced settings to either Pending or Excluded.
By clicking on “Show pending entries” in the page list, the list of all entries that are currently waiting for Approval is displayed. In their current state, They will not be included into exports unless the relevant option is selected at export-time, and will not appear for translation unless filtered for specifically.
Project or backup owners, or users with the Customer role can move these into either one of the other two states, by approving them or excluding them entirely from the scope of translation.
Show Swap Entries¶
Swap entries are those segments that have had the “EL_swap” class added to their enclosing tags on the source site.
They are special in that they are added to the Workbench without processing their tags. They will be displayed verbatim, allowing you to edit the source content markup directly. That content will be sent as-is by the proxy for each request.
Take caution when editing swap entries. All responsibility of rendering them successfully and safely is delegated to the requesting browser.
Filters¶
There is a comprehensive assortment of filters available. Click on the Filters icon in the toolbar to get a an overview of all available filters:
Use the checkboxes to define your filtering settings in the dialog and click on “Set Filters”.
The dialog will close and a new element will appear in the toolbar, indicating that user-defined Filtering settings are currently active, and the segment list is updated accordingly. Click on the “X” to disable filtering. You may also click anywhere else on the toolbar indicator to open the Filters dialog again and fine-tune your settings.
These filters work with the various types of metadata associated with an Entry (such as currently assigned workflow role, enclosing block element in source, approval state or source of translation), not content - use the Search functionality to filter for source or target text.
Searching for segments¶
Use the full-text Search field in the upper-right hand corner of the screen to search for segments. When you click on the Search field, the Workflow part of the black menu bar will be superimposed with a search field right above the Viewport.
Normal full-text search and regular expressions are supported. The normal search can be a bit misleading, as it is a forward-from-word-boundary search. If, for example, you’d like to find "translation"
in a source segment, a search query for "translatio"
will turn up all segments containing any words that start with that exact set of strings (and likely end with n
,ns
, or nese
, perhaps ). Searching for "ranslation"
only, however, will return zero matches.
Use regular expressions to search between word boundaries: to extend the previous example, a query for /.ranslation/
will show all segments that contain any character (except newline) exactly once, followed by the literal string ranslation
. The same as with the Page list, a string is interpreted as a regular expression if you enclose it between slashes.
You may select between displaying segments or whole entries using the radio buttons next to the Search field.
Closing the Search Field¶
It is important to remember that the search field is also a filter: as long as it is active, segments will be filtered based on its contents regardless of being in All entries view or on a specific page.
If you wish to restore full view of segments, clear the search field and send an empty search.Use the close button next to the Search display options to close the search field.
If a search string is still present, it will be preserved and displayed in the upper right corner Search field in orange, like this:
Deletion of Segments¶
You might be wondering what this has to do with navigation, but executing a regex search reveals another feature of the Workbench you might ordinarily look for someplace else: that of deleting segments. If you click on the Magnifying glass icon while the regex search is active (a bottom-facing triangle will indicate availability), the dialog above will open.
The non-discoverability of this feature is premeditated. Skulls & Bones warnings would generally apply to any situation where the words “delete” and “regex” are found in the same sentence. That being said, to give you a measure of peace of mind, deletion of segments is not as final as we seem to make it out to be: when TM Freeze is disabled, you can re-add segments anytime by Scanning or visiting in Preview the page that contains the deleted segments.
But that is new words added each time. So, if for no other reason, be careful about deleting segments in order to avoid unnecessary expenses. Buyer, beware!
Preview¶
By clicking on the “Eye” icon, you can visit the temporary domain to check your translations in their original context. In the “All Entries” view, this function is disabled if no segment is selected - without picking a segment, the Workbench has no information on which page to show you. Otherwise, if a Page view is open, the selected page will be loaded.
If you select a segment in All Entries view, the Preview proxy will open on the page where it was seen by the crawler for the first time.
The icon on the Workbench will take you to the Preview mode, but there are a few different Proxy modes available besides that. See here for details.
There is more than one way of looking at content on the Workbench, and the default, the List View is only one of them. In this section, we go over the various in-context ‘Views’ you can access from the Workbench and use to edit your translation while making sure that it is behaving exactly as it should in the original context.
Views¶
List View¶
While not necessarily the most impressive, the List View is certainly one of the most useful views on the Workbench. You can use it to go over each segment being translated and edit, filter and search for any subset.
The List view provides various features to use with each entry. The currently selected Entry will be highlighted in yellow, hovering will causes the Entry under the cursor to highlight in blue. The presentation of Entries is clear and simple. yet there are a variety of features you can use with each. Let’s take a detailed look at an entry and see what each part of a line does.
Anatomy of an Entry¶
- Select Checkbox: check this box to select the entry or segment in question. You may use the “Bulk Actions” icon to batch process selected entries (i.e. confirm, exclude or approve them)
- Source Entry: contains the text that was found within a given block element on the source site. You may set the text direction using the “Align source text” icon in the toolbar. Otherwise, the Source Entry is not editable.
- Entry No. + Containing Block Element: use the “Go to Segment” icon to jump to a segment with a given number. This number is assigned to entries in order of arrival (new entries can be found at the bottom of the list). Additionally, the tag that contained the text on the original site is displayed below it (useful when you want to identify a segment in the HTML source).
- Target entry: The translation, provided by a variety of sources: Manual Editing, Translation Imports, Machine Translation or Translation Memory.
- Lock Segment: Prevent any changes from influencing the current content of the Target Entry. Especially useful when you want to run batch processes on your segments, but you want to exclude an entry from the scope.
- Comment on Segment: Use this icon to add comments to an Entry either as a note-to-self, or as part of a collaborative translation effort. All comments have a checkbox next to them, allowing you to mark them as settled.
- Chain link: indicates that the segment is repeated verbatim (102% match) in the current view. Click on the icon to jump to the next repetition.
- Confirm Tick click on the tick to Confirm the segment for the current workflow and send it to the next stage. Confirmed entries remain editable as long as they are unedited by the next workflow role.
- Workflow Status Indicator: Display the current workflow state of the segment. Note that this might differ depending on which user in which workflow role is currently looking at the segment.
- Flag: Display current translation source.
Workflow Tags¶
Depending on what Workflow role you’re currently in, the following Workflow tags will be displayed for each segment, influencing their availability for editing:
T - Translator
P - Proofreader
Q - Proofreader 2 (Quality Check)
C - Customer (or Client)
See the section on Workflow roles for a detailed description of the various workflow roles.
Flags¶
Ordering Segments in List View¶
Use the “Order by” dropdown to alphabetize the target or source entries or reverse the order of segments based on their Workbench ID. The following Ordering methods are available from the dropdown:
- ID, lower first (default)
- ID, higher first (use to quickly navigate to new segments)
- Source A → Z
- Source Z → A
- Translation A → Z
- Translation Z → A
Highlight View¶
The highlight view is a true in-context editing view that makes the Workbench popular with Translators, and the solution to the problem of adequate context during website localization.
By selecting an Entry in List view and clicking on the Highlight View on the Workbench, you will be shown the text on the original webpage
You may click on any part of a website and have a highlighting frame appear around that segment. At the same time, the editing box below will jump to the segment in question, where you can add/edit your translations in-place.
Really, the Highlight View is simplicity itself with very little in the way of hidden gotchas. Select a page, point & click, and translate away!
But keep in mind that while you are using the Highlight View with a page, links will be unclickable - use Free-click View to navigate on the original site from within the Workbench.
NOTE The Highlight view is a wonderful tool we are very proud of, but don’t forget that much of the textual content of a website is not clickable. Check the other modes and the Preview to make sure that everything is covered!
Free-Click View¶
The Free-Click View is much like the Highlight View, but it allows you to use the links on a site to visit other parts of the website. Alternate between Free-click view and Highlight View to translate segments as you explore the website.
Free-click view will offer to reload your segment list based on which page you are on. The following dialog will be displayed.
Highlight View will only work if the segment list is that of the current page.
Pop-out View¶
The Workbench is a single-viewport application, and the Pop-out View is a feature meant to allow you to have the Highlight View and the List View on your screen at the same time. Select an Entry from the list and click on the Pop-out view icon to open a Highlight View in a separate pop-up window.
Modern browsers block popup windows by default, so you will most likely have to enable popups for the Workbench for this feature to work.
Translation of Segments¶
We are getting to the primary purpose of the Workbench, which is of course, translation of text. In this section, we’ll talk about the different ways of doing that.
It bears mentioning right off the bat that the Workbench is not meant to selfishly yank you out of your CAT tool of choice. As the proxy supports exporting segments in the industry-standard XLIFF format, you can always employ any tool of you preference, SDL Trados and MemoQ, for example, being two major players in the field. The choice is yours.
Nevertheless, you will find that the Workbench is closer to where the website really is - up there in the cloud, and therefore immensely useful during the editing process. You’ll see it truly begins to shine when your XLIFF files have done their first round-trip to and from external CAT tools.
There are multiple ways of translating content on the Workbench:
- manual Translation using the Target editor
- using Translation Memories
- using Machine Translation
- importing translated XLIFF files
The following sections provide an overview of the translation methods listed above.
Editing segments¶
Editing of segments happens in the Target editor located on the bottom of the Workbench:
You can use the Highlight View or the List View to select a segment, and the editor will reflect your selection. The Source entry is displayed to the left, the translation (Target) to the right. Only the target is editable.
Editing Window¶
Editing translations on the Web means going around/avoiding/leaving intact the forest of HTML tags that the text os usually embedded on. The Workbench abstracts away these markup details to ease working with text (the first and foremost task of translators), but doesn’t, and will never attempt to hide the fact of their existence.
Tags are represented as numbered grey widgets around certain parts of the text, which you can use as a yardstick to place your translations in the appropriate tag context without having to worry about what those tags actually do.
Two things:
- You can Drag & Drop the Numbered Tags
- You can NOT delete them
Adding translations is otherwise straightforward text input. Untranslated entries will contain the Source text as a placeholder until such time as a change is made to the contents of the Target, at which point both the List View and the Segment contents are updated.
There are a few smaller buttons in various parts of the Target editor. In the middle, you’ll see these three buttons:
These contain editing functions that relate the contents of the Source and the Target in different ways. Use the top “Equal” button to copy the Source contents to the Target, rendering them identical. You can use the middle, “Tag” button to copy the tag structure of the Source to the Target. The lower “Eraser” button will delete the translation and restore the placeholder.
There is another set of four buttons to the right:
They all deal with the Target. In descending order, these are:
- Toggle sidebar: use this to hide the Suggestions & History sidebar above the button.
- Preserve Whitespace: prevent trimming of whitespaces in the segment
- Insert non-printing characters There are a number of non-printing characters you might need during a translation project, especially if you are dealing with languages that have a Right-to-Left writing direction, such as Arabic. See the section on RTL conversion for further details.
- Split segment from group
Saving changes¶
Click on “Save & Next” to save your work on the given Entry and jump to the next segment. You can also use Ctrl+Up or Ctrl+Down to do this. This is an explicit, although not strictly necessary action, as any edits are automatically saved upon leaving a page, or otherwise after 60 seconds of inactivity on the Workbench. Any navigation generated by the Workbench itself will also trigger a flush of all unsaved segments.
Automatic Translation¶
There are a number of Auto-translation features that you can employ to ensure quality on the proxy.
The option to run pre-translation or set it up to run automatically is also available on the Dashboard. Here, we discuss the options available after clicking on the “Pre-translate” icon in the toolbar:
The following dialog is displayed:
Pseudo-translation¶
This function is admittedly not translation in a meaningful sense, but string-reversing each and every word of every segment on the spot is very effective during a demo. Use it, then go to Preview to demonstrate to the client that website localization is as painless as using CAT tools in tandem with the Proxy. Good for the wow!-factor. You can also use pseudo-translation to test the various editing features available on the Dashboard and the Workbench.
Translation Memory¶
If you have a populated Translation Memory on a project, you may use its contents to translate segments. Use this feature to translate your content with a preset match threshold.
Machine Translation¶
You can choose to Machine Translate the currently selected batch of entries using one of the available MT APIs (Google Translate, Bing Translate, iTranslate4u and GeoFluent).
Translation Memory and Machine pre-translation are both reproductions of the options accessible on the Dashboard, with the added functionality of being able to control which specific group of segments should be targeted by the process.
Search & Replace¶
The Workbench has a Search & Replace feature you can access by clicking on this icon in List View:
It always operates on the currently listed set (or subset) of editable segments. The dialog that opens will look like this:
Both ‘Search in target’ and ‘Replace in target’ values are required, while ‘Filter by source’ is optional. You can use either simple strings or regular expressions as search terms (simple strings are the default, check the ‘Regex’ option next to the field to use a regular expression). The function is case-sensitive.
Choose ‘Test run (no changes will be made)’ to check how the operation would affect your translation. To see the entries affected by replacement, check the ‘Preview changes on a few segments first’; the segments will appear in the Preview area (Preview can’t be used without Test run). You can also start replacement right away, without preliminary checking.
Once the operation is ready, you will receive an e-mail notification with a detailed report on the entries affected.
Please note that the replacement is done in the database, not in the TM. If you want your TM to reflect these changes, you need to run ‘Populate TM’ from the Dashboard.
Keep in mind that the operation cannot be undone!
History¶
The proxy keeps tabs on what happens to each Entry in the project timeline, and the Workbench displays these tabs in the sidebar. You can use these to access previous editing states of an Entry. Here follows a short description of the History functionality.
Translation Memories¶
If you select a segment, TM suggestions will be displayed for it in the sidebar tab labeled “Suggestions”. Click on any one of them to add it as a translation for the given Entry.
A Search field is also provided that you can use for concordance lookups.
Segment History¶
Whether a result of a manual or an automatic edit, each saved state of an Entry will be saved with a username and a timestamp in the Entry history. You can access it in the sidebar tab labeled “History”.
This means you don’t have to worry about ever losing translated content as a result of manual edits - you can always restore a previous state of an Entry by selecting the Entry in List View or Highlight View, going to History and copy & pasting a previous state of your choosing.
Collaboration¶
Collaboration are a must with website localization. If you have used the Dashboard’s Sharing settings to invite other people into the project, granted them the appropriate editing rights, the list of segments in the Workbench will look a bit different to each user.
Workflow Roles¶
As mentioned previously, oles are predominantly a project management feature associated with work on the Workbench. To reiterate, there are four different roles:
T - Translator (default)
P - Proofreader
Q - Proofreader 2 (Quality Check)
C - Customer (or Client)
There are four different workflows on the proxy you may employ. You may set these on the Dashboard.
- Simple Translation Workflow (T)
- Translation + Proofreading (TP)
- Translation + Proofreading + Client Approval (TPC)
- Translation +Proofreading + Quality Check + Client Approval (TPQC)
Each setting will activate the necessary roles, which the Owner or Backup Owner may assign to any project participant. By default, only the Translator role is required. Owners have access to all workflow roles.
Workflow Roles in Action¶
Use the Workflow role dropdown in the toolbar to switch between the available Workflow roles:
Take TPQC, the workflow with the most participants, for example.
- Each approved segment is assigned to the Translator role.
- When finished with a segment (either through manual edits, automatic translation or via XLIFF importing), the translator clicks on the Confirm tick to declare that segment cleared for that phase and send it to the next role, the proofreader.
- The proofreader (and much everyone else) may use Filters to display only those segments that are assigned to that role. He takes the segments sent by the Translator, edits them, changes their wording as required. When finished with an entry, the proofreader clicks on the Confirm tick, sending the segment along to Quality Check.
And so on. This cycle is then repeated until a segment (more to the point, all segments) reach the final workflow role, that of the Customer, who approves translated entries.
A few things to keep in mind:
- Each Role has access to the lists of upstream roles.
- Only Owners, Backup Owners and Project Managers have access to all roles.
- Entries/segments belonging to another role are greyed out.
- A segment remains available for editing after Confirming it just as long as it is not touched by the next Workflow role. If you ever mistakenly Confirm an entry, you may, so to speak, reclaim it for some more work before the next role can get to it.
And that’s about it!
Work Packages¶
If Workflow Roles is a method of grouping your users, then Work Packages are a method of grouping segments. See the Dashboard chapter for the details of how to generate them.
Use the Work Package dropdown to select a Work package, and the Workbench will display only those Entries that belong to that Work Package (note that an Entry may belong to more than one Work Package!). The dropdown looks like this.
The only default entry in the dropdown is “All”, which means disabling Work Package based filtering on the Workbench. As new Work packages are generated, this list will automatically update after refreshing the window. The dropdown will always contain the names of the 100 latest Work Packages.
Clicking on “Manage workpackages...” will take you to the Dashboard where you can tend to your existing Work Packages or generate new ones.
Cookbook¶
Seach & Replace¶
Tutorial 1: Fix Spelling (String Replacement)¶
A simple use case of the search & replace feature is the old chestnut: the differences between British and American spelling rules.
This could come up whenever you are working on both the en-US
and en-GB
locales, or if two translators, each on a different side of the pond, forgot to coordinate their spelling.
Let’s say you have the following targets with German as a source language:
The world's No.1 donut
Vanilla donut
Chocolate-chip donut
Doughnut miss it!
and so on. Replace all instances of the word donut
with the doughnut
variant by following the steps below.
- Click on the Search & Replace icon
- Fill in the
Filter by source
field to work exclusively on those entries that contain the given string in the source language. This example is about the spelling of “donut”, so you would enter the German original,Krapfen
, to limit the search. Rest assured: source entries are never changed. - Enter the word that you’d like to replace in the
Search in target
field, in this case,donut
. - Enter the replacement,
doughnut
, in theReplace in target
field.
That’s all there is to setting things up, the rest is about making sure your changes will not cause any problems.
- Click on Preview! to see your changes applied to a subset of segments.
- If contents of the Preview area look good, uncheck the Preview checkbox.
- (Optionally)do a Test run by clicking on “Go!“
- Uncheck the
Test run
checkbox and click on “Go!” to really apply the replacement.
Depending on the number of segments, the process can take some time to finish. You will receive an e-mail for both the Test and Live modes, containing a list of proposed (Test) or applied (Live) changes.
Tutorial 2: to-do (Regex Replace)¶
Maintenance¶
Project maintenance¶
The translation project of a website is a continuous task, as new content is regularly added to the site. So uploading the initial translation is not the end of the project, but the beginning of a new phase. This phase is practically a repetitive cycle of the following activities, many of which can be automated:
- checking the site for new content
- extracting new content for translation
- translation of the new content
- uploading the new translation
- and, of course, error fixing, if needed
Automation possibilities¶
The first two activities can be set to be carried out automatically at daily, weekly or monthly intervals, depending on how frequently content is updated on the website. This is called scheduled crawl.
If you turn on this feature, changes will be checked and retrieved for translation at the specified intervals, and e-mail notification will be sent to all project participants. The new content is available right away for translation in the online editor interface, and you can also download an XLIFF for translation.
Please note that this check is technically content extraction, so it has an associated cost of EUR 2 per 1000 words.
You can enable this option in Content > Settings > Look for changes. Of course, you can set it to Only manually, if the site does not update frequently.
The process can automatically extract new content, apply the associated translation memories and machine translation services, prepare a work package and the XLIFF export. This is particularly useful for fast-moving sites where content arrives quickly and time spent untranslated needs to be minimized (possibly at the expense of real-time human oversight).
Automatic pre-translation¶
Easyling can automatically pre-translate incoming content without user intervention or oversight, feeding its translation engine from saved translations (from an the translation proxy Translation Memory) above a certain confidence level or using configured machine translation engines (Google Translate, Bing Translate, iTranslate4EU, and GeoFluent, currently).
If new content is encountered, and at least one source is configured, a user-configurable timer starts counting down. Content is collected during this timer, and automatically translated using the configured sources. At the end of the configured window, any content that cannot be translated with the assigned sources (no matches of the desired confidence are found in the TM or the MT-engine returns no translation) is packaged into an auto-generated Work Package and exported into an XLIFF file (being pushed directly to XTM, if an integration is configured). The resulting export can then be translated in any external system and imported back normally.
Caching And The Proxy¶
Overview¶
Nearly all browsers today implement local caches to accelerate page loading and prevent unnecessary requests from being sent out to the network. However, the operation of these caches is tied to the presence of certain headers on the page, such as Pragma
and Cache-Control
- based on their presence and the values communicated in these headers, the browser (and various systems, such as CDNs) may make a decision to intercept the request and serve up certain content without requesting it anew from the server.
Normally, the Proxy simply forwards these headers, much the same way it does with any other. The option to override their presence and values exists (see the Path Settings option on the Dashboard), but by default, they are left unmodified, in the spirit of minimum invasion. This is not always desirable, however, as a site without such cache headers will remain uncached in the visitors’ devices, and each visit to the page will result in another request that is billed.
Inspecting Cache Headers¶
You can investigate how well a site may be cached using the Developer Tools in most major browsers. In Chrome, for instance, the DevTool can be summoned using by pressing the F12 key (or Alt+Cmd+I on under Mac OS), and after refreshing the page, the Network tab can be used to browse traffic associated with the tab. By selecting any entry in the list, you can view its details, in particular, the request and response headers. To tell whether or not a given resource will be requested again, you need to look at the “Response Headers” section, and look for the keys Cache-Control
and Pragma
.
If you see Pragma:private
and/or Cache-control:no-cache
, it is safe to say that the given resource will not be cached and each visitor will result in another hit. Files like this will likely prove resource drains if the site receives large amounts of traffic.
On the other hand, Pragma:public
and Cache-Control:public, max-age=\d+
(where \d+
means at least one digit, or more) are good signs in that these files will be stored on the client’s device after the first request, and will not be requested until max-age
seconds have elapsed since the last load, and will save resources in the long run. Of course, this also means that visitors may be seeing an “outdated” version of the resource for a limited time before their caches expire and are reloaded.
There is a bit of a gray area when seeing Cache-Control:must-revalidate
: this directive allows the cache to make the final decision based on its own algorithm, and the response may be stored, but not necessarily. When seeing this it is good to prepare for potential increased traffic, as the browser cache may or may not retain these responses.
What to take away¶
If you’re experiencing consistently large numbers of page views on a given project, it is often a good idea to suspect caches (as opposed to search bots, which cause transient spikes). In such cases, you should inspect a few pages using the DevTool, and determine whether or not the site is set up to take advantage of browser caches.
If it turns out be the case that the site is not set up to use caches, the best course of action is to notify the owners - perhaps they have a good reason for it. In any case, these headers should be added to the source, so that they provide consistent values across all languages. Alternatively, you can use the aforementioned Path Settings dialog to force the Proxy to override the cache headers and make the site cacheable, at the risk of diverging from the original, even if only for a short time.
Technical Reference¶
Architecture¶
Modularity¶
The Translation Proxy is based over Google’s AppEngine infrastructure, split into frontend and backend modules. Each module encompasses variable numbers of instances, scaling automatically in response to demand. Modules are versioned and deployed separately, and can be switched independently, if needed.
Frontend instances serve requests from visitors to translated pages (in addition to serving the Dashboard and providing user-facing functionality).Requests are routed to the Proxy Application via the CNAME records created during the publishing process.
Backend modules are responsible for billing, statistics aggregation, and handling potentially long-running tasks, like XML import-export. Backend instances are not directly addressable, and provide no user-facing interaction.
Underlying technologies¶
Immediately underlying the Proxy Application is the AppEngine infrastructure, responsible for rapidly scaling the deployed application. AppEngine also handles communication with the Google Cloud Datastore, a high-replication NoSQL database acting as the main persistent storage; as well as the Google Cloud Storage system, an also-distributed long-term storage. Logging is provided by Google Cloud Logging, while BigQuery provides rapid search ability in the saved logs on request.
Encompassing the entire application is the Google EdgeCache network, proactively caching content in various data centers located regionally to the request originator. Any content bearing the appropriate headers (Cache-control:public; max-age=/\d+/
and Pragma:public
- both are required) is cached by the EdgeCache for as long as needed, for requests originating in the same geographic area.
The current instance of the Proxy Application is hosted in the US central region, as a multi-tenant application (serving multiple users, but dedicated to proxying). However, single-tenant deployments (dedicated to a single user), special deployments to other regions (EU or Asia), or internal systems fulfilling the AppScale system requirements can be discussed on a per-request basis.
Request Handling¶
In the Translation Proxy, frontend instances are responsible for serving translated pages. Thanks to AppEngine’s quick-reaction scaling, the number of frontend instances aggressively follows (and somewhat predicts) demand, keeping latency low. The general life cycle of a proxy request can be described as follows.
- The incoming requests, based on the domain name, reach the Google Cloud (rerouted via DNS record CNAME, pointing to
ghs.domainverify.net
. - Based on the domain name and the deployed Proxy application, AppEngine decides that this specific request should be routed to the Proxy AppEngine deployment.
- The request reaches the Proxy Application internally; the application does a lookup against the domain for the associated project. There are special domain names, and the final serving domain, for which caching is activated.
- Based on the URL, the Proxy application determines the matching Page in the Proxy database. The database has a list of segments, pointing to our internal Translation Memory (TM). We retrieve all these existing Database entries, including the translations for the given target language.
- The Proxy application processes the incoming URL request, and transforms it to point back to the original site’s domain. Then, the source content of the translation is sourced, according to cache settings in effect on the project.
- If source caching is disabled, the application issues a request, and retrieves the result from the original web server, which is hosting the original website language.
- If source caching is enabled, a local copy (a previously stored version of the source HTML) is used, instead of issuing a request to the original web server.
- Depending on the
Content-type
of the response, the appropriateTranslator
is selected, and the response is passed to an instance of theTranslator
as a document. The behavior of theTranslator
can be affected by cache settings as well.- If binary caching is disabled, the application then builds the Document Object Model (DOM) tree of the result, finally iterates through all the block level elements, and matches them against the segments loaded from the database. If there’s a match, we replace the text with the translation. If not, we ‘report’ it as a missing translation.
- If binary caching is enabled and the hash of the source HTML matches the one stored in the cache, a previously prepared and stored translated HTML is served.
- If binary caching and keep cache are both enabled, and the hash of the source HTML doesn’t match the one stored in the cache, the proxy translates the page using the TM. If the number of translated segments is higher than the previously prepared and stored translated HTML, the new version is served; otherwise the old one. (Keep cache can be thought of as a “poor man’s staging server”).
- Hyperlinks in the translated content are altered by the
LinkMapper
to point to the proxied domain instead of the original. This affects allhref
orsrc
attributes in the document equally, unless the element is given the__ptNoRemap
class. At this point, resources may be replaced by their localized counterparts on a string-replacement basis. - The application serializes the translated DOM tree, and writes it to the response object.
- Before final transmission takes place, the Proxy may rewrite or add any additional headers, such as
Cache-control
orPragma
. - Finally, the Proxy serves the document as an HTML5 stream, as a response to the original request. AppEngine must close the connection once the response is transmitted, so proxying streaming services is not possible in this fashion!
Classification of Content¶
The Translation Proxy distinguishes two main types of content: text content and resources. The key difference is that text content may be translated, while resource content is treated opaquely, and only references can be replaced as resource localization. It is possible to reclassify entities from Resource to Text, but not the other way around.
During proxying, resources are enumerated, and any already-localized references are replaced, while text content is passed to an applicable Translator
implementation for segmented translation.
Text content¶
By default, the Proxy only handles responses with Content-Type:text/html
as translatable. To process HTML content, the source response’s content is parsed into a Document, then text content is extracted from the DOM-nodes. Additionally, various attribute values are processed (without additional configuration, title
and alt
).
The content is then transformed into SourceEntry
entities server-side. Each block element comprises one source entry, with a globally unique key. If segmentation is enabled on the project, the appropriate rules are loaded (either using the default segmentation or by loading the custom SRX file attached to the project), and the content is segmented accordingly, with the resulting token bounds stored in the SourceEntry
.Along with the SourceEntry
entities, the corresponding TargetEntry
and SourceEntryTarget
entities are created. TargetEntry
entities, as the name suggests, hold the translations; SourceEntryTarget
s act as the bridge between the two, and hold the segment status indicators for both.
The content of source entries is analyzed in the context of the project, and statistics are computed. These statistics include the amount of repeated content at different confidence levels based on the similarity of the segment - The Translation Proxy differentiates five levels of similarity:
- 102%: Strong contextual matches: every segment in the block level element (~paragraph) is a 101% match, where all the tags are identical. These matches do not result in the creation of new
SourceEntry
entities, thus changes in one place are propagated instantly to all occurrences. - 101%: Contextual matches: both tags in the segment, and contexts (segments immediately before and after) match.
- 100%: Regular matches: the segment is repeated exactly, including all tags.
- 99%: Strong fuzzy matches: tags from the ends are stripped out, words lowercased, numbers ignored.
- 98%: Weak fuzzy matches: all tags are stripped out (may have to be adjusted manually afterwards!), words lowercased, numbers ignored. If the Proxy cannot match the tags between the translation and the source due to excessive differences, all tags are placed at the end of the segment, requiring manual review!
These classifications are reused during memory-powered pre-translation in order to select the best applicable translation or to propagate existing translations.
Resource content¶
By default, any content with content types other than text/html
are treated as a resource, and is not a candidate for translation, only replacement en bloc. This mainly includes application/javascript
, text/xml
, and various image/*
content types. Every resource can be given different replacements per target language, and if required, certain resources (application/javascript
and text/xml
) can be made translatable after pre-configuration is done. In this case, instead of references being replaced, the appropriate Translator
will be instantiated and the content passed to it. This can enable partial or complete translation of dynamic content transmitted as JSON
or XML
.
Translation Memories¶
The Translation Proxy can be configured to maintain and leverage internal translation memories. These memories can contain more than one target locale allowing leveraging them for any pair of locales contained within.
As opposed to project dictionaries, translation memories are keyed to the user creating them, and can be assigned to any project the user has Backup Owner privileges or higher.Any project can contain an arbitrary number of memories, but one must always be designated the default: only this memory will be utilized when segments are being written; while pre-translation and suggestions are fed from all memories assigned to the project with applicable locale configurations.
Using TMs¶
Translation memories are initialized empty, and must be first configured with locales. After the target locales are defined, the memory can be populated. There are three ways a segment can be injected into the memory:
- TMX-import: The Proxy can digest a standard
TMX
(Translation Memory eXchange) file and populate a designated memory based on its contents. The memory must be configured with at least one of the target locales of theTMX
file. Duplicate segments are either merged (if for different locales) or discarded during import. - Project population: The Proxy can populate the memory from the project it is currently assigned to. The memory must be configured with at least one of the project’s target locales for this to work. If there are several locales assigned to the memory, the UI will treat them as a set, and offer the intersection of the memory and the project’s locales as the default. This set can be further restricted by removing locales from the population task before committing it. This action is logged in the project’s Audit Log.
- Individual Injection: If a memory is assigned to the project with at least one locale present on both, it will be available on the Workbench for use. Confirming one or more segments will trigger the
saveToMemory
action, injecting the segment in its current form into the memory.
Memories are used for two tasks on the UI:
- Pre-translation tasks can leverage any memories assigned to the project, provided the memory is configured with the correct locale. This applies to user-triggered Pre-translation, as well as Automatic Pre-translation triggered by new content. Only content with confidence levels above the user-configured threshold will be used, matches with lower percentages are discarded.
- The Workbench automatically leverages any memories with the appropriate locales on segment selection. Matches are displayed in the Suggestions tab of the sidebar, along with their match percentages. Additionally, all memories on the project with the applicable target segments can be queried at will by entering a search term.
Confidence levels¶
The Proxy differentiates five levels of similarity between individual segments/entries (see here). Memory application yields the best results between 101% and 99% - 98% matches disregard tagging, and may need manual adjustment. However, searching below 98% is also possible, using the Google Search API, but these matches should be used with caution, as there is no guarantee regarding their accuracy due to the Search API’s word stemming.
Page modifiers¶
Due to the way the Proxy Application operates, it becomes fairly easy to modify the pages as they are being served. Because the datastream must pass through the proxy to have the translation embedded, the Proxy Application can insert JavaScript modifiers, modify style sheets, and even embed entire pages that do not exist on the original.
- CSS Editor: the Proxy Application can be used to insert locale-specific CSS rules into the site being served. The rules are inserted as the last element of the
head
on every page served through the proxy. The most common use of this feature is to alter the writing direction for non-Latin scripts, such as Arabic. - JavaScript Editor: the JavaScript edited here is inserted into the
head
element of every page being served through the Proxy Application. As the last element of thehead
, it has access to any global variables added by scripts before it. - Content Override: the Proxy Application can create a “virtual” page in the site or override an existing one with custom code. For any requests to an overridden page, the corresponding remote server request is not sent, and the override contents are used as the basis of the translation. The source is not required to be HTML, custom content-types can be entered, along with customized cache headers, and status codes (HTTP status codes are restricted to those permitted by the Java Servlet class!) - note that the 300-family of status codes requires the
Location
header to be defined as well.
Both the CSS and JavaScript injectors can use already-existing files for injection instead of copied content. The injected files must be handled by the project in some way (either by being in the project domain, or in the domain of a linked project), or be created by a content override. The order of definition for these entries matters, as they will be inserted into the document in the order they are displayed on the UI, which may cause dependency or concurrency issues!
Troubleshooting & Support¶
Contacting Support¶
Website translation can be a complex task, even with the help of a piece of software like the proxy. Finding the root of the problem can be equally complex for those of us on support duty. Therefore, we would like to give you a few pointers on what information to supply if you decide to contact us.
This first thing we need is the project code. This eight-character unique string uniquely identifies the project. As you can see in the screenshot below, the project code is located in the address bar of your browser - you can copy the entire URL for us, but the project code by itself should suffice.
We also need a thorough description of the problem. Screenshots are tremendously helpful, especially if you have layout issues on the translated site. If you’re running into translation issues, please give an example segment along with the page link it can be found on.
If you have issues with importing XLIFF or TMX files, please attach them so that we can take a look. If you have questions about statistics, reports, or crawl logs, attaching or linking them in your query will considerably speed things up for us.
The information you provide will help us uncover the root cause of the problem. This often requires a bit of “detective work” on the original site, so we ask for of your patience while we figure things out. Someone from the support team will respond shortly with a solution, a request for more information, or simply an update.
Issues¶
Scanning Content Behind Secure Login¶
To scan content behind secure login, please follow this procedure:
Open your project and navigate to the Content menu.
Open the Pages list.
Visit the page with the login, if it is listed, and click Preview.
ORGo to the Preview of the front page (the “/”, the first one on the Pages list). It will give you the front page through the proxy.
ORGo to the address bar and type in the URL of the login-protected page.
- Enter your login details.
- Open your browser’s DevTools from the Menu (F12 on Windows).
- Go to** Network** and reload the page.
- Scroll up to the first item and click on it.
- Under headers scroll to the cookie header (among request headers), and copy the entire header.
- Pass it to the Proxy: go back to your project and click on Content. Paste the entire content in the Scanning cookies field.
- Click on Scan manually and specify the required scanning settings. You will receive an e-mail notification once scanning is ready and new content is available for translation.
XLIFF import error¶
When you import your translated XLIFF file back to easyling, you will receive an e-mail notification when the process is ready. This mail contains the URL of the import log, and an overview of the log entries:
- Error:
- Warning:
- Info:
If you see other than ‘Error: 0’ in your notification mail, the XLIFF file needs fixing. Usually these are tag placement errors that can be easily fixed in a text editor like Notepad++ or Sublime (the ones that have syntax highlighting, to make this fixing job easier), yet they do need attention, as the corresponding translation will not show up on the website.
- Open the XLIFF in your text editor
- Open the log file and check the error message(s)
- Do the necessary corrections in your XLIFF (see ‘Troubleshooting’)
- Save & upload the corrected XLIFF
In very serious cases the import might fail completely, but this is very rare. These cases include: attempt to upload an XLIFF related to another project, XLIFF with target language that doesn’t exist in the project, and fully invalid XML in the file. In most cases the file is imported, only the faulty entries are omitted.
Please note that you need XLIFF files. Ideally, the export format of the CAT-tool should be the same as the import format, and as you import an XLIFF file for translation, the output should also be a standard XLIFF file. However, some versions of Studio tend to create an SDLXLIFF file upon exporting the translation. In this case, simply use the “Finalize” batch task or open the document in the Editor, press SHIFT+F12 and select the target file location. This will create the XLIFF file for you (instead of SDLXLIFF).
You might also need to disable segment info storage in Studio (Options -> File Types -> XLIFF -> Settings -> ‘Do not store segmentation information in the translated file’ should be checked). This may require creating a new project.
Troubleshooting¶
Error: The xml structure has been changed so much that it is now unmappable from the source
Fix:
- Open both the XLIFF file and the error log in a text editor.
- Select & copy the TM-key of the faulty entry, the part after ‘(trans-unit id=”xxxxx)_tm:’ in parenthesis right after the error message
- Search for this key in the XLIFF file by pasting it into the ‘Find’ field. Only 1 translation unit will match.
- Compare the tags in the source and target languages, and fix the mismatch by editing the target text. (You can also use an online text comparison tool for this task: copy-paste
<source> … </source>
in one pane and<target> … </target>
in the other one.) - Save the corrected XLIFF file and upload again. It should give no error message now.
OR, alternatively,
- Go back to your CAT tool, where you did the translation and open the faulty file for editing
- Run QA. It will list you all the tag mismatches
- Navigate to the faulty segment(s) and fix the tags
- Export the corrected file and upload it. It should give no error message now.
Error: Content found outside of outermost element
Fix:
Practically this means that there is an extra space before the starting <target><g*>
or after the closing </g></target>
.
- Open both the XLIFF file and the error log in a text editor
- Select & copy the TM-key of the faulty entry, the part after ‘(trans-unit id=”xxxxx)_tm:’ in parenthesis right after the error message
- Search for this key in the XLIFF file by pasting it into the ‘Find’ field. Only 1 translation unit will match.
- Delete the extra space around the tags
- Save the corrected XLIFF file and upload again. It should give no error message now.
OR, alternatively,
- Go back to your CAT tool, where you did the translation and open the faulty file for editing
- Run QA. It will list you all the tag mismatches
- Navigate to the faulty segment(s) and fix the tags
- Export the corrected file and upload it. It should give no error message now.
IMPORTANT! Most of these issues can be avoided if the QA parameters of your CAT tool are set up properly, and you run QA before exporting your XLIFF files. Please make sure to check your translation for tag consistency and extra spaces; these are critical errors in website translation that can spoil the code.**
Error: Illegal character
Fix:
The reason for this error is usually a coding mismatch: all our XLIFF files are exported using the world-standard UTF-8 codepage. However, your CAT tool may save the file using another codepage, depending on the language, which may cause certain characters to appear as invalid.
- Open both the XLIFF file and the error log in a text editor
- Select & copy the TM-key of the faulty entry, the part after ‘(trans-unit id=”xxxxx)_tm:’ in parenthesis right after the error message
- Search for this key in the XLIFF file by pasting it into the ‘Find’ field. Only 1 translation unit will match.
- Check if you see any strange characters, like squares or other meaningless characters.
- Go back to your CAT tool and change the export options to use UTF-8 encoding. (As UTF-8 is a universal standard, it should be available.)
- Re-export the XLIFF with UTF-8 encoding and upload it. It should give no error message now.
Publishing issues¶
Translated page doesn’t show up¶
Issue: I’ve just uploaded the translation of some new pages, but they don’t show up on the translated site / they still appear in the source language.
Fix:
You experience this problem because Target cache is enabled, and you need to clear the Cache to update content to be served. The very reason for using a Target cache is to mask unfinished translation and avoid bleed-through. The Target cache shows the last fully translated version of the pages - so if content changes on the original page, it remains hidden on the translated site until a fully translated version is available.
To fix the issue you need to explicitly delete the page from the Cache, so that the updated content could be loaded in upon the very first viewing of the page.
If you have several languages, it might be more convenient to clear the entire Target cache by clicking the Trash icon.
Translated page is not listed in Google search results¶
Issue: The pages served from the Proxy are not listed in Google search results.
Fix:
Google has never indexed the site in the first place, most likely due to the fact that there are no “hreflang” links on any of the pages, so the Googlebot has no idea there are other pages to look for. More information on the element and how it affects Google rankings may be found at https://support.google.com/webmasters/answer/189077?hl=en
Additionally, creating and submitting a sitemap (more information at https://support.google.com/webmasters/answer/2620865) to Google in order to force an indexing of the pages can also help. Even so, without the hreflang attributes, it may mean that some penalty is applied to the rankings, due to perceived duplicate content.
Using the hreflang
Element¶
Google’s CrawlerBot will eventually find your translated page if there are any links to it. However, if the content there is not marked appropriately, it will not be given the same SEO scores as your main content. In fact, it may even be treated as duplicate content, and a scoring penalty may be applied.
To prevent this from happening, you need to provide the GoogleBot with information on how the translated sites relate to the original. The easiest way to do this is the <link rel="alternate" hreflang="" href="">
element.
These elements have to be placed in the page head (i.e. before the HTML body), and have two rules that must be satisfied in order for the GoogleBot to consider them:
hreflang
elements must be reciprocal: if a link points to a translated site, the translated site must point back to the original as well.hreflang
elements must be circular: each language must also refer to itself with a link.
Consider the following HTML snippet from an imaginary site at http://example.com
:
<html>
<head>
<title>Title Here</title>
<link rel="alternate" hreflang="en" href="http://example.com" />
<link rel="alternate" hreflang="jp" href="http://jp.example.com" />
</head>
[...]
and its translated counterpart at http://jp.example.com
:
<html lang="ja-JP">
<head>
<title>Title Here</title>
<link rel="alternate" hreflang="en" href="http://example.com" />
<link rel="alternate" hreflang="jp" href="http://jp.example.com" />
</head>
[...]
This snippet will provide proper SEO, since it satisfies both criteria: the references the English and the Japanese site are reciprocal (one refers to the other and vice versa) and the references are also circular (both languages also refer to themselves as well as their counterparts). This provides all the information GoogleBot needs to index each site in its rightful place and apply SEO scores across both domains.
For more information on this topic, see this article from Google:
https://support.google.com/webmasters/answer/189077?hl=en
WPEngine issues¶
Redirections during crawling¶
Due to WPEngine’s caching system and Redirect bots settings you might experience any of the following issues on your WPEngine hosted sites:
- Scan extracts outdated content from WPEngine cache
- Scan returns 301: Moved permanently error message for existing pages
- Translated page is redirected from HTTPS to HTTP - which results in an error due to mixed content
WPEngine caching uses different so-called buckets based on request type, and they have one for bots. If the request comes from Google
or other listed user-agents, and/or the URL has ?ver=
followed by a number, the\ redirect bots settings take effect.
The above issues can be resolved on the WPEngine site by turning off the redirect bots.
Intermittent HTTP403
on proxied pages¶
WPEngine automatically blocks traffic from “problem” IP addresses, typically those that generate large amounts of traffic in a short time. Due to the nature of the proxy, requests from several users may appear to have come through one IP, leading to WPEngine blocking that node due to their perception of “increased traffic”.
If that happens, and you note random pages of the proxied site not loading intermittently, call or chat WPEngine Customer Service with the following:
In relation to issue #874002, please enable proxy access on our installation.
According to an agreement with WPEngine, they will enable an alternate method of IP resolution that should no longer prevent access to the translated pages.
Captchas¶
Captcha doesn’t work on the translated site¶
This issue is most likely results from page caching. Certain captcha solutions, like WP plugins are hardcoding the image URL into the HTML, instead of sending it asynchronously. During the crawl that builds up the Source cache for the project one hash is saved, so it beco\ mes static, while it should be changing on each occasion. As a result, the server rejects the request because of the outdated verification image.
Fix:
- disable caching altogether to make captcha work
- use caching without captcha
- use another CAPTCHA system that uses async requests to retrieve the verification image, like Google’s ReCaptcha solution
Another possible cause can be the CORS-header. If the proxied page is not listed as an allowed origin, the browser blocks the page when it tries to load the image.
FAQ¶
General¶
Where are my translations published?¶
Instead of “where?”, a better question to ask is “how?”.
Imagine the proxy as standing between the original site and a visitor’s browser. Publishing the Japanese translation of example.com on the jp.example.com subdomain means mapping jp.example.com (presumably owned by the owner of example.com) to point to the proxy.
Visiting jp.example.com/contact.html results in that request being caught by the proxy and relayed to example.com/contact.html - the origin server. The contact.html page is served as a response, which is caught on the way back, translated on-the-fly at blazing speeds and served to the visitor.
This requires that jp.example.com be mapped to the cloud proxy application in the owner’s DNS settings.
Does the proxy host a copy of my site?¶
No. The proxy does not store any copies of the original site pages, it only stores translations, which it uses to process responses served by the original site to visitor’s queries.
There is one exception to this principle: if a source cache is built and enabled for a proxy mode, that cached version of the page will be used in place of the origin server’s response.
Some parts of a site are on a subdomain. How will the crawler pick them up?¶
The sites www.company.com and blog.company.com are treated as separate domains by the crawler. From the vantage point of a crawler running on www.company.com, a path on blog.company.com is an external and will be treated as such. The solution is to create two separate projects and link those with each other.
The Discovery went beyond the limit I set. Why?¶
A crawl will finish the current round and visit the redirects and links on the last page. If it took the limit too literally, that could potentially result in trailing links being thrown out.
Can I get page-specific repetition statistics?¶
Repetition statistics make the most sense in a site-wide context. The problem with controlling calculations on a per-page basis is that it is not true to life to call a segment on a given page a “canonical instance”. Take a navigation bar or a footer, for example. It will be “repeated” on all pages, but it cannot be said to “belong” to any one of them. the translation proxy stores the first instance it comes across and then propagates its translation to all other instances.
The page I’m trying to translate has prices. What can I do to handle local currencies?¶
The prices themselves can be made translation-invariable, but real-time price handling for different currencies will have to be implemented by the client on the source site, making it possible for the proxied site to access the locale-specific information. Pricing of products and services also has legal / market implications that are beyond the tasks of LSPs. Of course, once currency-specific information is accessible from the original site, we are happy to help with integrating any backend calls / ajax requests on the proxy.
How do I enable automated scan on my project?¶
To enable automated content extraction on your site, please go to Content, and choose either of the daily, weekly or monthly options in the drop-down next to the Look for changes option.
Is it possible to set up automated scanning behind secure login?¶
No, scanning can’t be automated behind secure login. For such processes you need to extract cookies with you browser’s dev. tools and pass them on to the proxy. Some cookies get invalidated over time, and we don’t store cookies either.
What do the various tags mean next to each page in the page list?¶
See the Glossary for a detailing of the various tags encountered in the page list.
Caches¶
Can I preview newer content on the workbench without causing bleedthrough on the published site?¶
You can customize which Source Cache to use on which proxy mode - go into Page Cache, choose custom settings and select Disabled from the dropdown menu. The preview mode will display all new content. It is recommended that you keep TM Freeze turned on while exploring the new content, otherwise everything will be automatically added to the Workbench.
Does building a Source Cache cost any money?¶
You can use Content Scan with the appropriate options checked to build your Source Cache. As long as there is no new content to pick up, Scan costs the same as a Discovery.
How can I check if a page uses the Source Cache?¶
Go into Pages view in the Dashboard. If you hover with the mouse over a page in the list, you will see a Cache button. Click on it to verify Source Cache information for that page. If there is no Source Cache for that page, you will see the following screen:
Does building a Target Cache cost any money?¶
Setting aside the inherent cost of the Page Visits you have to accrue to build them, Target Caches are free of charge.
Glossary¶
We use a set of recurring terminology in this manual and also in our Support Channels - we collect them here for your reference.
- 101% match
- Contextual repetition. Tags within the segment and the neighbouring segments are repetitions / exact matches as well.
- 102% match
- Strong contextual repetition. Every single segment within a block is a 101% match, and all tags are identical.
- Bleedthough
- When newly added, untranslated content appears on the translated site in the original language
- Dictionary freeze
- No new items can be added to the translation memory. Only available when Page freeze is activated.
- Discovery
- Checking the website for translatable content
- Exclusion rule
- A rule specified for explicitly excluding pages from the translatable list
- Highlight view
- The secondary view mode of workbench, allowing for in-context editing
- Inclusion rule
- A rule specified for explicitly including pages in the translatable list
- Keep Cache Strategy
- The strategy used to avoid bleedthrough. The last fully translated version is available on the translated pages, and new content is only added when the translation is ready
- List view
- The main view of the Workbench; a simple editor for online translation
- Page freeze
- No new items can be added to the page-list marked for translation
- Resource
- Binary content found on the website (images, PDFs, CSS and JS files, etc.)
- Scan
- Extracting content from the website for translation
- Workbench
- The online editing view of the proxy
Page List Tags¶
- Discovered
- the page was visited and content on it was included in a previous word count.
- Excluded by rule
- the page is excluded by a rule declared at the top of the page list.
- Excluded
- page was excluded by clicking on the "Exclude" button in its hover menu.
- New
- the page was content extracted, with no translations on it yet (or pending progress update).
- Progress bar
- repetitions were propagated to the page, or its translation is in-progress.
- Unvisited
- the page was collected as a new URL, but it hasn't been Discovered/Scanned yet.