This document outlines my findings whilst carrying out a website audit for the Trainline website trainline.com. It lists the issues I have uncovered, along with my recommendations to fix them.
Trainline Technical Audit Table of Contents
Please use the links below to jump to the appropriate section as desired.
- The Current Landscape
- Technical Analysis
- Crawling & Indexation
- Site Performance
- Site Health
- Backlink Analysis
Please bear in mind that in this instance I am not familiar with the CMS and the difficulties/limitations that it may or may not have. I also do not have access to Google Search Console or Google Analytics.
It’s also worth mentioning that due to limitations with tools I was only able to crawl a sample of 500 URLs on Trainline website using DeepCrawl.
If I had more time and access to more crawlers I would be able to crawl and analyse much more, if not all of the URLs on the website.
Additionally, this audit was conducted in March 2020 so it is entirely possible that changes have been made since then.
The Current Landscape
This area simply looks at some of the elements that make up the website or are contained/used on the Trainline site as it is currently:
- React Java Framework
- Apache web server
- Google Analytics
- Google Tag Manager
- Akamai CDN
- New Relic eCommerce
- Usabilla issue tracker
- Mobile responsive website
Observations: From an initial look over the website, it seems to be returning the correct status codes for live (200) and redirected (301) URLs.
There were 2 URLs from my sample of 500 URLs that were still on the HTTP Protocol, these appear to be external URLS that the site is linking too though so this isn’t too big of a problem.
The URLs and parameters do make sense too and don’t appear to be causing a problem, for example:
|Recommendations: I would double check the URLs on the unsecured HTTP protocol and if they are all external then I would look at updating the links to ensure they link to a secure version.
Why is this important? When a crawler requests a page on your website, your web server returns an HTTP status code along with the response. It is very important to make sure that your server returns a status code with a value of 200, which is the equivalent of saying that everything is OK. These should also be on the secure HTTPS protocol otherwise users can be met with a warning before they visit your website.
URL structure is also very important for any website that needs categorisation of some sort.
HREFLANG and Internationalisation
Observations: Trainline does have an international audience and largely uses international domains to do this.
This is the current hreflang setup:
These URLs are set up to display local websites and content in each country they serve, for example:
- Display German content for users in Germany (de)
- Display Spanish content for users in Spain (es)
- Display the French website for users in France (fr)
Some pages are missing HREFLANG tags like this one: https://www.thetrainline.com/trains/great-britain/railcards but this could be because this information is only relevant to the UK.
Why is this important? There is no one set approach for internationalisation and there is no reason to change the Trainline’s approach here. From what we can see, HREFLANG is currently set up correctly and is working well.
Observations: After inspecting the homepage and viewing the Page Source, I can see that there is a self-referencing canonical tag in place, which is good and shows Google that this is the correct version to display.
Most of the pages I audited had self-referencing canonicals, and a small amount were canonicalised to other URLs. However there are some that were missing canonical tags.
Here are some examples of these URLs, in total there were 75 of these from our sample of 500:
|Recommendations: I would recommend updating these links to include self-referencing canonicals.
Why is this Important?: It is always recommended, where possible, to add self-referencing canonicals as a prevention measure to ensure that incorrect or duplicate versions of these URLs are not indexed instead.
Observations: The website on the whole seems to have a fairly good, direct internal linking structure with unique anchor text and target URLs – within two clicks users can quickly and easily find the information they need and / or book trains.
Usually I would also run checks for orphaned pages but unfortunately my access to some tools has been revoked so I was unable to accurately check for this.
|Recommendations: The site looks to be well linked to internally, as mentioned above, I would usually like to check for orphaned pages, but unfortunately due to access being revoked on the majority of tools from my current company I was unable to check for this.
Why is this important? Internal linking is very important to the SEO strategy of a site. It is one of the most powerful weapons that you can use to increase your search engine rankings.
Your website needs to have a good internal link structure and all your pages must be linked together for the search engines to be able to crawl through and reach every single page. If some of your pages remain isolated from the rest of the website, crawlers will struggle to find out about their existence and index their content.
Observations: There is a 404 call back protocol in place, this essentially ensures that if a bot stumbles across a page that doesn’t exist it returns the correct status code (404).
In some cases websites are incorrectly configured which can cause non existent pages to return a 200 code, this is problematic as those pages can potentially get indexed. If this happens, it means that crawl budget is not only being wasted on useless pages, but the index is bloated out with useless pages.
If I had access to Google Search Console I could also check if there are any soft 404 errors, these can often be actual pages (or in some cases old product pages) but could be deemed thin or not having enough usable content. In this instance, I would make a call as to whether it would be worth either redirecting them to the closest relevant page, or if improving the quality of the content would be the appropriate action.
The actual 404 page itself is customised, and does contain a clear message indicating that the page can not be found and has a link back to the homepage on the site (as well as other alternatives). I would recommend either placing a search box underneath this message to encourage further site engagement, or place links to popular categories clearly on the page.
Crawling & Indexation
Observations: When looking in the robots.txt file I couldn’t see any immediate issues.
As a general rule, consider the following for your robots.txt file:
- Does it exist?
- Does it disallow all appropriate folders
- Does it disallow certain folders you don’t want it to?
- Are folders you want excluded in indexation in this list?
- Have you referenced all XML sitemaps?
Why is this important? A well optimised robots.txt file can hugely benefit how search engines navigate your website. You can direct search engines to files such as your XML sitemap and also prevent them from crawling the less important pages such as the “terms and conditions” or “admin” pages. Each website is given a limited amount of crawl budget by search engines, a well optimised robots.txt file can reduce the amount of wasted bandwidth and in turn ensure that search engines spend more time crawling the important pages on the website which can further improve organic visibility.
Observations: There are several versions of sitemaps on the Trainline website.
An example of these can be seen here:
- Sitemap: https://www.thetrainline.com/live/sitemap
- Sitemap: https://www.thetrainline.com/sitemap.xml.gz
- Sitemap: https://www.thetrainline.com/sitemap-cms
- Sitemap: https://www.thetrainline.com/content/ESS/de/trains/sitemap.xml
- Sitemap: https://www.thetrainline.com/content/ESS/de/station/sitemap.xml
- Sitemap: https://www.thetrainline.com/content/ESS/en/trains/sitemap.xml
- Sitemap: https://www.thetrainline.com/content/ESS/en/station/sitemap.xml
- Sitemap: https://www.thetrainline.com/content/ESS/es/trains/sitemap.xml
Sitemaps are important because they make it easier for search engines to find and index your websites pages. Sitemap.xml are highly recommended and widely used. This is because, even if your website is well linked, Google can easily overlook some of your pages, especially if they are newer, so it is a good way to make sure your content can all be discovered and indexed.
Contrastingly, if a website does not have sufficient internal linking then this is all the more reason to use a sitemap.
At this stage, if I had access, I would refer to Google Search Console to check indexation and compare this with the sitemap URL and search indexation numbers. This would also allow me to see if there are any other sitemaps that I may not have picked up when searching as well as any other wider issues.
Why is this important? XML sitemaps are essential to all websites, they are like a library of pages which search engines use to effectively crawl and index your website. A well optimised sitemap can tell a search engine exactly where a page is, how important that page is to the website (through priorities) and when it was last modified. A search engine can then use this information to determine how often the content on a given page is likely to change and amend its crawl rate to be inline with the estimated page modification frequency ensuring that any new changes to a given page are reindexed in a timely manner.
Indexation and Accessibility conflicts
Observations: From what I can see, there doesn’t appear to be any issues with the responsive design from a search engine perspective.
Why is this important? One of the most common indexation issues occurs when the entire website is blocked in the robots.txt file by mistake. Generally the only time you would block a website from being indexed is when it is still in development prior to the launch.
Some common errors that can appear in this mobile responsive report include: Content wider than screen, Text too small to read,Clickable elements too close together.
Observations: Overall the site speed for both mobile and desktop is below where it would ideally be:
When using Speed measuring tools as GTMetrix it is easy to get caught up in the scores they use. For example, GTMetrix gives you an ‘A’ if things are good and ‘F’ if it is bad. However, this is not always accurate. I find it to be much more accurate and rewarding to use the above metrics as a measure of speed and performance.
Ideally the desktop fully loaded time would be less than 3 seconds, but the average time on GTMetrics is 7.2s. Obviously this isn’t realistic in all instances, especially on sites of this size, but there are definite improvements that can be made. I have listed below some key issues that once addressed, should help improve the page speed.
On the whole though the actual page sizes of the Trainline website are good.
Minimise Redirects –
There are a number of redirect chains on the site which means there are multiple ‘jumps’ that search engines and users have to go through to reach the final URL. It is generally known that Googlebot will not follow more than 5 redirects in a chain. The typical guideline is to avoid chains longer than 3 URLs.
However, these resources are all external resources meaning they are resources that are not part of the main domain. Because of this, it is unlikely that Trainline would have direct control over these to allow them to add redirects in the typical way.
In light of this, we would suggest that they assess the use of these and whether the additional ‘jumps’ and load time they are adding to the site is justified.
Minimise Request Size –
Requests for some resources don’t fit into a single packet. By doing so, this could reduce latency.
Leverage browser caching –
Page load times can be significantly improved by asking visitors to save and reuse the files included in your website. This is particularly effective on websites where users regularly revisit the same areas of the site, for example, they may visit the train times page several times before making their booking.
Dev access is usually needed to implement this but the rewards should be high.
However, upon further inspection these resources are all external resources meaning they are resources that are not part of the main domain. Because of this, it is unlikely that Trainline would have direct control over these to allow them to leverage browser caching.
In light of this, we would not recommend that Trainline leverage browser caching. Instead, we would suggest that they assess the use of these and whether the weight and load time they are adding to the site is justified.
Caching is beneficial as it stores some static information on the user’s hard drive in order to save those resources being called repeatedly from the server. This makes site loading faster. Compressing resources with gzip or deflate can reduce the number of bytes sent over the network.
|Recommendations: I recommend working to optimise the above recommendations to improve the webpage loading speed.
Why is this important? Generally a website should load as fast as possible for optimal performance. Website load speed is a confirmed ranking factor and as we can influence this we should prioritise performance fixes as much as possible.
Observations: When checking the code against W3 standards there were a fair few issues that have been flagged up, please see the link below for further information.
|Recommendations: I would recommend putting a request in with the development team to work through the code discrepancies and amend to meet the current standards. Search engines prefer when the code on a website meets the current web standards.
Why is this important? It is essential that any code written for a website meets a given standard. The more “validated” code is then generally the better the quality of the web page and the easier it is for search engine bots to read it. This in turn will assist in the overall performance of the website as it will be much easier for search bots to read and understand the lines of code.
Tracking Code Implementation
Observations: The Trainline site does currently have Google Tag manager implemented throughout the site.
**At this stage I would usually dig around in Google Analytics or other analytics tools to check tracking and more importantly what you are tracking. I would also check for macro goals (sales), and micro goals (events and assisted conversions).
Why is this important? It is vital for any website to ensure that they are tracking site visits properly through a form of analytics tool, the most common being Google Analytics. With Google Analytics we can not only track traffic sources but we can also break these down by channel e.g. direct, organic etc and by location & even country if needed. This is all essential data that can only be monitored once proper tracking is set up.
Observations: In this area I would analyse how the site is currently being crawled, average pages crawled per day and the time spent downloading a page.
These stats can give great insights, an example of this would be if your website had seen an unusually high spike in pages crawled but no major website changes on the site had been made, this could indicate an imminent Google penalty.
|Recommendations: I would recommend checking this area in Google search console daily to ensure that if any unusual spikes or dips are present, especially in pages crawled per day it will enable you to take action more promptly.
Why is this important? Crawl stats provide information on how long it takes Googlebot to download a page, kilobytes downloaded per day and the average amount of pages downloaded per day. This is key information that can be used to improve your website for example; If you had noticed a huge drop in pages being crawled per day over a period of time this likely indicates that there are issues with the website.
Observations: This area would look at the amount of soft 404’s, 500, 404 and 410 codes currently present on your website. I would also look at any invalid pages in the sitemap. This area can help find broken pages on your website, which you can then redirect within the htaccess file.
|Recommendations: I would recommend using both Google search console and Google Analytics to search for broken pages and more importantly, broken pages that still receive traffic and pageviews.
Why is this important? Crawl errors indicate the amount of resources that Google has failed to crawl, this can be for a number of reasons with the most common being a 404 error or a not found error. This can occur when you move pages or products on a website without redirecting them properly. Google has stated that 404 errors are fine to have on a website, however larger numbers can have a negative effect as they use up valuable crawl budget on pages that do not exist anymore.
Penalty & Disavow Information
Observations: I would check to see if there is a current disavow file, if present I would download it to double check the URLs listed within to ensure that they are in fact bad links. I would also check the manual actions area in Google Search Console and the general messages to see if there had been any notifications of penalties.
|Recommendations: I would recommend regularly checking the backlink profile and actively attempting to remove any links that could be deemed harmful, either by adding to the disavow file or preferably, manually contacting the webmasters.
Why is this important? Checking your site messages is an important task and should be carried out daily. The inbox in your Google Webmaster Tools account is the preferred method for Google to contact you if there are any issues with your website. If you get a manual spam action placed against your website you will be notified here, similarly if you successfully get a manual action revoked you will also get a message here.
Content Checks and Optimisations
Typography & Content
Observations: The website makes use of the various H headings and presents the text in a clear manner on the main website. From our sample of 500 pages, there were 10 pages that are missing H1 headings and 5 pages that have multiple H1 tags.
There are also some instances where page titles of descriptions are not optimised fully and are either wider than the viewable limit on SERPs or not making full use of available space. Again, this is only a small amount given the overall number of pages on the site.
|Recommendations: I would recommend revisiting the formatting, and assessing the pages where H1 headings are missing to assess whether it would be worthwhile adding them. I would also do the same with pages that have multiple H1s and assess the value in updating some of them to H2s instead.
I would look at the pages where pages titles and descriptions are not fully optimised and make a judgement as to whether editing these would be worthwhile.
Why is this important? Good typography on a website can vastly improve conversion rates and overall user experience. Well laid out content that is easy to read will also help reduce bounce rates and encourage user interaction on the website.
Search engines also use headings to help them understand the topics of pages. Similarly, optimised and descriptive titles and metas can help improve CTR in SERPs.
Click to Call Buttons
Observations: There does appear to be a click to call button on the contact page, and this is mobile responsive which is good.
Why is this important? People who browse on mobile devices are generally quite specific with their searches, this means that they are much closer to converting into a customer. For this reason it is always advised to provide a click to call button on the mobile responsive version of your website. This way if a user wishes to contact you they can do so with the tap of the screen. This is good for mobile user experience and will result in a higher conversion rate overall.
Observations: There are instances of keyword cannibalisation on the Trainline website, but given the size of the website this is not unexpected.
This can cause significant problems as it can damage rankings. The thing is, it can be an easy problem to run into, especially on large websites. It happens when a website targets a single keyword or phrase on multiple areas of a website.
I could not export all 21,055 instances as the account did not have enough credits left to do so but I have included the top 10 examples here:
The top keyword on this sheet is “trainline advance tickets” and this term has 29 URLs that rank in this URL. This means that Google is unsure about which one of these URLS is the correct one to use.
If we look at the top 10 of these below we can see that they are not as relevant as they could be to the main term, for example we could say that the ‘how-do-i-change-my-booking-page” is not as relevant for this term, but given that I have conducted very similar searches myself I can see why Google also thinks this page is relevant.
|Recommendations: Cannibalisation can be resolved using canonicals. Typically (although this can be more difficult for booking websites due to the nature of how they work), instead of using the same keyword on every page, variations or long-tail versions should be used and then link back to the canonical source for the main term.
Redirects also help. If there are any pages that are no longer used or relevant then 301s should be used on these pages and then redirect back to the one main page.
I realise that this is not necessarily the best course of action for this website, nor is it possible.
In this case, I would suggest monitoring performance to see how much ranking fluctuation does happen. If it is significant then I would suggest looking at using varied keywords and consider some of the above fixes.
Why is this important? Cannibalisation is a problem as it can affect rankings as I mentioned. This is because Google does not know which is the single most relevant page for that query / keyword. When this happens, rankings tend to fluctuate as Google tries to determine which is the most relevant URL.
So just because you rank in position 1 at the moment for “trainline advance tickets” if Google starts to rank the URL https://www.thetrainline.com%2Finformation%2Fcheap-train-tickets then you could slip to position 4.
Using redirects and canonicals takes some of the leg-work away from Google by telling them which URL is the correct one. When this is done, rankings usually start to stabilise.
Observations: The Trainline website does already have breadcrumb implementation.
This makes it really easy for users to retrace their steps to higher level pages due to them being anchor text links.
Why is this important? Breadcrumb navigation can provide value to both users and search engines alike. In regards to the user, they can see the hierarchy of the website which can help them navigate up or down a level.
From a search engine perspective they can also assist bots in determining the hierarchy of a website and assist them with navigating to the deeper pages on a given website.
Observations: When running a sample URL through the structured data tool it did not flag any errors or warnings with their structured data and the items that have been added are what I would expect to see.
Why is this important? Properly implemented rich snippets on a website can help search engines further semantically understand the content of a given page, they can also show up in the search engine results pages when users are searching which can further encourage them to click through to your website.
When a business has multiple physical locations it is important to geo target your businesses to ensure that when a user in the local area is performing a search for a service or product that your business offers you would show up in the given area. The local listings usually show the business name along with the physical address and contact details. For mobile users this is key information as they can easily navigate to your business or give you a call directly from the SERPs.
Due to tools restrictions I am unable to see the whole picture when looking at the backlink profile, I have listed below some findings with the data that I can access.
Observations: From an initial scan over the anchor text (from the data that I can see), it looks to be a good mixture of variations of branded anchor text. I have put an image below of the top 20 anchor text for reference, given the nature of what the Trainline are offering there is nothing out of the ordinary here:
As you can see, there are no immediate threats in this list above, although I would take a deeper look at the ones that read “here”, Empty” etc. to ensure nothing is too spammy.
On the whole, the anchor text that is used is largely brand / product specific so there are no threats that I can see just from the anchor text.
Observations: A large portion of the historic links are from the website allcontactnumbers.co.uk – these all appear ok and this website does appear to have topical relevance. The link text that is used here is “contact trainline direct” which is a popular query that we would expect to see.
There also appears to be links from various smaller travel and timetable websites, these do link to the homepage for the most part, which makes sense as it is here that users can search for the trains they need. It looks like from the data I can see the recently acquired links are more varied in link type and depth.
As you can see from the below image, it appears as though the majority of the domains linking to the website are from known websites that do appear to have topical relevance. The international domains here coincide with the international countries that Trainline also serve.
Observations: There are some backlinks that have become broken, this means the link either redirects to a destination that 301 redirects to a broken page (404) or the target page 404s without it being redirected first. When this happens, the link equity and authority is lost.
Here you can see the links that are broken: https://drive.google.com/file/d/1KCL-3QsU1oFVukr3L-jVwJARRrKyIZU5/view?usp=sharing
This spreadsheet also allows us to see the number of links and domains that are linking to these URLs and therefore the amount of potential equity loss.
|Recommendations: At this stage, I would usually conduct a deeper analysis if time and tools allowed. I would do this to check the overall back health of the profile and take necessary action to redirect / update links and disavow those that could pose problems.
Why is this Important? If the link profile isn’t cleared up, and good quality links aren’t acquired there is a risk of the website being hit by algorithm updates or even having a manual action put against it. In this instance I would recommend a full link audit and clean up.
Having completed this audit, I’ve identified some areas that Trainline could look to improve and if I were working on their website this would give us a starting point. However, as mentioned at the start of this audit, there will likely be factors at play that we are unaware of that have led to the current website.