Towards Automated eGovernment Monitoring

Thesis Summary

Author: Morten Goodwin


Much interaction with government has been transformed from queue and desk environments to online services, so-called eGovernment. The promises of this transformation include better governance, reduced cost, increased citizen participation and so on. To understand to which degree the services deliver on their promised potential, an array of eGovernment surveys illuminating characteristics of the eGovernment services are established. Most of these surveys are carried out manually and do not reveal their complete methodologies nor the detailed results, which is crucial to enable policy makers and web site owners to efficiently address the reported issues. Figure 1 presents the relationship between eGovernment testers, web site owners, policy makers and citizens.

The main focus of this thesis is on automatic and open benchmarking of government web sites. The work is part of the eGovMon project1 which develops open source software, methodologies for quality evaluation of eGovernment services and policy design tools in a close collaboration with 20 Norwegian municipalities. The four areas of eGovernment research covered in the project are accessibility, transparency, efficiency and impact.

An overview of eGovernment showing relationships among eGovernment testers, results, web site owners, policy makers and citizens.

Motivation for eGovernment

The main goals of introducing eGovernment include increased availability and quality of information and services for the citizens, more efficient and effective governments including back end integrations, more transparent and accountable governments and increased participation [1],[2],[3],[4],[5],[6],[7],[8],[9],[7],[10].

However, one of the largest motivations for introducing eGovernment is to use information technology to achieve a better government both for citizens and policy makers [8]. For example, eGovernment enables citizens to remotely participate in political process using web technologies, to propose questions and comments in the meeting, or watch live and past meetings and similar. Without eGovernment this close interaction is only possible when the citizens are physically present.

However, introducing information technology by itself is not sufficient to achieve these goals [3]. There are many examples of eGovernment application failures, because the implementation did not meet the specification, there is a gap between implementation and the citizen demand or a lack of uptake by citizens [11]. For example, government services for the citizens could be hard to locate or introduced without usability in mind. If a service is available online but in practice cannot be found by the citizens, it is rendered useless and will neither be used nor contribute to government improvements. Similarly, government portals could be made available online without considering accessibility, causing web sites to exclude citizens with disabilities from efficient participation.

The main aim of this thesis is to develop methods and tools for measuring public governmental web sites. It should be noted that research on government web sites is only one of several areas within eGovernment. eGovernment research is in no way limited to government web sites, and eGovernment efforts on local or national neither are nor should be limited to public web sites. The most common areas that come under the eGovernment umbrella are: Enabling interaction between the public and the government electronically, typically through web sites, online portals and mobile phones. Back-end integration within the government to streamline services and reduce the needed manual work. Build up of electronic infrastructure to make telephones, Internet and computers available to the public. Reducing the digital illiteracy and closing the digital divide. Electronic identification mechanisms for citizens through electronic IDs, smart card systems and similar.

eGovernment Surveys

A significant part of eGovernment research is benchmarking to what extent certain criteria claimed positive for eGovernment are reached. These criteria varies between surveys, but are often based on laws, regulations and general eGovernment goals. The features are predominantly collected from governmental web sites.

Surveys can provide useful input for improvements of eGovernment portals. It enables policy makers and web site owners to be informed about accessibility and usability problems on their web sites, receive information about which laws and regulations are not followed, what services they have available compared to other governments, and so on. In many cases, what gets measured in the surveys is what gets attention by web site owners even though it may not be what needs most attention or improvement. Because of the influence of the surveys and because it can result in wrong focus, it is sometimes argued that surveys are bad for eGovernment [12],[13].

The published survey results are sometimes used directly by web site owners to improve their web site, and by politicians and policy makers as input to regulations and laws related to eGovernment. Thus, eGovernment surveys can have a direct influence on development of eGovernment services, and thereby have a significant effect on the citizens using the government web sites. Therefore the surveys need to generate data of high quality, minimum bias, cover essential eGovernment areas, reflect the users needs, and be carried out frequently.2

Automatic Measurements

Many of the issues with most eGovernment surveys today can be solved by using tools and introducing automatic algorithms for measurements. In contrast to tests which rely upon human judgement, tools and algorithms are objective, can effectively and frequently run with a minimum human interaction, on more web pages and sites, and at a lower cost. The limitations of automation lies in relatively high startup cost, and the fact that many tests require human judgement and are therefore hard to automate.

All in all, there seems to be a great potential in increased use of automation in eGovernment benchmarking to direct and encourage work in eGovernment improvements.

Similarly to how information technology was introduced to improve governments, this thesis introduces automatic approaches for improving benchmarking of eGovernment. This is meant to increase the quality and efficiency of eGovernment assessment, and enable more interactive on demand testing to expand who can run eGovernment assessment.

Thesis Overview

The thesis summary is organised as follows. Section 1 presents an overview of existing eGovernment surveys. Section 2 continues with web accessibility, which is often part of eGovernment surveys. Section 3 introduces automatic testing, both deterministic accessibility tests of web pages and PDF documents, more advanced testing of accessibility using classifiers, and tests for automatically locating services and information online. Section 4 compares results from automatic and manual accessibility measurements as a method to understand when automatic testing is sufficient. Section 6 presents eGovernment findings which have been collected using the applications in section 3. Section 7 presents the conclusion, and section 8 presents challenges and further research directions. Finally, section 9 presents the summary of contributions based on the papers part of the thesis.

eGovernment Surveys


Conducting eGovernment surveys is an approach used to examine various aspects of eGovernment, how governments use information technology and the digital interaction of government with citizens, businesses, and other government institutions. Most surveys focus on measuring governmental web sites against criteria or indicators representing an ideal web site. Examples of typical survey questions: Is the web site accessible for people with disabilities? Does the web site have contact information? Is it possible to submit requests online?

Existing eGovernment Surveys and Methods

Several eGovernment surveys exist, on global, regional, national and local levels. Paper A presents an overview and a comparison of the most important existing global eGovernment surveys, and an evaluation of how the surveys progress over time. The best known global surveys are carried out by Accenture, Brown University and United Nations (UN) Department of Economic and Social Affairs.

Accenture focus their annual survey on European national web sites and evaluate the sites up against two sets of indicators: service maturity and delivery maturity [15],[16],[17],[18],[19],[20],[21]. From 2000 to 2005, the main components of the survey were the following. Service maturity was weighted 70% and represented the number of services implemented and their level of completeness. Delivery maturity was weighted 30% and represented indicators related to service delivery, such as single point of entry, portal capabilities and so on. In 2005, delivery maturity was substituted with interviews with citizens and government officials, and the weight of each set of indicators was changed to 50%.

Brown University analyses a broad range of public web sites focusing on information availability, service delivery and public access [22],[23],[24],[25],[26],[27],[28],[29]. In contrast to Accenture, the Brown survey only aims at measuring the presence of features, not maturity. Brown University checks for the presence of 28 specific features in each web site. The survey has been published annually since 2001.

The UN eGovernment survey (previously eGovernment readiness report) is carried out bi-annually by UN Department of Economic and Social Affairs [30],[31],[32],[33],[34],[35]. It examines the national governmental web sites most important for the citizens in each member state. It measures the web sites according to a set of indicators based on service delivery in the following criteria: basic, informational, interactive and transactional. The analysis is based on quantitative measurements related to presence of information and services, and the maturity of the services. The survey has been running since 2002.

The survey ``Benchmark Measurement of European eGovernment services'' has been carried out by Capgemini for the European Commission since 2001 [36],[37],[38]. Its focus was measurements of the public services in the 32 European countries (27 EU countries, Croatia, Iceland, Norway, Switzerland and Turkey), more specifically by benchmarking twenty online services in more than 10 000 web sites. The assessment is on basic services including tax filing, enrolling in schools, obtaining permits and so on. The results include information on the availability of the twenty services (reached 82% in Europe), online sophistication, user experience, full online availability, portal sophistication, eProcurement visibility and eProcurement availability. In 2010, the indicators were extended to also include analysis on sub-national level and eProcurement availability for the post-award phase.

Limitations with Current Approaches

Today all major eGovernment surveys rely strongly on manual assessment by expert testers and/or interviews. The tests carried out manually depend completely on human judgement and are influenced by human factors. An interpretation may differ from person to person and even vary from day to day, which makes the test results challenging to reproduce and verify. As an example, one of the tasks carried out in the UN eGovernment survey is on whether statements encouraging citizen participation are present on the evaluated web sites. A web site with such a statement will get a better score than a web site without a statement. However, the testers need to decide whether the statements are in fact encouraging citizens, and the testers may have different opinions. This makes it hard to repeat the tests, which have a negative impact on the reliability of the results. As a mitigation, the tests are run by multiple assessors, and only unifiable agreeable results accepted. This increases the repeatability, but at the same time increases the efforts needed to carry out a surveys.

Moreover, the details on the methodologies and the results are not publicly available for any of the surveys. This makes it difficult to understand what is measured, and prevents efficient quality improvements and efficient learning from good examples.

Although most of the surveys are carried out annually, frequent and non-transparent changes of the sets of criteria makes it hard to monitor the progress of one web site.

Automatic testing based on open methods can solve some of these issues. Further, many of the tests carried out by manual assessment in the three surveys can potentially be automated. Because of this, paper A proposes to partially automate the surveys, and make the test methodology, results and implementation publicly available. This frees up resources so that the testers can focus on tests which cannot be run automatically, enables testing on demand, as well as making methods, tests and implementation open for inspection by anyone. Accordingly, the results in the survey will become more reliable.

Section 3 presents a comparison of the advantages and disadvantages of automatic and manual testing, and section 4 a comparison of corresponding results.

Web Accessibility


Even though the major eGovernment surveys have different focus and address different areas of eGovernment such as specific topics or geographical or political areas, a common property for many is that they attempt to measure if the web sites are accessible for people with special needs or disabilities [34],[35],[39],[38].

This focus is not surprising as web accessibility has received a lot of international attention. In fact, equal rights to public information and services are part of the Universal Declaration of Human Rights published in 1948. Even though the Universal Declaration of Human Rights does not specifically address the web, these rights apply when the information and services are available online [40]. Equal rights to access public information on the Internet was in 2007 strengthened by the UN Convention on the Rights of Persons with Disabilities signed by 143 UN member states [41],[42]. It is also addressed in the Riga ministerial declaration where the EU member states unanimously agreed to ``Enhance eAccessiblity by ... Facilitating accessibility and usability of ICT products and services for all ... ''[43],[44].

Paper F, paper G and paper H show that web accessibility barriers are common in both local government and national web sites. This often leads to significant problems for people with special needs and prevent some people from getting access to the information available online and the public government services on the web. Paper H further shows that neither introducing accessibility laws nor signing the UN convention on the rights of persons with disabilities are by themselves sufficient to avoid accessibility barriers in public web sites. Practical work is required to understand the level adherence to the laws and convention, and how the regulations work in practice. For this, knowledge on existing accessibility barriers, for which methods and tools for examining web accessibility is advantageous.

Defining Web Accessibility

There exist several definitions of web accessibility. This thesis is built on the definition offered by the World Wide Web Consortium (W3C); ``that people with disabilities can use the Web. More specifically, Web accessibility means that people with disabilities can perceive, understand, navigate, and interact with the Web, and that they can contribute to the Web. Web accessibility also benefits others, including older people with changing abilities due to aging.'' [45]. Following the W3C definition, if a person with a disability is unable to use a web site, the web site is in-accessible. In contrast, if the person does not encounter any problems, the web site is accessible for him/her.

Paper E and figure 2 present, based on [46],[47], accessibility as a subset of usability. Usability problems are issues affecting many users independent of whether the users have disabilities. On the contrary, accessibility barriers cause problems only for people with disabilities. This means that if a web site is not useful for anyone, it is not an accessibility issue but a usability problem.

Please note that figure 2 is a simplified model. Accessibility and usability are strongly related. Even though accessibility barriers mainly cause problems for people with disabilities, an accessible web site benefits all people. An example is links which do not describe the corresponding web page. If several of the links on a page are titled ``read more'' or similar, it poses a barrier as the user will need the context to understand what to read more about. Some tools, such as screen readers, will for quick navigation collect all links on a page [48]. This list will be useless if several link texts are the same. Even though this is a problem mainly for users with disabilities, and thus an accessibility issue, the web site becomes more navigable for all when links are descriptive. Further, figure 2 has a clear distinction between automatic tests as a subset of manual tests. However, there are some accessibility issues which are difficult to test without a tool such as colour contrast and (x)HTML validation. Additionally, it could be argued that certain accessibility barriers are not usability issues, and that some accessibility barriers are hard to separate from usability problems [49]. Problems such as too small images, too much text or similar may cause challenges for many users, but the impact is much worse for people with disabilities. This means that accessibility is not a proper subset of usability. Nevertheless, figure 2 gives a general overview of how accessibility is related to usability, and what is testable automatically and manually.

Furthermore, accessibility is strongly related with universal design. Several descriptions of universal design exist [50], but a commonly accepted one is that it ``is an approach to design that incorporates products as well as buildings features which, to the greatest extent possible, can be used by anyone.'' [51]. Thus, the universal design concept does not separate between different groups in the society, but rather treats the population as individuals of diverse characteristics. A universally designed web site is barrier free and therefore accessible for all, including people with special needs and disabilities.

Relationship between Usability, Accessibility, Manual and Automatic testing.


Accessibility Testing

There are several approaches for testing the accessibility of web sites [52]. The three most common test approaches are:

It is debated whether to check accessibility with user testing or the check list approach using expert testers. As argued in paper E, in general, a combination of both supplemented by automatic testing is seen as the most viable approach [53]. Some tests are easier to be carried out automatically, such as valid (x)HTML, while other tests, such as proper description of form elements, need human judgement and can best be carried out by experts. Finally, to make sure that all usability and accessibility issues in the web site have been properly addressed, testing with real users are needed.

Accessibility Guidelines and Methodologies

Web Content Accessibility Guidelines

In 1999 the W3C introduced the Web Content Accessibility Guidelines (WCAG 1.0) [54] to guide on how to make the W3C technologies (x)HTML and CSS accessible for people with disabilities. It relies upon common web site practices and knowledge on how people with disabilities use web sites and their assistive technologies. The WCAG 1.0 quickly became the de facto standard for web accessibility [55].

Since 1999, the web technologies changed fundamentally. This includes that web pages are no longer limited to (x)HTML and CSS, but relies upon technologies not covered and not allowed in WCAG 1.0 such as client side scripting and PDF documents. In addition, there have been significant improvements to the assistive technologies since the introduction of WCAG 1.0. In 2008 WCAG 1.0 was superseded by the Web Content Accessibility Guidelines 2.0 (WCAG 2.0) [56]. The main differences between WCAG 1.0 and WCAG 2.0 are technology independence, separate techniques which are continuously updated and introduction of sufficient techniques and common failures [57].

Paper H and table 1 describes tests according to WCAG 1.0 and maps the test to WCAG 2.0. 3 Section 3 presents how these tests are used.

CategoryUWEM WCAG WCAG Short Description
1.2 1.0 2.0
Alternative text 1.1.HTML.01 1.1 1.1.1 Non-text content without text equivalent.
1.1.HTML.06 1.1 1.1.1 Non-text elements embedded using the embed element (which does not support a textual alternative).
12.1.HTML.01 Frames without description.
Valid technology 3.2.HTML.02 Invalid (X)HTML.
3.2.CSS.01 Invalid CSS used.
3.2.HTML.01 No valid doctype found.
Latest technology 11.2.HTML.01 11.2- Deprecated (X)HTML elements.
11.2.HTML.02 11.2- Deprecated (X)HTML attributes.
11.1.HTML.01 11.2- Latest W3C technology is not used.
Non Descriptive links 13.1.HTML.01 Links with the same title but different targets.
Mouse required 6.4.HTML.01, 2.1.3 Mouse (or similar) required for navigation.
Blinking or moving content 7.2.HTML.01 Blink element used.
7.2.CSS.02 Blink property used in CSS.
7.3.HTML.01 Marquee element used.
7.4.HTML.01, 2.2.4 Page refreshing used.
Missing labels or legends in form elements 3.5.HTML.03 Levels are skipped in the heading hierarchy.
12.3.HTML.01 Fieldset without legend.
Refresh and redirection 7.4.HTML.01, 2.2.4 Page refreshing used.
7.5.HTML.01, 3.2.5 Page redirection used.
Numbered list simulated 3.6.HTML.03 Numbered list simulated.
List of automatic accessibility tests based on UWEM 1.2 and WCAG 1.0.
Unified Web Evaluation Methodology

WCAG 1.0 and WCAG 2.0 are guidelines for making web sites accessible, not methodologies for measuring accessibility. Despite this, many surveys rely on WCAG as a method of measurement. To carry out an eGovernment analysis based on WCAG additional specifications are needed such as which pages to test, how to present the results and so on. Because of lack of evaluation methodologies in WCAG, independent methodologies are invented to carry out accessibility surveys. This leads to subtle but important differences in accessibility surveys. Therefore it is hard to compare results between such surveys although they are based on the same guidelines.4

As a mitigation, the Unified Web Evaluation Methodology (UWEM 1.2) was introduced by the European Commission [59],[60],[61]. It introduces a set of tests based on WCAG 1.0. 80% of the tests can only be carried out manually by expert testers, while 20% can also be implemented and run automatically. In addition to formally defined tests, it provides a methodology for sampling and presentation of the results. UWEM 1.2 could either be used by experts to manually check the web sites, or as a basis of an automatic tool. The manual approach aims at applying all tests to a few important pages from each web site. In contrast, the automatic approach selects up to 600 web pages based on a near uniform random sampling, and applies those 20% of the test which can be run automatically. Paper B presents details about UWEM 1.2 with focus on the automatic testing including practical implementation information such as sample size, web pages to download and how to present the results.

Automatic Testing


Automatic testing of eGovernment web sites means applying software applications and algorithms to automatically retrieve quantitative information based on the content and/or structure of the web pages and web sites. This could for example mean to apply automatic tests for detection of web accessibility barriers. For a comparison between automatic and manual evaluation results of web accessibility see section 4.

Pros and Cons

As shown in paper G, there are pros and cons for using automatic testing in eGovernment surveys versus manual assessment. The main disadvantages with automatic testing of accessibility are:

Advantages with automatic testing:

As shown in paper D automatic web testing can further be divided into two main groups: (1) deterministic tests and (2) heuristic tests, for example built as learning algorithms. A deterministic test is based on formal rules and gives an absolute result such as if an (x)HTML tag has a certain attribute. In contrast, heuristic tests are based on algorithms utilizing training data and features of web pages, and produce results often based on likelihood or probability. Example of heuristic accessibility tests are testing for navigability [62],[63], understandable page structures [64] and whether text on a web page is easy to read for people with reading difficulties such as dyslexia [65].6

Deterministic Web Accessibility Testing

Deterministic tests produce absolute results as formal rules are applied. An example of a deterministic test is checking for the presence of altattribute in <img> elements. If the alt attribute exists, it is a positive result, otherwise it is a negative result indicating an accessibility barrier, which can cause problem for people with disabilities. The deterministic accessibility tests are available in paper H and table 1.

Architecture overview.
Large Scale Testing

Paper B presents an architecture and reference implementation for automatic checking of accessibility for web sites using (x)HTML and CSS technologies. Table 1 presents the tests used in the software.

The fully automatic tool evaluates web sites in line with UWEM 1.2 in a statistically sound way. The architecture consist of a breadth first crawler which identifies 6000 web pages from the link structure in each site and writes the URLs to a URL repository. If the site consists of less than 6000 pages, it is exhaustively scanned and all web pages downloaded. Subsequently, 600 web pages are randomly selected from the set of downloaded web pages. These pages are sent to a Web Accessibility Metric (WAM) component for evaluation. The WAM returns the results as EARL/RDF [67] which are written to custom made RDF databases. Thereafter an Extract Transform Load (ETL) component reads the URL repository and the RDF databases and writes the results to a data staging scheme of a Data Warehouse. When all evaluations have been completed, the data is aggregated to make data quickly available in a user interface. This approach enables large scale testing of accessibility of a large number of web sites, for example comparing the accessibility of web sites between municipalities, countries and continents.

Paper F updates and simplifies the initial architecture. Figure 3 includes the updated architecture. Instead of using three databases with much overlapping data, a URL repository, RDF databases and a Data Warehouse, the architecture relies only upon one database. In the process the use of RDF from the WAM, ETL and data staging were dropped, resulting in significant performance improvements [68],[69]. The updated architecture includes deterministic accessibility tests, a sampler, a breadth first crawler, a database and a presentation layer.

The tool relies upon automatic deterministic tests of (x)HTML and CSS content. A complete list of deterministic tests based on UWEM 1.2 and WCAG 1.0 are described in paper H and table 1.

Paper B and paper G use the web site score calculation based on UWEM 1.2. UWEM 1.2 defines the score as the number of tests with fail results (number of barriers) divided by the number of applied tests, formally as

where $s$ is a web site, $p$ a page within the web site, $N_{s}$ number of tests applied for page $p$ and $B_{s}$ number of detected barriers in page $p$. This means that a lower value of $F(s)$ represents a more accessible web site. The score can be interpreted as an error rate.

Paper H proposes a way to calculate an aggregated score beyond web site level. It introduces statistically sound web accessibility scores on regional, ministerial, country and international levels.

UWEM 1.2 presents regional scores as arithmetic means of web site scores. This approach does not take into account that different sub-regions have different number of web sites, and gives equal weight to all web sites independent of size and importance. It is theoretically possible to significantly improve the UWEM 1.2 score for a region by introducing a very accessible web site without any content. In contrast, since the regional scoring system in paper H uses the number of tests and barriers at all levels, all evaluated web sites need to have actual content to have implications on the score.7

On a country level, the score introduced in paper H is as follows:

where $C$ is a country, $s$ a web site in that country, $N_{p}$ number of tests applied in site $s$ and $B_{s}$ number of barriers detected in site $s$. The score calculations follows the same pattern for other groups of web sites, such as all web sites from a ministry, a continent, or any other political or geographical region.

Web Pages

Paper B addresses large scale accessibility assessment targeting policy makers, not web site maintainers and owners. Web sites are oftentimes not maintained by software developers, and running the tool introduced in paper B requires detailed technical insights beyond what is common among web site maintainers. Thus, with the approach in paper B web site maintainers who want to investigate and improve their accessibility need to rely on some expert to run the software. Further, to deal with web accessibility barriers in practice, it is useful to address the problems on web page level. This allows users to input a URL to a web page, and get detailed information about which problems exists including code extraction and example of how to remove the barrier.

To address this need, paper F introduces a single page checker allowing users an interactive approach to remove accessibility barriers, shown in figure 3 as ``checks on pages'', available online at

PDF Documents

The web does not only consist of pages using (x)HTML and CSS technologies, but also a significant amount of static documents. Portable Document Format (PDF) has become de facto standard for static documents both in governments and in the private sector.

An example of this is the use of PDF documents in local Norwegian governments. Norway has a law requiring that documents published online should be in an open format, which in its definition includes PDF documents [71]. Further the Norwegian Freedom of information legislation (Offentlighetsloven) states that interaction between citizens and local government of public interest should be made public [72]. Common practice is therefore that a many important governmental documents are published online, and the documents are often in PDF format. Thus, PDF documents are of critical importance for citizens, and should be made accessible for all users.

Paper C extends the architecture in figure 3 with testing of PDF documents. It uses the same crawling and sampling strategy as shown in paper B and paper F, and includes both testing of single PDF documents and web sites containing PDF files. Testing of single PDF files is available online at

The paper introduces four main tests based on [73],[74]:

Note that paper C was published prior to the PDF techniques of WCAG 2.0. In May 2011 W3C published a draft version of WCAG 2.0 techniques on PDF. Further note that the PDF checking software has been extended with additional tests beyond what is presented in paper C. A complete list of the current tests is available at:

Testing Based on Learning Algorithms

Testing based on learning algorithms is not common for web accessibility. Discrete classifiers are functions of learning algorithms that perform a classification task based on some supervised training data, often manually classified. An example is classification of web pages as either accessible or not accessible [63]. Such an approach uses the features and terms in the training data, and calculates the likelihood of a new page belonging to either class. The classification result will be the class the new document has the highest probability of belonging to.

Accessibility Testing

To improve the web accessibility of a web site, the individual barriers to be fixed need to be addressed. Manual tests can locate additional types of barriers compared to automatic assessment, but is much more resource intensive and costly. It is therefore interesting to broaden the scope of what is automatically testable.

Automatic accessibility testing is mostly based on deterministic tests, which has its limitations.10 For example UWEM 1.2 defines an automatic test on whether an <img> element has an

attribute, but no automatic test for judging whether the alternative text describes the corresponding image. Accordingly, any alternative text is sufficient to satisfy some automatic web accessibility tests, such as automatic UWEM 1.2 tests. Some tools will generate an alternative text unless the user specifies one, and these texts may give little information for a person who is unable to see the image, such as

Paper D shows that the above and similar examples occur frequently. These in-accessible alternative texts cause significant challenges for people who are not able to see images such as blind people relying on screen reader technologies. Additionally, it causes problems for software that categorises images, such as the Google crawler [77].

The paper also introduces a new form of accessibility testing using classifiers, shown in architecture figure 3 as ``classification (x)HTML tests''. In contrast to deterministic tests, the algorithms presented in paper D are able to judge the validity of alternative texts. This is done by calculating a likelihood of an alternative text being accessible or in-accessible based on pre-collected training material.

Two approaches for judging alternative texts are presented:

  1. The first approach uses predefined features collected from state-of-the-art known to represent in-accessible alternative texts [78],[79],[80]. These features include length of the texts, the use of known in-accessible phrases such as ``insert alternative text here'', use of file type abbreviations, and so on. The occurrences of features are put into a Euclidean space and the classification is done with a nearest neighbour algorithm.
  2. The second approach uses a bag of words with the frequency of individual words in the alternative texts. The words in the bag are ordered according to the frequency of the occurrences and the classification is done with a Naïive Bayes algorithm.

Table 2 presents the classification results for both approaches. The classifiers reach a similar accuracy close to 90% (approach 1: 90.0%, approach 2 at 89.1%). The second approach is less dependant on language specific terms such as known in-accessible phrases. It is therefore easier to adopt to other languages by simply adding appropriate training data. Hence, the second approach requires less practical work.

Actual $\to$AccessibleAccessibleIn-accessibleIn-accessibleOver all accuracyOver all accuracy
Predicted $\downarrow$
Approach 12 12 12
Accessible 93.9%92.6% 6.1%7.4%
In-accessible 27.2%10.9% 72.8%89.1%
Over all accuracy90.0%89.1%
Confusion Matrix with classification accuracy with approach 1, using state-of-the-art features, and 2, using a bag of words.

By showing that alternative texts can be accurately judged automatically with classification techniques, paper D demonstrates that classification and learning algorithms have the potential to work well for accessibility tests which in general require human knowledge. This has the potential to significantly increase what can be tested automatically.

Note that the approach in paper D judges the alternative texts, but not the corresponding image. This has some limitations: For example a picture of a dog could have the wrong alternative texts ``cat'', and is therefore in-accessible. In contrast, another picture of a cat may correctly have the alternative text ``cat''. Further, the alternative text ``cat'' may be appropriate in a web page with simple presentations of barnyard animals. However, it could be inadequate in a web page on pedigree cats which presents cats at a different level of detail. Since there is no image processing nor any processing of the web page other than the alternative text, the algorithms in paper D are not able to judge if alternative text ``cat'' accurately describes the image.

Locating Services

In addition to checking for web accessibility, it is common that eGovernment surveys include quantitative data on the services and information provided by the evaluated government web sites. To collect this data, testers are often requested to systematically locate information online. This kind of testing will generally need a verification process involving multiple users, due to the human factors involved.

There are two main methods for collecting quantitative data on the availability of services. The most common method is testing for the existence of material on a governmental web site independent of whether the information could be located by users. Some surveys take this a step further and include whether the sought after material could be found by actual users. A typical scenario to test for the latter is to let representative users try to locate material within a given time frame. Only the material actually found by the users within that time frame contribute to positive results [81]. Paper J formally defines these types of tests as existence tests and findability tests.

The human resources needed to carry out these tests result in two essential limitations:

  1. It limits the scope of the majority of eGovernment surveys to only focus on a small set of web sites, either a representative selection of web sites [35],[38],[29],[21], a subset of web sites from e.g. one political or geographical area [81],[82] or focusing on only one specific topic [6].
  2. It limits who can carry out large scale web assessment to big organisations such as UN or Capgemini. [35],[38],[83],[84] (see paper A)

As a mitigation, paper I and paper J outlines how to automate the process of locating services. Paper I presents detecting material in a web site as an information retrieval problem. It formally defines it as an information retrieval problem in a directed labelled graph as locating a single web page within a web site when the web page to be located is unobservable. An unobservable web page means that the algorithm does not know without supervision whether the correct web page has been found. This is similar to classification problems where the classifier can not know without supervision whether it has done a correct classification.

Many have shown and used the correlations between the content of a web page and the pages it links to [85],[86]. Similarly, paper I utilizes that anchor texts, parts of web page links, can be accurately used to predict the content of corresponding web pages [87], and introduces a new innovative algorithm to locate web pages, called the lost sheep. In order to minimise the number of web pages downloaded it uses the link text as a pre-classifier to decide if the corresponding web page should be downloaded and classified. The goal of the algorithm is therefore twofold: locate the sought web page with a high degree of accuracy and minimise the number of downloads.

Figure 4 and 5 present the accuracy and the number of downloaded pages for the lost sheep (LS 2F S=100) and comparable algorithms in a synthetic environment [88]. The data in figure 4 show that the lost sheep is able to reach a higher accuracy than all other compared algorithms [89],[90],[91],[92],[93] and that the lost sheep is less sensitive to web site size, while the performance of other algorithms decrease rapidly as the size of the web site increases.

The algorithms evaluated in this experiment, shown in figure 4 and 5, are the upper bound optimal, the lower bound random walk (RW), a probabilistic focused crawler (FOCUSED) [94], similarity search (SIM) [89], similarity and degree based search (SIMDEG) [89], A* with a similarity search heuristic estimate, a hidden Markov model based classifier with 100 internal states (HMM S=100) [95], HMM classifier where the transitions are in a non-linear space (HMM 2F S=100), the lost sheep with a HMM classifier (LS S=100) and the lost sheep with as non-linear HMM classifier (LS 2F S=100)

Additionally, figure 5 shows that the lost sheep downloads fewer web pages than many of the comparable algorithms. Some algorithms download fewer web pages than the lost sheep (SIM, SIMDEG and OPTIMAL). However, SIM and SIMDEG both have less accuracy than the lost sheep. Thus, lost sheep is the algorithm of the evaluated algorithms that is best able to combine the two goals.

Note that the two goals, increasing accuracy and downloading fewer pages, are connected. In general an algorithm that downloads fewer web pages should, as long as the target page is among the downloaded pages, have an increased accuracy. This is because fewer pages mean that the classification problem becomes easier. This is observable in the lost sheep, but not for SIM and SIMDEG which indicates that SIM and SIMDEG download irrelevant pages.

Accuracy of the lost sheep and comparable algorithms in synthetic environments.


The number of downloaded pages of the lost sheep and comparable algorithms in synthetic environments.


Paper J takes the lost sheep one step further and applies the algorithm in real web site environments to locate information and services. This is shown in the architecture in figure 3 as ``lost sheep''.

The paper uses the lost sheep to automatically provide results for existence with an accuracy of 85%. By defining an acceptable number of clicks, paper J provides information on whether the page is reachable within this number of clicks and in this way provides findability results.11

Another challenge with eGovernment measurements is that many governmental web sites are in the deep web [96]. Web pages in the deep web cannot be indexed by search engine crawlers since they are located behind password protection or require some other user interaction to be located such as filling out a form. In addition, some web pages are not available in search engines because they update so frequently that when search engines index the pages they are already outdated [97].

Paper J presents the idea of live automatic eGovernment testing. Prior to the testing, training data for the services to be found should be provided. When a live test is carried out, a tester enters the URL to the web site to be evaluated. Subsequently, the algorithm is carried out live on the web site and the results presented accordingly. For this to be possible, the algorithms need to run fast so that the report can be returned shortly after the request.

This means that for eGovernment testing, live search has two main advantages. Firstly, it is able to produce results on demand. Secondly, since live searching is initiated by a user in an interactive environment, it can be used for crawling of the deep web. It enables the algorithm to prompt for passwords, forms to be filled out, and so on, which could give access to pages in the deep web. Paper J shows that the lost sheep is able to run much quicker, and therefore works better for live searching than the comparable algorithms.

The main contributions of the lost sheep algorithm are:

Prior to using the lost sheep, training data for each category need to be provided. To use the algorithm, the testers therefore only need to provide the web sites to be evaluated. The results from the lost sheep could either be used directly, as input to expert testers or as priorities to indicate which tests should be carried out manually.

To examine how the lost sheep works in practice, paper J defines 13 realistic tasks on locating services and information on the 427 Norwegian municipality web sites.12 These tasks are based on commonly assessed government transparency material in the literature [99],[6],[5],[100].

The tasks are defined so that there is at most one page within the web site that matches the criteria, for example locating the local government budget for the current year.13 The aim for the algorithms is therefore to locate the sought after page, or report that the page cannot be found if it does not exist.

To verify the results, the tasks are also carried out manually. Of the 13 tasks, two tasks are assessed completely manually. To verify the remaining 11 tasks a random selection of 10% of the sites (43) are assessed manually. If the output from the algorithm matches manual assessment, it is a correct classification, otherwise it is false. The accuracy is defined as the number of correct assessments over total number of assessments.

Figure 6 presents the accuracy of the lost sheep and comparable algorithms as box plots. This includes the lost sheep with a hidden Marvov model based classifier (LS LA), the lost sheep with a hidden Markov model based classifier where the transitions are in a non-linear space (LS NLLA), the lost sheep using co-sine similarity similarity search (LS SIM), the random walk (RW), similarity based search (SIM) and the focused crawler shark search (SS). The black line depicts the median value of the 13 tasks, and the box is drawn between the quartiles (from the 75th percentile to the 25th percentile). The dashed line extend to the minimum and maximum values except outliers.

The lost sheep is designed to minimize the number of web pages downloaded. Figure 7 presents the number of downloaded pages per algorithm. The median size of the web sites in number of web pages is 26 000 pages. Note that some algorithms are not adaptive and have a fixed number of pages downloaded per site, and are not part of figure 7.

Accuracy of algorithms based on 13 tasks and 427 local government web sites. (Results retrieved April 2011.)


The number of downloaded pages based on 13 tasks and 427 local government web sites. (Results retrieved April 2011.)


While figure 4 and 5 show that the lost sheep outperforms comparable algorithms in a synthetic environment, figure 6 and 7 show that the same is true in a realistic setting with real web sites performing realistic tasks part of eGovernment surveys.

Accessibility testing with the lost sheep

Paper J uses the lost sheep as a technique to check the availability of online services and information. Similarly, the lost sheep can similarly be used to test issues related to accessibility for people with special needs, including:

Automatic versus Manual Accessibility Testing


Due to the low percentage of accessibility tests which can be assessed automatically, critics claim that automatic accessibility evaluation are not able to produce results representing the experienced level of accessibility on the web site [101].

As argued in paper E, for testing whether a web site is accessible, or for conformance testing, all conceivable accessibility tests need to be applied. This is not possible with only using automatic tests. Hence, the 20% of the tests which can be run automatically is not sufficient for conformance testing [61].

Using Automatic Results to Predict the Manual Results


Despite the claims that automatic accessibility testing has limited practical usefulness, paper E shows that it is sufficient to use automatic tests to reliably calculate an accessibility score for a web site. This is done by using the UWEM 1.2 web site accessibility score as following. First, let $T$ be all accessibility tests in UWEM 1.2, and let $t$ be the 20% of $T$ tests which can run automatically. Further let $F(S)$ be the accessibility score for site $S$ based on $T$ and $f(S)$ be the accessibility score for web site $S$ using only $t$. The calculations of $F(S)$ and $f(s)$ are otherwise identical as defined in section 3. By utilizing that there is a correlation between $F(S)$ and $f(S)$, paper E presents an approach using cubic regression with a confidence interval based on the estimated standard error. With this approach, in 73% of the cases, the $F(S)$ could be automatically calculated within the confidence interval based only on the $f(s)$. This eliminates all tests in $T$ not in $t$ for calculating an accessibility score on web site level.

Thus, if the aim is only to calculate a score for a web site based on the complete set of accessibility tests, automatic testing is in 73% of the cases sufficient.

It should be noted that only 30 pages from two sites and 5 different templates are used in this study. This small data set is not enough to provide proof that automatic testing is enough, but it gives a strong implication. Applying the same study with additional and different web pages, web sites and templates could yield different results.

Comparison at Web Site Level

The Norwegian agency for Public Management and eGovernment (Difi) carries out an annual measurement of public web sites. This includes assessment of web sites from all 430 Norwegian municipalities [82]. The results are based on three distinct categories: accessibility, user adaptation and useful content.

Paper F and figure 8 compare the accessibility results from manual assessment by retrieved in the end of 2008 with automatically retrieved results from 414 of 430 Norwegian municipalities retrieved in January 2008. Even though the two data sets are retrieved with completely different methods, using expert testers and automatic tools, there is a strong correlation. The data shows that the fewer barriers detected automatically in a web site, the better score the site gets by The only exception is web sites that are categorised as exceptionally good by and received six out of six stars in the scoring system.

A conclusion based on this data is that automatic testing alone can strongly suggest which web sites are in-accessible, but cannot detect the most accessible sites.

Comparison of manually retrieved results by and automatically calculated accessibility results of 414 municipality web sites. ( results were collected in the second half of 2008, the automatically calculated results were retrieved January 2009.)


Comparison Over Large Regions

Paper G presents measurement results from an evaluation of more than 2300 European web sites and compares the data to other European web accessibility studies. The high level comparison is done by categorising the results as pass, limited pass, marginal fail and fail. The categorisation is based on percentage of barriers over applied tests (see section 3). Limited pass means passing all automatic tests. If a site passes both manual and automatic accessibility tests the web site gets a pass score. The data shows that the distributions are very similar between the automatically retrieved results using the eGovMon framework and data from the other studies MeAC 2007 [102] and Cabinet Office 2005 [103].14

Note that the evaluations are not done on the same web sites. The automatic checking evaluates web sites are part of European surveys carried out by Capgemini titled ``Online Availability of Public Services: How Is Europe Progressing?'' [36]. Even so, the web sites in all studies are intended to be representative.

The number of sites that fail either marginally or completely are roughly the same in all studies. In contrast, the number of web sites which are categorised as limited pass are substantially less in eGovMon. A limited pass means passing all automatic accessibility tests, but not necessarily the tests run manually by experts.

There are several factors which contribute to this result:

eGovernment Findings


A substantial part of eGovernment benchmarking is presenting the results. In addition to methods for measuring web accessibility and locating services, paper C, paper F, paper G, paper H and paper J present results from measurements of government web sites.


Paper F and figure 9 present accessibility results of Norwegian municipality web sites aggregated into county scores. It shows that there are significant differences in the level of web accessibility in the municipalities, giving considerable differences in the web accessibility between counties. It varies from an average of 10% of the tests detecting barriers in county Oslo up to 37% in county Aust-Agder.

Map of Norway based on the accessibility scores (error rate as explained in section 3) of 414 municipality web sites. (Results retrieved January 2009.)


Comparison of score and regions. The number of web sites is in parenthesis. (Results retrieved November 2008.)


Paper C and figure 10 further show that there is a significant difference between the accessibility of the web sites in the different sector to which the organisation belongs. A conclusion based on the results in paper C and paper F is that the most accessible web sites can be found at the national level such as ministry, department and parliament. These sites are followed by web sites on education and health such as universities and hospitals which are located at regional and national levels, and then web sites on county level. The study shows that the most in-accessible web sites can be found at the municipal level.15

As argued in paper D and paper F, the differences in the results could be because there is more focus and more available resources to use on web accessibility on national level compared to regional, county and municipal level. Another explanation of this is that there are differences in how municipalities prioritise web accessibility. In small municipalities, there are often only one or a few people working with web sites. Most of the time, web site maintenance is only part of their work [104]. This means that web accessibility is less prioritised, specially in small municipalities with limited resources [105].

Paper F further presents what are the most frequent barriers in the various web sites and how these are related to actual problems people experience. The most common automatically detected barriers in local Norwegian government web sites are:

Global Evaluation

Paper H and figure 11 present web accessibility results of all UN member states. The web accessibility score is aggregated based on web accessibility evaluation results from the most important government web sites in each country. The same sites are also part of the UN eGovernment report 2010 [35]. Similarly, paper G presents an accessibility results from Europe as a whole (not on country level).

The results show that there are significant differences between web sites in the evaluated countries. The countries with the most accessible web sites are Germany, closely followed by Portugal, Spain and the Netherlands.

Map of the world based on the accessibility scores (error rate as explained in section 3) of public government web sites. (Results retrieved May-June 2010.)


Social Influences

Making web sites accessible is a complex task including several levels in the political and administrative hierarchy, from signing declarations and laws down to the technical implementations and individual training of editors.

Paper H analyses the influence of social factors on web accessibility. The paper includes an analysis of the most important public web sites for all UN member states. The paper presents hypotheses based on unproven assumptions related to web accessibility available in the literature, and provides evidence on the assumptions using correlation analysis between public data [106],[107],[108],[109],[110],[111],[112],[113] and the retrieved web accessibility results.

Web Accessibility Versus Financial Situation

Paper H shows, not surprisingly, that countries with a good financial situation have more accessible web sites than countries with limited financial funds. This is in line with the popular claim that web accessibility is prioritised down when resources are limited [105]. In spite of this, the paper gives examples of the opposite. Thus, it gives evidence that it is possible to achieve accessible web sites without financial wealth. As an example, by ranking the countries according to the achieved accessibility score in paper H, Hungary is ranked as number 6 while Austria is ranked as 30. In contrast, by ranking countries according the GNI Per capita, Hungary is ranked as 43 while Austria is ranked as 13.

Web Accessibility Versus Anti-disability Discrimination Laws

Paper H further shows that well established anti-disability discrimination laws is the analysed factor which influences the web accessibility of public web sites the most. This is in line with assumptions in the literature [110],[111],[112],[114]. However, signing the UN convention on Rights and Dignity of Persons with Disabilities has had no such effect yet.16

Hence, the data suggests that to improve the accessibility for the governmental web sites in a country, introducing laws on accessibility helps, while signing the UN convention by itself does not.

Claimed Accessibility Versus Web Accessibility Score

To claim that a web site is accessible, common practice is to use W3C Web Accessibility Initiative (WAI) logos on the site. These logos are intended to represent the level of accessibility of a web site. The logos which exists are, in the order from less to more accessible, A, AA or AAA corresponding to following priority level 1,2 and 3 of WCAG 1.0.

The correct use of the logos is neither verified by W3C nor any other third party organization. Thus, the users need to trust that the logos are truthfully representing the level of accessibility of the web site. It is assumed in the literature that the promise of the sites carrying these logos are exaggerated [115], but little empirical data is available to support this assumption.

Paper H investigates the correlation between achieved web accessibility scores and the level of accessibility claimed by the using WAI logos. The results show that not a single one of the evaluated web sites conform to the claimed level of accessibility. However, web sites with WAI logos are in general more accessible then web sites without. Furthermore, web sites with the WAI logo AA achieve better accessibility scores than web sites with the logo A.17 The same conclusion was done on a later similar study on conformance on web accessibility [116].

Web Accessibility Versus Web Site Quality

It is commonly claimed that making web sites accessible is equivalent to introducing simplified versions of the original web site [117]. To check this, the accessibility scores can be compared with quality measurements. There exists several surveys which aim at measuring the quality of national web sites. An example is the UN global eGovernment Survey based on a questionnaire on the presence of information, interactive services, online transactions and networked presence. The results are incorporated in a compound metric for each country intended to represent the quality of eGovernment in the country [35].

Paper H shows, despite popular claim, there is a strong correlation between quality measured by the UN global eGovernment survey and the accessibility measured in paper H. This conclusively shows that accessible web sites are in general of higher quality in other eGovernment areas as well than in-accessible web sites, and rejects the claim that in-accessible web sites are simplified versions of the original web site.

Governmental Transparency Results

Paper J uses the lost sheep to locate transparency services and information. These are services which promote openness in governments. Table 3 presents these results. In the lost sheep algorithm, reachable is defined as within a pre-defined number of click from the starting page. In this experiment it is set to three [118].

The data shows significant differences between the services and information to be located. Some information is available on almost all web sites, such as contact information, while other services are more sparsely available such as online local government plan meetings.

Table 3 further shows that some information is very easily reachable. For example contact information and mail records are reachable within three clicks for 354 and 311 of the evaluated web sites respectively.

Further, in general, there is a connection between whether a service or information is reachable, and how many municipalities have the service. This connection, as expected, shows that the most important services are both present and easily reachable for citizens on most municipal web sites. Services that do not follow this pattern are presence of chat with local government officials and video of local government meetings. Even though it is very infrequent (20 and 27 of 427 web sites), it is reachable within three clicks in 14 (45%) and 13 (52%) respectively of the web sites where it was found.

IDTypeDescription# of municipalities# reachable
1 Information Contact information 416345
2 Information Recent information section 350168
3 Information Budget 19960
4 Information Local government calendar 238105
5 Information Local government plan 21665
6 Information Zoning information and plans 15533
7 Service Mail record 379311
8 Service Search functionality (on a single page) 364179
9 Service Online local government council meetings 332143
10 Service Online local government executive council 292102
11 Service Chat with administrative or political officials 209
12 Service Video of local government council meetings 2714
13 Service Online local government plan meeting 3913
Results from 13 transparency tasks evaluating 427 local government web sites. (Results retrieved April 2011.)



This thesis introduces innovative ways to automatically measure quality aspects of eGovernment web sites. Automatic measurements have a tremendous potential as they can reduce the need for human interaction, increase the frequency and scope of measurements and even support evaluation on demand. Automation can also reduce cost and the effect of bias in the results introduced by human judgement.

The thesis covers automatic testing in two areas which are often part of eGovernment surveys.

For benchmarking web accessibility, the proposed tools use a combination of deterministic tests and learning algorithms to evaluate web sites, web pages and PDF documents.

In addition, the thesis introduces an algorithm for automatically locating services and information online.

Both approaches are shown to work well in practice and yield high accuracy when comparing with manual evaluations. The produced results could either be used directly, or as input to expert testers for verification, or as a mechanism to prioritise which parts should be tested with experts or real users.

The thesis also indicates which social and political aspects influences the test results, indicating that the most viable method to improve web accessibility of government web sites is by introducing laws and regulations.

The main contributions in this thesis are:

Challenges and Further Research Directions


There are several remaining challenges not addressed in this thesis when it comes to automatic benchmarking of eGovernment.


For measuring accessibility, there are several aspects not part of the presented tools of the eGovMon project, which could be integrated to improve the quality of the measurements. An example of this is measuring the use of WAI-ARIA which is a W3C application suite which helps users with disabilities to interact, typically in dynamic web sites.

The current tool tests accessibility according to WCAG 1.0, where each test is mapped to WCAG 2.0. Developing deterministic tests for WCAG 2.0 is currently ongoing. This work also includes development of tests for client side scripting.

Further, the W3C WCAG 2.0 Evaluation Methodology Task Force (Eval TF) are developing methods for WCAG 2.0 web sites conformance — including sampling, testing and reporting. There are apparent similarities with the thesis work and the aim of the task force. Because of this, there are plans to provide input to the task force through the eGovMon project. It should however be noted that the focus of this thesis is on automatic measurements, while the Eval TF does not limit itself to automatic measurements. Further, since the task force aims at developing methods for conformance, heuristic tests, such as learning algorithms, can not be used alone. For conformance, the results of heuristic methods need to be verified by expert testers.

The tool presented assesses web pages with the technologies (x)HTML, CSS and PDF documents. However, additional technologies are part of the web sites which could also be measured such as, Flash, other document formats, and the scheduled HTML5. For Flash, there already exists WCAG 2.0 guidelines. In addition, the PDF tests could be updated to match WCAG 2.0.

This thesis also presents accessibility testing using classifiers for judging alternative texts of images. To increase what is testable automatically an approaches using similar classification techniques can be applied to check whether link texts are descriptive, the web site is navigable, the text is readable, and so on.

Locating Services

Further work is needed to make the lost sheep operate with multiple languages, or making the algorithm language independent, which is required to make the algorithms usable in large scale international web assessment.

Further work is also needed to enable the lost sheep to address areas other than transparency. An example of this is collecting online forms which are related to efficiency and citizen participation.

An additional challenge with internationalisation of tests on transparency is that, as opposed to the adopted accessibility guidelines from W3C, there are no widely recognised transparency guidelines.

Potential broader monitoring approach

The tools presented in this thesis are so that the results can be used to support a broader monitoring approach in several ways:


To ensure reproducibility and enable external examination, all tools and corresponding results in this thesis are publicly available in an open licence.

All tools in this project is developed in an open licence and can be downloaded from The deterministic accessibility tests (web page, web site, large scale and PDF) are also available through an online interface at:

More specifically, the specific tools can be retrieved from the following URLs:

All results are also publicly available. Note that many of these results are presented in publications, while others are collected from other external sources including the UN, the World Bank and Difi/ Restriction on the usage of this data may apply.

Summary of Contributions


Paper A outlines existing international eGovernment studies and their corresponding methodologies. These state-of-the-art methodologies focus on manually collecting data from web sites and conducting interviews. Such methodologies make comparison both between studies and analysis over time challenging. To solve this, the paper suggests to introducing automatic unbiased benchmarking.

Paper B introduces a prototype tool for large scale automatic web accessibility measurements. The prototype includes a distributed web crawler and a uniform random sampler. It evaluates web sites using deterministic accessibility tests, stores the results in RDF databases and further aggregates the data into a data warehouse.

Paper C introduces a measurement methodology and reference implementation for accessibility of PDF documents. The paper shows that (x)HTML web pages and PDF documents in Norwegian municipalities are significantly less accessible than web sites in the Norwegian regional and national governments. Further, the paper presents an overview of the different levels of eGovernment where accessibility needs to be considered.

Paper D expands the accessibility testing suite with a learning algorithm based test for alternative image texts. Alternate texts are often automatically generated by web publishing software or not properly provided by the content editor, which impose significant web accessibility barriers. The introduced classification algorithms can expose in-accessible alternative texts with an accuracy of more than 90%.

Paper E shows a comparison of automatic and manual accessibility results. Critics claim that results from automatic and manual accessibility benchmarking are not related and thus automatic assessment cannot reliably be used to measure accessibility. This paper disproves that claim by showing that in 73% of the cases, results from automatic accessibility measurements can through regression be accurately used to predict the manual results.

Paper F presents accessibility results of web sites from Norwegian municipalities and counties. It further shows a solid dependency between automatic collected results and manual evaluations carried out by DIFI/, even though the two evaluations assess accessibility differently. The data further indicates that good accessibility cannot be detected by automatic evaluations alone, but needs to be supported by manual assessment.

Paper G presents high level accessibility measurement results of more than 2300 public web sites in Europe. The paper shows that in-accessible web sites are very common in the public European web. The paper further introduces severity levels of the web site score and shows that the distribution of scores is similar to the state-of-the-art European measurements ``the UK Cabinet Office'' and ``Measuring Progress of eAccessibility in Europe''.

Paper H presents accessibility evaluation results for the public web sites in all UN member states. The paper also extends the existing methodologies with an approach for aggregating results on ministerial, national and continental level. Additionally, the paper shows which social factors are related to reaching a good accessibility score. It shows that countries with good economy and strong accessibility laws have more accessible web sites. In contrast, signing the UN Rights and Dignity of Persons with Disabilities Convention does not influence the web accessibility.

Paper I extends the accessibility measurements with a methodology and algorithm for detecting the presence of material in large web sites by introducing a colony inspired classification algorithm called the lost sheep. The paper shows that the lost sheep in a synthetic environment can accurately find the target page with a higher accuracy and by downloading fewer pages than comparable algorithms.

Paper J extends the lost sheep to be applied in locating services and information in real eGovernment web sites. A reference implementation of the lost sheep is applied to detect targeted transparency components and report both if the material exists and an indication of whether the service can be found by real users. The results show that the lost sheep outperforms all comparable algorithms with higher accuracy and fewer downloaded pages.


1. [The eGovMon project is co-funded by the Norwegian Council of Research under the VERDIKT program. Project no.: Verdikt 183392/S10. See appendix A for a project overview.]

2. [Note that not all surveys have a high impact on governments, there are examples of results from eGovernment studies that are ignored [14].]

3. [Note that this is only a mapping from WCAG 1.0 to WCAG 2.0, which means that it is not WCAG 2.0 compliant. WCAG 2.0 have guidelines which are not part of WCAG 1.0 and is therefore not part of this list.]

4. [To deal with this issue, W3C has proposed to create a WCAG 2.0 Evaluation Methodology Task Force [58] with a planned evaluation methodology finished in December 2013.]

5. [Note that it is well known that tests descriptions for experts can be worded in a way that minimises subjectivity. Typically any automatable test can also be carried out by an expert with high cross-evaluator repeatability.]

6. [Note that the well known readability index is a deterministic test [66].]

7. [Also with the scoring system in paper H it is still theoretically possible to manipulate the score. This could for example be done by introducing an accessible web site with accessible dummy content. In general, it is hard to address manipulation of automatic measurements without manual assessment [70].]

8. [In PDF documents it is possible to have languages specified on each paragraph. This test only addresses language specified on the document levels.]

9. [It is possible to use OCR technologies to get text from scanned PDF documents. However, there are currently no assisitve technology solution in common use utilizing OCR sufficiently well [75].]

10. [Some heuristic accessibility testing is available in the literature such as [76],[63].]

11. [Note that the lost sheep is not guaranteed to find the shortest path. This means that it is theoretically possible for a page to be within three clicks, but that the lost sheep reports otherwise.]

12. [There are 430 Norwegian municipality web sites, but only 427 web sites were available during the experiments.]

13. [In the situations where more than one page matching the criteria existed, the starting page were chosen. This could e.g. be the page which links to the different sections of a budget. Without exceptions, there was at most one web page manually selected as the correct target page.]

14. [Paper G refers to the implementation as European Internet Accessibility Observatory (EIAO). The implementation in EIAO was continued as eGovernment Monitoring (eGovMon).]

15. [Capgemini later reached the same conclusion on the maturity level of the web sites based on data from Europe [38].]

16. [Note that the common practice is that a country ratifies a signed convention into a law. Therefore the correlations may change when more countries ratify the convention.]

17. [Only two web sites had the WAI AAA, so no sound correlation analysis could be made.]


[1] Developing fully functional E-government: A four stage model,Layne, K. and Lee, J.,Government information quarterly,122--136,2001,18,2,0740-624X

[2] E-government in digital era: concept, practice, and development,Fang, Z.,International Journal of The Computer, The Internet and Management,1--22,2002,10,2

[3] E-government strategies: best practice reports from the European front line,Millard, J.,Electronic Government,177--189,2002

[4] Reorganisation of government back-offices for better electronic public services,Millard, J.,Electronic Government,363--370

[5] E-Government as an anti-corruption strategy,Andersen, T.B.,Information Economics and Policy,201--210,2009,21,3

[6] Is E-Government Leading to More Accountable and Transparent Local Governments? An Overall View,Pina, V. and Torres, L. and Royo, S.,Financial Accountability \& Management,3--20,2010,26,1

[7] ‘E-government’: is it the next big public sector trend?,Gauld, R.,International Handbook of Public Management Reform,105,2009

[8] The E-government Imperative: Main Findings,Organisation for Economic Co-operation and Development (OECD),Retrieved 2008 from,March

[9] Developing a Successful E-Government Strategy,Lowery, L.M.,San Francisco, Department of Telecommunications \& Information Services,2001

[10] Digital government: Technology and public sector performance,West, D.M.,2005

[11] Most e-government-for-development projects fail: how can risks be reduced?,Heeks, R. and University of Manchester. Institute for Development Policy and Management,2003

[12] The curse of the benchmark: an assessment of the validity and value of e-government comparisons,Bannister, F.,International Review of Administrative Sciences,171,2007,73,2

[13] Benchmarking eGovernment: tools, theory, and practice,Codagnone, C.,European journal of ePractice,2008

[14] If you measure it they will score: An assessment of international eGovernment benchmarking,Janssen, D. and Rotthier, S. and Snijkers, K.,Information Polity,121--130,2004,9,3-4

[15] E-government Leadership: Rhetoric vs. Reality – Closing the Gap.,Accenture,2001,Retrieved April 12th, 2011, from

[16] E-government Leadership: Realizing the Vision.,Accenture,2002,Retrieved April 12th, 2011, from

[17] E-government Leadership: Engaging the Customer.,Accenture,2003,Retrieved April 12th, 2011, from

[18] E-government Leadership: High Performance, Maximum Value.,Accenture,2004,Retrieved April 12th, 2011, from

[19] Leadership in Customer Service: New Expectations, New Experiences.,Accenture,2005,Retrieved April 12th, 2011, from

[20] Leadership in Customer Service: Building the Trust.,Accenture,2006,Retrieved April 12th, 2011, from trust-summary.aspx

[21] Leadership in Customer Service: Delivering on the Promise.,Accenture,2007,Retrieved April 12th, 2011, from promise.aspx

[22] WMRC global e-government survey, October, 2001,West, D.M.,Taubman Center for Public Policy, Brown University,2001

[23] Global e-government, 2002,West, D.M.,Center for Public Policy, Brown University, at PDF,2002

[24] Global e-government, 2003,West, D.M.,Brown University Report,2003

[25] Global E-government, 2004,West, D.M.,Retrieved March,2004,23

[26] Global e-government, 2005,West, D.M.,Center for Public Policy, Brown University, Providence, RI http://www. insidepolitics. org/egovt05int. pdf [accessed 21 June 2006],2005

[27] Global e-government, 2006,West, D.M. and others,Providence, RI: Brow University,2006

[28] Global e-government, 2007,West, D.M.,2007

[29] Improving technology utilization in electronic government around the world, 2008,West, D.M. and Brookings Institution. Governance Studies,2008

[30] Benchmarking E-Government: A Global Perspective: Assessing the Progress of the UN Member States,Ronaghan, S.A. and American Society for Public Administration and United Nations. Division for Public Economics and Public Administration,2002

[31] Global E-Government Survey 2003, E-government at the Crossroads,United Nations Department of Economic and Social Affairs,Retrieved March 8th, 2010, from,December

[32] Global E-Government Readiness Report 2004, Towards Access for Opportunity,United Nations Department of Economic and Social Affairs,Retrieved March 8th, 2010, from,December

[33] Global E-Government Readiness Report 2005, From E-Government to E-Inclusion,United Nations Department of Economic and Social Affairs,Retrieved March 8th, 2010, from,December

[34] Global E-Government Survey 2008, From E-Government to Connected Governance,United Nations Department of Economic and Social Affairs,Retrieved March 8th, 2010, from,December

[35] Global E-Government Survey 2010, Leveraging E-government at a Time of Financial and Economic Crisis,United Nations Department of Economic and Social Affairs,Retrieved May 11th, 2010, from,February

[36] Online Availability of Public Services: How Is Europe Progressing?,Capgemini,Retrieved February 1st, 2010, from

[37] Smarter, Faster, Better eGovernment,Capgemini,Retrieved February 1st, 2010, from

[38] Digitizing Public Services in Europe: Putting Ambition into Action,Capgemini,Retrieved March 16th, 2011, from

[39] Lutz Kubitschke and Ingo Meyer Feliccital Lull and Marten Haesner and Kevin Cullen and Shari McDaid and Ciaran Dolphin and Luke Grossey and Brian Grover and Chris Bowden and Carol Pollit and Siobonne Brewster and Tilia Boussios,Retrieved November 4th, 2009, from

[40] The Universal Declaration of Human Rights: origins, drafting, and intent,Morsink, J.,2000

[41] Convention on the Rights of Persons with Disabilities,United Nations,Retrieved November 4th, 2009, from

[42] The Accessibility Imperative,Global Initiative for Inclusive Information and Communication Technologies,Retrieved December 18th, 2009, from

[43] i2010 -- Ministerial Declaration,European Commission,2006,Retrieved 2011 from

[44] i2010 -- {A European,European Commission,Retrieved 2008 from

[45] Introduction to Web Accessibility,World Wide Web Consortium,Retrieved April 19th, 2011, from

[46] The relationship between accessibility and usability of websites,Petrie, H. and Kheir, O.,397--406,2007,Proceedings of the SIGCHI conference on Human factors in computing systems,ACM

[47] Flexible reporting for automated usability and accessibility evaluation of web sites,Beirekdar, A. and Keita, M. and Noirhomme, M. and Randolet, F. and Vanderdonckt, J. and Mariage, C.,Human-Computer Interaction-INTERACT 2005,281--294,2005

[48] What frustrates screen reader users on the web: A study of 100 blind users,Lazar, J. and Allen, A. and Kleinman, J. and Malarkey, C.,International Journal of human-computer interaction,247--269,2007,22,3

[49] Evaluating web site accessibility: validating the WAI guidelines through usability testing with disabled users,,535--538,2008,NordiCHI '08: Proceedings of the 5th Nordic conference on Human-computer interaction,978-1-59593-704-9,,New York, NY, USA,Lund, Sweden

[50] Universal design handbook,Preiser, W.F.E. and Ostroff, E.,2001

[51] Accessibility, usability and universal design positioning and definition of concepts describing person-environment relationships,\textbf{Iwarsson, S. and Stå,Disability \& Rehabilitation,57--66,2003,25,2

[52] Web accessibility-automatic/manual evaluation and authoring tools,Petrie, H. and Power, C. and Weber, G.,Computers Helping People with Special Needs,334--337,2008

[53] Evaluating Web Sites for Accessibility: Overview,World Wide Web Consortium,Retrieved April 8th, 2011, from

[54] Web Content Accessibility Guidelines 1.0. W3C Recommendation 5 May 1999,World Wide Web Consortium,Retrieved November 4th, 2009, from

[55] The BenToWeb Test Case Suites for the Web Content Accessibility Guidelines (WCAG) 2.0,Strobbe, C. and Koch, J. and Vlachogiannis, E. and Ruemer, R. and Velasco, C. and Engelen, J.,Computers Helping People with Special Needs,402--409,2008

[56] Web Content Accessibility Guidelines (WCAG) 2.0,World Wide Web Consortium,Retrieved November 4th, 2009, from

[57] How WCAG 2.0 Differs from WCAG 1.0,World Wide Web Consortium (W3C),Retrieved 2008 from

[58] [DRAFT] WCAG 2.0 Evaluation Methodology Task Force (Eval TF) Work Statement,World Wide Web Consortium,Retrieved May 20th, 2011, from

[59] Web Accessibility Benchmarking Cluster,2007,Retrieved November 4th, 2009, from

[60] UWEM Indicator Refinement,,Retrieved 2008 from

[61] The Unified Web Evaluation Methodology (UWEM) 1.2 for WCAG 1.0,Annika Nietzio and Christophe Strobbe and Erik Velleman,394-401,2008

[62] Analysis of navigability of Web applications for improving blind usability,Takagi, H. and Saito, S. and Fukuda, K. and Asakawa, C.,ACM Transactions on Computer-Human Interaction (TOCHI),13--es,2007,14,3

[63] Prediction of web page accessibility based on structural and textual features,Bahram, S. and Sen, D. and Amant, R.S.,31,2011,Proceedings of the International Cross-Disciplinary Conference on Web Accessibility,ACM

[64] Accessibility designer: visualizing usability for the blind,Takagi, H. and Asakawa, C. and Fukuda, K. and Maeda, J.,177--184,2004,77-78,ACM SIGACCESS Accessibility and Computing,ACM

[65] Revisiting readability: A unified framework for predicting text quality,Pitler, E. and Nenkova, A.,186--195,2008,Proceedings of the Conference on Empirical Methods in Natural Language Processing,Association for Computational Linguistics

[66] A lightweight methodology to improve web accessibility,Greeff, M. and Kotz{\'e,30--39,2009,Proceedings of the 2009 Annual Research Conference of the South African Institute of Computer Scientists and Information Technologists,ACM

[67] World Wide Web Consortium,Retrieved November 4th, 2009, from

[68] Test Report for EIAO version 2.2 of the Observatory Software,Morten Goodwin Olsen and Nils Ulltveit-Moe and Annika Nietzio and Mikael Snaprud, Anand Pillai,Retrieved 2008 from

[69] D4.1.1 System Design Specification,Morten Goodwin Olsen,Retrieved 2009 from

[70] Manipulability of PageRank under sybil strategies,Cheng, A. and Friedman, E.,2006,Proceedings of the First Workshop of Networked Systems (NetEcon06)

[71] Referansekatalog for IT-standarder i offentlig sektor,Ministry of Government Administration and Reform,Retrieved 7th of June 2011 from

[72] ,Retrieved 20th of May 2011 from

[73] Status Report 2008. The development in society for persons with disabilities,National Centre of Documentation on Disability Affairs,Retrieved 2008 from,Oslo, Norway

[74] Document management: Electronic document file format for long-term preservation -- Part 1: Use of PDF 1.4 (PDF/A-1),ISO{~

[75] Assistive technology,Draffan, EA,Neurodiversity in higher education: positive responses to specific learning differences,217,2009

[76] Web Compliance Manager Demo,Web Compliance Center of the Fraunhofer Institute for Applied Information Technology (FIT),Retrieved November 4th, 2010, from

[77] Universal design for Web applications,Chisholm, W. and May, M.,2008,0596518730

[78] Increasing web accessibility by automatically judging alternative text quality,Bigham, J.P.,352,2007,Proceedings of the 12th international conference on Intelligent user interfaces,ACM

[79] Some features of alt text associated with images in web pages,Craven, T.C.,Information Research,2006,11

[80] The art of ALT: toward a more accessible Web,Slatin, J.M.,Computers and Composition,73--81,2001,18,1

[81] Testfakta kommunetest januar 2011,The Consumer Council of Norway,2011,Retrieved March 23rd, 2011, from

[82] Quality of public web sites,Norwegian Agency for Public Management and eGovernment (Difi),Retrieved September 2009 from

[83] Benchmarking e-Government - A Comparative Review of Three International Benchmarking Studies,Lasse Berntzen and Morten Goodwin Olsen,International Conference on the Digital Society,77-82,2009,0,978-0-7695-3526-5,,Los Alamitos, CA, USA

[84] Understanding and measuring eGovernment: international benchmarking studies,Heeks, R.,27--28,2006,UNDESA workshop,“E-Participation and E-Government: Understanding the Present and Creating the Future”, Budapest, Hungary

[85] Data mining for hypertext: A tutorial survey,Chakrabarti, S.,ACM SIGKDD Explorations Newsletter,1--11,2000,1,2

[86] Mapping the semantics of web text and links,Menczer, F.,IEEE Internet Computing,27--36,2005,9,3

[87] Topical locality in the Web,Davison, B.D.,272--279,2000,Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval,ACM,1581132263

[88] Stochastic models for the web graph,Kumar, R. and Raghavan, P. and Rajagopalan, S. and Sivakumar, D. and Tomkins, A. and Upfal, E.,57--65,2002,Foundations of Computer Science, 2000. Proceedings. 41st Annual Symposium on,IEEE

[89] Scalability of findability: effective and efficient IR operations in large information networks,Ke, W. and Mostafa, J.,74--81,2010,Proceeding of the 33rd international ACM SIGIR conference on Research and development in information retrieval,ACM

[90] Ant Focused Crawling Algorithm,Dziwinski, P. and Rutkowska, D.,Artificial Intelligence and Soft Computing--ICAISC 2008,1018--1028,2008

[91] Information retrieval in distributed hypertexts,De Bra, P. and Houben, G.J. and Kornatzky, Y. and Post, R.,481--491,1994,Proceedings of the 4th RIAO Conference

[92] The shark-search algorithm. An application: tailored Web site mapping,Hersovici, M. and Jacovi, M. and Maarek, Y.S. and Pelleg, D. and Shtalhaim, M. and Ur, S.,Computer Networks and ISDN Systems,317--326,1998,30,1-7,0169-7552

[93] Intelligent crawling on the World Wide Web with arbitrary predicates,Aggarwal, C.C. and Al-Garawi, F. and Yu, P.S.,96--105,2001,Proceedings of the 10th international conference on World Wide Web,ACM,1581133480

[94] Probabilistic models for focused web crawling,Liu, H. and Milios, E. and Janssen, J.,16--22,2004,Proceedings of the 6th annual ACM international workshop on Web information and data management,ACM,1581139780

[95] Learning automata based classifier,Zahiri, S.H.,Pattern Recognition Letters,40--48,2008,29,1,0167-8655

[96] Accessing the deep web: A survey,He, B. and Patel, M. and Zhang, Z. and Chang, K.C.C.,Communications of the ACM,95--101,2007,50,5

[97] Crawling the hidden web,Raghavan, S. and Garcia-Molina, H.,129--138,2001,Proceedings of the International Conference on Very Large Data Bases

[98] Data mining on the web,Greening, D.R.,Web Techniques,2000,5,1

[99] D2.1.2 State-of-the-art review: transparency indicators,Lasse Berntzen and Annika Nietzio and Morten Goodwin Olsen and Ahmed AbdelGawad and Mikael Snaprud,Retrieved 2009 from

[100] Governance matters VII: Aggregate and individual governance indicators, 1996-2007,Kaufmann, D. and Kraay, A. and Mastruzzi, M.,The World Bank Development Research Group Macroeconomics and Growth Team and World Bank Institute Global Governance Program,2008

[101] Understanding web accessibility,Henry, S.,Web Accessibility,1--51,2006

[102] Kevin Cullen and Lutz Kubitschke and Ingo Meyer,Retrieved March 19th, 2010, from

[103] Cabinet Office,Retrieved 2006 from

[104] Accessibility of eGovernment web sites: towards a collaborative retrofitting approach,Nietzio, A. and Olsen, M. and Eibegger, M. and Snaprud, M.,Computers Helping People with Special Needs,468--475,2010

[105] Human Rights and Disability: The current use and future potential of United Nations human rights instruments in the context of disability,Quinn, G. and Degener, T. and Bruce, A.,2002

[106] Data and Statistics,The World Bank,Retrieved November 4th, 2009, from

[107] Statistics Portal,Organisation for Economic Co-operation and Development,Retrieved November 4th, 2009, from,3352,en_2825_293564_1_1_1_1_1,00.html

[108] Government at Glance,Organisation for Economic Co-operationd and Development ,Retrieved November 4th, 2009, from,October

[109] Measuring the Information Society - The ICT Development Index,United Nations International Telecommunication Union,Retrieved February 1st, 2010, from

[110] Government Laws, Regulations, Policies, Standards, and Support,Shawn Lawton Henry,Retrieved March 18th, 2010, from

[111] World Laws,WebAIM,Retrieved March 18th, 2010, from

[112] Policies Relating to Web Accessibility,,Retrieved March 18th, 2010, from

[113] Convention and Optional Protocol Signatures and Ratifications,United Nations Rights and Dignity of Persons with Disabilities,Retrieved November 4th, 2009, from

[114] Global e-government Web Accessibility: An Empirical Examination of EU, Asian and African Sites,Kuzma, J.M. and Yen, D. and Oestreicher, K.,The Second International Conference on Information and Communication Technology and Accessibility, Hammamet, Tunisia,May

[115] Sex, lies and Web accessibility: the use of accessibility logos and statements on e-commerce and financial Web sites,Petrie, H. and Badani, A. and Bhalla, A.,23--25,2005,Proceedings of Accessible Design in the Digital World Conference

[116] Declaring conformance on web accessibility.,Suzette Keith and Nikolaos Floratos,Retrieved 1st of July 2011 from

[117] Modeling Web Accessibility for Rich Document Production,Lopes, R. and Carrico, L.,Journal of Access Services,237--260,2009,6,1

[118] Testing the Three-Click Rule,Joshua Porter,2003,Retrieved April 1st, 2011, from

The author of this document is:
Morten Goodwin
E-mail address is:
morten.goodwin __at__
Phone is:
+47 95 24 86 79