US20110314091A1 - Method and system for automated analysis and transformation of web pages - Google Patents
Method and system for automated analysis and transformation of web pages Download PDFInfo
- Publication number
- US20110314091A1 US20110314091A1 US13/149,025 US201113149025A US2011314091A1 US 20110314091 A1 US20110314091 A1 US 20110314091A1 US 201113149025 A US201113149025 A US 201113149025A US 2011314091 A1 US2011314091 A1 US 2011314091A1
- Authority
- US
- United States
- Prior art keywords
- web page
- page
- web
- transformation
- client
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/12—Use of codes for handling textual entities
- G06F40/14—Tree-structured documents
- G06F40/143—Markup, e.g. Standard Generalized Markup Language [SGML] or Document Type Definition [DTD]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/957—Browsing optimisation, e.g. caching or content distillation
- G06F16/9577—Optimising the visualization of content, e.g. distillation of HTML documents
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/12—Use of codes for handling textual entities
- G06F40/151—Transformation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/04—Processing captured monitoring data, e.g. for logfile generation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/08—Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/02—Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]
Definitions
- This invention relates generally to a method and system for modifying web pages, including dynamic web pages, based on automated analysis wherein web pages are transformed based on transformation instructions in nearly real-time, and wherein analysis is performed and transformation instructions based on the analysis are prepared prior to a request for the web page.
- the system has two primary components, an analyzer which asynchronously and repeatedly analyzes web pages creating and updating transformation instructions relating to the web pages, and a transformer which intercepts traffic to a web server in response to a request for the web page, receives the returned web pages, and transforms them based on stored transformation instructions.
- Web Pages are complicated entities, made up primarily of Hypertext Markup Language (HTML), but often containing other technologies, such as Cascading Style Sheets (CSS), JavaScript, Flash, and many more. Web Pages can be thought of as programs executed by a browser or client, which is capable of executing software code in the abovementioned languages and technologies. Without a typical user's knowledge, web pages are often generated upon request, created by running dedicated software on the server when a user request is received. Such dedicated software is called a web application, and uses technologies such as J2EE, PHP, ASP.NET and others.
- a web page is defined hereafter as software code for example provided or served as a response to a request for a particular and unique URL or web address, or pointer thereto from a client such as HTML, XHTML or different versions thereof; a web page is therefore software code that embodies or defines the web page, i.e. the software code which allows a web client to render or display a page for viewing.
- a web page at a particular address or pointed thereto whether modified or not is considered to be “the web page”.
- the response to a request for a web page is altered or transformed as compared to a previous response to the same request, the transformed web page is considered to be a modified version of the “same” web page rather than a “new” web page.
- one page may render much faster than the other; One page may expose a security flaw while the other does not; One page can be successfully loaded in multiple different internet clients or browsers, while the other may only work in Internet Explorer.
- Web applications are embodied in software, and making modifications to them requires development work, subsequent testing and deployment, all of which risk the integrity of the software and require skilled workers' time. Some of these changes require more expertise and time than others. For example, making a web page load faster, fixing a security flaw or making a web page accessible, often require a fair amount of expertise and time.
- a proxy is a software application able to intercept and modify incoming and outgoing communication with the web server.
- a proxy can be implemented in various ways, including the provision of a separate machine that traffic to a web server would go through, or a software proxy deployed as a web-server add-on through which internet traffic is passed.
- a proxy can compress the web page and add an HTTP header indicating it did so, such as the Apache mod_deflate add-on. Making the modifications in a proxy is an alternative to modifying the web application, and provides several benefits:
- Proxy-based manipulations of web pages are relatively common. They generally do not modify the page content, but rather the delivery mechanism wrapper—usually a Hypertext Transfer Protocol (HTTP) response. The modifications performed are typically based on manual configuration, stating which changes to apply under which conditions. The proxies rarely attempt to parse nor do they generally have built in intelligence to understand the pages presented.
- HTTP Hypertext Transfer Protocol
- proxies that not only perform the transformation, but also attempt to analyze the page and transform it based on that analysis.
- the two primary examples are HTML transcoders for mobile browsing and transformation for performance optimization.
- HTML Transcoders for Mobile browsing attempt to modify web pages to look better on the smaller smartphone screens. They try to extract the primary information and design aspects of the page, and modify them to fit on a page. These transcoders exist both as proxy solutions and client-side solutions. The different proxy solutions modify the page anywhere between the client and the server, while the client-side solutions modify the page just before rendering it, usually running on the mobile device itself. These HTML Transcoders perform the analysis of the web pages in real-time, while the client is awaiting the response.
- Performance optimization analysis and transformation tools analyze pages looking for a variety of known performance related impediments, and attempt to obviate or correct them. For example, one optimization technique is to combine all CSS files referenced in a page into one CSS file. If a page referenced 5 external CSS files, combining them into one would eliminate four requests when loading the page, and the combined file, when encoded using gzip compression, would likely compress more efficiently than compressing the files separately.
- a proxy solution may attempt to identify the CSS files in a given page, create a combined file, and modify the page to reference that combined CSS file instead. More examples of web page performance optimizations are explained further down the document.
- Proxy based analysis and transformation engines face two conflicting requirements; the first is speed. In order to transform web pages in real-time, the transformation must be applied quickly enough so as to not introduce any substantial delay. This tolerance for any delay is usually measured in milliseconds, and becomes even more challenging when the web application is under a heavy user load. This requirement is even more important for solutions looking to optimize the performance of a web page, as any delay introduced takes away from the solution's value.
- This web page references two css files.
- the first reference is written clearly into the HTML.
- the second reference is printed by JavaScript, using a variable holding the current menu version.
- Web pages are becoming more and more complex and the technologies incorporated within them are becoming more dynamic and difficult to understand. Therefore, analyzing web pages is a task that will only become more difficult over time, and this conflict will only worsen.
- Another solution is to use heuristics to compensate for shortcomings of analysis. For example, looking for document.write( ) calls in JavaScript code using regular expressions can often be done fast enough for real-time. These techniques are much more error prone and far less complete than the full analysis that could be done with more time.
- a method is provided of modifying a web page to the client in response to a request from the client, comprising:
- transformers can reside on a same system at a same location, in other embodiments transformers may access transformation instructions from a common repository, while being each being located in different physical locations, for example, in different cities or countries.
- a system for changing the content of a requested web page, in response to a request for the web page from a client is provided, so as to vary a characteristic thereof which comprises:
- an analyzer including one or more suitably programmed processors for analyzing at least a portion of the web page to identify at least a predetermined characteristic and for creating transformation instructions corresponding to said characteristic;
- a memory for storing the transformation instructions received from the analyzer
- a transformer for modifying the web page based on transformation instructions previously stored in the memory prior to said request for the web page and for returning the modified web page to the client in response to the request for the web page.
- a method is provided of modifying Source on the client side to resemble Target, comprised of: identifying the differences between Source and Target on the server using a computer-based comparison algorithm; and, generating an instruction set, executable by the client, for modifying the Source to be equivalent to the Target based on at least an identified difference, wherein the equivalence criteria is predetermined.
- the server modifies the Source web page before providing it to the client, replacing at least a reference on Source to Target with at least a reference to the instruction set.
- the reference to the instruction set is at least the instruction set itself.
- the reference to the instruction set comprises of at least a request to a server to retrieve the instruction set.
- the request to retrieve the instruction set returns an instruction set previously created and stored.
- the request to retrieve the instruction set creates the instruction set and returns it;
- the created instruction set is also stored on a computer accessible medium.
- At least one of the instruction sets modifying Source to Base and Base to Target is stored on a computer accessible medium.
- FIG. 1 is a system block diagram illustrating a request for a Web page from a client.
- the method and system of this invention use an analyzer for analyzing web pages and for preparing transformation instructions used by a transformer for transforming the content of web pages so as to, for example increase the speed of rendering one or more web pages.
- a solution to the problem of increasing the speed of delivery of web pages to a requesting client such as Internet Explorer, Safari, or Firefox, is achieved by separating the transformation from the analysis, and performing the transformation in near-real time and analyzing web pages to build transformation instructions in a much greater time span outside of the near real-time flow, asynchronous to the request, typically before or after a request and delivery of a web page.
- Transformation Instructions are often simple instructions, as simple as a textual search and replace instructions. The most important trait of a transformation instruction is that it does not require any deep knowledge or understanding of the page it is transforming. Transformation instructions may be grouped together, if the analysis concluded a set of transformations should either all be applied or not at all.
- the Transformer 200 acts as a proxy to the web application. Whenever it receives a web page, it fetches and applies the relevant transformation instructions from the repository 401 . If there are no transformation instructions, the transformer 200 requests an analysis task for this web page to the Repository 401 .
- FIG. 1 shows the transformer 200 , the analyzer 300 , and the repository 401 shown with the client and web application.
- the transformer 200 and the analyzer 300 are shown to each comprise multiple blocks.
- Each transformer block 200 and analyzer block 300 represents another instance of the transformer and analyzer, therefore it is possible to have multiple transformers and analyzers working with a same repository at the same time.
- the transformer 200 resides between the client and the web application, and is able to modify the returned web page.
- the transformer 200 logs requests and pages as needed, to the repository 401 .
- the analyzer(s) 300 reads a page and/or request from the repository, analyzes it, and writes transformation instructions to the repository 401 which will likely be used for a subsequent request of the web page.
- the transformer 200 and analyzer 300 work asynchronously; therefore there are two sequences, one for each.
- the analyzer 300 sequence is as follows:
- the system includes a transformer 200 and an analyzer 300 .
- Each of the transformer and analyzer includes a program storage device or memory storage device 202 / 302 may include a computer hard drive or other computer media and is configured to store a program 204 / 304 .
- the program storage device 202 / 302 is further configured to work in conjunction with a processor 203 / 303 on a computer device 201 / 301 to execute program instructions to transform in the transformer and analyze in the analyzer the program 204 / 304 .
- a repository interface 205 / 305 is used to interact with a memory 400 containing the repository 401 .
- the memory is a computer-based storage which allows programmatic access to it, and may include but is not limited to a database, hard drive and RAM.
- a transformation software component 206 is configured to apply needed transformations on a web page.
- An analysis software component 306 is configured to analyze a web page.
- a network component 207 enables the transformer to intercept a request made by a client 102 making a request to a target web server 103 , as well as interact with said client 102 , said target web server 103 , and optionally other web applications and/or external entities.
- a network component 307 enables the analyzer to interact with web applications and other external entities.
- the transformer 200 and analyzer 300 may share the same processor 203 / 303 and network interface 207 / 307 if executed as separate threads.
- the transformer may be implemented such that it is able to intercept the request between the client 102 and the target web server 103 and interact with the target web server 103 without requiring the network component 207 , for example if implemented as an add-on to the web server, the transformer can interact with the target web server 103 without the need of a network interface.
- transformation instructions can be defined in many ways, as long as they can be performed quickly enough by the transformer.
- transformation instruction is a search and replace instruction, made up of the original text and replacement text, and flag indicating if only the first found instance of the original text should be replaced or all instances.
- the transformer When receiving a search and replace transformation instruction for a given Web page, the transformer searches for the original text on the Web page, and replaces the first or all matches with the replacement text.
- Search and replace instructions may use regular expressions for the search, to support a more powerful search.
- transformation instructions may be grouped and applied as an “all or nothing” action—either all transformations are applied or none are. For example, in this instance a group of search and replace transformation instructions are only applied if the original text of all the instructions in the group was found on the page.
- step 3 the returned page in step 3 is the following one, with the added link in bold.
- 3(b)(ii) may occur if the page changed, or a variant of it was returned.
- the Transformer may create a new analysis task for The Page in this instance, to create new instructions for the revised page.
- Transformation instructions may be associated with a request and/or a web page, or any part of them. The only requirement is for the transformer to know which transformation instructions are relevant to the current request/web page.
- the transformer may determine a web page needs to be re-analyzed, and create an analysis task for it even if transformation instructions already exist for it. Examples of such conditions are:
- Analyzers must monitor the repository in a way that enables them to detect or be notified of a new analysis task in a reasonable amount of time. Examples of monitoring techniques include polling the repository every 100 ms for new tasks; being notified by the repository through a programmatic interface when a new task requires analysis.
- Different requests to the same page may be analyzed separately, if deemed different. Examples of such differences could be specific HTTP parameters (in the query or post data), difference in specific headers, difference in specific cookies, etc.
- Another key example is a different client type, specifically browser type, device type (e.g. laptop, desktop, smartphone) and operating system. Different browsers often require slightly or dramatically different transformations, even to achieve the same purpose. Therefore, the analysis and transformations are often done for every client type.
- the method and system of this invention allow for a deeper analysis of web pages, since they do not delay the page.
- Such deep analysis can result in more intelligent and more powerful transformations, and therefore more valuable ones.
- Below are a few examples of deep analysis that can be performed in such a system. These analysis techniques require a relatively long time to perform, making them not practical in a system where the analysis duration delays the delivery of the web page.
- JavaScript is a programming language, and a very flexible one, and it is therefore very difficult to understand everything a specific piece of JavaScript code may do.
- the two primary techniques to understanding JavaScript are Static Analysis & JavaScript Execution.
- Static analysis analyzes the JavaScript source code, along with any libraries it uses, and attempts to build mathematical models of all the possible executions the code may do.
- Various properties of the JavaScript language, and specifically the eval( ) function can make these models nearly infinite or impracticably large in size.
- static analysis can analyze specific aspects of a JavaScript code snippet with a high percentage of success. For example, static analysis can be used to create a call graph, indicating which function calls which other function. While there may be some minor error in the call graph, it will generally be highly accurate for most JavaScript code snippets.
- static analysis can be used for example to determine whether a JavaScript code snippet calls document.write( ) either directly or through a function in its call graph. Since document.write( ) adds content to the HTML right after the location of the script tag that holds it, such scripts often cannot be moved within the HTML without harming the rendered page or its functionality. Knowledge of which scripts call document.write( ) and which do not helps the analyzer avoid making modifications that will harm the page.
- Static analysis can provide different types of information to the analysis process, including but not limited to identifying unused code on a web page; identifying code that will not work on some browsers; and identifying inefficient code and potential infinite loops.
- a second common technique for understanding JavaScript is to execute it and observe the results. This technique is usually done by simulating or automating a browser, loading the page, and monitoring the executed code and its interaction with the Web page. For example, monitoring whether document.write( ) was called, and what content was passed to it.
- JavaScript Execution has various pros and cons when compared to static analysis. Slower performance is one if its primary disadvantages, as static analysis tends to be much faster than JavaScript Execution.
- Static Analysis can easily determine document.write( ) is being called, but cannot easily determine what exactly was written. JavaScript Execution would easily extract the exact HTML added to the page.
- JavaScript Execution can provide a considerable amount of very useful information. The types of information often overlap with those that JavaScript static analysis can extract.
- One primary usage is to use JavaScript execution to identify and extract links created by JavaScript, like the one included in the HTML in the example above.
- JavaScript execution is not nearly fast enough to be performed in real-time. However, with the technique described in this disclosure it can be performed by the analyzer outside the real-time flow.
- Some types of analysis may combine the analysis of more than one Web page, to determine which transformations to apply to a given page.
- optimizing a specific page to load as quickly as possible may harm the load time of a subsequent page.
- Page A links to page B
- Page A references 2 CSS files, “a.css” and “b.css”
- Page B references 2 CSS files, “b.css” and “c.css”.
- page A is modified to reference a combined CSS file, holding “a.css” & “b.css”, then when page B is loaded, it needs to re-download the content of “b.css”. If this scenario repeats with additional resources on the two pages, then optimizing page A's load time by merging resources may slow down page B's load time.
- the analysis performed by a proxy may attempt to analyze the pages linked to by page A, and perhaps even additional pages, before determining how to transform page A. With such a broad view, the analysis can strike the right balance between maximizing one page's load time and maximizing the cache.
- a user when logging into Google's webmail solution, gmail, a user is presented by his inbox, containing a list of email threads. When the user clicks one of these emails, dedicated JavaScript fetches the contents of that email (possibly with additional information on how to display/render). JavaScript on the page then interprets that data, and modifies the loaded page to display the email's content instead of the list of emails shown before.
- a delta script performing a Bridged A->B Transition can also attempt to avoid unnecessary steps. For example, if Transition A->Base includes modifying the page title, and Transition Base->B modifies the title as well, the script can—where possible—skip the first title change.
- the delta script is made of 2 parts:
- applying the 2 step transition would require merging the list of changes required for Transitions A->Base and Base->B, before performing step 2. If Transition Base->B replaces the modified value in Transition A->Base, the change can be done directly from page A to page B.
- Transition A->Base replaces the text ⁇ title>A ⁇ /title> with the text ⁇ title>Base ⁇ /title>
- transition Base->B replaces the text ⁇ title>Base ⁇ title> with the text ⁇ title>B ⁇ /title>
- the merge operation would perform a single transformation from ⁇ title>A ⁇ /title> to ⁇ title>B ⁇ /title>.
- the base page should be a page that is relatively similar to the rest of the pages on the site.
- This technique doesn't require one single page for all the pages on the site. It is also possible to create several “base pages”, each used to convert between a set of pages, and calculate the deltas between these base pages. For example, a web application might have one base page for each language the website is displayed in, used for all the pages written in that language. In this case a Bridged A->B Transition may include more than one mediating base page.
- the process of modifying page A to replace links with a delta script generated on-demand is made of two parts—modifying page A and creating the delta script.
- step 2(b)(i) ensures the script returned from the proxy would simply load the linked page in such cases, thus maintain a functionally identical user experience, albeit slower.
- Step 2(b)(ii) can be timed, and if it takes longer than an acceptable threshold, the proxy would revert to step 2(b)(i); the delta script generated in step 2(b)(ii) can be cached, and reused in case the linked page hasn't changed; step 2(b)(ii) could be modified to perform only a partial comparison, based on initial analysis that can be done in step 1. For example, user configuration or the analysis in step 1 may determine that only certain parts of the page may change dynamically. In such cases, the delta script can be pre-created for the page, and only modified based on the comparison of the dynamic parts of the page.
- the required context includes any resources or setup that needs to be in place before the transformation is applied. For example, when merging CSS files as described above, the merged CSS file has to be created and placed in the correct location before the transformations can be applied. If the context is not fully set, the transformations may modify the page to an invalid one—for instance, a page that references a non-existent CSS file.
- Setting up the context may be happen quickly, but in some instances it can be very time consuming, enough so that it would not be reasonable to perform it in real-time, delaying the web page's response.
- An example is when a new resources needs to be communicated to a third party, and the third party does not guarantee the time it takes to perform this communication.
- Content Delivery Networks are solutions in which the data of various resources is duplicated or “mirrored” into various locations around the globe.
- the returned web page references a generic location for a resource (e.g. the URL https://cdn.site.com/resource.css).
- DNS Domain Name System
- IP Internet Protocol
- a proxy performing analysis and transformation may place a newly created resource on a CDN, or move an existing resource referenced by a web page to it. In that case, setting up the context would include copying the resource to the CDN. This copy operation may take a long time, as the copy may need to be mirrored to many different locations. Therefore, performing such a copy usually cannot be done quickly enough to be performed in real-time.
- the method and system in accordance with this invention for analysis and transformation of web pages can be used for many different purposes. Performance Optimization is one possible purpose, as demonstrated above. Making web pages render and load faster is very valuable, and has been shown to tie directly to company revenues and user satisfaction. With the variety of browsers, operating systems and technologies involved, ensuring a web page loads and performs quickly is not easy.
- This task requires expertise and development time, and is hard to apply retroactively to existing web pages. Therefore, it is a task well suited for automated proxy-based analysis and transformation.
- the analysis can identify performance problems and optimization opportunities on each page, and the needed transformations to speed up the page.
- Web Browser Compatibility is another use case. Web Browsers change and advance rapidly, and while much of their functionality is standard, much of it still isn't. This means the same page may render and function well in one browser, but not in another, even if both browsers contain the features logically required to handle the web page. This is most evident in JavaScript, where subtle differences in the different browsers' implementations result in a lot of differences.
- Automated analysis & transformation of web pages can attempt to identify and correct cases of browser incompatibility.
- Internet Explorer allows a web page to perform a background request using a COM object called XmlHttpRequest.
- Firefox and many other browsers do not support this COM object, but offer a specific implementation of the XmlHttpRequest class.
- Automated analysis can identify the use of the COM object in a page returned to a Firefox browser, and replace it with the class built into Firefox instead.
- Web Applications today often make use of third party components, and include those in their web pages. These third party components provide a range of functionalities, including web analytics for example, measuring various aspects of site usage, ad networks displaying ads managed by the third party, and many more.
- proxy-based analysis and transformation system can offer much greater flexibility in replacing these third party components.
- the analysis can contain all the expertise required to interact with a variety of similar third party components, and offer the web application administrator a simple, non programmatic way of choosing the one to apply.
- a proxy based analysis and transformation engine can replace the references to one ad network with references to another for the desired regions.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Signal Processing (AREA)
- Computer Networks & Wireless Communication (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Environmental & Geological Engineering (AREA)
- Information Transfer Between Computers (AREA)
Abstract
Description
- The present invention claims priority from U.S. Provisional Patent Application No. 61/357,138 filed Jun. 22, 2010, which is incorporated herein by reference.
- This invention relates generally to a method and system for modifying web pages, including dynamic web pages, based on automated analysis wherein web pages are transformed based on transformation instructions in nearly real-time, and wherein analysis is performed and transformation instructions based on the analysis are prepared prior to a request for the web page. The system has two primary components, an analyzer which asynchronously and repeatedly analyzes web pages creating and updating transformation instructions relating to the web pages, and a transformer which intercepts traffic to a web server in response to a request for the web page, receives the returned web pages, and transforms them based on stored transformation instructions.
- Web Pages are complicated entities, made up primarily of Hypertext Markup Language (HTML), but often containing other technologies, such as Cascading Style Sheets (CSS), JavaScript, Flash, and many more. Web Pages can be thought of as programs executed by a browser or client, which is capable of executing software code in the abovementioned languages and technologies. Without a typical user's knowledge, web pages are often generated upon request, created by running dedicated software on the server when a user request is received. Such dedicated software is called a web application, and uses technologies such as J2EE, PHP, ASP.NET and others.
- A web page is defined hereafter as software code for example provided or served as a response to a request for a particular and unique URL or web address, or pointer thereto from a client such as HTML, XHTML or different versions thereof; a web page is therefore software code that embodies or defines the web page, i.e. the software code which allows a web client to render or display a page for viewing.
- Therefore a web page at a particular address or pointed thereto whether modified or not is considered to be “the web page”. For all intents and purposes, within the context of this document, if the response to a request for a web page is altered or transformed as compared to a previous response to the same request, the transformed web page is considered to be a modified version of the “same” web page rather than a “new” web page.
- One implication of the complexity of web pages is that there are many ways to achieve a same goal. Two web pages can look the same and function the same way for a given client, but their actual content may be very different.
- Even when different implementations result in the same interface presented to a user, they may differ greatly in many different aspects. For example, one page may render much faster than the other; One page may expose a security flaw while the other does not; One page can be successfully loaded in multiple different internet clients or browsers, while the other may only work in Internet Explorer. These are but a few of the implications the specific implementations carry.
- Changing a Web Page, especially one that's auto generated, can be a costly endeavour. Web applications are embodied in software, and making modifications to them requires development work, subsequent testing and deployment, all of which risk the integrity of the software and require skilled workers' time. Some of these changes require more expertise and time than others. For example, making a web page load faster, fixing a security flaw or making a web page accessible, often require a fair amount of expertise and time.
- Note that some changes to web pages are designed and built into the web application. For example, a news site would read the news articles to display from a database; a personalized home page would serve a different structure for different users; and drag-and-drop functionality may only be included in web pages served back to specific browsers able to support it. In the context of this document, changes to the response based such logic are considered a part of the web page when built into the web application.
- To avoid or reduce the cost of making such changes, these changes are sometimes performed by manipulating the web page after it is generated, using a proxy. A proxy is a software application able to intercept and modify incoming and outgoing communication with the web server. A proxy can be implemented in various ways, including the provision of a separate machine that traffic to a web server would go through, or a software proxy deployed as a web-server add-on through which internet traffic is passed.
- Because internet traffic it is intercepted by a proxy, it can modify the responses that are returned. For example, a proxy can compress the web page and add an HTTP header indicating it did so, such as the Apache mod_deflate add-on. Making the modifications in a proxy is an alternative to modifying the web application, and provides several benefits:
-
- Cost: It is often lower cost
- Time to deploy: It can often be up and running more quickly Flexibility: It is more dynamic in nature, easier to add/remove as needed
- Field deployable: It can be deployed and configured by people outside the development group, specifically by those who administer the infrastructure of the website
- Proxy-based manipulations of web pages are relatively common. They generally do not modify the page content, but rather the delivery mechanism wrapper—usually a Hypertext Transfer Protocol (HTTP) response. The modifications performed are typically based on manual configuration, stating which changes to apply under which conditions. The proxies rarely attempt to parse nor do they generally have built in intelligence to understand the pages presented.
- In the last few years, there have been a few examples of proxies that not only perform the transformation, but also attempt to analyze the page and transform it based on that analysis. The two primary examples are HTML transcoders for mobile browsing and transformation for performance optimization.
- HTML Transcoders for Mobile browsing attempt to modify web pages to look better on the smaller smartphone screens. They try to extract the primary information and design aspects of the page, and modify them to fit on a page. These transcoders exist both as proxy solutions and client-side solutions. The different proxy solutions modify the page anywhere between the client and the server, while the client-side solutions modify the page just before rendering it, usually running on the mobile device itself. These HTML Transcoders perform the analysis of the web pages in real-time, while the client is awaiting the response.
- Performance optimization analysis and transformation tools analyze pages looking for a variety of known performance related impediments, and attempt to obviate or correct them. For example, one optimization technique is to combine all CSS files referenced in a page into one CSS file. If a page referenced 5 external CSS files, combining them into one would eliminate four requests when loading the page, and the combined file, when encoded using gzip compression, would likely compress more efficiently than compressing the files separately. A proxy solution may attempt to identify the CSS files in a given page, create a combined file, and modify the page to reference that combined CSS file instead. More examples of web page performance optimizations are explained further down the document.
- Performing analysis alongside the transformation makes these proxy solutions much more powerful than transforming based only on configuration. For example, these solutions are more maintainable, as they adapt to changes in the web application or its content without (or with less) user involvement. These analysis solutions also usually provide built-in expertise in the analysis process, knowing what aspects of a page to look for and how to transform them into the desirable result. Performance optimization is a good example of an area where many developers do not know how to make their web pages load faster, making a solution that automatically applies performance optimizations to web application appealing.
- Proxy based analysis and transformation engines face two conflicting requirements; the first is speed. In order to transform web pages in real-time, the transformation must be applied quickly enough so as to not introduce any substantial delay. This tolerance for any delay is usually measured in milliseconds, and becomes even more challenging when the web application is under a heavy user load. This requirement is even more important for solutions looking to optimize the performance of a web page, as any delay introduced takes away from the solution's value.
- The second is deep analysis. As mentioned above, web pages contain many technologies, and properly understanding a web page is a complicated and CPU intensive task. The most common technology manifesting this problem is JavaScript. While parsing HTML can be done quite efficiently, fully understanding what a snippet of JavaScript code does requires considerably more CPU power and more sophisticated algorithms. Some code snippets are thought to be impossible to analyze in a reasonable time, at least based on current research.
- These two requirements are in conflict. On one hand, one can't perform deep analysis in real-time speed. On the other, without deep analysis, only very basic understanding of a page can be achieved, and the resulting transformations are very limited and error prone.
- Let us consider an example of a case where deep analysis is required for the optimization mentioned before, which attempts to merge all referenced CSS files on a page into one file.
- Web pages often use JavaScript to dynamically add content to the HTML page, using the document.write( ) function, for various reasons. Consider the following web page referred to hereafter as PAGE 1:
-
<html> <head> <link rel=’stylesheet’ type=’text/css’ href=’/main.css’> <script> var menuVer = ‘3.0.2’; document.write( ‘<link rel=”stylesheet” href=”/menu.’ + menuVer +’.css>’); </script> </head> <body> <!-- document body here --> </body> </html> - This web page references two css files. The first reference is written clearly into the HTML. The second reference is printed by JavaScript, using a variable holding the current menu version.
- Performing only HTML parsing on this page would conclude there is only one CSS file, and not two, and therefore would not perform the merging of CSS files (or perform it without the menu CSS). However, as mentioned before, executing or statically analyzing JavaScript is complex and resource intensive task, and cannot today be done fast enough to meet the real-time speed requirements.
- Web pages are becoming more and more complex and the technologies incorporated within them are becoming more dynamic and difficult to understand. Therefore, analyzing web pages is a task that will only become more difficult over time, and this conflict will only worsen.
- Today, the attempted solutions to this problem only raise the threshold of what can be analyzed quickly, or revert to manual configuration for areas that cannot be analyzed fast enough.
- One very common solution is to use hardware acceleration, building a dedicated appliance that does all or part of the analysis in hardware. This is an effective solution for some types of analysis, but it only slightly increases the types of analysis that can be done in real-time. For example, executing all the JavaScript on a page cannot be done nearly fast enough, even on hardware, for an average page. One drawback of this type of solution is that it is not very flexible and since the hardware is dedicated to do a particular task, varying that task or addition additional functionality can be problematic.
- Another solution is to use heuristics to compensate for shortcomings of analysis. For example, looking for document.write( ) calls in JavaScript code using regular expressions can often be done fast enough for real-time. These techniques are much more error prone and far less complete than the full analysis that could be done with more time.
- Another common solution is to use manual configuration to compensate for the more shallow analysis. For example, the user could manually specify that the page above contains the two CSS references. This technique has the original problem of being extremely hard to maintain and is not viable for many dynamic websites.
- No solution today offers a complete remedy to this problem. They only attempt to stretch the boundaries a little more is by attempting to analyze faster or attempting to make-do with shallow analysis, however this invention attempts to obviate this problem by providing a method and system which among other things speeds up the time in which a web page is returned to a client requesting that page.
- In accordance with the invention in a system wherein a web page is accessible to a client from a server, and wherein the web page has an associated url or link thereto, defining an address, wherein in response to a request for the web page the server provides the web page to the client, a method is provided of modifying a web page to the client in response to a request from the client, comprising:
- asynchronous to, and prior to said request from the client, in dependence upon predetermined conditions, analyzing at least a portion of the requested web page with an analyzer in a computer based system to identify at least a predetermined characteristic and creating transformation instructions that will change the predetermined characteristic when the web page is modified; and storing the transformation instructions in a repository;
- modifying the web page provided by the server in response to the request to the web page based on transformation instructions that were stored in the repository prior to said request from the client; and,
- providing the modified web page to the client.
- Although plural transformers can reside on a same system at a same location, in other embodiments transformers may access transformation instructions from a common repository, while being each being located in different physical locations, for example, in different cities or countries.
- In accordance the invention a system for changing the content of a requested web page, in response to a request for the web page from a client is provided, so as to vary a characteristic thereof which comprises:
- an analyzer including one or more suitably programmed processors for analyzing at least a portion of the web page to identify at least a predetermined characteristic and for creating transformation instructions corresponding to said characteristic;
- a memory for storing the transformation instructions received from the analyzer; and,
- a transformer for modifying the web page based on transformation instructions previously stored in the memory prior to said request for the web page and for returning the modified web page to the client in response to the request for the web page.
- In accordance with another aspect of the invention, in a system having a server provide a client with a web page (Source), and where at least one other web page exists (Target), a method is provided of modifying Source on the client side to resemble Target, comprised of: identifying the differences between Source and Target on the server using a computer-based comparison algorithm; and, generating an instruction set, executable by the client, for modifying the Source to be equivalent to the Target based on at least an identified difference, wherein the equivalence criteria is predetermined.
- In the aspect of the invention above the server modifies the Source web page before providing it to the client, replacing at least a reference on Source to Target with at least a reference to the instruction set.
- In the aspect of the invention above the reference to the instruction set is at least the instruction set itself.
- In the aspect of the invention above the reference to the instruction set comprises of at least a request to a server to retrieve the instruction set.
- In an aspect of the invention above the request to retrieve the instruction set returns an instruction set previously created and stored.
- In an aspect of the invention above the request to retrieve the instruction set creates the instruction set and returns it;
- In an aspect of the invention above the created instruction set is also stored on a computer accessible medium.
- In an aspect of the invention above there exists another web page (Base); and the difference and instruction set is calculated both between Source and Base and between Base and Target; and the instruction set for transforming from Source to Target is created at least by combining the two said instruction sets.
- In an aspect of the invention above some of the instructions in the combined instruction set are merged using a computer-based algorithm.
- In an aspect of the invention above at least one of the instruction sets modifying Source to Base and Base to Target is stored on a computer accessible medium.
- In an aspect of the invention above at least one of the instruction sets modifying Source to Base and Base to Target is read from computer accessible medium it was previously stored to.
- Exemplary embodiments of the invention will now be described in conjunction with the drawings in which:
-
FIG. 1 is a system block diagram illustrating a request for a Web page from a client; and, -
FIG. 2 is a detailed system block diagram illustrating the components of the system. - The method and system of this invention use an analyzer for analyzing web pages and for preparing transformation instructions used by a transformer for transforming the content of web pages so as to, for example increase the speed of rendering one or more web pages. In accordance with this invention a solution to the problem of increasing the speed of delivery of web pages to a requesting client such as Internet Explorer, Safari, or Firefox, is achieved by separating the transformation from the analysis, and performing the transformation in near-real time and analyzing web pages to build transformation instructions in a much greater time span outside of the near real-time flow, asynchronous to the request, typically before or after a request and delivery of a web page.
- Referring now to
FIG. 1 a system is shown comprised of two primary components: atransformer 200 for performing a transformation and ananalyzer 300 for performing the analysis. The system also includes adata repository 401 used by thetransformer 200 andanalyzer 300 to communicate and store information. - The
analyzer 300 does not reside between the client and the server, nor does it watch or interfere with that communication channel. The analyzer continuously monitors therepository 401, looking for requests to analyze a given page to perform tasks or analysis tasks. When the analyzer receives such a task, it analyzes the web page, and creates transformation instructions. Since the analysis is done asynchronously to the interaction between the client and the server, it does not delay the delivery of the web page to the client, and is not required to work in real-time speed. - Transformation Instructions are often simple instructions, as simple as a textual search and replace instructions. The most important trait of a transformation instruction is that it does not require any deep knowledge or understanding of the page it is transforming. Transformation instructions may be grouped together, if the analysis concluded a set of transformations should either all be applied or not at all.
- The
Transformer 200 acts as a proxy to the web application. Whenever it receives a web page, it fetches and applies the relevant transformation instructions from therepository 401. If there are no transformation instructions, thetransformer 200 requests an analysis task for this web page to theRepository 401. - This system solves any conflict between speed and analysis depth. The analysis does not delay the web page, and can therefore “afford” to perform deeper analysis and take more time to do so. The
transformer 200 does not need to understand the web page, only to apply the transformation instructions and can therefore do so very quickly. - Separating these two functions so that a transformation can be done essentially immediately in response to a request to a web page, and analysis can be done at another time, for example when the page is not being requested, allows the system to provide relatively up-to-date transformations in near-real time.
- This system and method has one key limitation, which is the fact sometimes, notably the first time a web page is received the analysis and transformations are not performed, and the web page is returned as is. This limitation makes it better suited to some tasks and not others. For example, when optimizing web page performance, it's usually acceptable if only the vast majority of pages are optimized, and so this method can be used to do such optimization. However, when fixing a security flaw, the system is likely expected to solve it for all pages, making this system less suitable for such use.
-
FIG. 1 shows thetransformer 200, theanalyzer 300, and therepository 401 shown with the client and web application. Thetransformer 200 and theanalyzer 300 are shown to each comprise multiple blocks. Eachtransformer block 200 andanalyzer block 300 represents another instance of the transformer and analyzer, therefore it is possible to have multiple transformers and analyzers working with a same repository at the same time. - As shown in
FIG. 1 , thetransformer 200 resides between the client and the web application, and is able to modify the returned web page. - The
transformer 200 logs requests and pages as needed, to therepository 401. The analyzer(s) 300 reads a page and/or request from the repository, analyzes it, and writes transformation instructions to therepository 401 which will likely be used for a subsequent request of the web page. - In response to a request for a web page, the
transformer 200 reads the transformation instructions related to the current request/web-page, and applies them to the web page, returning the modified web page to the client. - The
transformer 200 andanalyzer 300 work asynchronously; therefore there are two sequences, one for each. - The
transformer 200 sequence is as follows: -
- 1. Intercept a request and the web page returned from the application
- 2. Query the repository 401 (or a partial copy of the repository's data, such as a local cache) for relevant transformation instructions
- a. If found, transform the web page based on the queried instructions
- b. If none found, enter an analysis task for the request/page to the
repository 401
- 3. The
repository 401 returns the web page to the client with any transformations applied.
- The
analyzer 300 sequence is as follows: -
- 1. Continuously and at regular intervals monitor the
repository 401 for new analysis tasks - 2. After receiving a task, analyze the web page
- 3. Create transformation instructions for the page, based on the analysis
- 4. Write the transformation instructions to the
repository 401
- 1. Continuously and at regular intervals monitor the
- Referring now to
FIG. 2 , asystem 100 for automated analysis and transformation of web pages is shown. The system includes atransformer 200 and ananalyzer 300. Each of the transformer and analyzer includes a program storage device ormemory storage device 202/302 may include a computer hard drive or other computer media and is configured to store aprogram 204/304. Theprogram storage device 202/302 is further configured to work in conjunction with aprocessor 203/303 on acomputer device 201/301 to execute program instructions to transform in the transformer and analyze in the analyzer theprogram 204/304. Arepository interface 205/305 is used to interact with amemory 400 containing therepository 401. The memory is a computer-based storage which allows programmatic access to it, and may include but is not limited to a database, hard drive and RAM. Atransformation software component 206 is configured to apply needed transformations on a web page. Ananalysis software component 306 is configured to analyze a web page. Anetwork component 207 enables the transformer to intercept a request made by aclient 102 making a request to atarget web server 103, as well as interact with saidclient 102, saidtarget web server 103, and optionally other web applications and/or external entities. Anetwork component 307 enables the analyzer to interact with web applications and other external entities. - In a particular embodiment the
transformer 200 andanalyzer 300 may share thesame processor 203/303 andnetwork interface 207/307 if executed as separate threads. The transformer may be implemented such that it is able to intercept the request between theclient 102 and thetarget web server 103 and interact with thetarget web server 103 without requiring thenetwork component 207, for example if implemented as an add-on to the web server, the transformer can interact with thetarget web server 103 without the need of a network interface. - As mentioned heretofore, transformation instructions can be defined in many ways, as long as they can be performed quickly enough by the transformer.
- One example of a transformation instruction is a search and replace instruction, made up of the original text and replacement text, and flag indicating if only the first found instance of the original text should be replaced or all instances.
- When receiving a search and replace transformation instruction for a given Web page, the transformer searches for the original text on the Web page, and replaces the first or all matches with the replacement text.
- Search and replace instructions may use regular expressions for the search, to support a more powerful search.
- As mentioned above, transformation instructions may be grouped and applied as an “all or nothing” action—either all transformations are applied or none are. For example, in this instance a group of search and replace transformation instructions are only applied if the original text of all the instructions in the group was found on the page.
- Example—Merging CSS files
- As mentioned above, merging multiple CSS files referenced by a page can improve the page's loading speed. Here is a full flow or sequence of performing such a CSS merge on the web page described below and referred to as “The Page” in this sequence:
-
- 1. Transformer intercepts the first request to The Page
- a. Discovers there are no transformation instructions
- b. Creates an analysis Task for The Page in the Repository
- c. Returns The Page unmodified (with both CSS files referenced)
- 2. Analyzer receives the Analysis Task of analyzing The Page
- d. Analyzes the page, discovering both the static and dynamic CSS merges
- e. Creates a combined file, called “combined.css”, holding the content of both “main.css” and “menu.3.0.2.css”.
- f. Creates a group of 3 Search & Replace Transformation Instructions: 2 for removing the old CSS references and one for adding the new one.
- i. Instruction for removing main.css:
- Original: <link rel=‘stylesheet’ type=‘text/css’ href=‘/main.css’>
- Replacement: <empty string>
- ii. Instruction for removing menu.3.0.2.css (using a regular expression):
- Original: <script>\s*var menuVer=‘3.0.2’;\s*document.write(\s*‘<link rel=“stylesheet” href=“\menu’+menuVer+‘.css>’);\s*</script>
- Replacement: <empty string>
- iii. Instruction for adding the combined CSS after the head element:
- Original: <head>
- Replacement: <head><link rel=‘stylesheet’ type=‘text/css’ href=‘/combined.css’>
- i. Instruction for removing main.css:
- g. Store as The Page's Transformation Instructions in the Repository
- 3. Transformer receives another request to The Page
- h. Queries the DB, receives the group of Transformation Instructions
- i. Searches for the 3 original texts
- i. If all are found, replaces them with the replacement texts
- ii. If not all are found, does not modify the page
- j. Returns the (possibly) modified page
- 1. Transformer intercepts the first request to The Page
- In this case, if 3(b)(i) occurred, the returned page in step 3 is the following one, with the added link in bold.
-
<html> <head><link rel=’stylesheet’ type=’text/css’ href=’/combined.css’> </head> <body> <!-- document body here --> </body> </html> - Note that 3(b)(ii) may occur if the page changed, or a variant of it was returned. The Transformer may create a new analysis task for The Page in this instance, to create new instructions for the revised page.
- As seen in the system diagram, there may be multiple analyzers and multiple transformers all working as a part of the same system, using the same repository and sharing analysis tasks and transformation instructions.
- The existing solutions described in the background may still be applied here, such as using dedicated hardware for transformation or analysis, performing efficient analysis, leveraging user configuration, etc.
- Transformation instructions may be associated with a request and/or a web page, or any part of them. The only requirement is for the transformer to know which transformation instructions are relevant to the current request/web page.
- Under certain conditions, the transformer may determine a web page needs to be re-analyzed, and create an analysis task for it even if transformation instructions already exist for it. Examples of such conditions are:
-
- New analysis techniques may have been created
- The transformation instructions specify a condition that cannot be met, for example in a search and replace instruction, the searched text was not found on the web page.
- The current transformation instructions have become stale; a predefined time period has elapsed since the transformation instructions were retrieved
- Analyzers must monitor the repository in a way that enables them to detect or be notified of a new analysis task in a reasonable amount of time. Examples of monitoring techniques include polling the repository every 100 ms for new tasks; being notified by the repository through a programmatic interface when a new task requires analysis.
- Different requests to the same page may be analyzed separately, if deemed different. Examples of such differences could be specific HTTP parameters (in the query or post data), difference in specific headers, difference in specific cookies, etc. Another key example is a different client type, specifically browser type, device type (e.g. laptop, desktop, smartphone) and operating system. Different browsers often require slightly or dramatically different transformations, even to achieve the same purpose. Therefore, the analysis and transformations are often done for every client type.
- The method and system of this invention allow for a deeper analysis of web pages, since they do not delay the page. Such deep analysis can result in more intelligent and more powerful transformations, and therefore more valuable ones. Below are a few examples of deep analysis that can be performed in such a system. These analysis techniques require a relatively long time to perform, making them not practical in a system where the analysis duration delays the delivery of the web page.
- Understanding JavaScript is possibly the biggest barrier to performing proper analysis in real-time speed. JavaScript is a programming language, and a very flexible one, and it is therefore very difficult to understand everything a specific piece of JavaScript code may do. The two primary techniques to understanding JavaScript are Static Analysis & JavaScript Execution.
- Static analysis analyzes the JavaScript source code, along with any libraries it uses, and attempts to build mathematical models of all the possible executions the code may do. Various properties of the JavaScript language, and specifically the eval( ) function, can make these models nearly infinite or impracticably large in size. Thus, with today's technologies, it isn't feasible for a program to determine all the possible outputs and context changes every JavaScript code snippet may produce.
- However, static analysis can analyze specific aspects of a JavaScript code snippet with a high percentage of success. For example, static analysis can be used to create a call graph, indicating which function calls which other function. While there may be some minor error in the call graph, it will generally be highly accurate for most JavaScript code snippets.
- For the purpose of Web Page Transformation, static analysis can be used for example to determine whether a JavaScript code snippet calls document.write( ) either directly or through a function in its call graph. Since document.write( ) adds content to the HTML right after the location of the script tag that holds it, such scripts often cannot be moved within the HTML without harming the rendered page or its functionality. Knowledge of which scripts call document.write( ) and which do not helps the analyzer avoid making modifications that will harm the page.
- Static analysis can provide different types of information to the analysis process, including but not limited to identifying unused code on a web page; identifying code that will not work on some browsers; and identifying inefficient code and potential infinite loops.
- All but the shallowest JavaScript static analysis cannot, today, be performed in real-time speed for the amount of JavaScript on an average web page. However, with the technique described in this disclosure, static analysis can be performed by the analyzer, due to the reduced time constraints.
- A second common technique for understanding JavaScript is to execute it and observe the results. This technique is usually done by simulating or automating a browser, loading the page, and monitoring the executed code and its interaction with the Web page. For example, monitoring whether document.write( ) was called, and what content was passed to it.
- JavaScript Execution has various pros and cons when compared to static analysis. Slower performance is one if its primary disadvantages, as static analysis tends to be much faster than JavaScript Execution.
- However, there are some types of information that are much more easily obtained using JavaScript execution compared to static analysis. For example, for the script contained in PAGE 1 shown heretofore, Static Analysis can easily determine document.write( ) is being called, but cannot easily determine what exactly was written. JavaScript Execution would easily extract the exact HTML added to the page.
- For Web page transformation purposes, JavaScript Execution can provide a considerable amount of very useful information. The types of information often overlap with those that JavaScript static analysis can extract. One primary usage is to use JavaScript execution to identify and extract links created by JavaScript, like the one included in the HTML in the example above.
- JavaScript execution is not nearly fast enough to be performed in real-time. However, with the technique described in this disclosure it can be performed by the analyzer outside the real-time flow.
- Some types of analysis may combine the analysis of more than one Web page, to determine which transformations to apply to a given page.
- Example—Maximizing Subsequent Page performance through caching
- In some instances, optimizing a specific page to load as quickly as possible may harm the load time of a subsequent page.
- For instance, consider the following scenario:
- Page A links to page B
- Page A references 2 CSS files, “a.css” and “b.css”
- Page B references 2 CSS files, “b.css” and “c.css”.
- All CSS files can be cached for a long time
- If page A is modified to reference a combined CSS file, holding “a.css” & “b.css”, then when page B is loaded, it needs to re-download the content of “b.css”. If this scenario repeats with additional resources on the two pages, then optimizing page A's load time by merging resources may slow down page B's load time.
- So, the analysis performed by a proxy may attempt to analyze the pages linked to by page A, and perhaps even additional pages, before determining how to transform page A. With such a broad view, the analysis can strike the right balance between maximizing one page's load time and maximizing the cache.
- As with all the previous examples, performing an analysis on multiple pages takes even longer then analyzing a single page. Therefore, it cannot be properly performed when the analysis is done in-line, delaying the web page's return to the client.
- Example—Maximizing Subsequent Page performance by modifying original page
- When clicking a link from page A to page B, the browser goes through a costly process performance-wise of unloading page A, and loading page B. On most websites, large parts of page A and page B are probably shared (menu, headers, footers, etc), making much of this work unnecessary.
- Web applications looking to eliminate that work sometimes use JavaScript to modify page A, making it look and act like page B would have, instead of actually replacing the page. This has been proven to be a lot faster, and quite a few modern applications do this. This technique is considered to be one of the main aspects of the JavaScript and XML (AJAX) web application development methodology.
- For example, when logging into Google's webmail solution, gmail, a user is presented by his inbox, containing a list of email threads. When the user clicks one of these emails, dedicated JavaScript fetches the contents of that email (possibly with additional information on how to display/render). JavaScript on the page then interprets that data, and modifies the loaded page to display the email's content instead of the list of emails shown before.
- In order for the application to behave in this manner, it needs to be developed to do so. If the application was not initially developed to act this way, then modifying it to achieve this end requires significant development resources.
- However, with this newly introduced analysis & transformation technique, multiple page analysis can apply it after-the-fact onto existing pages. In case page A links to page B, the analysis can analyze both pages, and extract the delta or difference between the two. The analysis can then create transformation instructions to modify page A, replacing the link to page B with javascript that will transform page A to be visually and functionally equivalent to page B.
- The flow or sequence of such an analysis would be as follows:
- 1. Parse page A (included in an analysis task), extracting the links in it
- 2. For each link:
-
- a. Fetch and parse the linked page
- b. Compare page A and the linked page, extracting the differences between them
- c. Create a dedicated script that converts page A to the linked page (delta script)
- d. Create a Transformation Instruction that replaces the link on page A to execute the delta script
- 3. Store the Transformation Instructions to the Repository
- On a subsequent request for page A, a transformer applying these transformation Instructions would make the links from page A much faster.
-
- In step 2, it's possible the transformation would only be applied to a subset of the links on a page.
- In step 2(a), the linked page may not only be parsed, but somehow processed and/or modified before the delta is calculated. For example, the same Transformation Instructions a Transformer would have applied to it if it has been requested through it could be applied.
- In step 2(b), there are many ways a delta can be calculated, including comparing the text of the pages, the elements parsed from them, and more.
- In step 2(c), the script may be written in any language a client can understand and execute. JavaScript is the most likely example, but Flash, VBScript, Silverlight and others may be used as well.
- In step 2(c), the script can also know how to transform the new linked page back into page A, to allow a “back” action to use the same technique.
- In step 2(d), the transformation may be to embed the delta script inside page A, or the delta script may be saved to a file (delta file), and the transformation would be to replace the link with a generic script that fetches the delta file and executes it.
- In step 2(a), links on the linked page may be transformed themselves to call delta scripts to their own linked pages.
- Comparing every page on a web site to every page it links to may result in a lot of differences to calculate and possibly store. For instance, if a site has 100 links and each links to all the others, calculating the deltas between all of them would result in 10,000 deltas or differences.
- One way this can be improved is by calculating the delta of each page to and from a base page—for example, the home page of the web application. That means that in order to transition from page A to page 13 (“Bridged A->B Transition”), the delta script would combine two other delta scripts; the first will transition from page A to the base page (“Transition A->Base”); and the second from the base page to page B (“Transition Base->B”). In the example above, this solution would mean only 100 deltas would be created (and 200 delta scripts, converting the home page to and from every other page).
- A delta script performing a Bridged A->B Transition can also attempt to avoid unnecessary steps. For example, if Transition A->Base includes modifying the page title, and Transition Base->B modifies the title as well, the script can—where possible—skip the first title change. One way to perform that is if the delta script is made of 2 parts:
-
- 1. A list of required modifications a program can understand (e.g. modify the title tag)
- 2. A script, likely shared by all delta scripts, that performs the modifications in the list
- In such a case, applying the 2 step transition would require merging the list of changes required for Transitions A->Base and Base->B, before performing step 2. If Transition Base->B replaces the modified value in Transition A->Base, the change can be done directly from page A to page B.
- For instance, if Transition A->Base replaces the text <title>A</title> with the text <title>Base</title>, and transition Base->B replaces the text <title>Base<title> with the text <title>B</title>, the merge operation would perform a single transformation from <title>A</title> to <title>B</title>.
- Note that if the base page is a blank page, the delta would be pointless, as it would always contain the entire page. For example, a Bridged A->B Transition would require clearing all of page A and adding all of page B. This is effectively the same as simply loading page B, and therefore does not add significant performance benefits. Therefore, the base page should be a page that is relatively similar to the rest of the pages on the site.
- This technique doesn't require one single page for all the pages on the site. It is also possible to create several “base pages”, each used to convert between a set of pages, and calculate the deltas between these base pages. For example, a web application might have one base page for each language the website is displayed in, used for all the pages written in that language. In this case a Bridged A->B Transition may include more than one mediating base page.
- Calculating delta on-demand
- There are some scenarios where storing the delta between different pages is problematic. One example is a concern with the amount of stored data, especially when there are many pages involved. Another scenario is when pages change fairly often, effectively with every request. In such a case, the delta script would be invalid practically as soon as it was created.
- In such cases, another option is to generate the delta on-demand. Calculating the delta can be time consuming, but in some cases the performance gain in modifying the page can be worth the additional delay in calculating the delta.
- The process of modifying page A to replace links with a delta script generated on-demand is made of two parts—modifying page A and creating the delta script.
- 1. The flow of generating the modified page A is as follows:
-
- a. Sign page A (signature referred to as “The Signature”)
- b. Parse page A
- c. For at least one link on page A:
- i. Create a Transformation Instruction that replaces the link with a call to a web service on the proxy, asking it for the delta script between The Signature and this link
- d. Apply the newly created Transformation Instructions to page A
- e. Store the (now modified) page A in the Repository, using The Signature as its ID
- f. Store the Transformation Instructions to the Repository
- 2. The flow of generating the delta script on demand is as follows:
-
- a. Client clicks the modified link, resulting in a the call to the proxy web service being made, including The Signature and the original link location (the “Linked Page”)
- b. Proxy looks up the passed signature in the Repository
- i. If not found, returns a script making the browser change the page as it normally would (e.g. in JavaScript, the link would look like this: document.location=<link>)
- ii. If the signature points to a page, create a delta script between it and the Linked Page, as shown above
- iii. Proxy returns the delta script
- It should be noted that: the transformation Instructions created in step 1 will be applied on subsequent requests to page A; the pages stored in step 1 need to be cleared from time to time, but step 2(b)(i) ensures the script returned from the proxy would simply load the linked page in such cases, thus maintain a functionally identical user experience, albeit slower. Step 2(b)(ii) can be timed, and if it takes longer than an acceptable threshold, the proxy would revert to step 2(b)(i); the delta script generated in step 2(b)(ii) can be cached, and reused in case the linked page hasn't changed; step 2(b)(ii) could be modified to perform only a partial comparison, based on initial analysis that can be done in step 1. For example, user configuration or the analysis in step 1 may determine that only certain parts of the page may change dynamically. In such cases, the delta script can be pre-created for the page, and only modified based on the comparison of the dynamic parts of the page.
- Example—Time-consuming transformation context preparation
- In addition to the fact deep analysis of a web page can require considerable resources; sometimes it may take time to create the context required. The required context includes any resources or setup that needs to be in place before the transformation is applied. For example, when merging CSS files as described above, the merged CSS file has to be created and placed in the correct location before the transformations can be applied. If the context is not fully set, the transformations may modify the page to an invalid one—for instance, a page that references a non-existent CSS file.
- Setting up the context may be happen quickly, but in some instances it can be very time consuming, enough so that it would not be reasonable to perform it in real-time, delaying the web page's response. An example, is when a new resources needs to be communicated to a third party, and the third party does not guarantee the time it takes to perform this communication.
- Example—Posting resources to a Content Delivery Network (CDN)
- One example of such a third party is a content delivery network. Content Delivery Networks are solutions in which the data of various resources is duplicated or “mirrored” into various locations around the globe. When a client browses the web application, the returned web page references a generic location for a resource (e.g. the URL https://cdn.site.com/resource.css). When the client resolves the domain name (using a Domain Name System—DNS), the returned Internet Protocol (IP) address depends on the client's location on the network. The returned address aims to be the “closest” mirror on the network, meaning the mirror that can communicate the fastest with the client.
- A proxy performing analysis and transformation may place a newly created resource on a CDN, or move an existing resource referenced by a web page to it. In that case, setting up the context would include copying the resource to the CDN. This copy operation may take a long time, as the copy may need to be mirrored to many different locations. Therefore, performing such a copy usually cannot be done quickly enough to be performed in real-time.
- The method and system in accordance with this invention, for analysis and transformation of web pages can be used for many different purposes. Performance Optimization is one possible purpose, as demonstrated above. Making web pages render and load faster is very valuable, and has been shown to tie directly to company revenues and user satisfaction. With the variety of browsers, operating systems and technologies involved, ensuring a web page loads and performs quickly is not easy.
- This task requires expertise and development time, and is hard to apply retroactively to existing web pages. Therefore, it is a task well suited for automated proxy-based analysis and transformation. The analysis can identify performance problems and optimization opportunities on each page, and the needed transformations to speed up the page.
- Browser Compatibility is another use case. Web Browsers change and advance rapidly, and while much of their functionality is standard, much of it still isn't. This means the same page may render and function well in one browser, but not in another, even if both browsers contain the features logically required to handle the web page. This is most evident in JavaScript, where subtle differences in the different browsers' implementations result in a lot of differences.
- Automated analysis & transformation of web pages can attempt to identify and correct cases of browser incompatibility. For example, Internet Explorer allows a web page to perform a background request using a COM object called XmlHttpRequest. Firefox and many other browsers do not support this COM object, but offer a specific implementation of the XmlHttpRequest class. Automated analysis can identify the use of the COM object in a page returned to a Firefox browser, and replace it with the class built into Firefox instead.
- Web Applications today often make use of third party components, and include those in their web pages. These third party components provide a range of functionalities, including web analytics for example, measuring various aspects of site usage, ad networks displaying ads managed by the third party, and many more.
- These third party services are often free or use a pay-per-use model, and normally do not require long term commitments. This makes them more appealing to web application developers, who can swap them as needed. However, the cost of modifying the web application and replacing such a component is often quite costly, due to the development costs and required expertise in how to interact with the different third party components. This creates a de-facto commitment to those third party components, which is usually unplanned and often not in the web application owner's best interest.
- Using a proxy-based analysis and transformation system can offer much greater flexibility in replacing these third party components. The analysis can contain all the expertise required to interact with a variety of similar third party components, and offer the web application administrator a simple, non programmatic way of choosing the one to apply.
- For example, if a web application was used primarily in North America, and then started being used in the UK, it may be more lucrative to use different ad networks in those different regions. A proxy based analysis and transformation engine can replace the references to one ad network with references to another for the desired regions.
- Such external transformations can also be used to add invisible third party components after the fact. For example, web analytics often do not impact the user interface, and can fairly easily be added by such an external component, again based on logical configuration by the user.
Claims (19)
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US13/149,025 US8788577B2 (en) | 2010-06-22 | 2011-05-31 | Method and system for automated analysis and transformation of web pages |
US14/335,886 US9361345B2 (en) | 2010-06-22 | 2014-07-19 | Method and system for automated analysis and transformation of web pages |
US15/174,326 US10108595B2 (en) | 2010-06-22 | 2016-06-06 | Method and system for automated analysis and transformation of web pages |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US35713810P | 2010-06-22 | 2010-06-22 | |
US13/149,025 US8788577B2 (en) | 2010-06-22 | 2011-05-31 | Method and system for automated analysis and transformation of web pages |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/335,886 Continuation US9361345B2 (en) | 2010-06-22 | 2014-07-19 | Method and system for automated analysis and transformation of web pages |
Publications (2)
Publication Number | Publication Date |
---|---|
US20110314091A1 true US20110314091A1 (en) | 2011-12-22 |
US8788577B2 US8788577B2 (en) | 2014-07-22 |
Family
ID=44583711
Family Applications (3)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/149,025 Active 2032-05-02 US8788577B2 (en) | 2010-06-22 | 2011-05-31 | Method and system for automated analysis and transformation of web pages |
US14/335,886 Active US9361345B2 (en) | 2010-06-22 | 2014-07-19 | Method and system for automated analysis and transformation of web pages |
US15/174,326 Active 2031-06-25 US10108595B2 (en) | 2010-06-22 | 2016-06-06 | Method and system for automated analysis and transformation of web pages |
Family Applications After (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/335,886 Active US9361345B2 (en) | 2010-06-22 | 2014-07-19 | Method and system for automated analysis and transformation of web pages |
US15/174,326 Active 2031-06-25 US10108595B2 (en) | 2010-06-22 | 2016-06-06 | Method and system for automated analysis and transformation of web pages |
Country Status (3)
Country | Link |
---|---|
US (3) | US8788577B2 (en) |
EP (1) | EP2400407A1 (en) |
CA (1) | CA2742059C (en) |
Cited By (50)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120233250A1 (en) * | 2011-03-11 | 2012-09-13 | International Business Machines Corporation | Auto-updatable document parts within content management systems |
US20130060930A1 (en) * | 2011-09-02 | 2013-03-07 | Kenneth Alexander Ellis | Systems, methods, and interfaces for analyzing webpage portions |
US20130086247A1 (en) * | 2011-09-29 | 2013-04-04 | International Business Machines Corporation | Web page script management |
US20130111449A1 (en) * | 2011-10-26 | 2013-05-02 | International Business Machines Corporation | Static analysis with input reduction |
US20130123948A1 (en) * | 2011-11-11 | 2013-05-16 | Rockwell Automation Technologies, Inc. | Control environment change communication |
US20130123952A1 (en) * | 2011-11-11 | 2013-05-16 | Rockwell Automation Technologies, Inc. | Control environment change communication |
US20130311593A1 (en) * | 2012-05-17 | 2013-11-21 | Matthew Browning Prince | Incorporating web applications into web pages at the network level |
CN103792873A (en) * | 2012-10-26 | 2014-05-14 | 洛克威尔自动控制技术股份有限公司 | Control environment change communication |
US20140136952A1 (en) * | 2012-11-14 | 2014-05-15 | Cisco Technology, Inc. | Improving web sites performance using edge servers in fog computing architecture |
US20140181314A1 (en) * | 2012-10-20 | 2014-06-26 | Tomodo Ltd. | Methods circuits devices systems and associated computer executable code for web augmentation |
US20140201617A1 (en) * | 2011-05-16 | 2014-07-17 | Guangzhou Ucweb Computer Technology Co., Ltd | Method for Browsing Web Page on Mobile Terminal |
US20140281923A1 (en) * | 2013-03-13 | 2014-09-18 | Usablenet Inc. | Methods for processing cascading style sheets and devices thereof |
US8898560B1 (en) | 2012-04-25 | 2014-11-25 | Google, Inc. | Fixing problems with a user interface |
US8943473B1 (en) * | 2012-02-06 | 2015-01-27 | Google Inc. | Consistently delivering a web page having source code with a dynamic instruction |
WO2015048207A1 (en) * | 2013-09-25 | 2015-04-02 | Akamai Technologies, Inc. | Key resource prefetching using front-end optimization (feo) configuration |
US20150339275A1 (en) * | 2014-05-20 | 2015-11-26 | Yahoo! Inc. | Rendering of on-line content |
WO2016053759A1 (en) * | 2014-09-30 | 2016-04-07 | Shape Security, Inc. | Automated hardening of web page content |
US20160112511A1 (en) * | 2014-10-20 | 2016-04-21 | Microsoft Corporation | Pre-fetch cache for visualization modification |
US20160110324A1 (en) * | 2014-10-15 | 2016-04-21 | Alibaba Group Holding Limited | Compression of cascading style sheet files |
US9390177B2 (en) | 2014-03-27 | 2016-07-12 | International Business Machines Corporation | Optimizing web crawling through web page pruning |
US9529994B2 (en) | 2014-11-24 | 2016-12-27 | Shape Security, Inc. | Call stack integrity check on client/server systems |
US9576070B2 (en) | 2014-04-23 | 2017-02-21 | Akamai Technologies, Inc. | Creation and delivery of pre-rendered web pages for accelerated browsing |
US9608975B2 (en) | 2015-03-30 | 2017-03-28 | Shape Security, Inc. | Challenge-dynamic credential pairs for client/server request validation |
US9621583B2 (en) | 2014-05-29 | 2017-04-11 | Shape Security, Inc. | Selectively protecting valid links to pages of a web site |
US9716702B2 (en) | 2014-05-29 | 2017-07-25 | Shape Security, Inc. | Management of dynamic credentials |
US9742858B2 (en) | 2011-12-23 | 2017-08-22 | Akamai Technologies Inc. | Assessment of content delivery services using performance measurements from within an end user client application |
US9785621B2 (en) | 2012-11-26 | 2017-10-10 | Akamai Technologies, Inc. | Progressive consolidation of web page resources |
US9817916B2 (en) | 2012-02-22 | 2017-11-14 | Akamai Technologies Inc. | Methods and apparatus for accelerating content authored for multiple devices |
US9819721B2 (en) | 2013-10-31 | 2017-11-14 | Akamai Technologies, Inc. | Dynamically populated manifests and manifest-based prefetching |
US9866655B2 (en) | 2014-03-31 | 2018-01-09 | Akamai Technologies, Inc. | Server initiated multipath content delivery |
US9986058B2 (en) | 2015-05-21 | 2018-05-29 | Shape Security, Inc. | Security systems for mitigating attacks from a headless browser executing on a client computer |
US10009222B2 (en) | 2016-03-30 | 2018-06-26 | International Business Machines Corporation | Input method engine management for edge services |
US10129289B1 (en) | 2016-03-11 | 2018-11-13 | Shape Security, Inc. | Mitigating attacks on server computers by enforcing platform policies on client computers |
CN109284428A (en) * | 2018-08-13 | 2019-01-29 | 腾讯科技(深圳)有限公司 | Data processing method, device and storage medium |
US10216488B1 (en) | 2016-03-14 | 2019-02-26 | Shape Security, Inc. | Intercepting and injecting calls into operations and objects |
US10218566B2 (en) | 2016-03-30 | 2019-02-26 | International Business Machines Corporation | Proactive input method engine management for edge services based on crowdsourcing data |
US10237763B1 (en) * | 2012-08-18 | 2019-03-19 | Global Eagle Entertainment Inc. | Real time data meter |
US10346483B2 (en) | 2009-10-02 | 2019-07-09 | Akamai Technologies, Inc. | System and method for search engine optimization |
US10417325B2 (en) | 2014-10-16 | 2019-09-17 | Alibaba Group Holding Limited | Reorganizing and presenting data fields with erroneous inputs |
CN110309044A (en) * | 2018-03-20 | 2019-10-08 | 福建天泉教育科技有限公司 | Pattern changed test method and terminal in a kind of Web system |
US10467331B2 (en) * | 2013-05-16 | 2019-11-05 | Toshiba Global Commerce Solutions Holdings Corporation | Systems and methods for processing modifiable files grouped into themed directories for presentation of web content |
US10482578B2 (en) | 2014-11-06 | 2019-11-19 | Alibaba Group Holding Limited | Method and system for controlling display direction of content |
US10567419B2 (en) | 2015-07-06 | 2020-02-18 | Shape Security, Inc. | Asymmetrical challenges for web security |
US10649740B2 (en) * | 2015-01-15 | 2020-05-12 | International Business Machines Corporation | Predicting and using utility of script execution in functional web crawling and other crawling |
US10657315B2 (en) * | 2016-06-28 | 2020-05-19 | Sap Se | Generic and automated CSS scoping |
US10846356B2 (en) * | 2018-06-13 | 2020-11-24 | At&T Intellectual Property I, L.P. | Scalable whittled proxy execution for low-latency web over cellular networks |
CN112769730A (en) * | 2019-10-21 | 2021-05-07 | 北京车和家信息技术有限公司 | Page compression method and device, client and server |
US11088940B2 (en) | 2017-03-07 | 2021-08-10 | Akamai Technologies, Inc. | Cooperative multipath |
US11126787B2 (en) * | 2020-02-11 | 2021-09-21 | Madcap Software, Inc. | Generating responsive content from an electronic document |
US11567822B2 (en) * | 2018-08-30 | 2023-01-31 | Boe Technology Group Co., Ltd. | Method of monitoring closed system, apparatus thereof and monitoring device |
Families Citing this family (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9846893B2 (en) | 2012-07-18 | 2017-12-19 | Google Llc | Systems and methods of serving parameter-dependent content to a resource |
US20140283002A1 (en) * | 2013-03-15 | 2014-09-18 | Stephen Frechette | Method and system for anonymous circumvention of internet filter firewalls without detection or identification |
CN108595468A (en) * | 2013-03-22 | 2018-09-28 | 阿里巴巴集团控股有限公司 | A kind of acquisition methods of web data, device, server, terminal and system |
US9710264B2 (en) | 2013-10-28 | 2017-07-18 | International Business Machines Corporation | Screen oriented data flow analysis |
CN103955361A (en) * | 2014-03-28 | 2014-07-30 | 世纪禾光科技发展(北京)有限公司 | Modular development and publishing system for automatic compiling and establishing of web front-end codes |
CN104182547A (en) * | 2014-09-10 | 2014-12-03 | 北京浩瀚深度信息技术股份有限公司 | Method for optimizing page rendering of server and web cache server |
US20170237823A1 (en) * | 2015-12-07 | 2017-08-17 | Blockthrough Inc. | System and method for transforming online content |
CN106294597B (en) * | 2016-07-28 | 2019-09-20 | 百度在线网络技术(北京)有限公司 | The method and apparatus being grouped for the static resource to webpage |
US10764391B2 (en) | 2017-09-14 | 2020-09-01 | Akamai Technologies, Inc. | Origin and cache server cooperation for compute-intensive content delivery |
CN107729300B (en) * | 2017-09-18 | 2021-12-24 | 百度在线网络技术(北京)有限公司 | Text similarity processing method, device and equipment and computer storage medium |
US10360087B2 (en) * | 2017-10-27 | 2019-07-23 | International Business Machines Corporation | Web API recommendations based on usage in cloud-provided runtimes |
US10630797B2 (en) | 2018-01-30 | 2020-04-21 | Akamai Technologies, Inc. | Systems and methods for content delivery acceleration of virtual reality and augmented reality web pages |
US10810279B2 (en) | 2018-02-07 | 2020-10-20 | Akamai Technologies, Inc. | Content delivery network (CDN) providing accelerated delivery of embedded resources from CDN and third party domains |
US10855552B2 (en) | 2018-03-06 | 2020-12-01 | Bank Of America Corporation | Dynamic user interface computing platform |
CN109474515B (en) * | 2018-11-13 | 2022-06-24 | 平安科技(深圳)有限公司 | Risk event mail pushing method and device, computer equipment and storage medium |
CN111786858B (en) * | 2020-07-06 | 2022-04-15 | 三星(中国)半导体有限公司 | Method for diagnosing abnormal data, user terminal equipment and cloud server |
US11379281B2 (en) | 2020-11-18 | 2022-07-05 | Akamai Technologies, Inc. | Detection and optimization of content in the payloads of API messages |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050204276A1 (en) * | 2001-02-05 | 2005-09-15 | Predictive Media Corporation | Method and system for web page personalization |
US20080306816A1 (en) * | 2007-06-06 | 2008-12-11 | Nebuad, Inc. | Network devices for replacing an advertisement with another advertisement |
US20110137973A1 (en) * | 2009-12-07 | 2011-06-09 | Yottaa Inc | System and method for website performance optimization and internet traffic processing |
Family Cites Families (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8601050B2 (en) * | 1996-06-12 | 2013-12-03 | Michael Carringer | System and method for generating a modified web page by inline code insertion in response to an information request from a client computer |
US6397217B1 (en) | 1999-03-04 | 2002-05-28 | Futuretense, Inc. | Hierarchical caching techniques for efficient dynamic page generation |
US7047033B2 (en) | 2000-02-01 | 2006-05-16 | Infogin Ltd | Methods and apparatus for analyzing, processing and formatting network information such as web-pages |
US7574486B1 (en) | 2000-11-06 | 2009-08-11 | Telecommunication Systems, Inc. | Web page content translator |
US8635218B2 (en) | 2003-09-02 | 2014-01-21 | International Business Machines Corporation | Generation of XSLT style sheets for different portable devices |
US8037127B2 (en) | 2006-02-21 | 2011-10-11 | Strangeloop Networks, Inc. | In-line network device for storing application-layer data, processing instructions, and/or rule sets |
NO325628B1 (en) | 2006-09-20 | 2008-06-30 | Opera Software Asa | Procedure, computer program, transcoding server and computer system to modify a digital document |
US20080228920A1 (en) | 2007-03-16 | 2008-09-18 | Souders Steven K | System and method for resource aggregation and distribution |
US8060486B2 (en) | 2007-05-07 | 2011-11-15 | Hewlett-Packard Development Company, L.P. | Automatic conversion schema for cached web requests |
NZ566291A (en) | 2008-02-27 | 2008-12-24 | Actionthis Ltd | Methods and devices for post processing rendered web pages and handling requests of post processed web pages |
US20090254707A1 (en) | 2008-04-08 | 2009-10-08 | Strangeloop Networks Inc. | Partial Content Caching |
US9906620B2 (en) | 2008-05-05 | 2018-02-27 | Radware, Ltd. | Extensible, asynchronous, centralized analysis and optimization of server responses to client requests |
US20100050089A1 (en) | 2008-08-20 | 2010-02-25 | Company 100, Inc. | Web browser system of mobile communication terminal, using proxy server |
US8438312B2 (en) | 2009-10-23 | 2013-05-07 | Moov Corporation | Dynamically rehosting web content |
US9003309B1 (en) * | 2010-01-22 | 2015-04-07 | Adobe Systems Incorporated | Method and apparatus for customizing content displayed on a display device |
KR101625858B1 (en) * | 2010-04-19 | 2016-06-13 | 엘지전자 주식회사 | Mobile terminal and method for controlling the same |
US8977653B1 (en) * | 2010-06-17 | 2015-03-10 | Google Inc. | Modifying web pages to reduce retrieval latency |
US9262389B2 (en) * | 2012-08-02 | 2016-02-16 | International Business Machines Corporation | Resource-adaptive content delivery on client devices |
US9178934B1 (en) * | 2014-11-21 | 2015-11-03 | Instart Logic, Inc. | Modifying web content at a client |
-
2011
- 2011-05-31 CA CA2742059A patent/CA2742059C/en active Active
- 2011-05-31 US US13/149,025 patent/US8788577B2/en active Active
- 2011-06-03 EP EP11168663A patent/EP2400407A1/en not_active Ceased
-
2014
- 2014-07-19 US US14/335,886 patent/US9361345B2/en active Active
-
2016
- 2016-06-06 US US15/174,326 patent/US10108595B2/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050204276A1 (en) * | 2001-02-05 | 2005-09-15 | Predictive Media Corporation | Method and system for web page personalization |
US20080306816A1 (en) * | 2007-06-06 | 2008-12-11 | Nebuad, Inc. | Network devices for replacing an advertisement with another advertisement |
US20110137973A1 (en) * | 2009-12-07 | 2011-06-09 | Yottaa Inc | System and method for website performance optimization and internet traffic processing |
Cited By (80)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10346483B2 (en) | 2009-10-02 | 2019-07-09 | Akamai Technologies, Inc. | System and method for search engine optimization |
US20120284225A1 (en) * | 2011-03-11 | 2012-11-08 | International Business Machines Corporation | Auto-updatable document parts within content management systems |
US20120233250A1 (en) * | 2011-03-11 | 2012-09-13 | International Business Machines Corporation | Auto-updatable document parts within content management systems |
US20140201617A1 (en) * | 2011-05-16 | 2014-07-17 | Guangzhou Ucweb Computer Technology Co., Ltd | Method for Browsing Web Page on Mobile Terminal |
US20130060930A1 (en) * | 2011-09-02 | 2013-03-07 | Kenneth Alexander Ellis | Systems, methods, and interfaces for analyzing webpage portions |
US9846743B2 (en) * | 2011-09-02 | 2017-12-19 | Thomson Reuters Global Resources Unlimited Company | Systems, methods, and interfaces for analyzing webpage portions |
US20130086247A1 (en) * | 2011-09-29 | 2013-04-04 | International Business Machines Corporation | Web page script management |
US20130086255A1 (en) * | 2011-09-29 | 2013-04-04 | International Business Machines Corporation | Web page script management |
US20150074188A1 (en) * | 2011-09-29 | 2015-03-12 | International Business Machines Corporation | Web page script management |
US8924457B2 (en) * | 2011-09-29 | 2014-12-30 | International Business Machines Corporation | Client browser acceleration by having server removed and executed script embedded in web page |
US9516091B2 (en) * | 2011-09-29 | 2016-12-06 | International Business Machines Corporation | Web page script management |
US9503498B2 (en) * | 2011-09-29 | 2016-11-22 | International Business Machines Corporation | Web page script management |
US20130111449A1 (en) * | 2011-10-26 | 2013-05-02 | International Business Machines Corporation | Static analysis with input reduction |
US10157049B2 (en) * | 2011-10-26 | 2018-12-18 | International Business Machines Corporation | Static analysis with input reduction |
US9864365B2 (en) * | 2011-11-11 | 2018-01-09 | Rockwell Automation, Inc. | Control environment change communication |
US20180164790A1 (en) * | 2011-11-11 | 2018-06-14 | Rockwell Automation Technologies, Inc. | Control environment change communication |
US9529355B2 (en) * | 2011-11-11 | 2016-12-27 | Rockwell Automation Technologies, Inc. | Control environment change communication |
US20130123952A1 (en) * | 2011-11-11 | 2013-05-16 | Rockwell Automation Technologies, Inc. | Control environment change communication |
US20130123948A1 (en) * | 2011-11-11 | 2013-05-16 | Rockwell Automation Technologies, Inc. | Control environment change communication |
US10571898B2 (en) * | 2011-11-11 | 2020-02-25 | Rockwell Automation, Inc. | Control environment change communication |
US9742858B2 (en) | 2011-12-23 | 2017-08-22 | Akamai Technologies Inc. | Assessment of content delivery services using performance measurements from within an end user client application |
US8943473B1 (en) * | 2012-02-06 | 2015-01-27 | Google Inc. | Consistently delivering a web page having source code with a dynamic instruction |
US9514241B1 (en) | 2012-02-06 | 2016-12-06 | Google Inc. | Consistently delivering a web page having source code with a dynamic instruction |
US9147005B1 (en) * | 2012-02-06 | 2015-09-29 | Google Inc. | Consistently delivering a web page having source code with a dynamic instruction |
US9817916B2 (en) | 2012-02-22 | 2017-11-14 | Akamai Technologies Inc. | Methods and apparatus for accelerating content authored for multiple devices |
US8898560B1 (en) | 2012-04-25 | 2014-11-25 | Google, Inc. | Fixing problems with a user interface |
US10205674B2 (en) * | 2012-05-17 | 2019-02-12 | Cloudflare, Inc. | Incorporating web applications into web pages at the network level |
US8849904B2 (en) * | 2012-05-17 | 2014-09-30 | Cloudflare, Inc. | Incorporating web applications into web pages at the network level |
US11153226B2 (en) | 2012-05-17 | 2021-10-19 | Cloudflare, Inc. | Incorporating web applications into web pages at the network level |
US11621924B2 (en) | 2012-05-17 | 2023-04-04 | Cloudflare, Inc. | Incorporating web applications into web pages at the network level |
US20130311593A1 (en) * | 2012-05-17 | 2013-11-21 | Matthew Browning Prince | Incorporating web applications into web pages at the network level |
US20150019679A1 (en) * | 2012-05-17 | 2015-01-15 | Matthew Browning Prince | Incorporating web applications into web pages at the network level |
US10237763B1 (en) * | 2012-08-18 | 2019-03-19 | Global Eagle Entertainment Inc. | Real time data meter |
US20140181314A1 (en) * | 2012-10-20 | 2014-06-26 | Tomodo Ltd. | Methods circuits devices systems and associated computer executable code for web augmentation |
US9571555B2 (en) * | 2012-10-20 | 2017-02-14 | Tomodo Ltd. | Methods circuits devices systems and associated computer executable code for web augmentation |
CN103792873A (en) * | 2012-10-26 | 2014-05-14 | 洛克威尔自动控制技术股份有限公司 | Control environment change communication |
CN104798071A (en) * | 2012-11-14 | 2015-07-22 | 思科技术公司 | Improving web sites performance using edge servers in fog computing architecture |
US20140136952A1 (en) * | 2012-11-14 | 2014-05-15 | Cisco Technology, Inc. | Improving web sites performance using edge servers in fog computing architecture |
US9785621B2 (en) | 2012-11-26 | 2017-10-10 | Akamai Technologies, Inc. | Progressive consolidation of web page resources |
US20140281923A1 (en) * | 2013-03-13 | 2014-09-18 | Usablenet Inc. | Methods for processing cascading style sheets and devices thereof |
US10282401B2 (en) * | 2013-03-13 | 2019-05-07 | Usablenet Inc. | Methods for processing cascading style sheets and devices thereof |
US10467331B2 (en) * | 2013-05-16 | 2019-11-05 | Toshiba Global Commerce Solutions Holdings Corporation | Systems and methods for processing modifiable files grouped into themed directories for presentation of web content |
WO2015048207A1 (en) * | 2013-09-25 | 2015-04-02 | Akamai Technologies, Inc. | Key resource prefetching using front-end optimization (feo) configuration |
US9819721B2 (en) | 2013-10-31 | 2017-11-14 | Akamai Technologies, Inc. | Dynamically populated manifests and manifest-based prefetching |
US9495459B2 (en) | 2014-03-27 | 2016-11-15 | International Business Machines Corporation | Optimizing web crawling through web page pruning |
US9390177B2 (en) | 2014-03-27 | 2016-07-12 | International Business Machines Corporation | Optimizing web crawling through web page pruning |
US9866655B2 (en) | 2014-03-31 | 2018-01-09 | Akamai Technologies, Inc. | Server initiated multipath content delivery |
US9576070B2 (en) | 2014-04-23 | 2017-02-21 | Akamai Technologies, Inc. | Creation and delivery of pre-rendered web pages for accelerated browsing |
US20150339275A1 (en) * | 2014-05-20 | 2015-11-26 | Yahoo! Inc. | Rendering of on-line content |
US9716702B2 (en) | 2014-05-29 | 2017-07-25 | Shape Security, Inc. | Management of dynamic credentials |
US11552936B2 (en) | 2014-05-29 | 2023-01-10 | Shape Security, Inc. | Management of dynamic credentials |
US9621583B2 (en) | 2014-05-29 | 2017-04-11 | Shape Security, Inc. | Selectively protecting valid links to pages of a web site |
US10033755B2 (en) | 2014-09-30 | 2018-07-24 | Shape Security, Inc. | Securing web page content |
US9800602B2 (en) | 2014-09-30 | 2017-10-24 | Shape Security, Inc. | Automated hardening of web page content |
WO2016053759A1 (en) * | 2014-09-30 | 2016-04-07 | Shape Security, Inc. | Automated hardening of web page content |
US9747385B2 (en) * | 2014-10-15 | 2017-08-29 | Alibaba Group Holding Limited | Compression of cascading style sheet files |
US20160110324A1 (en) * | 2014-10-15 | 2016-04-21 | Alibaba Group Holding Limited | Compression of cascading style sheet files |
US10417325B2 (en) | 2014-10-16 | 2019-09-17 | Alibaba Group Holding Limited | Reorganizing and presenting data fields with erroneous inputs |
US20160112511A1 (en) * | 2014-10-20 | 2016-04-21 | Microsoft Corporation | Pre-fetch cache for visualization modification |
US10038749B2 (en) * | 2014-10-20 | 2018-07-31 | Microsoft Technology Licensing, Llc | Pre-fetch cache for visualization modification |
US10482578B2 (en) | 2014-11-06 | 2019-11-19 | Alibaba Group Holding Limited | Method and system for controlling display direction of content |
USRE50024E1 (en) | 2014-11-24 | 2024-06-25 | Shape Security, Inc. | Call stack integrity check on client/server systems |
US9529994B2 (en) | 2014-11-24 | 2016-12-27 | Shape Security, Inc. | Call stack integrity check on client/server systems |
US10649740B2 (en) * | 2015-01-15 | 2020-05-12 | International Business Machines Corporation | Predicting and using utility of script execution in functional web crawling and other crawling |
US10740071B2 (en) * | 2015-01-15 | 2020-08-11 | International Business Machines Corporation | Predicting and using utility of script execution in functional web crawling and other crawling |
US9608975B2 (en) | 2015-03-30 | 2017-03-28 | Shape Security, Inc. | Challenge-dynamic credential pairs for client/server request validation |
US9986058B2 (en) | 2015-05-21 | 2018-05-29 | Shape Security, Inc. | Security systems for mitigating attacks from a headless browser executing on a client computer |
US10567419B2 (en) | 2015-07-06 | 2020-02-18 | Shape Security, Inc. | Asymmetrical challenges for web security |
US10129289B1 (en) | 2016-03-11 | 2018-11-13 | Shape Security, Inc. | Mitigating attacks on server computers by enforcing platform policies on client computers |
US10216488B1 (en) | 2016-03-14 | 2019-02-26 | Shape Security, Inc. | Intercepting and injecting calls into operations and objects |
US10218566B2 (en) | 2016-03-30 | 2019-02-26 | International Business Machines Corporation | Proactive input method engine management for edge services based on crowdsourcing data |
US10009222B2 (en) | 2016-03-30 | 2018-06-26 | International Business Machines Corporation | Input method engine management for edge services |
US10657315B2 (en) * | 2016-06-28 | 2020-05-19 | Sap Se | Generic and automated CSS scoping |
US11088940B2 (en) | 2017-03-07 | 2021-08-10 | Akamai Technologies, Inc. | Cooperative multipath |
CN110309044A (en) * | 2018-03-20 | 2019-10-08 | 福建天泉教育科技有限公司 | Pattern changed test method and terminal in a kind of Web system |
US10846356B2 (en) * | 2018-06-13 | 2020-11-24 | At&T Intellectual Property I, L.P. | Scalable whittled proxy execution for low-latency web over cellular networks |
CN109284428A (en) * | 2018-08-13 | 2019-01-29 | 腾讯科技(深圳)有限公司 | Data processing method, device and storage medium |
US11567822B2 (en) * | 2018-08-30 | 2023-01-31 | Boe Technology Group Co., Ltd. | Method of monitoring closed system, apparatus thereof and monitoring device |
CN112769730A (en) * | 2019-10-21 | 2021-05-07 | 北京车和家信息技术有限公司 | Page compression method and device, client and server |
US11126787B2 (en) * | 2020-02-11 | 2021-09-21 | Madcap Software, Inc. | Generating responsive content from an electronic document |
Also Published As
Publication number | Publication date |
---|---|
EP2400407A1 (en) | 2011-12-28 |
US20160283452A1 (en) | 2016-09-29 |
US10108595B2 (en) | 2018-10-23 |
US9361345B2 (en) | 2016-06-07 |
CA2742059A1 (en) | 2011-12-22 |
US8788577B2 (en) | 2014-07-22 |
US20150142838A1 (en) | 2015-05-21 |
CA2742059C (en) | 2019-10-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10108595B2 (en) | Method and system for automated analysis and transformation of web pages | |
US9641591B1 (en) | Modifying web content at a client | |
US9509764B1 (en) | Updating cached web content | |
US20180124203A1 (en) | Extensible, asynchronous, centralized analysis and optimization of server responses to client requests | |
CA2640025C (en) | Methods and devices for post processing rendered web pages and handling requests of post processed web pages | |
US8185621B2 (en) | Systems and methods for monitoring webpages | |
US9077681B2 (en) | Page loading optimization using page-maintained cache | |
US7752258B2 (en) | Dynamic content assembly on edge-of-network servers in a content delivery network | |
US10296567B2 (en) | Progressive consolidation of web page resources | |
US8990289B2 (en) | Server based framework for improving Ajax performance | |
US10015226B2 (en) | Methods for making AJAX web applications bookmarkable and crawlable and devices thereof | |
US10291738B1 (en) | Speculative prefetch of resources across page loads | |
US10009439B1 (en) | Cache preloading | |
US9401949B1 (en) | Client web content cache purge | |
CN110647699A (en) | Web page rendering method and device, computer equipment and storage medium | |
US10178147B1 (en) | Client-side location address translation | |
Mardani et al. | Fawkes: Faster Mobile Page Loads via {App-Inspired} Static Templating | |
US10187319B1 (en) | Automatic configuration generation for a proxy optimization server for optimizing the delivery of content of a web publisher | |
Goldshtein et al. | Web Application Performance | |
Török et al. | Optimering av Webbprestanda | |
Picchi | Optimizing iOS WebApps |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: BLAZE SOFTWARE INC, CANADA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:PODJARNY, GUY;REEL/FRAME:026361/0987 Effective date: 20110509 |
|
AS | Assignment |
Owner name: AKAMAI TECHNOLOGIES CANADA INC., CANADA Free format text: CHANGE OF NAME;ASSIGNOR:BLAZE SOFTWARE INC.;REEL/FRAME:028082/0933 Effective date: 20120305 Owner name: 0931812 BC LTD., CANADA Free format text: MERGER;ASSIGNOR:BLAZE SOFTWARE INC.;REEL/FRAME:028082/0894 Effective date: 20120214 Owner name: BLAZE SOFTWARE INC., CANADA Free format text: CHANGE OF NAME;ASSIGNOR:0931812 BC LTD.;REEL/FRAME:028082/0875 Effective date: 20120214 |
|
AS | Assignment |
Owner name: AKAMAI TECHNOLOGIES CANADA INC., CANADA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:PODJARNY, GUY;REEL/FRAME:028099/0412 Effective date: 20120424 |
|
AS | Assignment |
Owner name: AKAMAI TECHNOLOGIES, INC., MASSACHUSETTS Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:AKAMAI TECHNOLOGIES CANADA INC.;REEL/FRAME:033038/0118 Effective date: 20120424 |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551) Year of fee payment: 4 |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 8 |