World Wide Web: HTTP Request - Response Cycle

@mit $ingh. March 25, 2017 Comments

Important concepts:

  • HTML [Hypertext Markup Language] : the most common language that web pages are written in.

  • HTTP [Hypertext Transfer Protocol] : a common protocol (akin to an internet language) to pass around HTML amongst other forms of data (JSON and Form data is also common).

    • It oversees the data exchange between a client & the server.
      • It is stateless, i.e. - treats each request as an independent transactions, unrelated to any previous requests.
      • Thus, communication involves of independent pairs of requests & responses.
  • HTTP relies on two lower order layers to achieve this:

    • TCP [Transmission Control Protocol]: Transports HTTP. It is responsible for breaking down any associated data into small 'packets' before these are sent over a network, & for assembling these again when they arrive.
    • IP [Internet Protocol]: Allows TCP to move between networks.

Step by step walk through HTTP Request/Response Handling:

Step 1: Parsing the URL

  • A URL is typed into the address bar of a web browser.

    • Url is a human readable alias for an IP [Internet Protocol]address. Its matching IP address is a unique number sequence identifying a target server called Host.
  • The browser checks for the IP in its browser cache. If not found, it send a request to DNS.

    • DNS [Domain Name System/Service/ Servers]: It is a comprehensive directory network translating domain names into its unique IP sequence.
  • The browser parses the URL to find: the •protocol type, •host, •port, and •request-URI path.

    • "[protocol]://[host]/[request-URI]"
    • Protocol: e.g. HTTP informs the server how to respond to a resource request.
    • Host ['host machine’ or ‘network host']: tells the browser which server to contact.
    • Request-URI [Universal Resource Identifier]: used by the server to identify the resource location.
  • The browser forms a HTTP request:

    • e.g. • "GET /URI HTTP/1.1" • Host: www.google.com
    • GET: A token indicating the method to perform on the resource identified from the Request-URI.
    • Host header: it is included with the request incase the server is hosting multiple sites, that way it will identify which one to serve back.

Step 2: Sending the request

  • A socket connection is opened from our user’s computer to the IP address.

  • HTTP request is sent to the host & the machine waits to get a response back.

  • Web server receives the request.

Step 3: The server response

  • Our host checks for more information to process the request: • headers, • GET, POST, PUT,... methods.

    • For static requests:
      • The server locates the html filename & sends that file back over the internet.
    • for dynamic requests:
      • .php, .asp, .aspx, .jsp files, are processed by a separate engine. These will be partially complete, containing changeable sections depending on variable values given to the page on the server end. The dynamic data & the requested file will be combined into a long string of text ( HTML, txt, XML) before its output is sent back over the internet.
  • If successful, the server returns a 200 status code (as the resource was found), response headers, and the requested data back to the browser.

Cache-Control:must-revalidate, private, max-age=0  
Connection:keep-alive  
Content-Encoding:gzip  
Content-Type:text/html; charset=utf-8  
Date:Sun, 10 May 2015 05:41:39 GMT  
Server:nginx + Phusion Passenger 4.0.58  
Status:200 OK  
Strict-Transport-Security:max-age=16070400; includeSubdomains  
Transfer-Encoding:chunked  

 

  • If the server fails to process or complete the request, it returns an 404 error message instead.

Step 4: Browser rendering:

  • The browser receives the response with a html document to parse into a DOM [Document Object Model]. For this, it translates html elements & attributes into nodes with the "Document Object" set as the root of the tree.

  • When external script, image, style sheets are parsed, new requests are made to the server.

  • When stylesheets are parsed, each applicable styles gets attached to the matching node in the DOM tree.

  • Javascript files are parsed & executed.

    • HTML5 adds an option to mark the script as asynchronous so these will be parsed & executed by a different thread.
    • This way the page rendering is not halted.
    • DOM nodes are updated.
    • Note: Firefox innately blocks scripts rendering while stylesheets remain un-loaded.
  • In parallel to the DOM tree being constructed, the browser constructs a "render tree" of "frames" or "render objects". This is the visual representation of the node elements.

  • The browser renders the page. The page is viewable & interact-able.

Step 5. HTTP persistent connection

  • 'Connection: Keep-Alive' header.

    • This initiates a single TCP [Transmission Control Protocol]connection for sending & receiving multiple HTTP requests / responses, instead of opening a new connection for every single request / response pair.

    • It is set from the initial browser request header, and informs the server to not drop this connection. When the client sends another request, it uses the same connection. This will continue until either the client or the server decides to drop the connection.

My Github repo: Basic static file rendering server in Node.js

Here I created two very basic static file and api returning servers. One is just bare Node.js, the other is in Express & Node. Feel free to take a look if it helps you understand step 3 better!

Optimizing this process:

Here is a resource going into depth regarding all the ways caching can be used to improve user experience through faster resource return and also as a way to save bandwidth on host servers. 

https://www.mnot.net/cache_docs/

  • http
  • http-request
  • http-response
  • process