- an application layer protocol to transmit files given URL
- designed to transmit hypertext documents on the Web, but can transmit any file, e.g. can replace SSH, FTP, etc.
- communication consists of independent request-response pairs between client, e.g. Web browser, and server
- communication is always initiated by client, server can not respond later asynchronously, i.e. pull-based communication
- stateless, i.e. canβt refer to previous requests, e.g. if user is already logged in, where customer is in order process, etc. π
β need to implement state keeping yourself, e.g. send unique identifier (cookie) in each request to server
- not encrypted, can be intercepted and manipulated, always use HTTPS, see TLS βοΈ
HTTP session
- client sends request to server for a given resource at an URL
- server processes request and sends response back to client
- request is the data of transport layer packet(s)
- request contains file path of resource, doesnβt need full URL since the server which reads the request identifies a resource only by its path βοΈ
- transport layer packet is sent to port given in URL, by default port 80 for HTTP and port 443 for HTTPS
- internet layer packet is sent to domain name / IP address given in URL
- by default uses one transport layer protocol session per request, i.e. on TCP needs to make new TCP 3-way handshake for every request
- can reuse existing transport layer session for multiple HTTP requests by sending
Connection: keep-alive
header in request, i.e. on TCP no need to make new TCP 3-way handshake for every request
- on single transport layer protocol connection has head-of-line blocking, clients often use multiple connections concurrently using different high-numbered source ports, bad due to increased network load and disabled congestion control, see HTTP/2
- beware: does not change that HTTP itself is stateless, need to still keep track of state, e.g. using cookies βοΈ
- beware: there exist only one type of request and response, only the contents change, no other βpacketsβ, e.g. from response alone can not necessarily tell which request method was used, see below
HTTP request
- request for resource at URL
- by convention a server relays a directory path to an HTML file inside it, e.g.
http://example.com/dir/
is interpreted as if it were http://example.com/dir/index.html
or any other file name
- Web browser sends many requests when loading a single website, often βautomaticallyβ in background
- clicking link or entering URL in URL bar sends
GET
request for resource at URL
- submitting form sends
POST
(or GET
) request
- for each resource referenced in HTML it sends a
GET
request, e.g. images, iframes, external scripts, external styles, etc.
- for each resource referenced in CSS it sends a
GET
request, e.g. images, fonts, etc.
- for each resource imported in JavaScript it sends a
GET
request
- JavaScript can trigger arbitrary requests using Fetch API
- method: action on resource, e.g.
GET
- target: path of resource, can also transmit custom data as query after path, e.g. for dynamic website
beware: often full URL is used even though not necessary, see HTTP session βοΈ
- headers: metadata about the request, e.g. browser settings, desired content type, desired compression, etc.
- body: data
- for example, a
GET
request to the URL http://example.com/about/
could look like
GET /about/ HTTP/1.1
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
Methods
- action on resource on the server
- names are case sensitive
- if safe has no significant side effects on server, except logging, caching, etc.
- if idempotent repeated identical requests yield identical responses
HTTP method |
description |
Request body |
Response body |
Safe |
Idempotent |
Cacheable |
GET |
get the resource |
Optional |
β
|
β
|
β
|
β
|
HEAD |
get metadata about the resource |
Optional |
β |
β
|
β
|
β
|
POST |
create a subordinate resource |
β
|
β
|
β |
β |
β
|
PUT |
replace / create the resource |
β
|
β
|
β |
β
|
β |
PATCH |
modify the resource |
β
|
β
|
β |
β |
β |
DELETE |
delete the resource |
Optional |
β
|
β |
β
|
β |
OPTIONS |
list all methods supported on resource |
Optional |
β
|
β
|
β
|
β |
TRACE |
echo back the request |
β |
β
|
β
|
β
|
β |
GET
is used to fetch a resource from the server without modifying it
HEAD
is like GET
, but only response headers without response body, used to fetch metadata about a resource from the server without transmitting its content, e.g. content-size, content-type, etc.
TRACE
is used to see if intermediary servers modified the request
- only
GET and HEAD
must be implemented, most often needs to support only GET
and POST
, maybe HEAD
??
- field name and value are separated by colon
- field names are case-insensitive
request header |
description |
notes |
Host |
domain name and port of server |
mandatory with HTTP/1.1 |
??? |
??? |
|
Host
is used to support multiple servers behind one IP, e.g. for each subdomain, because domain name in target was optional, canβt enforce easily in backwards compatible way
Content negotiation
- accepted representations of resource
request header |
description |
notes |
Accept |
MIME types accepted as response |
|
Accept-Charset |
character sets accepted as response |
|
Accept-Encoding |
encodings accepted as response |
|
Accept-Language |
languages accepted as response |
|
??? |
??? |
|
HTTP response
- response for resource at URL
- beware: server may not follow HTTP standards, can do whatever it wants, e.g. may not treat POST/GET as un-/safe, may not honor request headers like
Accept
, may not handle directory paths, may ignore Host
header, may send wrong status codes, may serve some random file back, etc. pp. β οΈ
- status code: number representing un-/successful request
- headers: metadata about the response, e.g. timestamp, content type, content character encoding, etc.
- body: file
- for example, a response to a
GET
request to the URL http://example.com/about/
could look like
HTTP/1.1 200 OK
Date: Fri, 11 Nov 2011 11:11:11 GMT
Server: Apache/2.2.14 (Win32)
Content-Type: text/html; charset=utf-8
<!DOCTYPE html>
<html>
<p>Hello World!</p>
</html>
Status codes
status codes |
description |
example |
1xx |
Information |
- |
2xx |
Success |
200 OK |
3xx |
Redirection |
301 Moved Permanently |
4xx |
Client Error |
404 Not Found |
5xx |
Server Error |
|
- field name and value are separated by colon
- field names are case-insensitive
response header |
description |
example |
Content-Type |
MIME type of response |
|
??? |
??? |
|
HTTP/2
- successor of HTTP 1.1, new versioning scheme
- binary protocol instead of plaintext
- multiplexed streams over single TCP connection, streams can be prioritised, streams can be canceled without canceling TCP connection
- server push, server can send additional data proactively to client, e.g. all resources referenced in a HTML document
- header compression
- most implementations allow use only over TLS
- since uses single TCP connection, again head-of-line blocking, affects all streams, HTTP/1 over multiple TCP connections might even be faster
HTTP/3
- uses QUIC as transport layer
- streams are outsourced to underlying QUIC, priorization is still done in application layer
- not over TLS, directly over QUIC, since TLS is built into QUIC
- otherwise like HTTP/2
Resources