TCP/IP Tutorial for Beginner

By Xah Lee. Date: 2013-01-27. Last updated: 2025-02-13.

This is a basic tutorial on TCP/IP, for beginner programer or scientists. In a hour, you should have a basic understanding.

TCP/IP is a set of protocols, and is the primary tech of the internet. When you browse the web, send email, chat online, online gaming, TCP/IP is working busily underneath.

What is a protocol?

A protocol is a set of rules and procedures, such as what format to use, when should data be send, what are the numbers in the data mean, what commands to use, what error code are there and their meaning, etc.

When two computers exachange data, they can understand each other if both uses the same protocol.

Overview of How Internet Works

Suppose you are viewing a web page, or chat with a friend online, or downloading a file. What happens underneath?

The app (email, chat, etc) breaks the data into thousands of tiny independent pieces. Each piece is called a packet (or datagram). Each packet has embedded with it the destination IP address. Your computer send this packet to your router , and the router send it to another router that's closer to the destination.

This process continues until the designated machine with the IP address receives it. This is done for each and every packet.

On the receiving machine, it re-assembles all these packets into the original whole piece in the right order, and send it to the right application on that machine. (The app is often the email server, or web server, or chat server. these servers in turn, repeat the same thing to send it to your friend's machine.)

Computer software follow a set of standardized rules of procedure when talking to each other. This standardized rules of procedure used for internet is called the Internet Protocol Suite (aka TCP/IP).

Networking Hardware

Before we talk about internet protocol, lets take a look at hardware needed, because hardware gives us a good overview of how things are connected.

The essential hardware for internet to work, for our purposes, are:

Network Adapter (aka Network Interface Controller, NIC)
Router

Network Interface Controller

Network Interface Controller (NIC) (aka network adapter, network card network interface card, LAN adapter ) is a hardware that lets your computer talk to the internet. All internet-capable device has at least one. Today's computer usually has two, one for Ethernet (wired) and one for wireless.

Network Interface Controller provides one of:

wired ethernet port
wireless internet transceiver

network adapter card 2017 03 13 — Network adapter card. It provides ethernet port. This is used if your computer does not have wired ethernet builtin. This is popular in early 2000s. Since 2010, it's usually just a chip on motherboard.

ethernet cable 20210307 — Ethernet cable, for wired internet.

usb network adapter 68283 — USB wireless network adapter. This is used if your computer does not have wireless network adapter builtin. This is popular in late 2000s, when wireless became popular but most laptop or desktop computers don't have it builtin yet.

As of year 2020, every phone, laptop, desktop computer has wireless network adapter builtin. And every phone has wireless network adapter builtin.

How to list all Network Interface?

Linux: Type ip link or ifconfig -a
Windows: Type ipconfig /all

Router

Then, the second most important hardware is Router . Router transfer packets between internet devices.

home wifi router ports 2017 03 13 — Home wireless router ports. Home wireless router also serve as wired router.

The Network Adapter (wired or wireless) in your computer send signals to the router, then the router either send it to other computer in your home, or send it to the internet via physical cable or phone line connected to it (typically a device called Cable Modem) .

Typically, each internet device start with its software sending info to the Network Adapter, then the Network Adapter send it to a router, then router send it to another router, and so on, until a router send it to a destination computer's Network Adapter. (the destination is usually a corporation's machine, we call it “server”. (e.g. when you goto google.com or any or any website.) The server, either stores your info (e.g. photos), or send it to another user's machine (e.g. online chat messages).)

Internet Addresses

In order to send info, devices must have addresses for destination. Two of them are most important:

MAC Adddress
IP Address

MAC Address (aka Hardware Address, Physical Address)

Network: MAC Address

Binary Number, Hexadecimal Number

When working with networking protocols, you need to understand Binary Number and Hexadecimal Number in detail, and you need to be able to convert them.

bit means binary digit. A binary digit is either 1 or 0. e.g. 4 bits looks like this: 1001, or 0011, or 1100, etc.

octet means 8 bits.

byte means 8 bits. (wasn't so before 1990s)

1 hexadecimal digit is equivalent to 4 bits.

IP Address

Network IP Address

Host, Hostname

Network: Host, Hostname

Network vs Host

Hosts is grouped into the concept of “network”. e.g. all computers in a company can be one network. All computers in a home can be one network.

Each host is a part of a network.

TCP/IP Protocol Layers

IP stack connections — TCP/IP data flow. The solid lines is the actual data connection. The dotted lines are abstract connection. 〔image source 2013-01-27 http://en.wikipedia.org/wiki/File:IP_stack_connections.svg 〕

“Host” means a computer.
The “process” means a running software program, such as web browser.
The “link” means “router”.
{Ethernet, fiber, satellite} are physical links (e.g. cable or radio wave transmissions.)

TCP/IP is a set of protocols that are logically separated into 4 layers. They are:

Application layer
Transport layer
Internet layer
Link layer

Each layer down covers more detail about how to send a datagram.

Here's a human example. If i send you a letter, i'm not concerned about how it gets there, by car, by plane, boat, or who delivers the letter, or what happens if its raining. All I care, is the mail content and address, and whether you got the letter (and how soon you can get it). This is the highest level. But beneath it, there must be a system, such as address system, transportation system, government law or structure for delivering mail, etc.

In TCP/IP, the highest layer, the Application Layer, is concerned only about software sending some content (a sequence of bytes) to another address such as email address or URL or IP address. The lowest layer, the link layer, is concerned about how to actually connect hardware things physically, over cable/wire or radio waves. Such as the design of the cable, the electric signals.

Here's more detail about each layer.

Application layer (process-to-process): This is the high level layer. Application layer are protocols that focus communication from the application's perspective. Such as, send or receive the data, or the format of the data. For example, {HTTP (web), SMTP (email), DHCP (automatic host config)} are protocols at this level.

Transport layer (host-to-host): provides end-to-end communication services for applications. The transport layer provides convenient services such as connection-oriented data stream support, reliability, flow control, and multiplexing. Two most used protocols in this layer are TCP and UDP.

Internet layer (internetworking): The internet layer is about exchanging datagrams across machines. This layer defines the addressing and routing structures used in TCP/IP. The primary example is the IP (Internet Protocol), which defines IP addresses. Its function in routing is to send datagrams to the next router that is closer to the destination IP address.

Link layer: This layer is pretty much about physical connection technology. That is, translating packets to various electric or optical wire signals, or wireless by radio waves or satellite transmission. The Ethernet cable is considered a standard of the link layer.

UDP encapsulation 〔image source 2013-01-27 http://en.wikipedia.org/wiki/File:UDP_encapsulation.svg 〕

Sample encapsulation of data in TCP/IP.
At top, the highests abstraction layer, the data is simply what we want to send, such as chat text. Then, the data is broken into many small packets.
Below it, the Transport layer, it shows a datagram. It adds a “header” to the datagram. This header contain info such as how exactly we send it, should be connection oriented or not, etc.
Below it,
the IP header, contains even more lower level info. And so on.

Network Port Number

Network Port Number

Network Socket

Network Socket

Connection Oriented vs Connectionless

There are 2 types of connection in TCP/IP:

Connection Oriented.
Connectionless.

TCP/IP by nature is a connectionless network, because each packet is independent. This is called Packet switching networking technology. (meaning, lots of small data “packets” are sent. Each one independent of another. They swarm towards destination, via routing (the “switch” part))

Packet Switching is in contrast to circuit switching tech.

Circuit Switching network is a connection-oriented networking approach. when a caller calls another, a electric circuit is established between the callers. It is used by early analog telephone networks. Circuit switching network in a sense dedicates the cable (or channel, medium) per active call/connection. Circuit switching

JT Switchboard 770x540 — A telephone operator manually connecting calls with cord pairs at a telephone switchboard. Photo taken in 1975. (photo by Joseph A Carr. Used with permission) 〔image source 2013-02-20 http://en.wikipedia.org/wiki/File:JT_Switchboard_770x540.jpg 〕

However, a packet switching network (tcp/ip) can emulate the effects of physical connection by using protocols that acknowledge transmission, then establishing a virtual connection. TCP does this.

Here's how connection-oriented networking works. When a packet is sent, the receiver sends back acknowledgement. If the sender don't receive this, it re-sends. When a session of communication is over, the sender and receiver say goodbye to each other, therefore “closes” the connection. In this way, communication is established as if thru physical connection, even though the data units transmitted is actually discrete and goes thru many routers that doesn't have any notion about who is connected to whom.

TCP connection — TCP protocol connection. 〔image source 2013-02-09 http://en.wikipedia.org/wiki/File:TCP_CLOSE.svg 〕

TCP is a connection oriented protocol.
UDP is a connectionless protocol.

IP Datagram

An IP packet consists of a header section and a data section.

An IP packet has no data checksum or any other footer after the data section. Typically the link layer encapsulates IP packets in frames with a CRC footer that detects most errors, and typically the end-to-end TCP layer checksum detects most other errors.

ICMP = 1
TCP = 6
UDP = 17

Routing schemes: unicast, anycast, multicast, broadcast

Routing

The Internet Protocol addressing system recognize 3 main types of addressing.

Unicast addressing uses a one-to-one association between destination address and network endpoint: each destination address uniquely identifies a single receiver endpoint.
Broadcast or multicast addressing uses a one-to-many association, datagrams are routed from a single sender to multiple endpoints simultaneously in a single transmission. The network automatically replicates datagrams as needed for all network segments (links) that contain an eligible receiver.
Anycast addressing routes datagrams to a single member of a group of potential receivers that are all identified by the same destination address. This is a one-to-one-of-many association.

Transmission Control Protocol (TCP)

Transmission Control Protocol (TCP)

UDP (User Datagram Protocol)

UDP (User Datagram Protocol)

Datagram Congestion Control Protocol (DCCP)

Stream Control Transmission Protocol (SCTP)

Address Resolution Protocol (ARP)

Address Resolution Protocol = a protocol that creates a look-up table for mapping IP address to MAC address.

Each host has a ARP cache.
If a host want to send data to another host in the same segment, it checks if the MAC address is in the ARP cache, if not, the host sends a broadcast called ARP request frame.
The receiver with the IP address will respond and give it's MAC address.

Reverse Address Resolution Protocol (RARP) is obsolete, replaced by Bootstrap Protocol (BOOTP) then by Dynamic Host Configuration Protocol (DHCP).

Internet Control Message Protocol (ICMP)

Internet Control Message Protocol (ICMP)

IGMP

Internet Group Management Protocol

The Internet Group Management Protocol (IGMP) is a communications protocol used by hosts and adjacent routers on IP networks to establish multicast group memberships. IGMP is an integral part of IP multicast.

IGMP can be used for one-to-many networking applications such as online streaming video and gaming, and allows more efficient use of resources when supporting these types of applications.

IGMP is used on IPv4 networks. Multicast management on IPv6 networks is handled by Multicast Listener Discovery (MLD) which uses ICMPv6 messaging in contrast to IGMP's bare IP encapsulation.

Routing

Routing is one of the most important element in internet, because it is routing that moves data.

By definition, a router has 2 or more network adapters, because a router is used to forward data between different networks. For home routers, usually one end is connected to a cable modem or DSL modem to the internet, and the other hand are Ethernet ports for the home network.

receive data from one of its attached networks.
check the destination address in the IP header. If it's on the network from whence the data came, the datagram is ignored. (because already reached its destination. (Ethernet sends it to all in the same network))
If destination IP address for a different network, the router checks the routing table to determine where to forward the datagram.
it dis-assemble and re-assemble the datagram and send it to the right adapter.

The most critical part is the routing table. Routing table can be manually setup, called static routing, but is almost always constructed automatically by other “discovery” protocols, called dynamic routing. (because, manually setting up the routing table is humanly impossible when there are more than a handful of networks.) Routing table can still be manually adjusted, however.

Routing Table

Routing table

Routing table, aka Routing Information Base (RIB), is a data table stored in a router or a computer that lists the routes to particular network destinations, and in some cases, metrics (distances) associated with those routes. The routing table contains information about the topology of the network immediately around it.

The construction of routing tables is the primary goal of routing protocols. Static routes are entries made in a routing table by non-automatic means and which are fixed rather than being the result of some network topology “discovery” procedure.

How to see the routing table of my computer?

Linux: Type ip route or route
Windows:

Routing Protocols

The job of Routing protocol is to fill the routing table.

There are 2 major types of routing protocol:

A link-state routing protocol is one of the two main classes of routing protocols used in packet switching networks for computer communications (the other is the distance-vector routing protocol). Examples of link-state routing protocols include open shortest path first (OSPF) and intermediate system to intermediate system (IS-IS).

The link-state protocol is performed by every router in the network. The basic concept of link-state routing is that every node constructs a map of the connectivity to the network, in the form of a graph, showing which nodes are connected to which other nodes. Each node then independently calculates the next best logical path from it to every possible destination in the network. The collection of best paths will then form the node's routing table.

This contrasts with distance-vector routing protocols, which work by having each node share its routing table with its neighbors. In a link-state protocol the only information passed between nodes is connectivity related.

Routing Information Protocol RIP. A distance vector routing protocol.

A RIP router broadcasts update message every 30 seconds. It can also request update.

Open Shortest Path First OSPF (a link-state routing protocol).

hop count

Routing loop problem

Core router

A core router is a router designed to operate in the Internet backbone, or core. To fulfill this role, a router must be able to support multiple telecommunications interfaces of the highest speed in use in the core Internet and must be able to forward IP packets at full speed on all of them. It must also support the routing protocols being used in the core. A core router is distinct from an edge router: edge routers sit at the edge of a backbone network and connect to core routers.

Autonomous System (Internet)

Dynamic Host Configuration Protocol (DHCP)

Dynamic Host Configuration Protocol

Zero configuration networking

multiplexing = multiple analog message signals or digital data streams are combined into one signal over a shared medium.
Duplex (telecommunications) point-to-point system composed of two connected parties or devices that can communicate with one another in both directions.

Tunneling protocol

Computer networks use a tunneling protocol when one network protocol (the delivery protocol) encapsulates a different payload protocol. By using tunneling one can (for example) carry a payload over an incompatible delivery-network, or provide a secure path through an untrusted network.

Virtual private network

Simple Service Discovery Protocol (SSDP)

Simple Network Management Protocol

Network segment. A term for a portion of network. For example, an Ethernet hub is a device for connecting multiple Ethernet devices together and making them act as a single network segment.

Ethernet hub

An Ethernet hub, active hub, network hub, repeater hub, multiport repeater or hub is a device for connecting multiple Ethernet devices together and making them act as a single network segment.
It has multiple input/output (I/O) ports, in which a signal introduced at the input of any port appears at the output of every port except the original incoming.
A hub works at the physical layer (layer 1) of the OSI model.
The device is a form of multiport repeater. Repeater hubs also participate in collision detection, forwarding a jam signal to all ports if it detects a collision.
A network hub is an unsophisticated device in comparison with, for example, a switch. A hub does not examine or manage any of the traffic that comes through it: any packet entering any port is rebroadcast on all other ports.
The availability of low-priced network switches has largely rendered hubs obsolete

Network switch A switch is a telecommunication device which receives a message from any device connected to it and then transmits the message only to the device for which the message was meant. This makes the switch a more intelligent device than a hub (which receives a message and then transmits it to all the other devices on its network).

Promiscuous mode

In computer networking, promiscuous mode or promisc mode is a mode for a wired network interface controller (NIC) or wireless network interface controller (WNIC) that causes the controller to pass all traffic it receives to the central processing unit (CPU) rather than passing only the frames that the controller is intended to receive. This mode is normally used for packet sniffing that takes place on a router or on a computer connected to a hub (instead of a switch) or one being part of a WLAN. The mode is also required for bridged networking for hardware virtualization.

In IEEE 802 networks such as Ethernet, token ring, and IEEE 802.11, and in FDDI, each frame includes a destination Media Access Control address (MAC address). In non-promiscuous mode, when a NIC receives a frame, it normally drops it unless the frame is addressed to that NIC's MAC address or is a broadcast or multicast frame. In promiscuous mode, however, the card allows all frames through, thus allowing the computer to read frames intended for other machines or network devices.

wireless

IEEE 802.11

Service set (802.11 network)

common problems

See: How to Diagnose Computer Networking Problems

Firewall

Firewall (computing) filters traffic. Firewall can be classified by their power:

Basic firewall (aka packet filter). Simply look at each packet and decide to drop based on any {ip address, port number, protocol, tcp/udp traffic} in the packet. When the packet fits a filter rule, the firewall may simply drop the packet or send a error response.
Stateful firewall. Understand up to transport layer. This is done by accumulate (caching) packets. Can know invalid packet, session hijacking, some DOS attack. Stateful firewall
more advanced firewall understands app layer. Application firewall

placement of firewall:

normal, between local network and outside.
put public services outside the firewall.
Two firewalls , between outsite world, public services, local network. The middle zone is called DMZ (computing). (not necessarily 2 firewall for this, might be just one filter/direct traffic among 3 zones (3 Network Interface).)

Firewall can be software based or hardware. The function of a firewall is often parts of other services or device. Most Operating System has software based firewall. Some routers can also do some firewall functions, or be a powerful firewall. Firewall can also be a proxy server .

on Linux, Firewall framework is netfilter (iptables). For a intro, see: Linux: What's Netfilter, iptables, Their Differences?

Port scanner

DNS and host file

Hosts (file)

Domain Name System

WAN

Wide area network

Integrated Services Digital Network ISDN

High-Level Data Link Control HDLC

ATM

Asynchronous Transfer Mode

OpenWrt

DD-WRT

FON

diskeynote talk by Radia Perlman at Linux.conf.au 2013 http://mirror.linux.org.au/linux.conf.au/2013/mp4/Keynote_Radia_Perlman.mp4