CS144

2022-10-04

Web

字数统计: 3.7k | 阅读时长≈ 18 分钟

本文介绍了对Stanford的经典计网课程CS144的相关实验的学习和完成记录。

环境测试

注意这里我没办法在自己的centos7下顺利完成所有的环境搭建，估计是centos版本不够，导致很多默认的应用版本都没有达到要求。无奈选择购买一台阿里云服务器，使用centos8.5系统，顺利完成所有环境搭建。

121.199.60.140

安装telnet

测试：

[zhangqi@localhost ~]$ telnet cs144.keithw.org http
Trying 104.196.238.229...
Connected to cs144.keithw.org.
Escape character is '^]'.

继续输入http request，结束后多输一个空行，表示请求输入完毕

该请求等同于在浏览器输入 http://cs144.keithw.org/hello

GET /hello HTTP/1.1 
Host: cs144.keithw.org
Connection: close

可以看到http的返回

HTTP/1.1 200 OK
Date: Tue, 04 Oct 2022 05:42:33 GMT
Server: Apache
Last-Modified: Thu, 13 Dec 2018 15:45:29 GMT
ETag: "e-57ce93446cb64"
Accept-Ranges: bytes
Content-Length: 14
Connection: close
Content-Type: text/plain

Hello, CS144!
Connection closed by foreign host.

安装netcat

测试：(注意不同操作系统下使用的指令可能不一样)

[zhangqi@localhost ~]$ nc -v -l -p 9090
Ncat: Version 7.50 ( https://nmap.org/ncat )
Ncat: Listening on :::9090
Ncat: Listening on 0.0.0.0:9090

打开另一个终端

[zhangqi@localhost ~]$ telnet localhost 9090
Trying ::1...
Connected to localhost.
Escape character is '^]'.

在启动netcat的终端可以看到

1 2	Ncat: Connection from ::1. Ncat: Connection from ::1:59256.

这时候在任一终端键入任一一些字符并敲回车，都会在另一头接收到同样的信息

在netcat窗口Ctrl-C退出，telnet窗口也会立即退出

升级gcc和g++版本

centos7下参考这篇文章

另：文章

注意需要升级完需要删除build文件夹重新编译

Lab0. Warm Up

clone代码仓库：

1	$ git clone https://github.com/cs144/sponge

cd sponge
mkdir build
cd build

cmake ..
make

一些开发规范

不要使用malloc/free和new/delete
不要使用原始指针，实在有需要，使用智能指针
不要使用模板、线程、锁和虚函数
避免C风格的字符串即相关字符串函数（strlen，strcpy），而是使用std::string
不要使用C风格的转型，而是使用C++的static_cast
最好在函数中传递常量引用
尽量设置每个变量为常量除非必要
尽量设置每个方法为常量方法除非需要改变对象
避免全局变量，尽量给每个变量最小的作用域

webget

写一个名为webget的程序用于创建TCP流socket，连接到一个web服务器，并且抓取一个页面（就像前面使用telnet的示例一样）

使用socket.h中的相关类

#include "socket.hh"
#include "util.hh"

#include <cstdlib>
#include <iostream>

using namespace std;

void get_URL(const string &host, const string &path) {
    // Your code here.

    TCPSocket sock;
    const string addr = Address(host,"http");
    sock.connect(addr);
    sock.write("GET "+ path + " HTTP/1.1\r\nHost: "+ host + "\r\n\r\n");

    sock.shutdown(SHUT_WR);
    while(!sock.eof()){
        cout<<sock.read();
    }
    sock.close();
    return;
}

int main(int argc, char *argv[]) {
    try {
        if (argc <= 0) {
            abort();  // For sticklers: don't try to access argv[0] if argc <= 0.
        }

        // The program takes two command-line arguments: the hostname and "path" part of the URL.
        // Print the usage message unless there are these two arguments (plus the program name
        // itself, so arg count = 3 in total).
        if (argc != 3) {
            cerr << "Usage: " << argv[0] << " HOST PATH\n";
            cerr << "\tExample: " << argv[0] << " stanford.edu /class/cs144\n";
            return EXIT_FAILURE;
        }

        // Get the command-line arguments.
        const string host = argv[1];
        const string path = argv[2];

        // Call the student-written function.
        get_URL(host, path);
    } catch (const exception &e) {
        cerr << e.what() << "\n";
        return EXIT_FAILURE;
    }

    return EXIT_SUCCESS;
}

测试程序：

[root@iZbp1aamvzn845uqnj516qZ build]# ./apps/webget cs144.keithw.org /hello
HTTP/1.1 200 OK
Date: Tue, 04 Oct 2022 12:38:56 GMT
Server: Apache
Last-Modified: Thu, 13 Dec 2018 15:45:29 GMT
ETag: "e-57ce93446cb64"
Accept-Ranges: bytes
Content-Length: 14
Content-Type: text/plain

Hello, CS144!

TCPSocket及其父类Socket的声明：

class Socket : public FileDescriptor {
  private:
    //! Get the local or peer address the socket is connected to
    Address get_address(const std::string &name_of_function,
                        const std::function<int(int, sockaddr *, socklen_t *)> &function) const;

  protected:
    //! Construct via [socket(2)](\ref man2::socket)
    Socket(const int domain, const int type);

    //! Construct from a file descriptor.
    Socket(FileDescriptor &&fd, const int domain, const int type);

    //! Wrapper around [setsockopt(2)](\ref man2::setsockopt)
    template <typename option_type>
    void setsockopt(const int level, const int option, const option_type &option_value);

  public:
    //! Bind a socket to a specified address with [bind(2)](\ref man2::bind), usually for listen/accept
    void bind(const Address &address);

    //! Connect a socket to a specified peer address with [connect(2)](\ref man2::connect)
    void connect(const Address &address);

    //! Shut down a socket via [shutdown(2)](\ref man2::shutdown)
    void shutdown(const int how);

    //! Get local address of socket with [getsockname(2)](\ref man2::getsockname)
    Address local_address() const;
    //! Get peer address of socket with [getpeername(2)](\ref man2::getpeername)
    Address peer_address() const;

    //! Allow local address to be reused sooner via [SO_REUSEADDR](\ref man7::socket)
    void set_reuseaddr();
};

class TCPSocket : public Socket {
  private:
    //! \brief Construct from FileDescriptor (used by accept())
    //! \param[in] fd is the FileDescriptor from which to construct
    explicit TCPSocket(FileDescriptor &&fd) : Socket(std::move(fd), AF_INET, SOCK_STREAM) {}

  public:
    //! Default: construct an unbound, unconnected TCP socket
    TCPSocket() : Socket(AF_INET, SOCK_STREAM) {}

    //! Mark a socket as listening for incoming connections
    void listen(const int backlog = 16);

    //! Accept a new incoming connection
    TCPSocket accept();
};

An in-memory reliable byte system

要求实现一个有序字节流类（in-order byte stream），使之支持读写、容量控制。这个字节流类似于一个带容量的队列，从一头读，从另一头写。当流中的数据达到容量上限时，便无法再写入新的数据。特别的，写操作被分为了peek和pop两步。peek为从头部开始读取指定数量的字节，pop为弹出指定数量的字节。

使用 std::deque作为数据结构

byte_stream.hh

class ByteStream {
  private:
    // Your code here -- add private members as necessary.
    std::deque<char> _buffer = {};
    size_t _capacity = 0;
    size_t _read_count = 0;
    size_t _write_count = 0;
    bool _input_ended_flag = false;
    bool _error = false;  //!< Flag indicating that the stream suffered an error.
    //......

byte_stream.cc

ByteStream::ByteStream(const size_t capacity) : _capacity(capacity) {}

size_t ByteStream::write(const string &data) {
    size_t len = data.length();
    if (len > _capacity - _buffer.size()) {
        len = _capacity - _buffer.size();
    }
    _write_count += len;
    for (size_t i = 0; i < len; i++) {
        _buffer.push_back(data[i]);
    }
    return len;
}

//! \param[in] len bytes will be copied from the output side of the buffer
string ByteStream::peek_output(const size_t len) const {
    size_t length = len;
    if (length > _buffer.size()) {
        length = _buffer.size();
    }
    return string().assign(_buffer.begin(), _buffer.begin() + length);
}

//! \param[in] len bytes will be removed from the output side of the buffer
void ByteStream::pop_output(const size_t len) {
    size_t length = len;
    if (length > _buffer.size()) {
        length = _buffer.size();
    }
    _read_count += length;
    while (length--) {
        _buffer.pop_front();
    }
    return;
}

void ByteStream::end_input() { _input_ended_flag = true; }

bool ByteStream::input_ended() const { return _input_ended_flag; }

size_t ByteStream::buffer_size() const { return _buffer.size(); }

bool ByteStream::buffer_empty() const { return _buffer.size() == 0; }

bool ByteStream::eof() const { return buffer_empty() && input_ended(); }

size_t ByteStream::bytes_written() const { return _write_count; }

size_t ByteStream::bytes_read() const { return _read_count; }

size_t ByteStream::remaining_capacity() const { return _capacity - _buffer.size(); }

调试方法论

如下，发现测试样例出错

[root@iZbp1aamvzn845uqnj516qZ build]# make check_lab0
[100%] Testing Lab 0...
Test project /root/sponge/build
    Start 26: t_byte_stream_construction
1/9 Test #26: t_byte_stream_construction .......***Failed    0.00 sec
Test Failure on expectation:
	Expectation: buffer_empty: 1

Failure message:
	The ByteStream should have had buffer_empty equal to 1 but instead it was 0

List of steps that executed successfully:
	Initialized with (capacity=15)
	Expectation: input_ended: 0

Exception: The test "construction" failed

在sponge目录下全局搜索可发现，所有的测试样例对应的命令都在spong/etc/tests.cmake中定义，我们从中找到需要的部分

[root@iZbp1aamvzn845uqnj516qZ sponge]# grep -Rn t_byte_stream_construction
etc/tests.cmake:34:add_test(NAME t_byte_stream_construction COMMAND byte_stream_construction)
build/CTestTestfile.cmake:57:add_test(t_byte_stream_construction "/root/sponge/build/tests/byte_stream_construction")
build/CTestTestfile.cmake:58:set_tests_properties(t_byte_stream_construction PROPERTIES  _BACKTRACE_TRIPLES "/root/sponge/etc/tests.cmake;34;add_test;/root/sponge/etc/tests.cmake;0;;/root/sponge/CMakeLists.txt;27;include;/root/sponge/CMakeLists.txt;0;")
build/Testing/Temporary/LastTest.log:3:26/169 Testing: t_byte_stream_construction
build/Testing/Temporary/LastTest.log:4:26/169 Test: t_byte_stream_construction
build/Testing/Temporary/LastTest.log:7:"t_byte_stream_construction" start time: Oct 05 11:03 CST
build/Testing/Temporary/LastTest.log:25:"t_byte_stream_construction" end time: Oct 05 11:03 CST
build/Testing/Temporary/LastTest.log:26:"t_byte_stream_construction" time elapsed: 00:00:00
build/Testing/Temporary/LastTestsFailed.log:1:26:t_byte_stream_construction
build/Testing/Temporary/CTestCostData.txt:10:t_byte_stream_construction 0 0
build/Testing/Temporary/CTestCostData.txt:19:t_byte_stream_construction

[root@iZbp1aamvzn845uqnj516qZ sponge]# cd build/
[root@iZbp1aamvzn845uqnj516qZ build]# cd tests/
[root@iZbp1aamvzn845uqnj516qZ tests]# ls
byte_stream_capacity      byte_stream_one_write   cmake_install.cmake
byte_stream_construction  byte_stream_two_writes  libspongechecks.a
byte_stream_many_writes   CMakeFiles              Makefile

而所有测试程序对应的源文件存放在sponge/tests/下

gdb调试基础

（最后发现是因为make check_lab0不会自动把所有文件编译，因此需要先经过一次make才行）

Lab1

在lab0中实际上我们是用的Linux内部实现的TCP协议。

这个lab中实现一个流重组器（stream reassemebler），可以将带索引的字节流碎片重组成有序的字节流。每个字节流碎片都通过索引、长度、内容三要素进行描述。重组完的字节流应当被送入指定的字节流（byte stream）对象_output中。

特别注意：

这节需要安装pcap库和pcap-dev库才能正常编译，如果没编译没报错那就没事了。
碎片可能交叉或重叠。
如果某次新碎片到达后字节流的开头部分被凑齐，那就应当立刻把凑齐的部分立刻写入到_output中。即对应讲义中的:

When should bytes be written to the stream?

As soon as possible. The only situation in which a byte should not be in the stream is that when there is a byte before it that has not been “pushed” yet.

碎片可能是一个只包含EOF标志的空串
LAB0的顺序字节流和LAB1的流重组器各有各的容量限制。流重组器把字节流写满后，只有当字节流腾出空后才能继续写，相当于字节流满时流重组器出口被“堵住”了。同样当流重组器容量满了后自身也无法被写入新数据，此时到来的新碎片只能被丢弃掉。

实验准备

git merge origin/lab1-startercode
make

实验说明

In this and the next lab, you will implement a TCP receiver: the module that receives datagrams and turns them into a reliable byte stream to be read from the socket by the application—just as your webget program read the byte stream from the webserver in Lab 0.

The TCP sender is dividing its byte stream up into short segments (substrings no more than about 1,460 bytes apiece) so that they each fit inside a datagram. But the network might reorder these datagrams, or drop them, or deliver them more than once. The receiver must reassemble the segments into the contiguous stream of bytes that they started out as.

In this lab you’ll write the data structure that will be responsible for this reassembly: a StreamReassembler. It will receive substrings, consisting of a string of bytes, and the index of the first byte of that string within the larger stream. Each byte of the stream has its own unique index, starting from zero and counting upwards.

The StreamReassembler will own a ByteStream for the output: as soon as the reassembler knows the next byte of the stream, it will write it into the ByteStream. The owner can access and read from the ByteStream whenever it wants.

接口：

// Construct a `StreamReassembler` that will store up to `capacity` bytes.
StreamReassembler(const size_t capacity);
// Receive a substring and write any newly contiguous bytes into the stream,
// while staying within the memory limits of the `capacity`. Bytes that would
// exceed the capacity are silently discarded.
//
// `data`: the substring
// `index` indicates the index (place in sequence) of the first byte in `data`
// `eof`: the last byte of this substring will be the last byte in the entire stream
void push_substring(const string &data, const uint64_t index, const bool eof);
// Access the reassembled ByteStream (your code from Lab 0)
ByteStream &stream_out();
// The number of bytes in the substrings stored but not yet reassembled
size_t unassembled_bytes() const;
// Is the internal state empty (other than the output stream)?
bool empty() const;

实现思路

用一个block_node结构体来存放每个碎片的索引、长度、内容。又因为set排序实现基于对应节点类型的小于运算符规则，所以把block_node结构体的小于运算符重载为按索引值升序。

push_substring处理流程：

容量判断：满了就立刻返回。
处理子串的冗余前缀：如果子串包含已经被写入字节流的部分，就把这部分剪掉。
合并子串：运用set自带的lowerbound快速确定插入位置，前后重复比较，用个自己写的子函数判断重叠的字顺便合并之。
写入字节流：如果流重组器头部非空，就把头部写入字节流，并更新指示头部的游标。
EOF判断

stream_reassembler.hh

class StreamReassembler {
  private:
    // Your code here -- add private members as necessary.
    // block_node可以看作网络中的每个连续的数据碎片，包含这个碎片开头的索引及长度
    struct block_node {
        size_t begin = 0;
        size_t length = 0;
        std::string data = "";
        bool operator<(const block_node t) const { return begin < t.begin; }
    };
    std::set<block_node> _blocks = {};
    std::vector<char> _buffer = {};
    size_t _unassembled_byte = 0;
    size_t _head_index = 0; //即first_unassemabled
    bool _eof_flag = false;
    ByteStream _output;  //!< The reassembled in-order byte stream
    size_t _capacity;    //!< The maximum number of bytes

    //! merge elm2 to elm1, return merged bytes
    long merge_block(block_node &elm1, const block_node &elm2);
    //......

stream_reassembler.cc

StreamReassembler::StreamReassembler(const size_t capacity) : _output(capacity), _capacity(capacity) {
    _buffer.resize(capacity);
}

long StreamReassembler::merge_block(block_node &elm1, const block_node &elm2) {
    block_node x, y;//x存储开头索引较小的block_node
    if (elm1.begin > elm2.begin) {
        x = elm2;
        y = elm1;
    } else {
        x = elm1;
        y = elm2;
    }
    if (x.begin + x.length < y.begin) {
        return -1;  // no intersection, couldn't merge
    } else if (x.begin + x.length >= y.begin + y.length) {
        // x完全包含y
        elm1 = x;
        return y.length;
    } else {
        // x与y交叉
        elm1.begin = x.begin;
        // eg: x:"ab" 0,2   y:"bcd" 1,3  "ab"+"bcd".substr(0+2-1)
        elm1.data = x.data + y.data.substr(x.begin + x.length - y.begin);
        elm1.length = elm1.data.length();
        //返回新增的字符个数
        return x.begin + x.length - y.begin;
    }
}

//! \details This function accepts a substring (aka a segment) of bytes,
//! possibly out-of-order, from the logical stream, and assembles any newly
//! contiguous substrings and writes them into the output stream in order.
void StreamReassembler::push_substring(const string &data, const size_t index, const bool eof) {
    if (index >= _head_index + _capacity) {  // capacity over
        return;
    }

    // handle extra substring prefix
    block_node elm;
    if (index + data.length() <= _head_index) {  // couldn't equal, because there have emtpy substring
        goto JUDGE_EOF;
    } else if (index < _head_index) {
        size_t offset = _head_index - index;//只保留_head_index之后的部分
        elm.data.assign(data.begin() + offset, data.end());
        elm.begin = index + offset;
        elm.length = elm.data.length();
    } else {
        elm.begin = index;
        elm.length = data.length();
        elm.data = data;
    }
    _unassembled_byte += elm.length;

    // merge substring
    do {
        // merge next
        long merged_bytes = 0;
        auto iter = _blocks.lower_bound(elm);
        while (iter != _blocks.end() && (merged_bytes = merge_block(elm, *iter)) >= 0) {
            _unassembled_byte -= merged_bytes;
            _blocks.erase(iter);
            iter = _blocks.lower_bound(elm);
        }
        // merge prev
        if (iter == _blocks.begin()) {
            break;
        }
        iter--;
        while ((merged_bytes = merge_block(elm, *iter)) >= 0) {
            _unassembled_byte -= merged_bytes;
            _blocks.erase(iter);
            iter = _blocks.lower_bound(elm);
            if (iter == _blocks.begin()) {
                break;
            }
            iter--;
        }
    } while (false);
    _blocks.insert(elm);

    // write to ByteStream
    if (!_blocks.empty() && _blocks.begin()->begin == _head_index) {
        const block_node head_block = *_blocks.begin();
        // modify _head_index and _unassembled_byte according to successful write to _output
        size_t write_bytes = _output.write(head_block.data);
        _head_index += write_bytes;
        _unassembled_byte -= write_bytes;
        _blocks.erase(_blocks.begin());
    }

JUDGE_EOF:
    if (eof) {
        _eof_flag = true;
    }
    if (_eof_flag && empty()) {
        _output.end_input();
    }
}

size_t StreamReassembler::unassembled_bytes() const { return _unassembled_byte; }

bool StreamReassembler::empty() const { return _unassembled_byte == 0; }

Lab2

实现接收端

作为receiver，它关心的是下图蓝色方框的部分（seqno、SYN、FIN、Payload）

要求实现序列号、绝对序列号与流索引间的转换。照着讲义的表格写就行：

Sequence Numbers	Absolute Sequence Numbers	Stream Indices
Start at the ISN	Start at 0	Start at 0
Include SYN/FIN	Include SYN/FIN	Omit SYN/FIN
32 bits, wrapping	64 bits, non-wrapping	64 bits, non-wrapping
“seqno”	“absolute seqno”	“stream index”

需要提一下的地方有checkpoint表示最近一次转换求得的absolute seqno，而本次转换出的absolute seqno应该选择与上次值最为接近的那一个。原理是虽然segment不一定按序到达，但几乎不可能出现相邻到达的两个segment序号差值超过INT32_MAX的情况，除非延迟以年为单位，或者产生了比特差错（后面的LAB可能涉及）。

实际操作就是把算出来的绝对序号分别加减1ul << 32做比较，选择与checkpoing差的绝对值最小的那个。

版权声明： 本博客所有文章除特别声明外，著作权归作者所有。转载请注明出处！