Clickhouse简介
Clickhouse是一个用于联机分析处理(OLAP)的列式数据库管理系统(DBMS)。它由俄罗斯的搜索引擎公司Yandex开发,并且是开源的,源代码主要使用C++编写。Clickhouse特别适用于运行实时数据分析查询的场景,可以处理非常大的数据量,包括万亿行数据以上。
以下是一些Clickhouse的关键特性:
- 列式存储:数据按列存储,这样可以大大提高对特定列数据的查询性能,并且高效地压缩数据。
- 数据压缩:Clickhouse使用不同的压缩算法来减少存储空间需求,提高I/O效率。
- 矢量化查询执行:执行查询时能够一次处理数据列的多个值,从而提升性能。
- 实时查询性能:即使是针对数十到数百TB数据的复杂查询,也能在几秒到几分钟内返回结果。
- 分布式架构:Clickhouse可以在多个节点上水平扩展,支持分布式查询处理。
- 高容错性:支持数据复制和数据冗余,确保数据的高可用性。
- 强大的SQL支持:Clickhouse支持SQL查询,使那些已经熟悉SQL的用户能够容易上手。
- 高度可定制的:可以根据需求定制数据分片和复制配置。
- 兼容多种数据格式:支持各种数据格式的输入和输出,可以轻松与其他系统集成。
Clickhouse适合处理各种分析型业务场景,如数据仓库、日志分析、BI应用、监控等,并且在互联网、金融、电信、广电等行业中得到广泛使用。它的性能和可伸缩性使它成为处理大规模数据集时的一个强有力的选择。
Clickhouse安装
创建pvc
1
2
3
4
5
6
7
8
9
10
11
|
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: clickhouse-pvc
spec:
resources:
requests:
storage: 500Gi #数据大小
accessModes:
- ReadWriteMany # pvc数据访问类型
storageClassName: "csi-cephfs-247" #storageclass 名称
|
创建ConfigMap
将users.xml配置修改挂载
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
|
apiVersion: v1
kind: ConfigMap
metadata:
name: clickhouse-users
data:
users.xml: |
<?xml version="1.0"?>
<clickhouse>
<!-- See also the files in users.d directory where the settings can be overridden. -->
<!-- Profiles of settings. -->
<profiles>
<!-- Default settings. -->
<default>
</default>
<!-- Profile that allows only read queries. -->
<readonly>
<readonly>1</readonly>
</readonly>
</profiles>
<!-- Users and ACL. -->
<users>
<!-- If user name was not specified, 'default' user is used. -->
<default>
<!-- See also the files in users.d directory where the password can be overridden.
Password could be specified in plaintext or in SHA256 (in hex format).
If you want to specify password in plaintext (not recommended), place it in 'password' element.
Example: <password>qwerty</password>.
Password could be empty.
If you want to specify SHA256, place it in 'password_sha256_hex' element.
Example: <password_sha256_hex>65e84be33532fb784c48129675f9eff3a682b27168c0ea744b2cf58ee02337c5</password_sha256_hex>
Restrictions of SHA256: impossibility to connect to ClickHouse using MySQL JS client (as of July 2019).
If you want to specify double SHA1, place it in 'password_double_sha1_hex' element.
Example: <password_double_sha1_hex>e395796d6546b1b65db9d665cd43f0e858dd4303</password_double_sha1_hex>
If you want to specify a previously defined LDAP server (see 'ldap_servers' in the main config) for authentication,
place its name in 'server' element inside 'ldap' element.
Example: <ldap><server>my_ldap_server</server></ldap>
If you want to authenticate the user via Kerberos (assuming Kerberos is enabled, see 'kerberos' in the main config),
place 'kerberos' element instead of 'password' (and similar) elements.
The name part of the canonical principal name of the initiator must match the user name for authentication to succeed.
You can also place 'realm' element inside 'kerberos' element to further restrict authentication to only those requests
whose initiator's realm matches it.
Example: <kerberos />
Example: <kerberos><realm>EXAMPLE.COM</realm></kerberos>
How to generate decent password:
Execute: PASSWORD=$(base64 < /dev/urandom | head -c8); echo "$PASSWORD"; echo -n "$PASSWORD" | sha256sum | tr -d '-'
In first line will be password and in second - corresponding SHA256.
How to generate double SHA1:
Execute: PASSWORD=$(base64 < /dev/urandom | head -c8); echo "$PASSWORD"; echo -n "$PASSWORD" | sha1sum | tr -d '-' | xxd -r -p | sha1sum | tr -d '-'
In first line will be password and in second - corresponding double SHA1.
-->
<password></password>
<!-- List of networks with open access.
To open access from everywhere, specify:
<ip>::/0</ip>
To open access only from localhost, specify:
<ip>::1</ip>
<ip>127.0.0.1</ip>
Each element of list has one of the following forms:
<ip> IP-address or network mask. Examples: 213.180.204.3 or 10.0.0.1/8 or 10.0.0.1/255.255.255.0
2a02:6b8::3 or 2a02:6b8::3/64 or 2a02:6b8::3/ffff:ffff:ffff:ffff::.
<host> Hostname. Example: server01.clickhouse.com.
To check access, DNS query is performed, and all received addresses compared to peer address.
<host_regexp> Regular expression for host names. Example, ^server\d\d-\d\d-\d\.clickhouse\.com$
To check access, DNS PTR query is performed for peer address and then regexp is applied.
Then, for result of PTR query, another DNS query is performed and all received addresses compared to peer address.
Strongly recommended that regexp is ends with $
All results of DNS requests are cached till server restart.
-->
<networks>
<ip>::/0</ip>
</networks>
<!-- Settings profile for user. -->
<profile>default</profile>
<!-- Quota for user. -->
<quota>default</quota>
<!-- User can create other users and grant rights to them. -->
<access_management>1</access_management>
<!-- User can manipulate named collections. -->
<named_collection_control>1</named_collection_control>
<!-- User permissions can be granted here -->
<!--
<grants>
<query>GRANT ALL ON *.*</query>
</grants>
-->
</default>
</users>
<!-- Quotas. -->
<quotas>
<!-- Name of quota. -->
<default>
<!-- Limits for time interval. You could specify many intervals with different limits. -->
<interval>
<!-- Length of interval. -->
<duration>3600</duration>
<!-- No limits. Just calculate resource usage for time interval. -->
<queries>0</queries>
<errors>0</errors>
<result_rows>0</result_rows>
<read_rows>0</read_rows>
<execution_time>0</execution_time>
</interval>
</default>
</quotas>
</clickhouse>
|
创建clickhouse部署文件
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
|
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: clickhouse
name: clickhouse
spec:
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app: clickhouse
template:
metadata:
labels:
app: clickhouse
spec:
containers:
- image: clickhouse/clickhouse-server:23.8
imagePullPolicy: IfNotPresent
name: clickhouse
ports:
- containerPort: 8123
name: http
protocol: TCP
- containerPort: 9000
name: tcp
protocol: TCP
- containerPort: 9009
name: interserver
protocol: TCP
resources:
limits:
cpu: 1048m
memory: 2Gi
requests:
cpu: 1048m
memory: 2Gi
volumeMounts:
- mountPath: /var/lib/clickhouse
name: clickhouse-volume
- mountPath: /etc/clickhouse-server/users.xml
subPath: users.xml
name: clickhouse-users
volumes:
- name: clickhouse-users
configMap:
name: clickhouse-users
defaultMode: 511
- name: clickhouse-volume
persistentVolumeClaim:
claimName: clickhouse-pvc
---
apiVersion: v1
kind: Service
metadata:
name: clickhouse
spec:
ports:
- port: 8123
name: http
protocol: TCP
targetPort: 8123
- port: 9000
name: tcp
protocol: TCP
targetPort: 9000
# - port: 9009
# name: interserver
# protocol: TCP
# targetPort: 9009
selector:
app: clickhouse
type: ClusterIP
|
外部访问
创建NodePort的svc
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
|
apiVersion: v1
kind: Service
metadata:
name: clickhouse-node
spec:
ports:
- port: 8123
protocol: TCP
name: http
targetPort: 8123
nodePort: 30410
- port: 9000
name: tcp
protocol: TCP
targetPort: 9000
nodePort: 30411
selector:
app: clickhouse
type: NodePort
|
内部调用测试
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
|
[root@k8s-7-132 ~]# kubectl exec -it clickhouse-5ff4545d7b-hvdqv /bin/bash
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
root@clickhouse-5ff4545d7b-hvdqv:/# clickhouse-client
ClickHouse client version 23.8.11.28 (official build).
Connecting to localhost:9000 as user default.
Connected to ClickHouse server version 23.8.11 revision 54465.
Warnings:
* Linux transparent hugepages are set to "always". Check /sys/kernel/mm/transparent_hugepage/enabled
clickhouse-5ff4545d7b-hvdqv :) show databases; #查看数据库
SHOW DATABASES
Query id: b499e5c5-a8cf-4e4c-9efc-1b8f2cf033d4
┌─name───────────────┐
│ INFORMATION_SCHEMA │
│ default │
│ information_schema │
│ system │
└────────────────────┘
4 rows in set. Elapsed: 0.001 sec.
clickhouse-5ff4545d7b-hvdqv :) create database test; #创建数据库
CREATE DATABASE test
Query id: b0f73c06-c8d8-4cbb-aefb-99f72dae7747
Ok.
0 rows in set. Elapsed: 0.014 sec.
clickhouse-5ff4545d7b-hvdqv :) show databases;
SHOW DATABASES
Query id: 4151b3b6-c54d-4c08-a20a-b6c3e04af33c
┌─name───────────────┐
│ INFORMATION_SCHEMA │
│ default │
│ information_schema │
│ system │
│ test │
└────────────────────┘
5 rows in set. Elapsed: 0.002 sec.
clickhouse-5ff4545d7b-hvdqv :) use test;
USE test
Query id: 31d2ecd2-d872-4c00-9704-4feec26e93ff
Ok.
0 rows in set. Elapsed: 0.001 sec.
clickhouse-5ff4545d7b-hvdqv :) show tables;
SHOW TABLES
Query id: 614138ca-499e-4c37-aa8d-69329fb06946
Ok.
0 rows in set. Elapsed: 0.002 sec.
clickhouse-5ff4545d7b-hvdqv :) SELECT * FROM system.clusters; #查看当前集群
SELECT *
FROM system.clusters
Query id: 776df3bc-99a5-4010-8aea-06b84fdc88b6
┌─cluster─┬─shard_num─┬─shard_weight─┬─replica_num─┬─host_name─┬─host_address─┬─port─┬─is_local─┬─user────┬─default_database─┬─errors_count─┬─slowdowns_count─┬─estimated_recovery_time─┬─database_shard_name─┬─database_replica_name─┬─is_active─┐
│ default │ 1 │ 1 │ 1 │ localhost │ 127.0.0.1 │ 9000 │ 1 │ default │ │ 0 │ 0 │ 0 │ │ │ ᴺᵁᴸᴸ │
└─────────┴───────────┴──────────────┴─────────────┴───────────┴──────────────┴──────┴──────────┴─────────┴──────────────────┴──────────────┴─────────────────┴─────────────────────────┴─────────────────────┴───────────────────────┴───────────┘
1 row in set. Elapsed: 0.002 sec.
clickhouse-5ff4545d7b-hvdqv :)
|
外部访问
