性能方案设计
调研清单
powerfulseal
schemathesis
Ceph 自动化
参考文档:https://teams.microsoft.com/l/file/2043B749-2D71-4ACF-88C9-12DF50126452?tenantId=19441c6a-f224-41c8-ac36-82464c2d9b13&fileType=pdf&objectUrl=https%3A%2F%2Fapulistech.sharepoint.com%2Fsites%2FQA-Ops%2FShared%20Documents%2FGeneral%2F2019-05-20%20teuthology.pdf&baseUrl=https%3A%2F%2Fapulistech.sharepoint.com%2Fsites%2FQA-Ops&serviceName=teams&threadId=19:30392c27d21c45a7ac07848d310b4347@thread.tacv2&groupId=846ac8d2-94e2-44a9-a6e3-0386803e6f89
测试工具:https://github.com/ceph/teuthology/tree/master/docs
https://github.com/ceph/teuthology/ https://www.youtube.com/watch?v=XwZqo_KQEag
https://www.youtube.com/channel/UCno-Fry25FJ7B4RycCxOtfw Ceph - YouTubeEnjoy the videos and music you love, upload original content, and share it all with friends, family, and the world on YouTube.www.youtube.com
https://www.youtube.com/watch?v=gj1OXrKdSrs
系统备份还原
https://clonezilla.org/downloads.php
业界参考书籍
AI自动化测试:https://item.jd.com/70985685552.html
京东质量团队转型实践: https://item.jd.com/12442039.html
机器学习测试入门与实践: https://item.jd.com/12958088.html
iOS/Android 自动化测试实战:https://search.jd.com/Search?keyword=%E6%B5%8B%E8%AF%95%E4%B9%A6%E7%B1%8D%20%E8%85%BE%E8%AE%AF&enc=utf-8&wq=%E6%B5%8B%E8%AF%95%E4%B9%A6%E7%B1%8D%20%E8%85%BE%E8%AE%AF&pvid=22b5893757e341608b0edcc81a17dd18
k8s 性能
kubeflow - kubeserving
平台性能,稳定性,异常
Results of measuring of API performance of Kubernetes kubernetes-storage-performance-comparison-v2-2020 scalability network-plugins-cni etcd community sig-testing kubemark perf-tests kubetest2 test-infra prow.k8s.io Kubernetes Aggregated Failures from
PXE Server 自动安装系统
参考:
Kubernetes 维护
编码规范
命名规范【主谓/动宾表】
脚本命名规范
主体_操作_属性_实体_环境
jmeter 脚本命名示例:
more_user_submit_more_CPU_job.jmx
代码命名规范
代码文件, 函数命名:小写单词 + “_”
get_job_name
变量命名: 双驼峰
TestList
常量,特殊标记: 全大写
WAITTIME
AI(HPC)高性能计算框架, NPU, GPU
kubic-control for openSUSE Kubic OpenQA kubernetes-perf-tests argo Kubeadm on SLES Jenkins X k8s 获取master token kubeadm国内源
Azurre k8s最佳实践 https://docs.microsoft.com/en-us/azure/aks/quotas-skus-regions
k8s-kpis-with-kuberhealthy:
https://kubernetes.io/blog/2020/05/29/k8s-kpis-with-kuberhealthy/
https://github.com/Comcast/kuberhealthy/blob/master/docs/EXTERNAL_CHECKS_REGISTRY.md
kubeflow/testing:
https://github.com/kubeflow/testing https://github.com/kubernetes/test-infra
argo
https://github.com/argoproj/argo/
https://argoproj.github.io/argo-cd/getting_started/#1-install-argo-cd
https://blog.argoproj.io/about
jenkins-x https://www.inovex.de/blog/spinnaker-vs-argo-cd-vs-tekton-vs-jenkins-x/
kubernetes perf-tests:
https://github.com/kubernetes/perf-tests
k8s Limits
https://kubernetes.io/docs/setup/best-practices/cluster-large/#:~:text=More%20specifically%2C%20Kubernetes%20is%20designed,more%20than%20150000%20total%20pods
No more than 100 pods per node No more than 5000 nodes No more than 150000 total pods No more than 300000 total containers
https://cloud.google.com/kubernetes-engine/docs/best-practices/scalability 默认情况下:
Pod 次要范围默认为 /14(262144 个 IP 地址)。 每个节点都有为其 Pod 分配的 /24 范围(用于其 Pod 的 256 个 IP 地址)。 节点的子网为 /20(4092 个 IP 地址)。
testExample: https://www.jeffgeerling.com/blog/2020/10000-kubernetes-pods-10000-subscribers https://docs.openshift.com/container-platform/3.7/scaling_performance/cluster_limits.html https://learnk8s.io/setting-cpu-memory-limits-requests https://blog.newrelic.com/engineering/kubernetes-request-and-limits/ https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/ https://github.com/kubernetes/community/blob/master/sig-scalability/configs-and-limits/thresholds.md
稳定性测试与功能(自动化)测试的相比的特殊之处
不同职责的性能测试相关人员需要专业的知识背景
需要系统的搭建基础环境
是平台稳定性、性能调优的前提
一般需要至少1个月时间,不能在短时间内完成
系统模块层级、通信等越来越复杂,某些模块的“性能很好”或“性能很差”不能解释或解决问题 page: 158
测试环境和生产环境的差异可能使得测试失败或结果没有参考价值
压力场景设计要充分考虑多类,多种情况的操作
要考虑访问、查找等的负载较大页面的比例差异
要合理估计用户停留时间(思考时间)
要考虑会话在多个页面之间迁移,会话中的(迁移) 信息累计对内存的占用
要用不同的ID, 不同的对象,不同的条件触发动态请求,避免“缓存命中”
测试报告不能简单的通过PASS, FAIL来反馈;面向客户的测试报告有别于内部详细认真的数据报告;更多的,或者不得不考虑客户的理解和信任感,而做取舍或“概述”
测试范围(种类)
性能测试有很多种分类比如:压力测试,负载测试、(上,下)临界测试,狭义性能测试,可靠性测试,耐久测试或稳定性测试。
我们目前关注:稳定性、可靠性测试,压力测试
由于平台有别于WEB系统的特殊性,狭义性能测试和临界值测试在当前的平台架构和实际部署环境不太适用。
既定的平台可靠运行的资源上限
CPU负载: 75% 内存使用率:
IO使用率:
外出网络带宽: 10Gbps 业务网络带宽: 500Mbps OS盘使用率: 75% (超过80%,k8s会将pod驱逐,已驱逐的Pod不能自行恢复) 存储使用率: 90%
注意事项:
在日志中记录各个处理的会话ID,序列号
工具
Apache Bench (license)
Oracle Application Testing suite
待确认问题
内存缓冲,缓存机制 是否抢先加载占用内存空间
用户ID(IP, Head)是怎么区分的,
虚拟CPU的分配机制,超线程情况
虚拟内存的分配、过载使用情况,内存共享,(Large Page)大页面,从POD中回收内存页 均一分配,动态分配
IO延时 存储命令等待时间 GAVG 存储队列 SCSI预约竞争 分布式的存储迁移
任务批处理,异步处理机制
自动扩容和容量管理
故障发生时的降容
WAN, LAN网络 WAN 口访问 内部大LAN的,存储,前端,后端,DB通信
Restful 请求
参考:
接口响应: [Results of measuring of API performance of Kubernetes] https://docs.openstack.org/developer/performance-docs/test_results/container_cluster_systems/kubernetes/API_testing/index.html#kubernetes-pod-startup-latency-measurement [Kubernetes网络和集群性能测试]https://jimmysong.io/kubernetes-handbook/practice/network-and-cluster-perfermance-test.html [Kubernetes Performance Measurements and Roadmap]https://kubernetes.io/blog/2015/09/kubernetes-performance-measurements-and/
存储性能基线 Kubernetes Storage Performance Comparison v2 (2020 Updated) 翻译:https://toutiao.io/posts/nmflsd/preview [logs]https://gist.github.com/pupapaik/76c5b7f124dbb69080840f01bf71f924
k8s集群可扩展性和性能SLI/SLO Kubernetes scalability and performance SLIs/SLOs 中文:https://www.jianshu.com/p/de768ea3fc19
Benchmark results of Kubernetes network plugins (CNI) over 10Gbit/s network 参考: [2020-Benchmark results of Kubernetes network plugins (CNI) over 10Gbit/s network] (https://itnext.io/benchmark-results-of-kubernetes-network-plugins-cni-over-10gbit-s-network-updated-august-2020-6e1b757b9e49) [2018-Benchmark results of Kubernetes network plugins (CNI) over 10Gbit/s network] https://itnext.io/benchmark-results-of-kubernetes-network-plugins-cni-over-10gbit-s-network-36475925a560
eted 性能基线与优化 参考:[When to update the heartbeat interval and election timeout settings]https://github.com/etcd-io/etcd/blob/master/Documentation/tuning.md etcd 性能测试与调优 (https://github.com/etcd-io/etcd/blob/master/Documentation/op-guide/performance.md) limit
性能优化参考点: kubernetes 集群的性能优化 eBay应用程序集群管理器TESS.IO在大规模集群下的性能优化 网易云基于Kubernetes的深度定制化实践 开放下载《阿里巴巴云原生实践 15 讲》揭秘九年云原生规模化落地 使用 K8S 几年后,这些技术专家有话要说 腾讯成本优化黑科技:整机CPU利用率最高提升至90% 华为云在 K8S 大规模场景下的 Service 性能优化实践
测试场景和用例 https://kiddie92.github.io/2019/01/23/kubernetes-%E6%80%A7%E8%83%BD%E6%B5%8B%E8%AF%95%E6%96%B9%E6%B3%95%E7%AE%80%E4%BB%8B/ https://supereagle.github.io/2017/03/09/kubemark/
自动化测试参考:
-
1,000 runs per month
Chrome browser tests only
Serial tests runs
Runs tests on Testim’s cloud
Advanced branching and merging
File testing
Self-help only (Docs and Resources)
Limit: one account per organization
API Key Id: a5ba1e4e2d9afc743a481d9f
API Key Secret: 433cd60a89dda9e400497a8b5b04f0ba5dc988ba1319ea9d0d5c836d8610bbf9cf188ba6
# Usage example:
$ curl https://a.blazemeter.com/api/v4/user --user 'a5ba1e4e2d9afc743a481d9f:433cd60a89dda9e400497a8b5b04f0ba5dc988ba1319ea9d0d5c836d8610bbf9cf188ba6'
测试报告
ludeknovy/jtl-reporter Generate intuitive Locust.io reports with Jtl Reporter