If Sourcegraph with Kubernetes does not start up or shows unexpected behavior, there are a variety of ways you can determine the root cause of the failure.
See our operations guide for more useful commands and operations.
Common errorsThe account you are using to apply the Kubernetes configuration doesn't have sufficient permissions to create roles, which can be resolved by creating a cluster-admin role for your user with the following command:
$ kubectl create clusterrolebinding cluster-admin-binding \
--clusterrole cluster-admin \
--user $YOUR_EMAIL
--namespace $YOUR_NAMESPACE
"kubectl get pv" shows no Persistent Volumes, and/or "kubectl get events" shows a Failed to provision volume with StorageClass "sourcegraph"
error.
Make sure a storage class named "sourcegraph" exists in your cluster within the same zone.
$ kubectl get storageclass sourcegraph -o=yaml \
--namespace $YOUR_NAMESPACE
Error: error retrieving RESTMappings to prune: invalid resource networking.k8s.io/v1, Kind=Ingress, Namespaced=true: no matches for kind "Ingress" in version "networking.k8s.io/v1".NOTE: Google Cloud Platform users may need to request an increase in storage quota.
Run kubectl version
to verify the Client Version matches the Server Version.
Run kubectl get ingresses -A
to check if there is more than one ingress for sourcegraph-frontend
. You can delete the duplicate with kubectl delete ingress sourcegraph-frontend --namespace $YOUR_NAMESPACE
Error: error when creating "base/cadvisor/cadvisor.ClusterRoleBinding.yaml": subjects[0].namespace: Required valueNOTE: See our "configuration guide" for more information on network access.
This error occurs when using legacy deployment manifests. For modern deployments using deploy-sourcegraph-k8s, cadvisor is configured as a DaemonSet with a ServiceAccount and doesn't require a ClusterRoleBinding. If you encounter this error, ensure you're using the latest deploy-sourcegraph-k8s repository.
Multiple pods are stuck in Pending.Lack of resources could be a contributing factor. Dump current cluster state and look for error messages. Below is an example of a message that indicates the cluster is currently under provisioned.
# dump.txt
"Reason": "FailedScheduling",
"Message": "0/3 nodes are available: 1 Insufficient memory, 3 Insufficient cpu.",
ImagePullBackOff / 429 Too Many Requests Errors.NOTE: The default node type for clusters on Google Cloud Platform is
n1-standard-1
, a machine with only one CPU, while some components require a 2-CPU node. We recommend setting machine-type ton1-standard-16
.
This indicates the instance is getting rate-limited by Docker Hub(link), where our images are stored, as unauthenticated users are limited to 100 image pulls within a 6 hour period. Possible solutions included:
ImagePullSecrets
K8S object with your Docker Hub service that contains your docker credentials (link to tutorial)Alternatively, you can wait until the rate limits are reset.
[OPTIONAL] You can also upgrade your account to a Docker Pro or Team subscription with higher rate-limits. (See Docker Hub for more information).
Irrelevant cAdvisor metrics are causing strange alerts and performance issues.This is most likely due to cAdvisor picking up other metrics from the cluster. A workaround is available: Filtering cAdvisor metrics.
I don't see any metrics on my Grafana Dashboard.Missing metrics indicate Sourcegraph is having issues connecting to the Kubernetes API. For instance, running a Sourcegraph instance as non-privileged prevents services from picking up metrics through the Kubernetes API. One of the potential solutions is to grant Prometheus and cAdvisor root access.
Which metrics are using the most resources?$ kubectl port-forward svc/prometheus 9090:30090
$ open http://localhost:9090
topk(10, count by (__name__)({__name__=~".+"}))
to check the valuesMake sure the namespace of the ingress-controller is ingress-nginx
. See the Troubleshooting ingress-nginx docs for more information.
{$portName}
: invalid syntax error
This can occur when the Readiness or Liveness probe is referring to a port that is not defined. Please ensure the port name is consistent with upstream. Foe example:
ports:
- containerPort: 3188
name: minio
...
livenessProbe:
httpGet:
path: /minio/health/live
port: minio #this port name MUST exist in the same spec
Service mesh
Known issues when using a service mesh (e.g. Istio, Linkerd, etc.)
Error message:Git command [git rev-parse HEAD] failed (stderr: ""): strconv.Atoi: parsing "": invalid syntax
This error occurs because Envoy, the proxy used by Istio, drops proxied trailers for the requests made over HTTP/1.1 protocol by default. To resolve this issue, enable trailers in your instance following the examples provided for Kubernetes and Kubernetes with Helm.
Symbols sidebar and hovers are not workingIn a service mesh like Istio, communication between services is secured using a feature called mutual Transport Layer Security (mTLS). mTLS relies on services communicating with each other using DNS names, rather than IP addresses, to identify the specific services or pods that the communication is intended for.
To illustrate this, consider the following examples of communication flows between the "frontend" component and the "searcher" component:
Example 1: Approved Communication Flow
http://searcher_pod_ip:3184
Example 2: Disapproved Communication Flow
http://searcher_pod_ip:3184
searcher_pod_ip
NOTE: When using mTLS, communication between services must be made using the DNS names of the services, rather than their IP addresses. This is to ensure that the service mesh can properly identify and secure the communication.
To resolve this issue, the solution is to redeploy the frontend after specifying the service address for searcher by setting the SEARCHER_URL environment variable in frontend.
Please make sure the old frontend pods are removed.
SEARCHER_URL=http:searcher:3184
Squirrel.LocalCodeIntel http status 502WARNING: This option is recommended only for searcher with a single replica. Enabling this option will negatively impact the performance of the searcher service when it has multiple replicas, as it will no longer be able to distribute requests by repository/commit.
The issue described is related to the Code Intel hover feature, where it may get stuck in a loading state or return a 502 error with the message Squirrel.LocalCodeIntel http status 502
. This is caused by the same issue described in Symbols sidebar and hovers are not working. See that section for solution.
Still need additional help? Please contact us using one of the methods listed below:
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4