Zscaler MCP On AWS: Review, Bugs, And Fixes
Hey guys! Let's dive deep into the deployment of Zscaler's MCP (Model Context Protocol) AgentCore on AWS, focusing on the nitty-gritty details, the problems we ran into, and most importantly, the solutions. We'll be looking at the official Zscaler MCP AgentCore Docker image and exploring some critical issues that need addressing. Get ready for a deep dive into the code, security vulnerabilities, and potential fixes to get the most out of your Zscaler MCP setup on AWS.
Unveiling the Issues: Critical Bugs in Zscaler MCP AgentCore
First off, we've spotted some critical bugs within the official Zscaler MCP AgentCore Docker image (specifically zscaler/zscaler-mcp-server:0.4.0-bedrock). These aren't just minor annoyances; these are fundamental flaws that prevent the server from working as it should, especially when dealing with standard MCP clients. On top of that, it doesn't quite align with the security best practices that AWS recommends. Let's break down the major problems and what we can do about them. For reference, the image we're talking about is: 709825985650.dkr.ecr.us-east-1.amazonaws.com/zscaler/zscaler-mcp-server:0.4.0-bedrock.
The tools/list Bug: A Deep Dive
The handle_tools_list() function is where the trouble begins, guys. We found a few critical bugs here that completely mess up the MCP protocol, making it impossible for standard MCP clients to discover the available tools. Let's see what's going wrong.
The Buggy Implementation
Here's a simplified look at the buggy code:
async def handle_tools_list() -> Dict[str, Any]:
tools = mcp_server.server.list_tools() # β Missing await
return {
"status": "success",
"tool": "tools/list",
"result": [json.dumps(tools, indent=2)] # β Double serialization
}
The Problems, Guys!
- Missing
awaitkeyword: The async call isn't awaited, which means the function returns a coroutine object instead of the actual tools. Yikes! - Double JSON serialization: The tools get serialized into a JSON string, and then that string is wrapped in an array. Why, though?
- Incorrect response format: The function returns
{"status": "success", "result": [...]}instead of the MCP-compliant{"tools": [...]}. This is a big no-no. - Object serialization failure: It tries to serialize Python Tool objects without converting them into dictionaries. This is a recipe for disaster.
What the Output Looks Like (Broken)
Here's what you get, which is not what we want:
{
"status": "success",
"tool": "tools/list",
"result": [
"[{\"name\": \"zpa_list_app_segments\", ...}]" // β String, not object
]
}
What the Output Should Look Like (MCP Protocol)
This is what we're aiming for. It's clean and follows the MCP spec:
{
"tools": [
{
"name": "zpa_list_app_segments",
"description": "List all application segments in ZPA",
"inputSchema": {
"type": "object",
"properties": {...}
}
}
]
}
The Impact
- β Breaks all standard MCP clients (Claude Desktop, QuickSuite, you name it).
- β Violates the MCP protocol specification. Come on, guys!
- β Tools are undiscoverable and unusable.
- β οΈ Might work with Genesis (which wraps everything), masking the bug. Sneaky.
Proposed Fix
Here's the suggested fix:
async def handle_tools_list() -> Dict[str, Any]:
# Get the list of tools from the MCP server
tools = await mcp_server.server.list_tools() # β
Added await
# Convert Tool objects to dictionaries for JSON serialization
tools_list = []
for tool in tools:
tool_dict = {
"name": tool.name,
"description": tool.description,
}
# MCP spec uses inputSchema (camelCase)
if hasattr(tool, 'inputSchema'):
tool_dict["inputSchema"] = tool.inputSchema
tools_list.append(tool_dict)
# Return MCP protocol format: {"tools": [...]}
return {"tools": tools_list} # β
Correct format
The Fix in a Nutshell
Essentially, we add the await keyword, correctly format the response, and make sure that the Tool objects are converted to dictionaries. This means our standard MCP clients can discover and use the tools. Itβs all about getting the format right, so the clients can do their job.
Security Alert: AWS Secrets Manager Support Needed
This is crucial, guys. The current setup requires you to pass Zscaler API credentials as plain-text environment variables, which is a big no-no when it comes to AWS security best practices. We need to fix this ASAP.
The Problem: Plain-Text Credentials
Here's what the current implementation looks like:
# Credentials must be passed as plain-text environment variables
ENV ZSCALER_CLIENT_ID=iq7u4xxxxxk6
ENV ZSCALER_CLIENT_SECRET=supersecretvalue123 # β Plain text!
ENV ZSCALER_CUSTOMER_ID=2xxxxxxxxxxxx8
The Risks, Explained
| Risk | Impact |
|---|---|
| ECS Task Definition Exposure | Anyone with ecs:DescribeTaskDefinition can read secrets |
| CloudFormation Exposure | Secrets visible in stack parameters and outputs |
| Container Inspection | docker inspect reveals all environment variables |
| No Encryption at Rest | Credentials stored in plain text in AWS APIs |
| No Audit Trail | No CloudTrail logs for credential access |
| No Rotation Support | Requires redeployment to update credentials |
| Compliance Failures | Fails SOC2, PCI-DSS, HIPAA, ISO 27001 audits |
An Example of the Exposure
Anyone with ECS read permissions can easily extract your secrets:
# Anyone with ECS read permissions can extract secrets
aws ecs describe-task-definition --task-definition zscaler-mcp
# Output exposes credentials in plain text:
{
"environment": [
{"name": "ZSCALER_CLIENT_SECRET", "value": "supersecretvalue123"}
]
}
The Solution: AWS Secrets Manager Integration
Hereβs how we should do it:
import boto3
from botocore.exceptions import ClientError
# Fetch credentials from Secrets Manager if configured
secret_arn = os.environ.get('ZSCALER_SECRET_ARN')
if secret_arn:
try:
region = secret_arn.split(':')[3]
client = boto3.client('secretsmanager', region_name=region)
response = client.get_secret_value(SecretId=secret_arn)
secret = json.loads(response['SecretString'])
# Set all secret keys as environment variables
for key, value in secret.items():
os.environ[key] = str(value)
logger.info(f"Loaded credentials from Secrets Manager")
except ClientError as e:
logger.error(f"Failed to fetch credentials: {e}")
raise
The Benefits
- β Credentials are encrypted at rest with AWS KMS.
- β IAM-based access control.
- β CloudTrail audit logging.
- β Automatic rotation support.
- β Compliance with SOC2, PCI-DSS, HIPAA.
- β Zero plain-text credential exposure.
Missing Features: Protocol Negotiation and Client Support
Letβs move on to some other missing pieces that are preventing Zscaler MCP from working smoothly. These include things like not handling the MCP initialize and ping methods, and a lack of support for standard MCP clients.
MCP Protocol Negotiation
The Missing Implementation
We need to handle the initialize and ping methods. Here's what's missing:
# No handling for these required MCP methods:
# - initialize (protocol version negotiation)
# - ping (health check)
The Impact
- β Cannot negotiate protocol versions with clients.
- β No support for MCP 2024-11-05 or 2025-03-26 protocols.
- β Breaks the handshake with standard MCP clients.
- β No health check mechanism.
The Solution
if method == "ping":
logger.info("Handling MCP ping request")
result = {} # MCP spec: ping returns empty object
elif method == "initialize":
logger.info("Handling MCP initialize request")
# Support both 2024-11-05 and 2025-03-26 protocol versions
client_protocol = payload.get("params", {}).get("protocolVersion", "2024-11-05")
logger.info(f"Client requested protocol version: {client_protocol}")
result = {
"protocolVersion": client_protocol, # Echo back client's version
"capabilities": {"tools": {}},
"serverInfo": {"name": "zscaler-mcp", "version": "1.0.0"}
}
Standard MCP Client Support
The Limitation
The image currently only supports the AWS Genesis NDJSON format, not standard MCP clients like Claude Desktop or QuickSuite, which use JSON-RPC or SSE (Server-Sent Events).
# Only returns Genesis NDJSON format
return StreamingResponse(
generate_streaming_response(response_data, session_id),
media_type="application/x-ndjson", # Genesis only
)
The Impact
- β Cannot be used with Claude Desktop.
- β Cannot be used with QuickSuite.
- β Cannot be used with standard MCP testing tools.
- β Limited to AWS Genesis runtime only.
The Solution
Add content negotiation based on request format:
# Check if this is a standard MCP client or Genesis
is_jsonrpc = payload.get("jsonrpc") == "2.0"
accept_header = request.headers.get("accept", "")
prefers_sse = "text/event-stream" in accept_header
if is_jsonrpc:
# Standard JSON-RPC response for MCP clients
response_content = {
"jsonrpc": "2.0",
"id": payload.get("id"),
"result": result
}
if prefers_sse:
# SSE format for streaming clients
async def sse_generator():
yield f"data: {json.dumps(response_content)}\n\n"
return StreamingResponse(
sse_generator(),
media_type="text/event-stream",
)
else:
# Standard JSON response
return JSONResponse(content=response_content)
else:
# Genesis streaming NDJSON response
return StreamingResponse(
generate_streaming_response(response_data, session_id),
media_type="application/x-ndjson",
)
Service Filtering: Keeping Things Lean
The current setup loads all Zscaler services (ZPA, ZIA, ZDX, ZCC, ZIdentity) without the ability to filter. This often leads to exceeding MCP client tool limits.
The Problem with Too Many Tools
The Zscaler MCP server exposes a ton of tools:
- ZPA: ~30 tools
- ZIA: ~40 tools
- ZDX: ~15 tools
- ZCC: ~10 tools
- ZIdentity: ~10 tools
Many MCP clients have hard limits on the number of tools they can handle:
| MCP Client | Tool Limit | Result with All Services |
|---|---|---|
| Claude Desktop | ~50 tools | β Fails to load or truncates |
| Some Genesis Agents | ~100 tools | β οΈ Performance degradation |
| QuickSuite | ~200 tools | β Works but slow |
| Custom Clients | Varies | β May fail silently |
The Real-World Impact
When testing with Claude Desktop:
# Without filtering (100+ tools)
β Error: "Too many tools provided. Maximum 50 tools supported."
# With filtering to only ZPA (30 tools)
β
Success: All tools loaded and functional
The Impacts of Loading Everything
- π« Client Compatibility: Exceeds tool limits in Claude Desktop and other clients.
- π° Higher AWS costs: Bedrock charges per tool invocation.
- β±οΈ Slower startup: Initializes all services even if they are unused.
- π§ No flexibility: Cannot disable unused services.
- π Harder debugging: More tools to troubleshoot.
- β‘ Performance degradation: Large tool lists slow down the client UX.
The Solution
# Read ZSCALER_MCP_SERVICES environment variable to filter services
services_env = os.environ.get('ZSCALER_MCP_SERVICES', '')
if services_env:
enabled_services = set(s.strip() for s in services_env.split(',') if s.strip())
logger.info(f"Filtering to services: {enabled_services}")
mcp_server = ZscalerMCPServer(enabled_services=enabled_services)
else:
logger.info("Loading all services")
mcp_server = ZscalerMCPServer()
How to Use It
# Only enable ZPA and ZIA
ZSCALER_MCP_SERVICES="zpa,zia"
Logging Configuration
Right now, the official image uses fixed INFO level logging, which isn't very helpful for detailed debugging or production. Let's fix this.
The Problem: Fixed Logging Level
logging.basicConfig(
level=logging.INFO, # β Fixed, cannot change
format='% (asctime)s - %(name)s - %(levelname)s - %(message)s'
)
The Impact
- π Harder debugging: Can't enable DEBUG logs.
- π No traffic inspection: Can't log HTTP headers/bodies.
- π Limited troubleshooting: Missing critical diagnostic information.
The Solution
# Configure logging with environment variable
log_level = os.environ.get('LOG_LEVEL', 'INFO').upper()
logging.basicConfig(
level=getattr(logging, log_level, logging.INFO),
format='% (asctime)s - %(name)s - %(levelname)s - %(message)s'
)
logger.info(f"Logging level set to: {log_level}")
# Optional HTTP traffic logging middleware
@app.middleware("http")
async def log_request_response(request: Request, call_next):
if os.environ.get('LOG_HEADERS', 'false').lower() == 'true':
logger.info(f"Request: {request.method} {request.url.path}")
logger.info(f"Headers: {dict(request.headers)}")
response = await call_next(request)
return response
Usage
# Enable debug logging
LOG_LEVEL=DEBUG
# Enable HTTP traffic logging
LOG_HEADERS=true
Summary of Issues and Recommended Actions
Letβs put it all together in a quick summary table:
| Issue | Severity | Impact | Status |
|---|---|---|---|
tools/list bug |
π΄ Critical | Breaks MCP clients | Not fixed |
| No Secrets Manager | π΄ Critical | Security vulnerability | Not implemented |
| No protocol negotiation | π‘ High | Breaks handshake | Not implemented |
| Genesis-only support | π‘ High | Limited compatibility | Not implemented |
| No service filtering | π‘ Medium | Higher costs | Not implemented |
| Fixed logging | π‘ Medium | Harder debugging | Not implemented |
Recommended Actions
- Immediate (Critical):
- Fix the
tools/listasync/await and response format bug. - Add AWS Secrets Manager support for credential management.
- Fix the
- High Priority:
- Implement MCP
initializeandpingmethods. - Add JSON-RPC and SSE support for standard MCP clients.
- Implement MCP
- Medium Priority:
- Add service filtering via environment variable.
- Implement configurable logging levels.
Testing and Contributing
We've validated these issues by:
- Extracting the official Docker image filesystem.
- Comparing with a working production implementation.
- Testing with multiple MCP clients.
- Reviewing MCP protocol specification compliance.
Test Environment:
- Image:
zscaler/zscaler-mcp-server:0.4.0-bedrock - Platform:
linux/arm64 - Extracted:
/tmp/zscaler-official/app/web_server.py
Weβre ready to help. We have working implementations of all these fixes and are happy to contribute them back to the project. Just let us know the best way to do it!
Recommendation: Open-Source the AgentCore Build
Hereβs a suggestion that we think would make a big difference in terms of usability, security, and overall adoption.
The Current Situation
The AgentCore/Bedrock-specific build is only available as a pre-built Docker image in AWS Marketplace ECR.
- Image:
709825985650.dkr.ecr.us-east-1.amazonaws.com/zscaler/zscaler-mcp-server:0.4.0-bedrock - Source code: Not available in the public repository
- Build process: Undocumented
The Inconsistency
It's odd because the rest of the Zscaler MCP project is fully open source.
| Component | Status | Repository |
|---|---|---|
| Core MCP Server | β Open Source | zscaler/zscaler-sdk-python-mcp |
| All Tool Implementations | β Open Source | Public GitHub |
| ZPA Tools | β Open Source | Public GitHub |
| ZIA Tools | β Open Source | Public GitHub |
| ZDX Tools | β Open Source | Public GitHub |
| ZCC Tools | β Open Source | Public GitHub |
| ZIdentity Tools | β Open Source | Public GitHub |
| AgentCore Wrapper | β Closed | Only pre-built image |
The Problem
Why hide only the AgentCore wrapper? It's just an HTTP adapter (~300 lines) that translates Genesis NDJSON to MCP protocol calls. It has no proprietary logic, algorithms, or competitive advantages.
Why This Is Problematic
- Security Concerns: Users canβt audit the build process, verify whatβs in the container, or validate security practices.
- Lack of Transparency: The build process is hidden, and there's no visibility into dependencies or configurations.
- Easy to Reverse Engineer: Container images are easy to extract, making the obscurity pointless.
- Hinders Adoption: Enterprise customers often need source code review, custom builds, and vulnerability scanning.
- Prevents Bug Fixes: Users can't submit fixes or validate proposed solutions.
Recommended Approach
Make the AgentCore/Genesis wrapper code publicly available in the repository.
zscaler-mcp/
βββ src/
β βββ zscaler_mcp/
β βββ server.py # Core MCP server (already public)
β βββ tools/ # Tool implementations (already public)
β βββ web_server.py # Genesis wrapper (currently hidden)
βββ docker/
β βββ Dockerfile # Build instructions (currently hidden)
β βββ requirements.txt # Dependencies (currently hidden)
βββ docs/
βββ agentcore-deployment.md # Deployment guide (currently missing)
Benefits of Making It Public
- β Increased Trust
- β Better Security
- β Faster Bug Fixes
- β Improved Quality
- β Easier Adoption
- β Community Growth
- β Better Documentation
- β Reduced Support Burden
Critical for Enterprise Adoption
Enterprise security requirements often include source code review, custom container builds, vulnerability scanning, and supply chain security.
Real-World Enterprise Blockers
Without source code and a Dockerfile, enterprises can't build from source, scan dependencies, generate SBOMs, or apply internal security policies.
Enterprise Approval Process
The current closed-source approach blocks enterprises from adopting the solution, regardless of its technical merit.
Conclusion
We strongly urge Zscaler to make the AgentCore build publicly available. The current model creates unnecessary friction, reduces trust, and hinders adoption. Making the code public would align with industry best practices and accelerate adoption of the Zscaler MCP server in AWS environments.
Precedent
Most successful MCP server implementations are fully open source, including Anthropic's, AWS's, and the community's.